L O A D I N G

Understanding the Basics of Query Reformulation

The evolution of Search Engines has allowed them to become better able to understand a user’s intent behind a search query. Google has been able to utilise user-generated page content as well as other factors to understand the context of a user’s search, rather than just matching keywords on the page with what the user is typing into their browser. In light of this evolution, Search Engines such as Google have released multiple updates (such as RankBrain and Hummingbird), which have focused more on ways to identify a user’s intent behind a query by looking at multiple factors, including the user’s previous search history and the importance of a user’s prior questions when creating an optimal response for their current search. 

Query Reformulation is the modification of a user’s original search to provide the best match of their search query with an appropriate response. This post describes a specific methodology for Rewriting User Queries as outlined in a patent, which suggests that Google will likely use prior searches within a single session to generate a better response for a current search. 

This approach to Query Reformulation goes beyond simply replacing keywords, but instead looks at an overall, more complex understanding of the context and complexity of queries in a session-based or sequential manner.

google Query Rewriting
google Query Rewriting

The Mudville Stadium Example

The document explains how Google changes queries and the way users submit their requests based on this particular patent has an example to illustrate what happens. The example uses the satire of a fictional site called Mudville Stadium, since that is the reference point. This fictional site was derived from the well-known 18th-century poem Casey at the Bat.

In the case used in the document, a person may start searching with an initial query like: “Mudville Stadium” or “Mudville Nine” and also submit a search unrelated to either query of: “Barack Obama.” The user then finishes their searches with one final ambiguous question: “What is the capacity?” If you use a typical keyword search engine, then you will find that the query “What is the capacity?” does not provide enough information from which to provide an answer. The capacity of anything could be a computer’s hard disk drive or a fuel tank, or even a stadium.

Nonetheless, the patent suggests a technique for where analytical capabilities for earlier queries within the same session. In this case, by recognising “Mudville Stadium” as significant within the recent query history, the engine infers that when the user searched for “capacity,” he or she was looking specifically for the “capacity” of “Mudville Stadium” and presents a candidate rewrite: “What is the capacity of ‘Mudville Stadium.” 

As such, this candidate rewrite should lead to a higher likelihood of finding the answer to what the user actually asked. The preceding example highlights the numerous NLP steps associated with information retrieval, including the necessity for the engine to understand not only the current question but also the context surrounding it.

The Mechanics of Query Rewriting Using Previous Queries

The core of this patent lies in the systematic process of using session data to refine searches. This process helps answer the question of whether Google changes user search query inputs to better match available content. The methodology described involves several distinct stages that transform a raw user input into a precise command for the search index.

Step 1: Receiving and Logging:

First, the search engine receives a query from the user. At the same time, it retrieves the log of previous queries made by that same user during the current active session. This creates a contextual window.

Step 2: Generating Candidates:

The system then generates multiple candidate query rewrites. These are essentially different versions of the current query, modified by appending or integrating terms from the previous searches. For instance, it might combine the current query with the very first query of the session, or with the most recent one, to see if a logical sentence is formed.

Step 3: Scoring and Selection:

Once these candidates are generated, they must be evaluated. The system scores each candidate’s rewrite based on quality and relevance. This scoring process is multifaceted. It considers the age of the prior queries, meaning a query made five seconds ago is likely more relevant than one made ten minutes ago.

Factors Influencing the Rewrite Score

The scoring system uses data gathered on users at large to validate its decisions. For example, if many users have followed a similar search technique by first searching for one of many movie titles, then searching for “cast”, then the system will have more confidence referring to the next query of “who’s in it” with the query “who’s in [movie title]”.

The identification of entities is another important aspect of scoring. If an entity is included in the candidate rewriting that has been identified as an entity before, e.g., an athlete, a famous person, a landmark, or a brand name, the candidate rewriting receives a score increase. The system compares the terms in the current query with terms from prior queries that have a strong correlation with entities.

The logic used in scoring is vital for optimising multi-intent keywords, as it will help the engine determine what is meant by a term that has multiple meanings and will associate the term with an entity already established earlier in the session. When the highest-scoring candidate is identified, Google will return and display the results for the rewritten query as if no query reformulation had taken place.

Testing the Theory in Real Time

While patents provide a fascinating window into the potential inner workings of Google, they do not always reflect what is currently live in the search results. The author of the analysis attempted to replicate the Mudville example to see does Google customizes search results in real time based on this logic.

They searched for “Mudville Stadium,” “Mudville Nine,” and “Barack Obama,” followed by “what is the capacity.” In this test, Google did not rewrite the query to provide the stadium capacity. The author also tried with a real location, “Lincoln Financial Field,” but experienced similar mixed results. This suggests that while the capability exists in patent form, it may not be universally applied to every query or may require stronger signals to trigger.

Future Implications and Google MUM

The fact that some of the test results were inconsistent doesn’t negate the fact that there are still other evidence-based indicators that indicate Google is progressing toward a point where it will be using that type of data. 

For example, the addition of “Quality Visit” scores in Google Analytics 360 can be seen as an indication of Google’s interest in user engagement metrics and user behaviour. And the announcement regarding their new badges to be awarded to businesses that are in the upper tier of visited sites in their own category further illustrates how user interactions feed back into the ranking system.

By observing these indicators, we can validate our theory regarding Query Reformulation. User signals are essential in terms of determining how the system should improve the way it retrieves and ranks information. Another element to consider, as technology continues to evolve, is how Google will use more complex algorithms, such as Google MUM, and develop their ability to/attempt to complete multi-turn/a lot of tasks using them. 

These longer-term and broader types of searches will benefit from Google MUM’s ability to analyse and comprehend both text and images at the same time. Theoretically, Google MUM should be able to complete these types of search-related rewrites with greater fluency and proficiency than what was outlined in the 2019 patent.

Takeaways for Digital Marketers

For professionals in the digital space, such as a digital marketing agency in Dubai, understanding these mechanics is vital. It highlights the importance of creating content that clearly establishes entities and their attributes.

If Google is rewriting queries to include entity names, your content must explicitly connect those entities to the relevant answers. For example, explicitly stating “The capacity of Lincoln Financial Field is…” rather than just saying “The capacity is…” allows the engine to map the data correctly. This ensures that when Google Query Rewriting occurs, your content is the precise match for the reformulated question.

In summary, this patent reveals a logical step in the evolution of search: moving from isolated keyword matching to a fluid, conversational understanding of user intent. By analysing prior queries, Google can bridge the gap between what a user says and what they actually mean, ensuring that you effectively answer the question. “How Google changes query strings ultimately leads to a more satisfying search experience. While not every search confirms this behaviour today, the trajectory towards context-aware, session-based Query Reformulation is undeniable.

About the Author Bhupender Singh

Related Post

Publications, Insights & News from GTECH