Query rewriting for RAG: how to improve retrieval accuracy

Share the article

A common predicament these days is that, in many retrieval-augmented generation (RAG) pipelines, the original user query doesn't really match the language of the datasets or knowledge bases. As a result, the retrieved results are based on irrelevant documents, which frustrates users.

This guide will help you learn:

What's meant by query rewriting in RAG.
Why query rewriting is so important in RAG.
How query rewriting works within the RAG system.
The problems that are solved by query rewriting, such as user intent mismatches, retrieval quality, and hallucinations.
The techniques used for query rewriting, including template rewriting, multi-query generation, and query transformation.
How you can implement query rewriting through practical steps and benchmarks within the RAG system.
How query rewriting fares against reranking and query expansions.

Let's start by defining what query rewriting means in a RAG system.

What is query rewriting in RAG?

Query rewriting in search refers to modifying a user query before it reaches the server. The intention behind this is to turn the original query into a better, more structured version that matches the language used in the datasets and knowledge bases.

This is not the same as query expansion, where you add related terms to give more context. Query rewriting involves changing the structure or wording of the search query to retrieve more relevant documents.

Why is query rewriting important for RAG?

Query rewriting is important because, in different types of RAG systems, the retrieval quality determines how accurate the LLM's final response will be. The better you make the retrieval, the better the generation you get.

It's common for the language used in a user query not to match that in datasets or knowledge bases, leading to results that may contain weak or irrelevant documents.

With query rewriting, you can solve this problem at the root. The user query is converted into a form better understood by the retrieval layer. This can include changing the language, adding missing context, or restructuring the query so the retriever can quickly locate more relevant documents.

When retrieval improves, the grounding improves as well.

Now, let's see how query rewriting works in RAG.

How does query rewriting work in RAG?

Query rewriting occurs before the retriever starts searching through the knowledge base or datasets. Through rewriting, the retrieval accuracy is increased, and hallucination scores are reduced in the final LLM response.

Here's what a typical query rewriting workflow inside a retrieval-augmented generation pipeline looks like:

First, the user submits a natural language question. It can be vague, incomplete, or phrased differently from the documents already in the knowledge base.
The rewrite model transforms the query. Either the LLM or a query transformation algorithm analyzes the user intent and generates a rewritten user query that's more retrieval-friendly.
The rewritten query has now reached the retriever, which performs either a hybrid or vector search to locate relevant documents using embeddings.
The next step is for the reranker to improve document relevance. A reranking model determines which of the retrieved results are the most relevant using k-scoring.
Finally, the LLM uses the retrieved documents as context to generate a grounded answer that users will find relevant.

What problems does query rewriting solve?

Query rewriting addresses several retrieval problems in RAG systems.

The most common ones include:

Ambiguous user queries: A short or vague query may not clearly express user intent. Query rewriting clarifies the query so that the retriever can locate the correct documents.
Missing context in search queries: Many queries lack important details. Rewriting adds context that helps the retrieval system search datasets more effectively.
Vocabulary mismatch: Users often use a different vocabulary from that used in datasets or knowledge bases, which can lead to irrelevant results.
Conversational follow-up questions: If the workflow is chat-based, the follow-up queries can depend on prior queries. With query rewriting, the system can construct the full query before retrieval.
Enterprise search challenges: Large datasets across internal business systems often require query transformation to retrieve relevant documents efficiently.

Next, we will look at the most common query rewriting techniques.

What are common query rewriting techniques?

Various techniques are used in query rewriting to improve retrieval accuracy.

The goal of each technique is the same: to improve the original query so the retriever can locate more relevant documents across knowledge bases.

Let's explore them one by one.

1. Query expansion

In query expansion, related terms or concepts are added to the original query so that the retrieval system can retrieve a broader range of documents. This technique helps address vocabulary mismatch between user queries and stored datasets.

For example, a search query about 'heart attack treatment' can be expanded to include terms such as 'myocardial infarction' or 'cardiac care.'

2. Query decomposition

Query decomposition is the opposite of query expansion. It focuses on breaking a complex query down into sub-queries that are easier for the retrieval system to process.

Each sub-query retrieves relevant documents from the dataset, and the LLM combines the results to generate the final answer.

This technique comes in handy when users ask multi-part questions that require information from different sources.

3. Query paraphrasing

Query paraphrasing involves using different wording for the same query to make it easier for the retriever to understand. Large language models generate alternative versions of the user's question so the retriever can search across more linguistic variations.

By having multiple paraphrased versions, the system increases the likelihood of retrieving relevant documents from the knowledge base.

4. Multi-query generation

In multi-query generation, you get several rewritten versions of a single user request. Each query targets a different perspective of the user's question.

The retriever runs each rewritten query against the datasets and collects the retrieved documents.

After this, the system uses aggregation techniques and selects the most relevant documents for the generator stage.

The goal of this technique is to improve the overall recall and sift through a broader range of information to get the best results.

5. Step-back prompting

In step-back prompting, the system first generates a broader or more general question before searching the dataset.

Why? So the model can identify the concept behind the user's query.

After the general search, the system refines the search to the specific user question. This technique is especially effective when the original query lacks sufficient context.

When should you use query rewriting?

You should use query rewriting whenever the original query is insufficient for the retriever to quickly locate relevant documents.

Here are the common scenarios where you can use query rewriting:

Multi-turn chat conversations: In conversational search, follow-up questions depend on prior context. Query rewriting helps rewrite the full user query so the AI model doesn't miss any prior context.
Domain-specific corpora: Query rewriting helps in domains with specialized datasets, such as medical, legal, or technical documentation.
Enterprise knowledge bases: Such as internal company datasets.
Long-tail queries: Highly specific queries sometimes need to be rewritten so the retriever can match them with the proper documents.
Ambiguous user questions: It's always better to remove any ambiguity before the retrieval step.

Now, let's see how to implement query rewriting in a RAG system.

How do you implement query rewriting?

To implement query rewriting within a RAG pipeline, you need to transform the query before it reaches the retriever.

Here's a typical step-by-step implementation process:

1. Capture and normalize the user query

The first step is to capture the user query and normalize it so it can be processed consistently by the rewrite model.

For this, you rely on cleaning, formatting, removing any unnecessary tokens, and preparing the query for processing.

The role of normalization is to help prevent noise in the retrieval system and to create a clean baseline for the query to be transformed.

Developers can also capture metadata, such as session context or user-intent signals, that can influence the rewritten query.

python

This normalized query serves as input to the rewriting stage.

2. Generate a rewritten query using an LLM

In this step, an LLM is used to transform the original query into a rewritten query that better targets what we're after. This step is also used in prompt engineering, particularly for few-shot prompting.

The goal is to generate one or more queries from the original one to improve recall across the relevant datasets.

python

Now we have a rewritten query that the retrieval system can easily understand.

3. Retrieve relevant documents

Next up, the query is passed to the retriever. Retrieval involves using vector search or hybrid search to find relevant documents in the datasets.

Embeddings from the embedding model enable the system to match based on semantic meaning rather than exact keywords.

Here's an example of using a vector search API:

python

At this stage, the system retrieves the top k relevant documents that match the rewritten query.

4. Generate the final response

In the final step, the retrieved documents, along with the user's original query, are sent to the LLM to generate a response. This response is highly accurate and has a low chance of hallucinations, thanks to the query rewriting we used to improve the retrieval.

python

This architecture smoothly connects all three, query rewriting, retrieval, and generation, into a unified RAG workflow.

Now we will examine the limitations of query rewriting in real-world RAG systems.

What are the limitations of query rewriting?

Query rewriting is certainly useful in RAG systems but it has its limitations. Here are some of the key ones:

Over-expansion of queries: Some query rewriting strategies add too many terms or generate overly broad queries. This can cause the retriever to return irrelevant documents rather than improve retrieval quality.
Semantic drift: A rewritten query may unintentionally change the meaning of the original query. When this happens, it's likely that the system will search for the wrong information.
Latency and cost: Since query rewriting requires an extra LLM call before retrieval, the cost and latency can increase.
Evaluation complexity: It's hard to evaluate how much better the rewritten query is. For this, developers have to rely on benchmarks, metrics, and comparisons, which takes a lot of time.
Not a complete solution: Query rewriting isn't the be-all and end-all fix for AI search. You still need to worry about metadata filtering, reranking, and strong embeddings for better results.

Now, we'll see how you can evaluate query rewriting performance.

How do you evaluate query rewriting performance?

Developers usually evaluate query rewriting by comparing the rewritten query with the original one using specific metrics. These metrics include:

Recall@k: Measures how often relevant documents appear among the top-k retrieved results. If the recall is higher, it means query rewriting is getting better results than the initial query.
MRR (Mean Reciprocal Rank): Focuses on the first relevant document in the results. If the MRR is high, it means the system found the document quickly.
NDCG (Normalized Discounted Cumulative Gain): Measures ranking quality by evaluating how well the retrieval system orders relevant documents in the results.
Answer accuracy: Determines whether the final generated response actually answers the user's question.
Human evaluation: Domain experts review the results to assess relevance, grounding, and factual correctness.

The best approach that teams can take is to combine offline benchmarks with online experiments.

How does query rewriting compare to reranking?

The main difference between query rewriting and reranking is where they operate in the retrieval process.

Query rewriting modifies the search query before the retriever searches for documents. Reranking reorders the documents after retrieval.

Here are the other differences between the two:

Primary goal: Query rewriting aims to improve recall, while reranking aims to improve precision through ordering.
Computational cost: Query rewriting often requires an additional LLM step before retrieval, while reranking applies additional scoring to the retrieved documents.
Impact on retrieval quality: Query rewriting transforms an initial query, while reranking evaluates the relevance of the retrieved documents.

Most RAG systems actually combine both methods to improve retrieval performance.

How does query rewriting differ from query expansion?

The main difference between query rewriting and query expansion is how they modify or transform the original query before retrieval.

Query rewriting transforms the original query into a new one that better reflects the user's intent.

Query expansion, on the other hand, focuses on adding additional search terms.

Here are the key differences:

Query structure: Query rewriting modifies the query's initial structure. Query expansion adds to the existing structure.
Impact on retrieval: Query rewriting improves how the retriever interprets the user's question, while query expansion improves recall by searching additional related terms.

Let's say we have the following original query: 'Best model for customer questions.'

A rewritten query could look something like this: 'Best large language models for customer support question-answering,' or this: 'Which LLMs perform best for automated customer support question-answering in production systems?'

The same original query would have the following query expansions; related queries added to the original:

'Best LLM for customer support chatbots,' 'Top AI models for customer service automation,' and 'LLM performance on customer support datasets.'

How does query rewriting fit into RAG pipelines?

Query rewriting is a preprocessing step in the RAG pipeline that occurs before the retriever retrieves documents. It serves to improve the initial user query so it's easier for the retriever to understand.

When a user's query first enters the system, it is processed by a query rewriting component. This step may use LLMs, prompt templates, or ML algorithms to generate a better query that reflects more clearly the user's intent.

The rewritten query is converted into embeddings and passed to the retriever, which performs vector and hybrid searches to find relevant documents from the knowledge bases or datasets.

After retrieval, the system will apply semantic rankers or reranking models to reorder the retrieved documents based on their relevance.

The top-ranked documents are then passed to the LLM, which generates a response grounded in the retrieved context.

What tools support query rewriting for RAG?

There are several tools and frameworks that actively support query rewriting in RAG systems. Let's have a look:

Meilisearch: A search engine that supports RAG architectures with fast hybrid search. Meilisearch doesn't do the query rewriting for you. It helps LLM-based workflows that rely on rewriting. How? Developers can generate a rewritten query using an LLM and send it to Meilisearch, which then retrieves relevant documents using vector search, filtering, and relevance tuning. Since the context is high-quality, so is the generated result.
Azure AI Search: Azure AI Search provides built-in support for hybrid search and semantic ranking. Developers can combine query rewriting with Azure's retrieval pipelines to help the system rewrite user queries for better, higher-quality retrieval.
LangChain: An open-source framework that helps developers orchestrate RAG pipelines. It provides utilities for implementing query rewriting using LLM prompts, multi-query generation, and query transformation workflows before retrieval.

These tools allow developers to fit query rewriting into production RAG systems.

Why query rewriting for RAG is becoming essential

Query rewriting is crucial for improving retrieval accuracy in modern RAG systems, since most user queries don't match the language of the datasets and knowledge bases.

Query rewriting better reflects user intent and improves the retriever's ability to locate relevant documents.

How Meilisearch strengthens query rewriting for RAG in production systems

Meilisearch supports RAG architecture through fast hybrid search, vector search, and flexible relevance tuning. Developers can make the most of Meilisearch's capabilities by integrating LLM-based query rewriting with Meilisearch APIs to retrieve relevant documents from large datasets.

Try Meilisearch

Query rewriting for RAG: how to improve retrieval accuracy

What is query rewriting in RAG?

Why is query rewriting important for RAG?

How does query rewriting work in RAG?

What problems does query rewriting solve?

What are common query rewriting techniques?

1. Query expansion

2. Query decomposition

3. Query paraphrasing

4. Multi-query generation

5. Step-back prompting

When should you use query rewriting?

How do you implement query rewriting?

1. Capture and normalize the user query

2. Generate a rewritten query using an LLM

3. Retrieve relevant documents

4. Generate the final response

What are the limitations of query rewriting?

How do you evaluate query rewriting performance?

How does query rewriting compare to reranking?

How does query rewriting differ from query expansion?

How does query rewriting fit into RAG pipelines?

What tools support query rewriting for RAG?

Why query rewriting for RAG is becoming essential

How Meilisearch strengthens query rewriting for RAG in production systems

Maya Shin

Related articles

RAG for medical data: improving healthcare AI accuracy

What is context distillation in AI & how does it improve LLM efficiency?

RAG guardrails: the foundation of trustworthy AI applications