Building a RAG system with Meilisearch: a comprehensive guide

Carolina Ferreira

Carolina Ferreira

Developer Advocate @ Meilisearch

··17 min read

Retrieval Augmented Generation (RAG) has become an essential component of modern AI applications, enabling more accurate and controllable responses from Large Language Models (LLMs). While vector databases are the standard for RAG, Meilisearch stands out as a fast, open-source alternative with AI-powered search, exceptional relevancy, and remarkable speed.

This guide will walk you through building and optimizing a RAG system using Meilisearch.

Understanding RAG

RAG is a process that enhances LLM outputs by grounding them in external, retrievable data. Instead of relying solely on the model's trained knowledge, RAG systems first retrieve relevant information from a curated knowledge base, then use this context to generate responses.

The typical RAG workflow consists of three main steps:

  1. Retrieval: query the knowledge base to find relevant documents or passages
  2. Augmentation: combine the retrieved information with the user's query
  3. Generation: use an LLM to generate a response based on both the query and retrieved context

Key components of RAG

A RAG system comprises three essential components:

  • External data source External data sources are the foundation of a RAG system. These sources such as knowledge bases, or technical documentation provide the information the LLM uses to generate responses. The quality of this data directly impacts performance; it must be well-organized, and regularly updated for accuracy and relevancy.

  • Vector store The vector store serves as the bridge between raw data and the LLM. It converts text into vector embeddings – numerical representations of meaning. These vectors allow efficient similarity searches, enabling quick retrieval of relevant information. Modern tools like Meilisearch combine keyword search with semantic similarity to deliver fast and scalable results.

  • Large Language Model The LLM is the system's intelligence, responsible for understanding user queries and generating coherent, relevant responses. It combines user queries with context retrieved from the vector store to produce accurate replies. Models like GPT-4, Claude, or Llama 2 excel at creating human-like responses within the constraints of the provided context.

Why LLMs need RAG: overcoming key limitations

Large Language Models excel at general knowledge but face two significant limitations:

  • they struggle with specialized domain-specific information
  • they are constrained by their last training sessions, relying on outdated knowledge and often lagging months or even years behind current advancements.

RAG lets you tackle both challenges at once. For instance, a legal firm can enhance their LLM's capabilities by incorporating not only their historical case archives but also the latest court decisions and regulatory changes. A healthcare provider might integrate both established medical literature and recent clinical trials or updated treatment protocols.

The ability to continuously update your knowledge base ensures that your LLM-powered applications can provide accurate, up-to-date responses that combine deep domain expertise with the latest information in your field.

How to optimize document retrieval in RAG Systems

Efficient information retrieval is crucial for RAG. Without precise and relevant document retrieval, even the most advanced LLMs can produce inaccurate or incomplete responses. The goal is to ensure that only the most relevant, contextually rich documents are retrieved in response to a query.

Choosing the right document retrieval system is a crucial step in this process. Meilisearch offers a fast, open-source search engine that supports keyword searches and more advanced AI-powered search approaches that combine exact word matching with semantic search. This dual capability makes it an ideal tool for RAG systems, where the goal is to retrieve not only documents that match keywords but also those that are semantically related

Meilisearch offers a range of features specifically suited for RAG systems:

  • Easy embedder integration: Meilisearch automatically generates vector embeddings, enabling high-quality semantic retrieval with minimal setup and flexibility to choose the latest embedder models.
  • Hybrid search capabilities: Combine keyword and semantic (vector-based) search to deliver broader, more accurate document retrieval.
  • Speed and performance: Meilisearch delivers ultra-fast response times, ensuring that retrieval is never a bottleneck in your LLM workflow.
  • Customizable relevancy: Adjust ranking rules and sort documents based on attributes like freshness or importance, to prioritize the most valuable results. Set a relevancy threshold to exclude less relevant results from the search.

Once you've established your retrieval system, the next step is to optimize how your data is stored, indexed, and retrieved. The following strategies – document chunking, metadata enrichment, and relevancy tuning – will ensure that every search query returns the most useful and contextually relevant information.

How to chunk documents to maximize relevancy

Breaking down documents into optimal-sized chunks is crucial for effective retrieval. Chunks should be large enough to maintain context but small enough to be specific and relevant. Consider semantic boundaries like paragraphs or sections rather than arbitrary character counts.

Enriching metadata to boost search precision

Enhance your documents with rich metadata to improve retrieval accuracy. Include categories, tags, timestamps, authors, and other relevant attributes. For example, tagging technical documentation with specific product versions can significantly improve retrieval quality.

Adjusting relevancy for accurate results

Fine-tune your search parameters based on your specific use case. Adjust the hybrid search semantic ratio to balance conceptual understanding and exact matching based on the needs of your domain. Use the ranking score threshold to filter out low-quality matches, but be careful not to set it too high and miss valuable contextual information.

Setting up Meilisearch for RAG

The quality of the retrieval system directly impacts the accuracy and reliability of generated responses. Meilisearch stands out as a search engine for RAG implementations, thanks to its AI-powered search capabilities, customizable document processing, and advanced ranking controls.

Set Meilisearch up

Unlike traditional vector stores that rely solely on semantic search, Meilisearch combines vector similarity with full-text search, giving you the best of both worlds.

First, you need to create a Meilisearch project and activate the AI-powered search feature.

Then, you need to configure the embedder of your choice. We are going to use an OpenAI embedder, but Meilisearch also supports embedders from HuggingFace, Ollama, and any embedder accessible via a RESTful API:

python

Note: You'll need to replace OPEN_AI_API_KEY with your OpenAI API key.

Smart document processing with Meilisearch's document template

Meilisearch’s document template allows you to customize embeddings for each document, ensuring only the most relevant fields are included.

Customizing your document processing helps you:

  • Increase retrieval relevance with precise embeddings
  • Lower costs by reducing unnecessary tokens
  • Ensure consistency across different document types
  • Support domain-specific needs for unique data formats
  • Iterate and refine embedding strategies as your system evolves

Here’s an example document from the Meilisearch documentation:

json

To optimize the embeddings for this document, we’ve decided to focus on the most meaningful fields:

  • Headings: The values of hierarchy_lvl0 to hierarchy_lvl3 will be included in the embeddings to retain document structure and context
  • Content: The value of content will be embedded as it provides the essential text needed for semantic search

Other fields, like publication_date, will be excluded from embeddings but remain available for sorting. This allows Meilisearch to sort by date while keeping embeddings lean and focused on relevancy

Meilisearch customizable ranking rules

Meilisearch offers fine-grained control over result ranking, enabling you to customize how search results are ordered and prioritized. This control ensures that users see the most relevant content first, tailored to your specific business or domain needs.

Unlike fixed ranking systems, Meilisearch allows you to define your own ranking rules. This flexibility helps you prioritize certain types of content, promote newer or more relevant results, and create a search experience that aligns with user expectations.

For instance, we have added to the default ranking rules, a custom rule that promote newer documents.

python

Index your documents

After setting up Meilisearch and preparing your data using best practices like document chunking and metadata enrichment, you can now push your data to Meilisearch.

Meilisearch accepts data in .json, .ndjson, and .csv formats. There are several ways to upload your documents:

  • Drag and drop files into the Cloud UI.
  • Use the API via the /indexes/{index_uid}/documents route.
  • Call the method from your preferred SDK

💡 Note: Your documents must have a unique identifier (id). This is crucial for Meilisearch to identify and update records correctly.

Here’s how to upload documents using the Python SDK:

python

Perform an AI-powered search

Perform AI-powered searches with q and hybrid to retrieve search results using the embedder you configured earlier.

Meilisearch will return a mix of semantic and full-text matches, prioritizing results that match the query's meaning and context. You can fine-tune this balance using the semanticRatio parameter:

python

This flexible control lets you:

  • Optimize the balance to fit your specific use case.
  • Adapt in real-time based on query patterns.
  • Combine the strengths of both methods, ensuring you don't miss key results.

This dual approach ensures you won't miss relevant results that might slip through the cracks of pure semantic search, while maintaining the benefits of semantic understanding.

Quality control with ranking score threshold

The rankingScoreThreshold parameter ensures that only high-quality results are included in the search response. It works in tandem with the ranking score, a numeric value ranging from 0.0 (poor match) to 1.0 (perfect match). Any result with a ranking score below the specified rankingScoreThreshold is excluded.

By setting a ranking score threshold, you can:

  • Filter out low-relevance results to improve overall result quality
  • Provide better context for RAG systems, ensuring LLMs work with higher-quality data
  • Reduce noise in search results, minimizing irrelevant information
  • Customize relevancy to align with your specific use case needs

The following query only returns results with a ranking score bigger than 0.3:

python

Ready to build your RAG system? Now that we've set up Meilisearch. We'll walk you through the steps to create a RAG system with Meilisearch.

Implementing RAG with Meilisearch

We'll build a RAG system using the Meilisearch documentation as our example knowledge base, demonstrating how to retrieve, process, and generate accurate, context-aware responses.

Key technologies used

Our implementation leverages several key technologies:

  • FastAPI: powers the API that handles user queries
  • Meilisearch: retrieves the relevant content
  • OpenAI's GPT-4: generates human-like, contextual responses
  • LangChain: orchestrates the AI workflow by chaining the search and LLM response generation.

How the system works

When a user submits a question, the system follows these steps:

  • User input: The user submits a query to the API
  • Content retrieval: Meilisearch searches for the most relevant content using a combination of keyword and semantic search
  • Context construction: the system builds a hierarchical context from the search results
  • LLM generation: the context and user query are sent to GPT-4 to generate an accurate, practical response
  • Response delivery: the system returns the LLM-generated answer along with the sources used to generate it

Setting up the environment

API keys and credentials are stored on environment variables in a .env file. We use dotenv to load them.

Here's how key services are initialized:

  • Meilisearch client: connects to the Meilisearch instance using the host and API key.
  • OpenAI client: authenticates the GPT-4 LLM via an API key
  • FastAPI application: sets up the web API for users to interact with the system
python=1

Configuring CORS middleware

To ensure the system can handle requests from different origins (like frontend clients), we configure Cross-Origin Resource Sharing (CORS) for the FastAPI app. This allows cross-origin requests from any domain.

python=23

Defining the Query Data Model

The Query class defines the data structure for incoming POST requests. This ensures that only queries with a valid question are accepted.

python=32

How it works:

  • Input validation: FastAPI will automatically validate that incoming POST requests contain a valid question field of type string
  • Data parsing: The incoming query is parsed into a Query object that can be used inside the endpoint

Defining the API endpoint

The API exposes a single POST endpoint (/query) where users send a query. This endpoint retrieves relevant content, constructs a context, and returns an answer from GPT-4.

python=35

Querying Meilisearch for relevant documents

The system queries Meilisearch using a hybrid search approach that combines semantic search (70%) with keyword search (30%). It also enforces a rankingScoreThreshold of 0.4, ensuring only high-quality results are included.

python=38

Constructing the context for GPT-4

Once Meilisearch returns the search results, the system processes them to create a structured context. The context preserves the hierarchical structure of the documents, ensuring that headings and subheadings are retained.

Context construction process

  • Extract Hierarchical Data: the system pulls hierarchical levels (hierarchy_lvl0, hierarchy_lvl1, etc.) from the search results.
  • Concatenate context: the headings and main content are combined to create a clear, readable context.
  • Separate Sections: each document's context is separated using "---" to improve clarity for GPT-4.
python=55

Generating a response with GPT-4

The assembled context is passed to GPT-4 along with the user's question. A precise prompt ensures responses are:

  • practical and implementation-focused
  • based on actual documentation
  • clear about limitations when information isn't available
python=74

Running the LLMChain with LangChain

  • Create LLMChain: this links GPT-4 to the formatted prompt.
  • Send input: the user query and context are sent to the LLM for processing.
  • Return response: the LLM's response is returned to the user.
python=86

Assembling the final API response

The final API response includes:

  • LLM-generated answer
  • Sources (URLs and hierarchy of the documents used)
python=95

Handling errors and exceptions

To avoid system crashes, all exceptions are caught and returned as an error response.

python=107

Running the application

Finally, you can run the API locally using Uvicorn. This command starts the FastAPI app on localhost:8000.

python=110

At this point, your RAG system is live, able to retrieve relevant context and generate precise answers using Meilisearch and GPT-4.

How to evaluate the performance of your RAG system

Ensuring high-quality content in RAG systems

Maintain high standards for your document base. Regularly audit and update your content to ensure accuracy and relevance. Remove duplicate or outdated information that might dilute search results. Establish a process for validating and updating information to maintain the knowledge base's integrity.

Monitoring performance to identify bottlenecks

Implement monitoring to track retrieval effectiveness. Watch for patterns in failed queries or consistently low-ranking results. Use this data to refine your document processing and search parameters. Monitor both technical metrics (like response times) and quality metrics (like relevancy scores) to ensure optimal performance. This can be easily done through the Meilisearch Cloud monitoring metrics and analytics dashboards.

Collecting user feedback

User feedback is one of the most valuable sources for improving the performance of your RAG system. While metrics like query latency or relevancy scores provide technical insight, user feedback reveals real-world problems.

By collecting and analyzing feedback, you can identify issues that are harder to detect with system metrics alone, such as:

  • False positives: When irrelevant results are returned for a query
  • Missed context: When the system fails to retrieve a document that users expected to see
  • Slow responses: When users experience slow loading times or incomplete responses

User feedback can guide you in fine-tuning your Meilisearch configuration. It might highlight the need to adjust sorting to prioritize more recent documents, raise the rankingScoreThreshold to filter out low-relevance results, optimize the documentTemplate to embed more relevant context, or chunk large documents into smaller, more targeted sections to improve retrieval accuracy.

Key takeaways: maximizing RAG performance with Meilisearch

Implementing RAG with Meilisearch provides several key advantages:

  • Flexibility: easily integrates with various data sources and LLMs.
  • Performance: delivers fast retrieval times and efficient resource usage.
  • Accuracy: combines keyword and semantic search for more precise results.
  • Scalability: handles large, growing knowledge bases with ease.

Meilisearch's robust features and high performance make it a strong foundation for production-ready RAG implementations. To get the most out of your system, focus on:

  • Data preparation and indexing: Ensure your knowledge base is clean, organized, and well-structured
  • Domain-specific fine-tuning: Adjust ranking rules, relevance thresholds, and embedding strategies for your unique context
  • Continuous evaluation: Use user feedback, system metrics, and LLM responses to optimize system performance
  • Knowledge base updates: Regularly review and update content to keep responses accurate and relevant

As Meilisearch and LLM technology continue to evolve, future advancements will bring even greater efficiency, accuracy, and flexibility to RAG systems – making them an increasingly valuable approach for AI-powered applications.

Carolina Ferreira

Carolina Ferreira

Developer Advocate @ Meilisearch

Carolina joined Meilisearch in 2020 as a Developer Advocate. With a background in translation and teaching, she discovered programming by chance and quickly became passionate about it. She has worked in DevRel and tech support and is now transitioning into a Solution Engineer role, enjoying the diverse challenges along the way. Outside of work, she loves staying active, music, cinema, traveling, and exploring new cuisines—one of her favorite parts of any trip.

Related articles