Meilisearch v1.14 is here ✨ Read more on our blog

Go to homeMeilisearch's logo
Back to articles
24 Apr 2025

What is full-text search and how does it work?

See what full-text search is, the benefits, the different types, and many use cases. Discover how these search engines actually work.

Carolina Ferreira
Carolina FerreiraDeveloper Advocate @ Meilisearch@CarolainFG
What is full-text search and how does it work?

Full-text search offers instant, accurate results from mountains of text.

It’s a powerful tool for querying text-based information, with fast retrieval and high relevance. And it also improves user experience.

Commonly used in e-commerce, content management systems, and legal research, Meilisearch creates an index by tokenizing text and storing it efficiently using LMDB.

It ranks results using a configurable relevancy algorithm based on typo tolerance, proximity, and term matching rules.

Curious how it works and why it matters? This article explains it.

What is full-text search?

Full-text search is a powerful search technique that searches for the user query in the entirety of a text document or dataset. It’s like a super-smart librarian who can instantly scan every page of every book in a library and hand you exactly what you need.

Unlike specific keyword-matching algorithms, full-text search parses entire documents—singular or (if from within a database) plural—and finds all instances of relevance that match the user’s search.

It peers into documents and sections of documents that traditionally may not be searched, including product descriptions, bibliographies, and supplementary material. Once parsed, it indexes and stores every word from each section in a catalog, making it available for search.

What are the benefits of full-text search?

Full-text search offers distinct advantages for both users and systems. For instance, it

  1. Provides high-speed query performance by retrieving results from large datasets in milliseconds. This is supplemented by the fact that it analyzes all the words in a set of documents and indexes them for faster information retrieval.
  2. Enhances user accessibility by accommodating natural language queries, typographical errors, and synonyms. Thus, it offers an intuitive and user-friendly solution for user queries.
  3. Improves result relevance and precision over traditional search methods. It does this through ranking algorithms that prioritize matches based on contextual significance—like in neural search—and semantic similarity rather than mere word occurrence.

These benefits make it invaluable across applications, from e-commerce to document management.

What are the different types of full-text search?

Full-text search encompasses multiple approaches, each tailored to specific needs. Below is an overview of the primary types:

image9.png

Basic SearchSearches for exact word matches, such as “apple,” across all documents or docs. Then, it returns all the documents containing the word “apple.” It is efficient and saves on computational power but lacks precision for complex queries.
Wildcard SearchEmploys symbols (e.g., “appl*”) to match variations like “apply” or “application.” While it is suitable for incomplete inputs, it may introduce irrelevance, e.g., returning “apple” when variations of “apply” are required.
Fuzzy SearchTolerates errors (e.g., “appel” matches “apple”) and closely similar renditions (e.g., “apple” and “apples”) using similarity algorithms. It is ideal for correcting misspellings or catching results with slight variations, such as those between US and UK English.
Phrase SearchThis method requires exact sequences (e.g., “red apple” in quotes) to ensure precise contextual matches. It is useful when searching for an exact string of words in a particular sequence.
Boolean SearchCombines terms with operators (e.g., “apple **AND **orange **NOT **banana”) for refined control over results. It combines multiple fields of search and narrows down the field of search for more comprehensive yet specific results.
Proximity SearchSpecifies word distance (e.g., “apple orange” ~5, “apple NEAR orange”) to capture contextual relationships within a defined range. This is especially useful when specific words are expected to appear close together, which allows for the most relevant information to be extracted from the document.
Faceted SearchEnables filtering from different facets of a topic (e.g., “fruits” refined by “yellow” or “citrus”) for fine control over search results. It is widely used in structured datasets where the query has specific attributes for which the user might be looking.
Range SearchThis search targets numerical (e.g., price:4-20), alphabetical (e.g., size:S-M), or temporal ranges (e.g., date:04/02/24-10/02/25) to retrieve results within user-specified limits. Thus, it is particularly effective for quantitative filtering.

Each method, often integrated within tools like Meilisearch, addresses diverse yet unique search requirements.

Where does this power show up in the real world? Let’s examine some practical applications of these capabilities.

What are the different use cases of full-text search?

Full-text search supports a wide range of industries, functions, and applications.

E-commerce searchFacilitates product discovery by indexing descriptions, specifications, and reviews.For precise queries like “waterproof hiking boots
Document management systemsEnhances retrieval from extensive collections of PDFs, Word files, and other text formats.For access to critical information like a specific news report.
Customer support & help centersAllows quick problem fixes by indexing FAQs, articles, and troubleshooting guides.For locating solutions using queries such as “fix login error.”
Medical & legal researchExpedites analysis of case law, medical records, and research papers within vast data volumes.For getting precedent on a traffic accident case.
Social media & content platformsImproves content indexing for news websites, blog pages, and video platforms using text, captions, and metadata.For better discovery of trends, e.g., dalgona coffee.

These applications show how full-text search is the backbone of efficient, versatile information retrieval.

But how does it go from raw data to searchable results? Let’s examine the indexing functionality of full-text search.

How does full-text search work?

In modern search engines, full-text search is a complicated multi-step procedure. Its main goals are relevance and speed, and here we’ll give a quick breakdown of how it all works.

In many search systems, the first step is to crawl or ingest the content meant to be searched. This can be achieved with bots, direct upload of files, or through APIs and databases. The exact combination of methods will depend on the use case. For instance, Google relies on crawlers, while a company's internal search engine mainly uses databases or ingests files directly.

After this, the text needs to be processed and normalized. Techniques like tokenization (more on this later), lowercasing, stop word removal, and stemming ensure the text is clean and standardized, so searching won’t miss relevant results due to trivial differences in the text, such as words being uppercase or lowercase.

Then comes the inverted index, which you can think of as the heart of the whole process. It’s a data structure that maps terms to the documents in which they appear.

“coffee” → [Doc1, Doc3, Doc5] 

“beans” → [Doc3, Doc4]

The inverted index allows for a fast lookup of which documents contain a given search term without scanning each document's full contents.

In addition to the terms, the indexes can also store term positions (useful for phrase searching), frequency data (how often the terms appear in a document), and metadata such as titles, tags, and others.

Now, we’re at the stage when the user begins searching. Query processing requires the system to tokenize and normalize the user query in the same way as the inverted index does. User intent enhancements, such as synonym matching, can also be added to this stage.

Finally, once the matches are found, comes the scoring and ranking. Since not all matches are equally relevant, the search engine uses algorithms to rank and sort the results by relevance.

TF-IDF (term frequency - inverse document frequency) and BM25 (inspired by TF-IDF but uses probabilistic models) are the “classic” ways to sort results. Modern techniques include vector-based search (semantic search,) which converts queries and documents into numerical vectors to match based on meaning rather than keywords.

Now that we roughly understand how full-text search works, we’ll explore the stages and technologies in depth, using our own Meilisearch as an example.

A deep dive into the process of full-text search with Meilisearch

Indexing transforms raw data into a searchable format, a critical step in full-text search. Meilisearch optimizes this process through a structured approach, and it’s worth understanding how.

We’ll take you through how Meiliearch tackles these processes step by step.

A performant storage engine

Meilisearch employs a custom-built storage engine based on records called “documents” grouped into collections called "indexes," which are designed for efficiency and scalability.

Behind the scenes, Meiliearch uses the LMDB (Lightning Memory-Mapped Database) key-value store, which handles datasets ranging from small collections to millions of records and stores them as a collection of key-value pairs. Such a storage configuration maintains low memory usage, quick access times, and high performance for an optimal user experience.

For instance, LMDB avoids synchronization-related issues by permitting only a single writing process at a time. This allows it to provide users with fast access to up-to-date, consistent data.

However, Meilisearch preprocesses data via tokenization before the key-value store comes into the picture.

From words to tokens

Raw text, such as a product description, doesn’t just get dumped into the database. It’s first segmented into tokens: small searchable units—the first process in tokenization.

For this purpose, Meilisearch uses (and maintains) the open-source tokenizer named Charabia. Charabia allows users to configure the fields they want to be made searchable and subsequently tokenized.

The second step in tokenization is normalization, or, more simply, rule-based organization. Since each language has its own peculiarities, this is a highly language-specific task in which words may be made lowercase and diacritical marks, such as accents, removed.

In the following example, “Le café de Nicolas” is segmented into “le,” “cafe,” “de,” and “nicolas”. Meilisearch also strips out noise (stop words like “the”) and normalizes terms (like going from “Le café de Nicolas” to “le cafe de nicolas”) to make them easier to classify.

image1.png

Together, segmentation and normalization make searches smarter, faster, and less literal by organizing tokens with the appropriate data structure.

Storing tokens

Once tokenized, those individual tokens need a home. Modern full-text search engines like Meilisearch lean on clever data structures for efficient storage and rapid retrieval.

Each feature, like prefix search, typo tolerance, and geo search, has benefits and drawbacks. Therefore, our team gave meticulous attention to and selected the ones best suited for our search engine without compromising on speed.

Let us now peek at the data structures powering Meilisearch.

Inverted index

Inverted indexes are the core trick. An inverted index maps tokens to the documents they’re in alongside their positions in said documents. See the image below for an example of how it maps the words “alice,” “hello,” and “word” to respective documents.

image6.png

It’s fast because, as the name suggests, it flips the usual document-to-word lookup. Since it stores words once and associates them with the documents, they appear in an inverted index, and it does not browse every document to find a searched word.

Meilisearch creates almost 20 inverted indexes per document index, making it one of the most frequently occurring data structures. However, to provide a search-as-you-type experience, the engine needs to do a lot of preprocessing and define search schema during indexing, including word prefixes, filterable attributes, etc.

Roaring bitmap

Meilisearch uses roaring bitmaps to compress lists of document IDs associated with each token. They’re tiny in memory but quick to query, perfect for scaling to large datasets.

Moreover, they store large sets of integers and perform set operations like union, intersection, and difference. These operations help refine search results by selecting documents based on their inter-relationships.

Finite-state transducer

Finite-state transducers (FSTs) store token prefixes and variations in a compact, string-dependent manner. They represent a sequence of states with strings arranged in ascending lexicographic (alphabetical or numerical) order. Because of their compactness, they’re a smaller and faster alternative to inverted indexes.

An FST is sometimes known as a word dictionary because it contains all indexed words in a dataset. Meilisearch utilizes two main FSTs: one for storing all words of the dataset and the other for storing the most recurrent prefixes.

Meilisearch’s dependence on FSTs allows it to support compression and lazy decompression techniques while also handling wildcards or autocompletes with minimal overhead. This includes retrieving word subsets that match specific syntax rules or patterns, such as prefixes, by using regular expression-like automata.

R-tree

R-trees manage spatial or range-based data such as coordinates or numerical values, powering Meilisearch’s geo-search feature.

To optimize full-text search queries like “restaurants within 5 miles,” R-trees associate geographical coordinates with relevant document identifiers. This allows users to find nearby points within a specific area or points that intersect with other spatial objects.

Together, these components make indexing fast and flexible. But what happens when you hit “search”? The following section addresses exactly how search terms are processed at search time.

Search time: query processing

Indexing sets the stage; query processing steals the show. When you type a query, Meilisearch doesn’t just throw back random matches; it’s deliberate and precise.

Modern search experiences only require you to start typing to get results. To achieve such a search-as-you-type experience, Meilisearch precomputes a list of the most frequent prefixes to produce them without a moment’s delay.

For typo tolerance, Meilisearch uses FSTs together with the Levenshtein algorithm. This algorithm calculates the Levenshtein distance or the “cost” of transforming one string into another. In other words, it quantifies the number of transformations required for a word to be converted into another word.

For example, transformations could be in the form of:

  • insertions, e.g., hat -> chat
  • deletions, e.g., tiger -> tier
  • substitutions, e.g., cat -> hat
  • transpositions or swaps, e.g., scared -> sacred

FSTs generate all possible variations of a word within a user-specified edit distance. Thus, they enable the search engine to calculate the Levenshtein distance accurately and detect typos by comparing user input against a dictionary of “valid” words.

When processing a search request, considerations like whether the user has finished typing or whether the query has any typos pop up.

Query graph

Each time it receives a search query, Meilisearch parses it into a graph structure that outlines terms and their relationships. This structure lets Meilisearch plan the fastest way to fetch results.

For instance, the query “the sun flower” is split into “the,” “sun,” and “flower,” with logical connections (e.g., AND) guiding the search path. Additionally, it may be transformed via:

  • Concatenation: the sunflower
  • Substitution: the sun flowed
  • Addition: the sun flowers

A more complex query, such as “the sun flower is facing the su,” would be dealt with in a more extended manner (graphic courtesy of our D2-powered internal debugging tool):

image7.png

As illustrated above, the graph represents different variations of the search query. The engine precomputes the word variations (and their Levenshtein distances) for each term in the query. Moreover, it determines whether the last term in the query is a prefix, i.e., not followed by a space, in order to summon the prefix database.

Now that you have a query graph, what do you do with it?

At the filtering stage, Meilisearch narrows potential results to the document IDs generated during the indexing process that fulfill the filter criteria.

Next, it uses the query terms and their query graph variations and searches for matching words in the FSTs. If the word is considered a prefix, it will also look it up in the prefix FST. It searches for them in the **inverted index **to retrieve the corresponding document IDs.

Finally, the engine performs an intersection to identify the documents that contain the words in the query graph and meet the filter criteria.

Let's take an example to better understand query processing. Suppose you have a dataset of songs where a user searches for “John Lennon.” The user wants to retrieve only John Lennon songs released between 1957 and 1975.

First, Meilisearch retrieves the document IDs of songs within that time frame. After ensuring that the words in the query graph exist in the FST, Meilisearch retrieves the document IDs that contain either John, Lennon, or both. It also retrieves possible variations, but we are leaving them out for simplicity.

image3.png

Finally, only the overlap (subset) of the two sets of document IDs is considered. This means only document IDs that appear in both sets are kept. In other words, Meilisearch retains the document IDs of songs released between 1957 and 1975 that contain either John, Lennon, or both.

But what happens when a whole list of documents matches the search query? How does the engine decide which is more relevant and, thus, the first search result?

That’s where relevancy calculations come into play.

Relevancy

Not all matches are equal. Word variations, such as John Lebon, may also turn up. That’s why Meilisearch ranks search results using factors like term frequency (how often “John” appears), proximity (are “John” and “Lennon” close?), and field weight (prioritizing titles over body text).

It’s tuned to feel intuitive so that the best stuff lands at the top. This combo of smart parsing and ranking makes searches snappy and spot-on.

Meilisearch sorts documents in the search results using bucket sort. This algorithm allows the ranking of documents based on a set of rules. By default, Meilisearch prioritizes rules in the following order:

  1. Number of matched words: documents containing all query terms are ranked first
  2. Number of typos: documents matching query terms with fewer typos are ranked first
  3. Proximity between matched query terms: documents where query terms occur close together and in the same order as the query string are ranked first
  4. Presence and position of query terms at attributes: documents containing query terms in more important attributes and at the beginning of attributes are ranked first
  5. User-defined parameters: documents satisfying user criteria set at the query time are ranked first
  6. Keyword matching: documents with a higher number of words matched with the query are ranked first

Meilisearch applies these rules sequentially, sorting results step by step. If two documents are tied after applying one rule, it uses the next rule to break the tie.

Note that these rules are fully customizable, meaning you can add, delete, and reorder them as needed. Read more in the relevancy documentation.

By default, Meilisearch returns up to 1000 documents per search. However, it prioritizes delivering the most relevant results rather than all matching results. In this way, Meilisearch prioritizes efficiency and precision over exhaustive results to ensure an optimized search experience.

Frequently Asked Questions (FAQs)

How does full-text search compare to keyword-based search?

Full-text search scans entire documents for matches and understands semantic context and relevance. On the other hand, traditional keyword-based search just looks for exact terms in specific fields, leaving out any other instance of relevance. Thus, full-text search offers greater depth, adaptability, and finesse, making it perfect for natural language queries.

What are the disadvantages of full-text search?

Full-text search requires significant storage and computational resources for indexing, making it resource-heavy. Without optimization, it may process complex queries slowly or return less relevant, sometimes even spurious, results.

What are some popular full-text search engines?

Meilisearch is a popular option for fast, user-friendly search functionality. Others include Elasticsearch (powerful but complex), Solr (enterprise-focused), and Algolia (optimized for ease of use).

How does full-text search compare to vector search?

Full-text search focuses on textual matches and keyword relevance in text-based datasets, while vector search uses machine learning to identify semantic similarities. While they’re both powerful in their own right, they serve distinct yet complementary purposes.

What are some alternatives to full-text search?

One alternative to full-text search is a vector search that targets meaning-based retrieval. Other options include SQL queries for structured data types and regular expressions to handle small-scale pattern matching. Each method is for specific use cases, so what might fit one application might not fit another.

Delivering the best results with full-text search

Full-text search is a robust searching mechanism for accessing and managing vast textual data. Its speed, flexibility, and relevance support diverse applications from e-commerce to research.

Meilisearch takes it further through efficient indexing and sophisticated query processing. It’s not just about finding stuff; it’s about finding the right stuff, fast. Structures like inverted indexes and relevancy algorithms help whether you’re tokenizing text or ranking results.

With the speed at which AI-powered search is advancing, the possibilities of understanding queries and documents are endless, starting with semantic search.

Transform your search experience now

Meilisearch, an open-source search solution, delivers rapid, accurate full-text search capabilities with features like fuzzy matching and faceted filtering. Built with end users and developers in mind, it provides smooth experiences across the board and offers advanced search modalities like hybrid and federated search.

Why you shouldn't use vector databases for RAG

Why you shouldn't use vector databases for RAG

The contrarian take on building better retrieval augmented generation systems.

Thomas Payet
Thomas Payet30 Apr 2025
Internal search: the key to better user experience and SEO

Internal search: the key to better user experience and SEO

Discover how internal search boosts user experience, drives conversions, and reveals valuable insights to optimize your website and business strategy.

Ilia Markov
Ilia Markov29 Apr 2025
What is federated search: Complete guide [2025]

What is federated search: Complete guide [2025]

Discover what federated search is all about and how it works. See the different types, the benefits, challenges, tools, and more.

Laurent Cazanove
Laurent Cazanove24 Apr 2025