Horizontal scaling with sharding

Meilisearch 1.19 EE introduces experimental sharding support to enable horizontal scaling.

26 Aug 20254 min read

Laurent CazanoveDeveloper Experience Engineer@StriftCodes

Share the article

In this article

How Meilisearch sharding works Distribution algorithm Important: Enterprise Edition and licensing How to set up sharding Prerequisites Step 1: Configure the network on each node Step 2: Add documents as usual Step 3: Monitor the distributed task Step 4: Search your distributed index Moving forward

Meilisearch uses sharding to distribute a single index across multiple server instances, thereby going beyond the limits of a single machine. This enables you to handle massive datasets and high-traffic loads by scaling your search infrastructure horizontally.

This guide will walk you through what sharding is, how it benefits you, and how to implement it step-by-step.

How Meilisearch sharding works

Meilisearch sharding is designed for simplicity and robustness. Instead of you manually splitting data, Meilisearch handles the distribution automatically.

When you send a batch of documents to any node in your sharded cluster, the receiving node instantly forwards the entire payload to every other node. Each node then independently decides which documents it's responsible for and only indexes its assigned subset.

This design was chosen for its reliability. It avoids a single point of failure for data distribution and simplifies the logic, ensuring that every node has the opportunity to process the data without complex coordination.

This also means you can send your indexing tasks to any node, making it easy to load balance your writes.

Distribution algorithm

To ensure a document is always stored on the same shard, Meilisearch uses a consistent hashing algorithm (Rendezvous Hashing using twox-hash) on each document's primary key.

For this algorithm to work, you must set a primaryKey for your index before adding documents. The primaryKey is the unique identifier used as the input for the hash function. Without it, Meilisearch cannot guarantee that a specific document will consistently land on the same shard, which would lead to data duplication and inconsistencies.

Important: Enterprise Edition and licensing

Sharding is an advanced feature available exclusively in Meilisearch Enterprise Edition (EE). All Meilisearch Cloud users get access to Enterprise Edition features.

A note on self-hosting: The EE features are governed by the Business Source License 1.1, which allows you to use, test, and develop with sharding for free in non-production environments. A commercial license agreement with Meilisearch is required to use sharding in a production environment.

To get started with Meilisearch Enterprise Edition, get in touch.

How to set up sharding

Prerequisites

Meilisearch Enterprise Edition: Ensure you are running a version of Meilisearch that includes this feature (v1.19+).
Identical Setup: All Meilisearch instances in your cluster must run the exact same version and share the same MEILI_MASTER_KEY.
Primary Key: The index you intend to shard must have a primaryKey configured before you begin adding documents.

As of Meilisearch 1.19, network and sharding are experimental features. See how to enable experimental features.

Step 1: Configure the network on each node

The sharding feature is enabled and configured by using the network API. First, you must update the network of all nodes in your cluster to enable sharding.

On all nodes:

{
  "sharding": true
}

Then, you need to configure each node to make it aware of the network topology: give it a name and list the other nodes it will communicate with. For a 3-node cluster, update the network as follows:

On node-1:

{
  "self": "node-1",
  "remotes": {
    "node-2": { 
      "url": "http://node-2-address:7700", 
      "searchApiKey": "your-search-api-key",
      "writeApiKey": "your-write-api-key"
    },
    "node-3": { 
      "url": "http://node-3-address:7700", 
      "searchApiKey": "your-search-api-key",
      "writeApiKey": "your-write-api-key"
    }
  }
}

(Repeat this for node-2 and node-3, updating the self value and remotes accordingly.)

self: A unique name for the current node.
remotes: A map of all other nodes in the network.
writeApiKey: This API key must have the documents.add permission and is used to securely authorize the forwarding of documents between nodes.
searchApiKey: This API key must have permission to search in the relevant indexes.

Step 2: Add documents as usual

You add documents the same way you always have, by sending a request to /indexes/{index_uid}/documents. You can send this request to any node in the cluster. Meilisearch handles the rest under the hood.

curl -X POST 'http://node-1:7700/indexes/movies/documents' 
  -H 'Content-Type: application/json' 
  -H 'Authorization: Bearer YOUR_API_KEY' 
  --data-binary @movies.json

Step 3: Monitor the distributed task

The task response has been enhanced to give you visibility into the distributed operation. The task object now includes a root-level network field.

When you query the task on the original node, you will see the status of the operation on the other shards:

{
  "uid": 123,
  "status": "processing",
  // ... other task fields
  "network": {
    "remoteTasks": {
      "node-2": { "taskUid": 56 },
      "node-3": { "taskUid": 89 }
    }
  }
}

Step 4: Search your distributed index

To search against a distributed index, you need to use federated search to make queries against multiple indexes and consolidate results in a single response.

In practice, this can be done by making a multi-search request against multiple nodes. The request parameters need to be similar, except for the network configuration. In our example, the multi search request would look like this:

{ 
  // enables federated search
  "federation": {},
  "queries": [ 
    { 
      "indexUid": "movies", 
      "q": "star wars",
      "federationOptions": { 
        "remote": "node-1" 
      }
    }, 
    { 
      "indexUid": "movies", 
      "q": "star wars",
      "federationOptions": { 
        "remote": "node-2" 
      }
    }, 
    { 
      "indexUid": "movies", 
      "q": "star wars",
      "federationOptions": { 
        "remote": "node-3" 
      }
    }
  ] 
}

Moving forward

Sharding is a powerful feature that enables horizontal scaling and high availability for your search infrastructure. By distributing your index across multiple nodes, you can handle larger datasets and higher query loads with ease.

We’re looking forward to seeing sharding evolve thanks to early adopters’ feedback. Our roadmap includes supporting dynamic network topology changes, automatic settings propagation, and improving multi-search on sharded nodes.

Meilisearch Cloud users can get started by asking the support to enable sharding for their projects. For companies interested in self-hosting Meilisearch Enterprise Edition, get in touch with our sales team.

Get started with Meilisearch Cloud with a 14-day free trial.

Start free

Engineering Using Meilisearch

Meilisearch indexes embeddings 7x faster with binary quantization

By implementing binary quantization with the vector store Arroy, significant reductions in disk space usage and indexing time for large embeddings have been achieved while maintaining search relevance and efficiency.

Tamo29 Nov 2024

Using Meilisearch AI

How to add AI-powered search to a React app

Build a React movie search and recommendation app with Meilisearch's AI-powered search.

Carolina Ferreira24 Sept 2024

Engineering Using Meilisearch

Meilisearch is too slow

In this blog post, we explore the enhancements needed for Meilisearch's document indexer. We'll discuss the current indexing engine, its drawbacks, and new techniques to optimize performance.

Clément Renault20 Aug 2024