Scale without limits: introducing sharding & replication in Meilisearch Cloud

Share the article

Day 1 of Meilisearch Launch Week. New releases dropping every day - keep reading to the end to see what else is coming.

Every search deployment eventually hits a wall.

Maybe your index has grown to a size where a single instance starts struggling to keep up. Maybe you have just shipped a product to users across three continents and you are watching latency climb on one end. Or maybe your on-call team gets paged every time you push a Meilisearch update because there is no failover during the restart window.

These are real, painful problems. And up until now, solving them on Meilisearch Cloud meant contacting us and waiting.

Today, that changes. Sharding and replication are now available on Meilisearch Cloud -- letting your search infrastructure grow horizontally, stay available, and serve users from the nearest node, all configured and managed by our team so you do not have to handle the infra yourself.

Two problems, two solutions

Scaling search is not one problem - it is two distinct ones, and they need different answers.

Too much data is a sharding problem. When a single instance no longer has the capacity to index and serve your full dataset at the response times you need, you need to distribute the data across multiple nodes. Each node holds a slice (a "shard") of the full index, and queries fan out across all of them before results are merged.

Too much risk is a replication problem. When you cannot afford for search to go down - even for the ten seconds it takes Meilisearch to restart during an update - you need multiple nodes holding identical copies of your data. If one goes down, traffic shifts to another instantly.

Meilisearch Cloud now handles both.

Sharding: horizontal scaling for large datasets

Sharding distributes your data across multiple Cloud nodes. Each node holds a shard - a portion of the total index - and all nodes work in parallel to handle queries and indexing.

When should you shard? When your dataset has grown beyond what a single instance can handle at acceptable performance. Not sure where that line is? We help you figure it out. Running scale tests together and determining the right shard count before configuring your cluster is part of how we work with teams today.

How queries work: when a search request comes in, it fans out to all shard nodes. Each node searches its slice, returns results, and the leader merges and ranks them before responding. The latency impact is minimal; the parallelism is the point.

An important nuance on migrations: sharding does not require a full data migration. When a remote is added, a partial migration is involved - but you are not starting from scratch. This is what allows you to address bigger datasets as you grow, without a forced reindex of everything.

One thing sharding does not solve: availability. If one shard node goes down, you lose access to part of your data. For that, you need replication - or a combination of both.

Replication: high read availability and geo-distribution

Replication keeps multiple nodes in sync with the same complete dataset. When one goes down, the others keep serving reads seamlessly.

Architecture: Meilisearch Cloud uses a static leader-based model. Configuration changes go to the leader node, which proxies them automatically to all remotes. This keeps the cluster consistent and predictable.

A note on scope: replication today covers high read availability. Write high availability - eliminating the leader as a single point of failure for writes - is not included in this release.

Three use cases replication covers:

1. High read availability - failover during updates

When Meilisearch restarts during an upgrade, the window is short (typically around ten seconds), but it is enough to interrupt user-facing search. With replication, traffic shifts to another node automatically. Users keep getting results throughout.

2. Geo-replication - serve users from the nearest instance

Distribute nodes across regions and route each user to the one that will respond fastest. This is live for replication-only setups today. Optimal geo-routing with sharding combined is on the roadmap.

3. The combination - scale and availability together

Sharding and replication are fully composable. If your dataset is too large for a single instance and you need high read availability, you can run both simultaneously.

Composability: running both at once

One of the design principles behind this release is that sharding and replication are not mutually exclusive modes -- they are composable building blocks.

Here is what that looks like in practice:

Sharding only: split a large dataset across multiple nodes, each holding a fraction of the total index. Scale horizontal query and indexing throughput.
Replication only: two or three nodes all holding the same data. High read availability, automatic failover, geo-routing.
Sharding and replication: 5 shards x 2 replicas = 10 nodes. Every shard is replicated. You get scale and availability.

Billing follows the same logic: you pay for the number of nodes you run. Two replicas means twice the machines, roughly twice the cost - the same model you would see from most cloud infrastructure providers.

What this looks like in practice

One enterprise team we worked with recently had reached a point where their index had grown past 100 million documents. A single instance was no longer meeting their performance targets.

Together, we ran a scale test to identify where a single node saturated. Based on those results, we determined the right number of shards - more than our initial estimate, once we saw the actual load profile. We configured the cluster, validated performance, and they were live.

For monitoring, they set up webhooks to track indexing task success across all nodes. When a task failed on one instance, their system detected it and knew to requeue. A single webhook server receives events from every node, giving a natural aggregate view of the whole cluster's health without anything custom.

This kind of setup -- 100M+ documents distributed across multiple shards, monitored via webhooks -- is exactly what our team configures for enterprise customers today. If your needs look similar, that is a good reason to get in touch.

Who is this for?

You need sharding if:

Your dataset is large enough that a single instance is noticeably slow, or you are approaching index size limits
You need more indexing throughput than one node can provide
You are planning for data growth and want headroom without a forced migration later

You need replication if:

You have an uptime requirement that cannot tolerate even brief search interruptions during updates
Your users are distributed geographically and latency varies noticeably by region
You want a reliable failover safety net for production

You need both if:

You have a large dataset and a high read availability requirement
You are running an enterprise-grade, always-on search deployment

If you are not sure - starting with a single Cloud instance is completely fine. When the time comes, we will help you figure out the right architecture and handle the setup for you.

What is coming next

This release is the foundation. Here is what is already in progress:

True optimal geo-replication - combining sharding and replication for globally distributed routing, so each user always hits the nearest node regardless of how the data is sharded
Self-serve replication UI - configure replication directly from the Cloud dashboard, no need to contact us
Improved cluster observability - an aggregated task view across all nodes in your cluster, so you can monitor the health of your sharded or replicated setup without tracking each remote individually

Part of something bigger

This is Day 1 of Meilisearch Launch Week. Every day this week, we are shipping something new - from features that change how you configure and monitor Meilisearch Cloud, to improvements deeper in the engine. Stay tuned.

Ready to scale?

Sharding and replication on Meilisearch Cloud are available today as an enterprise add-on, configured by our team based on your specific setup and scale requirements. If your team is hitting data volume or availability limits, get in touch and we will help you figure out the right architecture.

→ Get in touch

Have questions about how sharding or replication works under the hood? Drop by our Discord or check out the documentation.