Optimize indexing performance by analyzing batch statistics
Indexing performance can vary significantly depending on your dataset, index settings, and hardware. The batch object provides information about the progress of asynchronous indexing operations. TheprogressTrace field within the batch object offers a detailed breakdown of where time is spent during the indexing process. Use this data to identify bottlenecks and improve indexing speed.
Understanding the progressTrace
progressTrace is a hierarchical trace showing each phase of indexing and how long it took.
Each entry follows the structure:
- The step occurred during indexing.
- The subtask was extracting word proximity.
- It took 33.71 seconds.
Key phases and how to optimize them
computing document changesand extracting documents
| Description | Optimization |
|---|---|
| Meilisearch compares incoming documents to existing ones. | No direct optimization possible. Process duration scales with the number and size of incoming documents. |
extracting facets and merging facet caches
| Description | Optimization |
|---|---|
| Extracts and merges filterable attributes. | Keep the number of filterable attributes to a minimum. |
extracting words and merging word caches
| Description | Optimization |
|---|---|
| Tokenizes text and builds the inverted index. | Ensure the searchable attributes list only includes the fields you want to be checked for query word matches. |
extracting word proximity and merging word proximity
| Description | Optimization |
|---|---|
| Builds data structures for phrase and attribute ranking. | Lower the precision of this operation by setting proximity precision to byAttribute |
waiting for database writes
| Description | Optimization |
|---|---|
| Time spent writing data to disk. | No direct optimization possible. Either the disk is too slow or you are writing too much data in a single operation. Avoid HDDs (Hard Disk Drives) |
waiting for extractors
| Description | Optimization |
|---|---|
| Time spent waiting for CPU-bound extraction. | No direct optimization possible. Indicates a CPU bottleneck. Use more cores or scale horizontally with sharding. |
post processing facets > strings bulk / numbers bulk
| Description | Optimization |
|---|---|
| Processes equality or comparison filters. | - Disable unused filter features, such as comparison operators on string values. - Reduce the number of sortable attributes. |
post processing facets > facet search
| Description | Optimization |
|---|---|
| Builds structures for the facet search API. | If you don’t use the facet search API, disable it. |
Embeddings
| Trace key | Description | Optimization |
|---|---|---|
writing embeddings to database | Time spent saving vector embeddings. | Use embedding vectors with fewer dimensions. - Disabling embedding regeneration on document update. - Consider enabling binary quantization. |
post processing words > word prefix *
| Description | Optimization | |
|---|---|---|
| Builds prefix data for autocomplete. Allows matching documents that begin with a specific query term, instead of only exact matches. | Disable prefix search (prefixSearch: disabled). This can severely impact search result relevancy. |
post processing words > word fst
| Description | Optimization |
|---|---|
| Builds the word FST (finite state transducer). | No direct action possible, as FST size reflect the number of different words in the database. Using documents with fewer searchable words may improve operation speed. |