Skip to main content
Choosing an embedding model is not just about quality. Cost, indexing speed, search latency, dimensions, and domain specialization all matter. In most cases, a smaller, cheaper model will serve you better than the largest available option.

Available providers

Meilisearch supports a wide range of embedding providers, each with different models, pricing, and strengths:
ProviderModelsStrengthsGuide
OpenAItext-embedding-3-small, text-embedding-3-largeStraightforward setup, good general qualityGuide
Cohereembed-v4.0, embed-english-v3.0, embed-multilingual-v3.0Latest v4 supports text and images, strong multilingualGuide
Voyage AIvoyage-4, voyage-4-lite, voyage-4-largeHigh quality, flexible dimensions, domain-specific modelsGuide
Jinajina-embeddings-v4, jina-embeddings-v5-text-small/nanov4 supports text, images, and PDFs, 32K contextGuide
Mistralmistral-embedGood for existing Mistral usersGuide
Google Geminigemini-embedding-001High dimensions (3072), Google ecosystemGuide
Cloudflarebge-small/base/large, embeddinggemma, qwen3Edge network, low latency, free tierGuide
AWS BedrockTitan v2, Nova, Cohere Embed v4 on BedrockAWS ecosystem, multimodal optionsGuide
HuggingFace (local)Any compatible modelNo API costs, full controlGuide
HuggingFace Inferencebge, MiniLM, mpnet, multilingual-e5, and moreScalable open-source models, hundreds availableGuide

Smaller models are often better

Bigger is not always better. In a hybrid search setup, Meilisearch combines keyword results with semantic results using its smart scoring system. Full-text search already handles exact matches very well, so the semantic side only needs to capture general meaning, not every nuance. This means a small, fast embedding model is often enough. The quality difference between a 384-dimension model and a 3072-dimension model is rarely worth the extra cost and latency, especially when the keyword side is already covering precise queries. Prioritize cheaper, faster models unless you have a specific reason to need more dimensions or higher embedding quality. Models like text-embedding-3-small, voyage-4-lite, jina-embeddings-v5-text-nano, or embed-english-light-v3.0 are excellent starting points.

What to look for

Cost and rate limits

Embedding providers charge per token or per request. For large datasets, embedding costs add up during indexing. Consider:
  • Free tiers: Cloudflare Workers AI and local HuggingFace models have no per-request cost
  • Rate limits: free-tier accounts on paid providers may slow down indexing significantly. Meilisearch handles retries automatically, but higher tiers index faster
  • Re-indexing: Meilisearch caches embeddings and only re-generates them when document content changes, reducing ongoing costs

Dimensions

Lower-dimension models are faster to index, use less memory, and produce faster searches. Higher dimensions can capture more semantic nuance but with diminishing returns.
DimensionsTrade-off
384Fast, low memory, good for most use cases
768-1024Balanced quality and performance
1536-3072Higher quality, slower, more memory

Domain specialization

Some providers offer models specialized for specific domains:
  • Legal, medical, financial: check if your provider has domain-specific models or fine-tuned variants
  • Multilingual: if your content is not in English, choose a model with explicit multilingual support (Cohere’s multilingual models, Jina v3/v5, or multilingual BGE models)
  • Code: some models are optimized for code search

Indexing speed

Embedding generation is the main bottleneck during indexing. Two factors affect speed:
  • API latency: cloud providers add network round-trip time per batch. Providers with edge networks (Cloudflare) or regional endpoints (Bedrock) can be faster
  • Model size: larger models take longer to compute embeddings, even on the provider side

Maximize performance with composite embedders

If you need the best possible indexing speed and search latency, consider using a composite embedder. This lets you use different models for indexing and search:
  • Indexing: use a cloud provider (Cloudflare Workers AI, HuggingFace Inference Endpoints, or any REST API) to generate high-quality embeddings at scale without impacting your Meilisearch server
  • Search: use a local HuggingFace model (like BAAI/bge-small-en-v1.5) running inside Meilisearch for near-instant query embedding with zero API latency
This combination gives you the throughput of a cloud API for indexing with the speed of a local model for search. Both models must produce embeddings with the same number of dimensions.

User-provided embeddings

If you work with non-textual content (images, audio) or already generate embeddings in your pipeline, you can supply pre-computed vectors directly. See search with user-provided embeddings.

Decision flowchart