Vector search
Vector search is an experimental technology that uses Large Language Models to retrieve search results based on the meaning and context of a query.
This feature can improve search relevancy for queries that do not to match keywords in your dataset, allow your users to search images and other non-textual media, suggest related products in webshops, and create conversational search interfaces.
Vector search is available to all users. Meilisearch Cloud is the recommended way of using vector search.
Using vector search
Activate vector search
If using Meilisearch Cloud, navigate to your project overview and find "Experimental features", then check the "vector store" box.
Alternatively, use the /experimental-features
route to activate vector search during runtime:
curl \
-X PATCH 'http://localhost:7700/experimental-features/' \
-H 'Content-Type: application/json' \
--data-binary '{
"vectorStore": true
}'
Meilisearch Cloud AI-powered search waitlist
To ensure proper scaling of Meilisearch Cloud's latest AI-powered search offering, you must enter the waitlist before activating vector search. You will not be able to activate vector search in the Cloud interface or via the /experimental-features
route until your sign up has been approved.
Generate vector embeddings
To use vector search, first configure the embedders
index setting. You may configure multiple embedders for an index.
Embedders generate vector data from your documents. Meilisearch natively supports the following embedders:
- OpenAI
- Hugging Face
- Ollama
- any embedder with a REST interface
It is also possible to supply custom embeddings. In this case, you must generate the embeddings manually and include them as a field in your documents.
Generate auto-embeddings
Use the embedders
index setting of the update /settings
endpoint to configure one or more embedders for an index:
curl \
-X PATCH 'http://localhost:7700/indexes/movies/settings' \
-H 'Content-Type: application/json' \
--data-binary '{
"embedders": {
"default": {
"source": "openAi",
"apiKey": "anOpenAiApiKey",
"model": "text-embedding-3-small",
"documentTemplate": "A movie titled {{doc.title}} whose description starts with {{doc.overview|truncatewords: 20}}",
"dimensions": 1536
}
}
}'
It is mandatory to pass an OpenAI API key through the OPENAI_API_KEY
environment variable or the apiKey
field when using an OpenAI embedder. Generate an API key from your OpenAI account. Use tier 2 keys or above for optimal performance.
model
is a mandatory field indicating a compatible OpenAI model. Meilisearch supports the following OpenAI models: openai-text-embedding-ada-002
, text-embedding-3-small
, and text-embedding-3-large
.
documentTemplate
is an optional field. Use it to customize the data you send to the embedder. It is highly recommended you configure a custom template for your documents.
dimensions
is an optional field. It must be bigger than zero and lower or equal to the chosen model's dimensions. This option is not compatible with text-embedding-ada-002
. Fewer dimensions may improve performance, but lead to lower search result accuracy.
`documentTemplate` usage
documentTemplate
must be a Liquid template. Use {{ doc.attribute }}
to access the attribute
field value of your documents. Meilisearch also exposes a {{ fields }}
array containing one object per document field, which you may access with {{ field.name }}
and {{ field.value }}
.
Any field you refer to in this way must exist in all documents or an error will be raised at indexing time.
For best results, use short strings indicating the type of document in that index, only include highly relevant document fields, and truncate long fields.
Generate custom embeddings manually
You may also provide custom embeddings. In this case, you must manually update your embeddings when adding, updating, and removing documents to your index.
Configure the embedder
index setting:
curl \
-X PATCH 'http://localhost:7700/indexes/movies/settings' \
-H 'Content-Type: application/json' \
--data-binary '{
"embedders": {
"image2text": {
"source": "userProvided",
"dimensions": 3
}
}
}'
Then, use the /documents
endpoint to upload vectorized documents. Store vector data in your documents' _vectors
field:
curl -X POST -H 'content-type: application/json' \
'localhost:7700/indexes/products/documents' \
--data-binary '[
{ "id": 0, "_vectors": {"image2text": [0, 0.8, -0.2]}, "text": "frying pan" },
{ "id": 1, "_vectors": {"image2text": [1, -0.2, 0]}, "text": "baking dish" }
]'
Distribution shift
For mathematical reasons, the _rankingScore
of semantic search results tend to be closely grouped around an average value that depends on the embedder and model used. This may result in relevant semantic hits being underrepresented and irrelevant semantic hits being overrepresented compared with keyword search hits.
Use distribution
when configuring an embedder to correct the returned _rankingScore
s of the semantic hits with an affine transformation:
curl \
-X PATCH 'http://localhost:7700/indexes/movies/settings' \
-H 'Content-Type: application/json' \
--data-binary '{
"embedders": {
"default": {
"source": "huggingFace",
"model": "MODEL_NAME",
"distribution": {
"mean": 0.7,
"sigma": 0.3
}
}
}
}'
Configuring distribution
requires a certain amount of trial and error, in which you must perform semantic searches and monitor the results. Based on their rankingScore
s and relevancy, add the observed mean
and sigma
values for that index.
distribution
is an optional field compatible with all embedder sources. It must be an object with two fields:
mean
: a number between0
and1
indicating the semantic score of "somewhat relevant" hits before using thedistribution
settingsigma
: a number between0
and1
indicating the average absolute difference in_rankingScore
s between "very relevant" hits and "somewhat relevant" hits, and "somewhat relevant" hits and "irrelevant hits".
Changing distribution
does not trigger a reindexing operation.
Vector search with auto-embeddings
Perform searches with q
and hybrid
to retrieve search results using both keyword and semantic search:
curl -X POST -H 'content-type: application/json' \
'localhost:7700/indexes/products/search' \
--data-binary '{
"q": "kitchen utensils",
"hybrid": {
"semanticRatio": 0.9,
"embedder": "default"
}
}'
hybrid
is an object and accepts two fields:
semanticRatio
: number between0
and1
.0
indicates full keyword search,1
indicates full semantic search. Defaults to0.5
embedder
: string, indicating one of the embedders configured for the queried index. Defaults to"default"
hybrid
can be used together with other search parameters, including filter
and sort
:
curl -X POST -H 'content-type: application/json' \
'localhost:7700/indexes/products/search' \
--data-binary '{
"q": "kitchen utensils",
"hybrid": {
"semanticRatio": 0.9,
"embedder": "default"
},
"filter": "price < 10",
"sort": ["price:asc"]
}'
Vector search with user-provided embeddings
Use the vector
search parameter to perform vector searches:
curl -X POST -H 'content-type: application/json' \
'localhost:7700/indexes/products/search' \
--data-binary '{ "vector": [0, 1, 2] }'
vector
must be an array of numbers indicating the search vector. You must generate these yourself when using vector search with user-provided embeddings.
vector
can be used together with other search parameters, including filter
and sort
:
curl -X POST -H 'content-type: application/json' \
'localhost:7700/indexes/products/search' \
--data-binary '{
"vector": [0, 1, 2],
"filter": "price < 10",
"sort": ["price:asc"]
}'
Other resources
Check out the Meilisearch blog post for a guide on implementing semantic search with LangChain.
Deactivate vector search
Manually remove all embedder configuration from your index:
curl \
-X DELETE 'http://localhost:7700/indexes/movies/settings/embedders' \
-H 'Content-Type: application/json'
WARNING
If you don't remove all embedders, Meilisearch will continue auto-generating embeddings for you documents. This will happen even if vectorStore
has been set to false
and may lead to unexpected expenses when using OpenAI's paid tiers.
If using Meilisearch Cloud, navigate to your project overview and find "Experimental features", then uncheck the "vector store" box.
Alternatively, use the /experimental
route:
curl \
-X PATCH 'http://localhost:7700/experimental-features/' \
-H 'Content-Type: application/json' \
--data-binary '{
"vectorStore": false
}'
More information
Consult the feature discussion on GitHub for the latest information on using vector search with Meilisearch. This feature is undergoing active development and any feedback you might have is welcome.