Vector search

    Vector search is an experimental technology that uses Large Language Models to retrieve search results based on the meaning and context of a query.

    This feature can improve search relevancy for queries that do not to match keywords in your dataset, allow your users to search images and other non-textual media, suggest related products in webshops, and create conversational search interfaces.

    Vector search is available to all users. Meilisearch Cloud is the recommended way of using vector search.

    If using Meilisearch Cloud, navigate to your project overview and find "Experimental features", then check the "vector store" box.

    A section of the project overview interface titled "Experimental features". There are two options: "Score details" and "Vector store". "Vector store" is turned on.

    Alternatively, use the /experimental-features route to activate vector search during runtime:

    curl \
      -X PATCH 'http://localhost:7700/experimental-features/' \
      -H 'Content-Type: application/json'  \
      --data-binary '{
        "vectorStore": true
      }'
    
    Meilisearch Cloud AI-powered search waitlist

    To ensure proper scaling of Meilisearch Cloud's latest AI-powered search offering, you must enter the waitlist before activating vector search. You will not be able to activate vector search in the Cloud interface or via the /experimental-features route until your sign up has been approved.

    Generate vector embeddings

    To use vector search, first configure the embedders index setting. You may configure multiple embedders for an index.

    Embedders generate vector data from your documents. Meilisearch natively supports the following embedders:

    It is also possible to supply custom embeddings. In this case, you must generate the embeddings manually and include them as a field in your documents.

    Generate auto-embeddings

    Use the embedders index setting of the update /settings endpoint to configure one or more embedders for an index:

    curl \
      -X PATCH 'http://localhost:7700/indexes/movies/settings' \
      -H 'Content-Type: application/json' \
      --data-binary '{
        "embedders": {
          "default": {
            "source":  "openAi",
            "apiKey": "anOpenAiApiKey",
            "model": "text-embedding-3-small",
            "documentTemplate": "A movie titled {{doc.title}} whose description starts with {{doc.overview|truncatewords: 20}}",
            "dimensions": 1536
          }
        }
      }'
    

    It is mandatory to pass an OpenAI API key through the OPENAI_API_KEY environment variable or the apiKey field when using an OpenAI embedder. Generate an API key from your OpenAI account. Use tier 2 keys or above for optimal performance.

    model is a mandatory field indicating a compatible OpenAI model. Meilisearch supports the following OpenAI models: openai-text-embedding-ada-002, text-embedding-3-small, and text-embedding-3-large.

    documentTemplate is an optional field. Use it to customize the data you send to the embedder. It is highly recommended you configure a custom template for your documents.

    dimensions is an optional field. It must be bigger than zero and lower or equal to the chosen model's dimensions. This option is not compatible with text-embedding-ada-002. Fewer dimensions may improve performance, but lead to lower search result accuracy.

    `documentTemplate` usage

    documentTemplate must be a Liquid template. Use {{ doc.attribute }} to access the attribute field value of your documents. Meilisearch also exposes a {{ fields }} array containing one object per document field, which you may access with {{ field.name }} and {{ field.value }}.

    Any field you refer to in this way must exist in all documents or an error will be raised at indexing time.

    For best results, use short strings indicating the type of document in that index, only include highly relevant document fields, and truncate long fields.

    Generate custom embeddings manually

    You may also provide custom embeddings. In this case, you must manually update your embeddings when adding, updating, and removing documents to your index.

    Configure the embedder index setting:

    curl \
      -X PATCH 'http://localhost:7700/indexes/movies/settings' \
      -H 'Content-Type: application/json' \
      --data-binary '{
        "embedders": {
          "image2text": {
            "source":  "userProvided",
            "dimensions": 3
          }
        }
      }'
    

    Then, use the /documents endpoint to upload vectorized documents. Store vector data in your documents' _vectors field:

    curl -X POST -H 'content-type: application/json' \
    'localhost:7700/indexes/products/documents' \
    --data-binary '[
        { "id": 0, "_vectors": {"image2text": [0, 0.8, -0.2]}, "text": "frying pan" },
        { "id": 1, "_vectors": {"image2text": [1, -0.2, 0]}, "text": "baking dish" }
    ]'
    

    Distribution shift

    For mathematical reasons, the _rankingScore of semantic search results tend to be closely grouped around an average value that depends on the embedder and model used. This may result in relevant semantic hits being underrepresented and irrelevant semantic hits being overrepresented compared with keyword search hits.

    Use distribution when configuring an embedder to correct the returned _rankingScores of the semantic hits with an affine transformation:

    curl \
      -X PATCH 'http://localhost:7700/indexes/movies/settings' \
      -H 'Content-Type: application/json' \
      --data-binary '{
        "embedders": {
          "default": {
            "source":  "huggingFace",
            "model": "MODEL_NAME",
            "distribution": {
              "mean": 0.7,
              "sigma": 0.3
            }
          }
        }
      }'
    

    Configuring distribution requires a certain amount of trial and error, in which you must perform semantic searches and monitor the results. Based on their rankingScores and relevancy, add the observed mean and sigma values for that index.

    distribution is an optional field compatible with all embedder sources. It must be an object with two fields:

    Changing distribution does not trigger a reindexing operation.

    Vector search with auto-embeddings

    Perform searches with q and hybrid to retrieve search results using both keyword and semantic search:

    curl -X POST -H 'content-type: application/json' \
      'localhost:7700/indexes/products/search' \
      --data-binary '{
        "q": "kitchen utensils",
        "hybrid": {
          "semanticRatio": 0.9,
          "embedder": "default"
        }
      }'
    

    hybrid is an object and accepts two fields:

    hybrid can be used together with other search parameters, including filter and sort:

    curl -X POST -H 'content-type: application/json' \
      'localhost:7700/indexes/products/search' \
      --data-binary '{
        "q": "kitchen utensils",
        "hybrid": {
          "semanticRatio": 0.9,
          "embedder": "default"
        },
        "filter": "price < 10",
        "sort": ["price:asc"]
      }'
    

    Vector search with user-provided embeddings

    Use the vector search parameter to perform vector searches:

    curl -X POST -H 'content-type: application/json' \
      'localhost:7700/indexes/products/search' \
      --data-binary '{ "vector": [0, 1, 2] }'
    

    vector must be an array of numbers indicating the search vector. You must generate these yourself when using vector search with user-provided embeddings.

    vector can be used together with other search parameters, including filter and sort:

    curl -X POST -H 'content-type: application/json' \
      'localhost:7700/indexes/products/search' \
      --data-binary '{
        "vector": [0, 1, 2],
        "filter": "price < 10",
        "sort": ["price:asc"]
      }'
    
    Other resources

    Check out the Meilisearch blog post for a guide on implementing semantic search with LangChain.

    Manually remove all embedder configuration from your index:

    curl \
      -X DELETE 'http://localhost:7700/indexes/movies/settings/embedders' \
      -H 'Content-Type: application/json'
    
    WARNING

    If you don't remove all embedders, Meilisearch will continue auto-generating embeddings for you documents. This will happen even if vectorStore has been set to false and may lead to unexpected expenses when using OpenAI's paid tiers.

    If using Meilisearch Cloud, navigate to your project overview and find "Experimental features", then uncheck the "vector store" box.

    Alternatively, use the /experimental route:

    curl \
      -X PATCH 'http://localhost:7700/experimental-features/' \
      -H 'Content-Type: application/json'  \
      --data-binary '{
        "vectorStore": false
      }'
    

    More information

    Consult the feature discussion on GitHub for the latest information on using vector search with Meilisearch. This feature is undergoing active development and any feedback you might have is welcome.