Vector search

    Vector search is an experimental technology that uses Large Language Models to retrieve search results based on the meaning and context of a query.

    This feature can improve search relevancy for queries that do not to match keywords in your dataset, allow your users to search images and other non-textual media, suggest related products in webshops, and create conversational search interfaces.

    Vector search is available to all users. Meilisearch Cloud is the recommended way of using vector search.

    If using Meilisearch Cloud, navigate to your project overview and find "Experimental features", then check the "vector store" box.

    A section of the project overview interface titled "Experimental features". There are two options: "Score details" and "Vector store". "Vector store" is turned on.

    Alternatively, use the /experimental route to activate vector search during runtime:

    curl \
      -X PATCH 'http://localhost:7700/experimental-features/' \
      -H 'Content-Type: application/json'  \
      --data-binary '{
        "vectorStore": true
      }'
    

    Generate vector embeddings

    To use vector search, first configure the embedders index setting. You may configure multiple embedders for an index.

    Embedders generate vector data from your documents. Meilisearch natively supports OpenAI and HuggingFace embedders.

    It is also possible to use custom embedders. In this case, you must generate the embeddings manually and include them as a field in your documents.

    Generate auto-embeddings with OpenAI

    Use the embedders index setting of the update /settings endpoint to configure one or more embedders for an index:

    curl \
      -X PATCH 'http://localhost:7700/indexes/movies/settings' \
      -H 'Content-Type: application/json' \
      --data-binary '{
        "embedders": {
          "default": {
            "source":  "openAi",
            "apiKey": "anOpenAiApiKey",
            "model": "text-embedding-ada-002",
            "documentTemplate": "A movie titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
          }
        }
      }'
    

    It is mandatory to pass an OpenAI API key through the OPENAI_API_KEY environment variable or the apiKey field when using an OpenAI embedder. Generate an API key from your OpenAI account. Use tier 2 keys or above for optimal performance.

    documentTemplate is an optional field you can use to customize the data you send to the embedder. It is highly recommended you configure a custom template for your documents.

    `documentTemplate` usage

    documentTemplate must be a Liquid template. Use {{ doc.attribute }} to access the attribute field value of your documents. Any field you refer to in this way must exist in all documents or an error will be raised at indexing time.

    For best results, use short strings indicating the type of document in that index, only include highly relevant document fields, and truncate long fields.

    Generate auto-embeddings with HuggingFace

    The HuggingFace embedder computes embeddings locally. This is a resource-intensive operation and might affect indexing performance.

    Use the embedders index setting of the update /settings endpoint to configure one or more embedders for an index:

    curl \
      -X PATCH 'http://localhost:7700/indexes/movies/settings' \
      -H 'Content-Type: application/json' \
      --data-binary '{
        "embedders": {
          "default": {
            "source": "huggingFace",
            "model": "bge-base-en-v1.5",
            "documentTemplate": "A movie titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
          }
        }
      }'
    

    documentTemplate is an optional field you can use to customize the data you send to the embedder. It is highly recommended you configure a custom template for your documents.

    model is mandatory and must indicate a BERT embedding model.

    `documentTemplate` usage

    documentTemplate must be a Liquid template. Use {{ doc.attribute }} to access document field values. Meilisearch also exposes a {{ fields }} array containing one object per document field, which you may access with {{ field.name }} and {{ field.value }}.

    For best results, use short strings indicating the type of document in that index, only include highly relevant document fields, and truncate long fields.

    Custom embeddings

    You may also provide custom embeddings. In this case, you must manually update your embeddings when adding, updating, and removing documents to your index.

    Configure the embedder index setting:

    curl \
      -X PATCH 'http://localhost:7700/indexes/movies/settings' \
      -H 'Content-Type: application/json' \
      --data-binary '{
        "embedders": {
          "image2text": {
            "source":  "userProvided",
            "dimensions": 3
          }
        }
      }'
    

    Then, use the /documents endpoint to upload vectorized documents. Store vector data in your documents' _vectors field:

    curl -X POST -H 'content-type: application/json' \
    'localhost:7700/indexes/products/documents' \
    --data-binary '[
        { "id": 0, "_vectors": {"image2text": [0, 0.8, -0.2]}, "text": "frying pan" },
        { "id": 1, "_vectors": {"image2text": [1, -0.2, 0]}, "text": "baking dish" }
    ]'
    

    Vector search with auto-embeddings

    Perform searches with q and hybrid to retrieve search results using both keyword and semantic search:

    curl -X POST -H 'content-type: application/json' \
      'localhost:7700/indexes/products/search' \
      --data-binary '{ 
        "q": "kitchen utensils",
        "hybrid": {
          "semanticRatio": 0.9,
          "embedder": "default"
        }
      }'
    

    hybrid is an object and accepts two fields:

    hybrid can be used together with other search parameters, including filter and sort:

    curl -X POST -H 'content-type: application/json' \
      'localhost:7700/indexes/products/search' \
      --data-binary '{
        "q": "kitchen utensils",
        "hybrid": {
          "semanticRatio": 0.9,
          "embedder": "default"
        },
        "filter": "price < 10",
        "sort": ["price:asc"]
      }'
    

    Vector search with user-provided embeddings

    Use the vector search parameter to perform vector searches:

    curl -X POST -H 'content-type: application/json' \
      'localhost:7700/indexes/products/search' \
      --data-binary '{ "vector": [0, 1, 2] }'
    

    vector must be an array of numbers indicating the search vector. You must generate these yourself when using vector search with user-provided embeddings.

    vector can be used together with other search parameters, including filter and sort:

    curl -X POST -H 'content-type: application/json' \
      'localhost:7700/indexes/products/search' \
      --data-binary '{
        "vector": [0, 1, 2],
        "filter": "price < 10",
        "sort": ["price:asc"]
      }'
    
    Other resources

    Check out the Meilisearch blog post for a tutorial on implementing semantic search with LangChain.

    Manually remove all embedder configuration from your index:

    curl \
      -X DELETE 'http://localhost:7700/indexes/movies/settings/embedders' \
      -H 'Content-Type: application/json'
    
    WARNING

    If you don't remove all embedders, Meilisearch will continue auto-generating embeddings for you documents. This will happen even if vectorStore has been set to false and may lead to unexpected expenses when using OpenAI's paid tiers.

    If using Meilisearch Cloud, navigate to your project overview and find "Experimental features", then uncheck the "vector store" box.

    Alternatively, use the /experimental route:

    curl \
      -X PATCH 'http://localhost:7700/experimental-features/' \
      -H 'Content-Type: application/json'  \
      --data-binary '{
        "vectorStore": false
      }'
    

    More information

    Consult the feature discussion on GitHub for the latest information on using vector search with Meilisearch. This feature is undergoing active development and any feedback you might have is welcome.