Computing Hugging Face embeddings with the GPU

    This guide is aimed at experienced users working with a self-hosted Meilisearch instance. It shows you how to compile a Meilisearch binary that generates Hugging Face embeddings with an Nvidia GPU.

    Prerequisites

    Install CUDA

    Follow Nvidia's CUDA installation instructions.

    Verify your CUDA install

    After you have installed CUDA in your machine, run the following command in your command-line terminal:

    nvcc --version | head -1
    

    If CUDA is working correctly, you will see the following response:

    nvcc: NVIDIA (R) Cuda compiler driver
    

    Compile Meilisearch

    First, clone Meilisearch:

    git clone https://github.com/meilisearch/meilisearch.git
    

    Then, compile the Meilisearch binary with cuda enabled:

    cargo build --release --features cuda
    

    This might take a few moments. Once the compiler is done, you should have a CUDA-compatible Meilisearch binary.

    Run your freshly compiled binary:

    ./meilisearch
    

    Next, enable the vector store experimental feature:

    curl \
      -X PATCH 'http://localhost:7700/experimental-features/' \
      -H 'Content-Type: application/json'  \
      --data-binary '{ "vectorStore": true }'
    

    Then add the Hugging Face embedder to your index settings:

    curl \
      -X PATCH 'http://localhost:7700/indexes/INDEX_NAME/settings/embedders' \
      -H 'Content-Type: application/json' \
      --data-binary '{ "default": { "source": "huggingFace" } }'
    

    Meilisearch will return a summarized task object and place your request on the task queue:

    {
      "taskUid": 1,
      "indexUid": "INDEX_NAME",
      "status": "enqueued",
      "type": "settingsUpdate",
      "enqueuedAt": "2024-03-04T15:05:43.383955Z"
    }
    

    Use the task object's taskUid to monitor the task status. The Hugging Face embedder will be ready to use once the task is completed.

    Conclusion

    You have seen how to compile a Meilisearch binary that uses your Nvidia GPU to compute vector embeddings. Doing this should significantly speed up indexing when using Hugging Face.