Question 1

What content types does multimodal search support?

Accepted Answer

Meilisearch multimodal search supports images (JPEG, PNG, WebP, GIF), video thumbnails and transcripts, audio transcripts, and text documents. Any content that can be converted to an embedding vector can be indexed and searched together in a single unified index.

Question 2

Do I need to generate embeddings myself?

Accepted Answer

No. Meilisearch automatically generates embeddings during indexing when you configure an embedder. Just point to your content and Meilisearch handles the rest. No separate ML pipeline or pre-processing step required.

Question 3

Which embedding providers support image and multimodal content?

Accepted Answer

Meilisearch has native integrations with providers that support multimodal embeddings, including OpenAI CLIP, Google Vertex AI, and other vision-language models. You can also use any custom provider via the REST embedder endpoint.

Question 4

How does text-to-image search work?

Accepted Answer

Text-to-image search uses multimodal embedding models that map both text and images into the same vector space. When a user types "sunset over mountains", the query is embedded and compared against image embeddings to find visually matching results. No manual tagging required.

Question 5

Can I combine text and image search in the same query?

Accepted Answer

Yes. With hybrid retrieval, you can combine keyword-based text search with vector-based image search. The semanticRatio parameter lets you tune the balance between exact keyword matches and semantic visual similarity for each query.

Multimodal Search

Trusted by leading companies

How multimodal search works

Search every content type

Text-to-image search

Visual similarity

Video & audio search

Hybrid retrieval

Multi-embedder support

Metadata filtering

Works with multimodal embedding providers

Google

Cohere

Voyage AI

Jina AI

Custom

Built for every use case

E-commerce

Frequently asked questions

Ready to get started?