Beyond text

Multimodal Search

Search beyond text. Let users find images, video, and audio the same way they search words, by meaning, not just file names or tags.

Trusted by leading companies

How multimodal search works

Every content type is transformed into a unified vector space, enabling cross-modal search in milliseconds.

Your content library

Images
12,483 photos
Videos
3,291 clips
Audio
8,104 tracks
Text
47,220 docs
AI understands every format
golden retriever
golden_retriever_beach.jpg
imagePhoto · 2.4 MB
98%
puppy_training_outdoors.mp4
videoVideo · 4 min 32 sec
94%
Best dog breeds for active families
textArticle · 8 min read
89%
Any content type
Photos, clips, tracks and docs live in one single index
One text query
Type what you're looking for — no special syntax needed
Cross-modal results
Words return images. Phrases surface videos. All ranked together.

Search every content type

From images to video to audio – one unified search API for all your media.

Text-to-image search

Find images with natural language. Describe what you want, get matches instantly.

Visual similarity

Find visually similar content via vector embeddings. Images, thumbnails, product photos.

Video & audio search

Index and search video transcripts, audio files, and rich media with text docs.

Hybrid retrieval

Combine keyword and vector search. Tune the semantic ratio to fit your use case.

Multi-embedder support

Different embedding models per content type. Composite embedders mix multiple sources.

Metadata filtering

Filter by tags, dimensions, duration, format, or any custom attribute.

Works with multimodal embedding providers

Native support for models that understand images, video, and text.

5 models from 4 providers

Google

Google

1 model

gemini-embedding-2
Cohere

Cohere

1 model

Embed 4
Voyage

Voyage AI

1 model

voyage-multimodal-3.5
Jina

Jina AI

2 models

jina-embeddings-v4
jina-clip-v2
+

Custom

Any provider

Meilisearch is compatible with any model offering a REST API and tool calling capabilities.

Built for every use case

From e-commerce visual search to enterprise media libraries, multimodal search powers discovery across industries.

E-commerce

Let shoppers search by photo or description. Find products visually similar to what they already love.

Frequently asked questions

Meilisearch multimodal search supports images (JPEG, PNG, WebP, GIF), video thumbnails and transcripts, audio transcripts, and text documents. Any content that can be converted to an embedding vector can be indexed and searched together in a single unified index.

Ready to get started?

Run Multimodal Search on Meilisearch Cloud, or self-host the open-source engine.