Multimodal Embeddings — Content API¶
The Content API lets you embed text, images, audio, video, and PDFs through a single portable interface. Instead of using provider-specific methods, you describe what you want to embed and the library handles the how.
Why the Content API?¶
The legacy EmbedDocuments/EmbedQuery API works great for text. But when you need to embed an image alongside a description, or a video with context, you need a way to express "these things belong together." That's what the Content API does.
Legacy API: EmbedDocuments(["text1", "text2"]) → []Embedding
Content API: EmbedContent(Content{text + image}) → Embedding
Both APIs coexist — use whichever fits. The Content API adds multimodal support without replacing anything.
Core Concepts¶
The Content API has four building blocks:
Content ← One thing to embed
├── Parts[] ← The pieces that make it up
│ ├── TextPart("a cat photo") ← Text piece
│ └── ImagePart(source) ← Image/video/audio/PDF piece
│ └── BinarySource ← Where the data comes from (URL, file, bytes)
├── Intent ← What you're using the embedding for (optional)
└── Dimension ← Output vector size override (optional)
Content¶
A Content is one semantic unit you want to embed — a document, a query, or a media item. It contains one or more Parts that the provider combines into a single embedding vector.
{% codetabs group="lang" %} {% codetab label="Go" %}
// A photo with its description → one embedding
content := embeddings.NewContent([]embeddings.Part{
embeddings.NewTextPart("A lioness hunting at sunset"),
embeddings.NewPartFromSource(
embeddings.ModalityImage,
embeddings.NewBinarySourceFromFile("lioness.png"),
),
})
Part¶
A Part is one piece of content — text, an image, a video clip. Each part has a modality that declares what type of content it is:
| Modality | What it represents | Part constructor |
|---|---|---|
ModalityText |
Plain text | NewTextPart("...") |
ModalityImage |
Image (PNG, JPEG, WebP, GIF) | NewPartFromSource(ModalityImage, source) |
ModalityVideo |
Video (MP4) | NewPartFromSource(ModalityVideo, source) |
ModalityAudio |
Audio (MP3, WAV) | NewPartFromSource(ModalityAudio, source) |
ModalityPDF |
PDF document | NewPartFromSource(ModalityPDF, source) |
For single-modality Content shortcuts, see the Convenience Constructors table below.
Not every provider supports every modality. See Provider Support below.
BinarySource¶
A BinarySource tells the library where to find non-text content. You don't construct it directly — use one of the helpers:
{% codetabs group="lang" %} {% codetab label="Go" %}
// From a URL (provider fetches it)
embeddings.NewBinarySourceFromURL("https://example.com/cat.jpg")
// From a local file (library reads and encodes it)
embeddings.NewBinarySourceFromFile("/path/to/photo.png")
// From raw bytes already in memory
embeddings.NewBinarySourceFromBytes(imageBytes)
// From a base64-encoded string
embeddings.NewBinarySourceFromBase64(b64String)
Intent¶
An Intent tells the provider why you're embedding this content. Providers that support intents (like Gemini and VoyageAI) use this to optimize the embedding for your use case.
{% codetabs group="lang" %} {% codetab label="Go" %}
// Embedding a query to search against stored documents
query := embeddings.NewTextContent("how do lionesses hunt?",
embeddings.WithIntent(embeddings.IntentRetrievalQuery),
)
// Embedding a document to be searched later
doc := embeddings.NewTextContent("Lionesses hunt cooperatively...",
embeddings.WithIntent(embeddings.IntentRetrievalDocument),
)
When to use which intent:
| Intent | Use when... | Example |
|---|---|---|
IntentRetrievalQuery |
Embedding a search query | User types "find sunset photos" |
IntentRetrievalDocument |
Embedding content to be searched | Indexing a photo description |
IntentClassification |
Categorizing content | Sorting images into categories |
IntentClustering |
Grouping similar content | Finding related documents |
IntentSemanticSimilarity |
Comparing two items | Checking if two descriptions match |
Intents are optional. If you skip them, the provider uses its default behavior.
Not all providers support all intents
Gemini supports all five. VoyageAI supports only IntentRetrievalQuery and IntentRetrievalDocument. Unsupported intents return a clear error — they never silently degrade.
Convenience Constructors¶
For single-modality content, use the shorthand constructors instead of building Content structs manually:
{% codetabs group="lang" %}
| Modality | Shorthand | Equivalent verbose form |
|---|---|---|
| Text | NewTextContent("...") |
Content{Parts: []Part{NewTextPart("...")}} |
| Image (URL) | NewImageURL(url) |
Content{Parts: []Part{NewPartFromSource(ModalityImage, NewBinarySourceFromURL(url))}} |
| Image (file) | NewImageFile(path) |
Content{Parts: []Part{NewPartFromSource(ModalityImage, NewBinarySourceFromFile(path))}} |
| Video (URL) | NewVideoURL(url) |
Content{Parts: []Part{NewPartFromSource(ModalityVideo, NewBinarySourceFromURL(url))}} |
| Video (file) | NewVideoFile(path) |
Content{Parts: []Part{NewPartFromSource(ModalityVideo, NewBinarySourceFromFile(path))}} |
| Audio (URL) | NewAudioURL(url) |
Content{Parts: []Part{NewPartFromSource(ModalityAudio, NewBinarySourceFromURL(url))}} |
| Audio (file) | NewAudioFile(path) |
Content{Parts: []Part{NewPartFromSource(ModalityAudio, NewBinarySourceFromFile(path))}} |
| PDF (URL) | NewPDFURL(url) |
Content{Parts: []Part{NewPartFromSource(ModalityPDF, NewBinarySourceFromURL(url))}} |
| PDF (file) | NewPDFFile(path) |
Content{Parts: []Part{NewPartFromSource(ModalityPDF, NewBinarySourceFromFile(path))}} |
{% /codetab %}
All constructors accept optional ContentOption arguments for intent, dimension, and provider hints:
{% codetabs group="lang" %} {% codetab label="Go" %}
// Embed text for retrieval
query := embeddings.NewTextContent("how do lionesses hunt?",
embeddings.WithIntent(embeddings.IntentRetrievalQuery),
)
// Embed with custom output dimensions
doc := embeddings.NewTextContent("document text",
embeddings.WithDimension(256),
)
For mixed-part content, use NewContent with Part helpers:
{% codetabs group="lang" %} {% codetab label="Go" %}
content := embeddings.NewContent([]embeddings.Part{
embeddings.NewTextPart("A lioness hunting at sunset"),
embeddings.NewPartFromSource(
embeddings.ModalityImage,
embeddings.NewBinarySourceFromFile("lioness.png"),
),
})
Common Recipes¶
Embed text¶
{% codetabs group="lang" %} {% codetab label="Go" %}
ef, err := gemini.NewGeminiEmbeddingFunction(gemini.WithEnvAPIKey())
if err != nil {
log.Fatal(err)
}
content := embeddings.NewTextContent("What is Chroma?")
emb, err := ef.EmbedContent(context.Background(), content)
Embed an image from a URL¶
{% codetabs group="lang" %} {% codetab label="Go" %}
content := embeddings.NewImageURL("https://example.com/cat.jpg")
emb, err := ef.EmbedContent(context.Background(), content)
Embed an image from a local file¶
{% codetabs group="lang" %} {% codetab label="Go" %}
content := embeddings.NewImageFile("/path/to/photo.png")
emb, err := ef.EmbedContent(context.Background(), content)
Embed text + image together¶
When you combine parts, the provider fuses them into a single embedding that captures both the text and visual content:
{% codetabs group="lang" %} {% codetab label="Go" %}
content := embeddings.NewContent([]embeddings.Part{
embeddings.NewTextPart("A lioness hunting at sunset"),
embeddings.NewPartFromSource(
embeddings.ModalityImage,
embeddings.NewBinarySourceFromFile("lioness.png"),
),
})
emb, err := ef.EmbedContent(context.Background(), content)
Verbose construction
The Convenience Constructors table above shows the equivalent verbose Content{} struct literals for each modality.
Embed a batch of items¶
Use EmbedContents to embed multiple content items in one call. Each item produces its own embedding:
{% codetabs group="lang" %} {% codetab label="Go" %}
contents := []embeddings.Content{
embeddings.NewTextContent("The golden hour on the Serengeti"),
embeddings.NewImageFile("lioness.png"),
embeddings.NewContent([]embeddings.Part{
embeddings.NewTextPart("A lioness pouncing on prey"),
embeddings.NewPartFromSource(
embeddings.ModalityVideo,
embeddings.NewBinarySourceFromFile("the_pounce.mp4"),
),
}),
}
results, err := ef.EmbedContents(context.Background(), contents)
// results[0] = text embedding, results[1] = image embedding, results[2] = text+video embedding
Embed with an intent¶
{% codetabs group="lang" %} {% codetab label="Go" %}
content := embeddings.NewTextContent("how do lionesses hunt?",
embeddings.WithIntent(embeddings.IntentRetrievalQuery),
)
emb, err := ef.EmbedContent(context.Background(), content)
Provider Support¶
| Provider | Models | Modalities | Mixed Parts | Intents |
|---|---|---|---|---|
| Gemini | gemini-embedding-2-preview |
text, image, audio, video, PDF | yes | all 5 |
gemini-embedding-001 (legacy) |
text only | no | all 5 | |
| VoyageAI | voyage-multimodal-3.5 |
text, image, video | yes | query, document |
voyage-2 (default) |
text only | no | query, document | |
| Roboflow | CLIP | text, image | no (one part per Content) | none |
See the Embeddings page for provider setup, API keys, and option functions.
Advanced¶
Custom output dimensions¶
Some providers support truncated embeddings for storage efficiency. Use WithDimension:
{% codetabs group="lang" %} {% codetab label="Go" %}
content := embeddings.NewTextContent("document text",
embeddings.WithDimension(256),
)
emb, err := ef.EmbedContent(context.Background(), content)
// emb.Len() == 256
Provider hints¶
For provider-specific options that don't have a portable equivalent, use WithProviderHints:
{% codetabs group="lang" %} {% codetab label="Go" %}
content := embeddings.NewTextContent("classify this",
embeddings.WithProviderHints(map[string]any{
"task_type": "CLASSIFICATION", // Gemini-specific
}),
)
Warning
WithProviderHints bypasses portable intent mapping. It's an escape hatch — prefer WithIntent when a neutral constant fits your use case.
Capability inspection¶
Check what a provider supports at runtime:
{% codetabs group="lang" %} {% codetab label="Go" %}
if capAware, ok := ef.(embeddings.CapabilityAware); ok {
caps := capAware.Capabilities()
fmt.Println("Modalities:", caps.Modalities) // e.g. [text image audio video pdf]
fmt.Println("Mixed parts:", caps.SupportsMixedPart) // true
fmt.Println("Intents:", caps.Intents) // e.g. [retrieval_query retrieval_document ...]
}
Compatibility with Legacy API¶
Both APIs coexist indefinitely — neither is deprecated.
| Use case | Recommended API |
|---|---|
| Text-only embeddings | EmbedDocuments / EmbedQuery |
| Mixed media (text + images + video) | EmbedContent / EmbedContents |
| Portable intents or per-request dimensions | EmbedContent / EmbedContents |
Existing providers automatically work with the Content API when retrieved through the registry. The registry wraps them with built-in adapters.