Skip to content

Multimodal Embeddings — Content API

The Content API lets you embed text, images, audio, video, and PDFs through a single portable interface. Instead of using provider-specific methods, you describe what you want to embed and the library handles the how.

Why the Content API?

The legacy EmbedDocuments/EmbedQuery API works great for text. But when you need to embed an image alongside a description, or a video with context, you need a way to express "these things belong together." That's what the Content API does.

Legacy API:  EmbedDocuments(["text1", "text2"])     → []Embedding
Content API: EmbedContent(Content{text + image})    → Embedding

Both APIs coexist — use whichever fits. The Content API adds multimodal support without replacing anything.

Core Concepts

The Content API has four building blocks:

Content                          ← One thing to embed
├── Parts[]                      ← The pieces that make it up
│   ├── TextPart("a cat photo")  ← Text piece
│   └── ImagePart(source)        ← Image/video/audio/PDF piece
│       └── BinarySource         ← Where the data comes from (URL, file, bytes)
├── Intent                       ← What you're using the embedding for (optional)
└── Dimension                    ← Output vector size override (optional)

Content

A Content is one semantic unit you want to embed — a document, a query, or a media item. It contains one or more Parts that the provider combines into a single embedding vector.

{% codetabs group="lang" %} {% codetab label="Go" %}

// A photo with its description → one embedding
content := embeddings.NewContent([]embeddings.Part{
    embeddings.NewTextPart("A lioness hunting at sunset"),
    embeddings.NewPartFromSource(
        embeddings.ModalityImage,
        embeddings.NewBinarySourceFromFile("lioness.png"),
    ),
})
{% /codetab %}

Part

A Part is one piece of content — text, an image, a video clip. Each part has a modality that declares what type of content it is:

Modality What it represents Part constructor
ModalityText Plain text NewTextPart("...")
ModalityImage Image (PNG, JPEG, WebP, GIF) NewPartFromSource(ModalityImage, source)
ModalityVideo Video (MP4) NewPartFromSource(ModalityVideo, source)
ModalityAudio Audio (MP3, WAV) NewPartFromSource(ModalityAudio, source)
ModalityPDF PDF document NewPartFromSource(ModalityPDF, source)

For single-modality Content shortcuts, see the Convenience Constructors table below.

Not every provider supports every modality. See Provider Support below.

BinarySource

A BinarySource tells the library where to find non-text content. You don't construct it directly — use one of the helpers:

{% codetabs group="lang" %} {% codetab label="Go" %}

// From a URL (provider fetches it)
embeddings.NewBinarySourceFromURL("https://example.com/cat.jpg")

// From a local file (library reads and encodes it)
embeddings.NewBinarySourceFromFile("/path/to/photo.png")

// From raw bytes already in memory
embeddings.NewBinarySourceFromBytes(imageBytes)

// From a base64-encoded string
embeddings.NewBinarySourceFromBase64(b64String)
{% /codetab %}

Intent

An Intent tells the provider why you're embedding this content. Providers that support intents (like Gemini and VoyageAI) use this to optimize the embedding for your use case.

{% codetabs group="lang" %} {% codetab label="Go" %}

// Embedding a query to search against stored documents
query := embeddings.NewTextContent("how do lionesses hunt?",
    embeddings.WithIntent(embeddings.IntentRetrievalQuery),
)

// Embedding a document to be searched later
doc := embeddings.NewTextContent("Lionesses hunt cooperatively...",
    embeddings.WithIntent(embeddings.IntentRetrievalDocument),
)
{% /codetab %}

When to use which intent:

Intent Use when... Example
IntentRetrievalQuery Embedding a search query User types "find sunset photos"
IntentRetrievalDocument Embedding content to be searched Indexing a photo description
IntentClassification Categorizing content Sorting images into categories
IntentClustering Grouping similar content Finding related documents
IntentSemanticSimilarity Comparing two items Checking if two descriptions match

Intents are optional. If you skip them, the provider uses its default behavior.

Not all providers support all intents

Gemini supports all five. VoyageAI supports only IntentRetrievalQuery and IntentRetrievalDocument. Unsupported intents return a clear error — they never silently degrade.

Convenience Constructors

For single-modality content, use the shorthand constructors instead of building Content structs manually:

{% codetabs group="lang" %}

Modality Shorthand Equivalent verbose form
Text NewTextContent("...") Content{Parts: []Part{NewTextPart("...")}}
Image (URL) NewImageURL(url) Content{Parts: []Part{NewPartFromSource(ModalityImage, NewBinarySourceFromURL(url))}}
Image (file) NewImageFile(path) Content{Parts: []Part{NewPartFromSource(ModalityImage, NewBinarySourceFromFile(path))}}
Video (URL) NewVideoURL(url) Content{Parts: []Part{NewPartFromSource(ModalityVideo, NewBinarySourceFromURL(url))}}
Video (file) NewVideoFile(path) Content{Parts: []Part{NewPartFromSource(ModalityVideo, NewBinarySourceFromFile(path))}}
Audio (URL) NewAudioURL(url) Content{Parts: []Part{NewPartFromSource(ModalityAudio, NewBinarySourceFromURL(url))}}
Audio (file) NewAudioFile(path) Content{Parts: []Part{NewPartFromSource(ModalityAudio, NewBinarySourceFromFile(path))}}
PDF (URL) NewPDFURL(url) Content{Parts: []Part{NewPartFromSource(ModalityPDF, NewBinarySourceFromURL(url))}}
PDF (file) NewPDFFile(path) Content{Parts: []Part{NewPartFromSource(ModalityPDF, NewBinarySourceFromFile(path))}}

{% /codetab %}

All constructors accept optional ContentOption arguments for intent, dimension, and provider hints:

{% codetabs group="lang" %} {% codetab label="Go" %}

// Embed text for retrieval
query := embeddings.NewTextContent("how do lionesses hunt?",
    embeddings.WithIntent(embeddings.IntentRetrievalQuery),
)

// Embed with custom output dimensions
doc := embeddings.NewTextContent("document text",
    embeddings.WithDimension(256),
)
{% /codetab %}

For mixed-part content, use NewContent with Part helpers:

{% codetabs group="lang" %} {% codetab label="Go" %}

content := embeddings.NewContent([]embeddings.Part{
    embeddings.NewTextPart("A lioness hunting at sunset"),
    embeddings.NewPartFromSource(
        embeddings.ModalityImage,
        embeddings.NewBinarySourceFromFile("lioness.png"),
    ),
})
{% /codetab %}

Common Recipes

Embed text

{% codetabs group="lang" %} {% codetab label="Go" %}

ef, err := gemini.NewGeminiEmbeddingFunction(gemini.WithEnvAPIKey())
if err != nil {
    log.Fatal(err)
}

content := embeddings.NewTextContent("What is Chroma?")
emb, err := ef.EmbedContent(context.Background(), content)
{% /codetab %}

Embed an image from a URL

{% codetabs group="lang" %} {% codetab label="Go" %}

content := embeddings.NewImageURL("https://example.com/cat.jpg")
emb, err := ef.EmbedContent(context.Background(), content)
{% /codetab %}

Embed an image from a local file

{% codetabs group="lang" %} {% codetab label="Go" %}

content := embeddings.NewImageFile("/path/to/photo.png")
emb, err := ef.EmbedContent(context.Background(), content)
{% /codetab %}

Embed text + image together

When you combine parts, the provider fuses them into a single embedding that captures both the text and visual content:

{% codetabs group="lang" %} {% codetab label="Go" %}

content := embeddings.NewContent([]embeddings.Part{
    embeddings.NewTextPart("A lioness hunting at sunset"),
    embeddings.NewPartFromSource(
        embeddings.ModalityImage,
        embeddings.NewBinarySourceFromFile("lioness.png"),
    ),
})
emb, err := ef.EmbedContent(context.Background(), content)
{% /codetab %}

Verbose construction

The Convenience Constructors table above shows the equivalent verbose Content{} struct literals for each modality.

Embed a batch of items

Use EmbedContents to embed multiple content items in one call. Each item produces its own embedding:

{% codetabs group="lang" %} {% codetab label="Go" %}

contents := []embeddings.Content{
    embeddings.NewTextContent("The golden hour on the Serengeti"),
    embeddings.NewImageFile("lioness.png"),
    embeddings.NewContent([]embeddings.Part{
        embeddings.NewTextPart("A lioness pouncing on prey"),
        embeddings.NewPartFromSource(
            embeddings.ModalityVideo,
            embeddings.NewBinarySourceFromFile("the_pounce.mp4"),
        ),
    }),
}
results, err := ef.EmbedContents(context.Background(), contents)
// results[0] = text embedding, results[1] = image embedding, results[2] = text+video embedding
{% /codetab %}

Embed with an intent

{% codetabs group="lang" %} {% codetab label="Go" %}

content := embeddings.NewTextContent("how do lionesses hunt?",
    embeddings.WithIntent(embeddings.IntentRetrievalQuery),
)
emb, err := ef.EmbedContent(context.Background(), content)
{% /codetab %}

Provider Support

Provider Models Modalities Mixed Parts Intents
Gemini gemini-embedding-2-preview text, image, audio, video, PDF yes all 5
gemini-embedding-001 (legacy) text only no all 5
VoyageAI voyage-multimodal-3.5 text, image, video yes query, document
voyage-2 (default) text only no query, document
Roboflow CLIP text, image no (one part per Content) none

See the Embeddings page for provider setup, API keys, and option functions.

Advanced

Custom output dimensions

Some providers support truncated embeddings for storage efficiency. Use WithDimension:

{% codetabs group="lang" %} {% codetab label="Go" %}

content := embeddings.NewTextContent("document text",
    embeddings.WithDimension(256),
)
emb, err := ef.EmbedContent(context.Background(), content)
// emb.Len() == 256
{% /codetab %}

Provider hints

For provider-specific options that don't have a portable equivalent, use WithProviderHints:

{% codetabs group="lang" %} {% codetab label="Go" %}

content := embeddings.NewTextContent("classify this",
    embeddings.WithProviderHints(map[string]any{
        "task_type": "CLASSIFICATION",  // Gemini-specific
    }),
)
{% /codetab %}

Warning

WithProviderHints bypasses portable intent mapping. It's an escape hatch — prefer WithIntent when a neutral constant fits your use case.

Capability inspection

Check what a provider supports at runtime:

{% codetabs group="lang" %} {% codetab label="Go" %}

if capAware, ok := ef.(embeddings.CapabilityAware); ok {
    caps := capAware.Capabilities()
    fmt.Println("Modalities:", caps.Modalities)     // e.g. [text image audio video pdf]
    fmt.Println("Mixed parts:", caps.SupportsMixedPart) // true
    fmt.Println("Intents:", caps.Intents)            // e.g. [retrieval_query retrieval_document ...]
}
{% /codetab %}

Compatibility with Legacy API

Both APIs coexist indefinitely — neither is deprecated.

Use case Recommended API
Text-only embeddings EmbedDocuments / EmbedQuery
Mixed media (text + images + video) EmbedContent / EmbedContents
Portable intents or per-request dimensions EmbedContent / EmbedContents

Existing providers automatically work with the Content API when retrieved through the registry. The registry wraps them with built-in adapters.