Embeddings
Convert text to vectors. RAG, search, classification, clustering — all the same primitive.
An embedding is a fixed-length vector that represents the meaning of a piece of text. Two pieces of text that mean similar things will have similar vectors (cosine similarity near 1). Embeddings power semantic search, clustering, classification, and retrieval-augmented generation (RAG). With the AI SDK use embed for one input and embedMany for batches.
Quick example
import { embed } from "ai"
const { embedding } = await embed({
model: "openai/text-embedding-3-large",
baseURL: "https://synapse.garden/api/v1",
apiKey: process.env.MG_KEY,
value: "The quick brown fox jumps over the lazy dog.",
})
console.log(embedding.length) // 3072 for text-embedding-3-large
console.log(embedding.slice(0, 5)) // [0.012, -0.034, 0.089, ...]Batch embed
Embedding hundreds of strings one at a time wastes round-trips. Use embedMany:
import { embedMany } from "ai"
const { embeddings } = await embedMany({
model: "openai/text-embedding-3-large",
values: chunks, // string[]
})
// embeddings.length === chunks.length
// each embeddings[i] is a number[] of dimension 3072The AI SDK chunks under the hood when the batch exceeds the provider's limit (typically 2048 inputs per request).
OpenAI-compatible API
const res = await client.embeddings.create({
model: "openai/text-embedding-3-large",
input: ["The quick brown fox", "Jumps over the lazy dog"],
})
for (const item of res.data) {
console.log(item.index, item.embedding.length)
}Available embedding models
Filter the catalog by the Embeddings modality on /models. Common choices:
| Model | Dimension | Max tokens | Notes |
|---|---|---|---|
openai/text-embedding-3-large | 3072 | 8192 | OpenAI flagship; best quality |
openai/text-embedding-3-small | 1536 | 8192 | OpenAI default; cheap and capable |
cohere/embed-v4 | 1024 | 8192 | Multilingual; 100+ languages |
voyage/voyage-3-large | 1024 | 32000 | Long context; strong on code/legal |
google/gemini-embedding-exp | 768 | 8192 | Google embeddings |
Reducing dimensions
OpenAI's v3 models support dimension truncation — request a smaller vector and they truncate-then-renormalize. Useful for storage cost:
await client.embeddings.create({
model: "openai/text-embedding-3-large",
input: "...",
dimensions: 1024, // truncate from 3072
})Quality drops gracefully — 1024 is usually fine for most retrieval tasks.
Building a RAG pipeline
import { embed, embedMany, generateText, cosineSimilarity } from "ai"
// 1. Index your corpus (do this once at ingest time)
const chunks = chunkText(myDocument, { size: 800, overlap: 100 })
const { embeddings } = await embedMany({
model: "openai/text-embedding-3-large",
values: chunks,
})
// Persist (chunks[i], embeddings[i]) tuples to your vector store —
// Postgres pgvector, Pinecone, Weaviate, Qdrant, etc.
await db.insert("docs", chunks.map((c, i) => ({ text: c, vector: embeddings[i] })))
// 2. Query (do this every request)
const userQuestion = "How do I rotate an API key?"
const { embedding: queryVec } = await embed({
model: "openai/text-embedding-3-large",
value: userQuestion,
})
const top = await db.query`
SELECT text, 1 - (vector <=> ${queryVec}) AS similarity
FROM docs
ORDER BY vector <=> ${queryVec}
LIMIT 5
`
// 3. Generate with retrieved context
const { text } = await generateText({
model: "openai/gpt-5.4-mini",
system: "Answer using only the context provided.",
prompt: `Context:\n${top.map((t) => t.text).join("\n---\n")}\n\nQuestion: ${userQuestion}`,
})
console.log(text)For better recall, rerank the top-K candidates with a cross-encoder before generation — see Reranking.
Cosine similarity
import { cosineSimilarity } from "ai"
const a = await embed({ model: "...", value: "..." })
const b = await embed({ model: "...", value: "..." })
const score = cosineSimilarity(a.embedding, b.embedding)
// 1.0 = identical, 0.0 = unrelated, -1.0 = oppositeFor pairwise scoring across many candidates, batch the query through a vector DB rather than computing similarity in JS — it's orders of magnitude faster.
Storing embeddings
Recommended vector stores:
| Store | Best for | Notes |
|---|---|---|
| Postgres + pgvector | Most apps | Same DB as your relational data; HNSW index |
| Pinecone | Managed; massive scale | Pay-as-you-go |
| Weaviate | Hybrid (vector + keyword) | Open source + managed |
| Qdrant | Speed + filtering | Rust-based; very fast |
| LanceDB | Embedded / local | SQLite for vectors |
Caveats
- Pick one model and stick with it. Embeddings from different models live in different vector spaces — you can't mix them. If you swap models, re-index your whole corpus.
- Normalize before storing. Most vector DBs assume unit vectors for cosine similarity. The big providers return normalized vectors, but verify with
Math.hypot(...vec)≈ 1. - Don't embed raw HTML. Strip tags, run through a text extractor, optionally summarize. The model's tokens are precious.
- Chunk smartly. 600–1200 character chunks with 10–15% overlap is the standard. Boundary on paragraph or sentence ends, not arbitrary character counts.
- Cache where possible. If you embed the same text twice, you pay twice. Cache by SHA-256 of the input.
Pricing
Embedding models are billed per million input tokens. Output dimension doesn't affect cost. Browse /models filtered by Embeddings for live rates.
A typical RAG pipeline:
- Ingest 1M words ≈ 1.3M tokens → one-time cost (depends on the model)
- Each query embeds ~50 tokens → near-zero per-query cost
- The big spend is the LLM doing the actual generation, not the embedding step