Definition

Embedding

An embedding is a dense vector that represents the meaning of text, code, or an image — the basis for semantic search, clustering, and RAG.

An embedding is a dense numeric vector — typically hundreds to a few thousand dimensions — produced by a model that maps inputs (text, code, images) to a space where semantically similar things are near each other. "Cat" and "kitten" produce nearby vectors; "cat" and "refinance" do not. Embeddings are the plumbing behind semantic search, RAG, recommendation, and clustering.

Why it matters

For developer tools, embeddings enable semantic code search — "find functions that parse dates" — where keyword search fails because the word "parse" may not appear in the code. They also power the retrieval half of RAG, which is how many AI tools give LLMs access to knowledge that doesn't fit in the context window.

Claude Code and other agentic coding CLIs don't always use embeddings directly — their file tools do direct keyword/glob search by default — but embeddings often show up in MCP servers that wrap codebase search, docs search, or company knowledge.

How it works

An embedding model (OpenAI's text-embedding-3, Voyage AI's code embedders, Cohere, open models like bge or e5) is a neural network trained so that semantically related inputs produce vectors with high cosine similarity. To use one:

Embed your corpus once and store the vectors in a vector database
At query time, embed the query with the same model
Compute similarity (cosine, dot product) and return the nearest K results

Dimensions are a tradeoff: higher dimensions capture more nuance but cost more storage and compute. Typical sizes are 768, 1024, 1536, or 3072.

How it's used

Embeddings in developer tooling:

Semantic code search ("find the function that validates coupons")
Duplicate detection across large codebases
Docs search — RAG over product documentation
Long-term memory for agents ("what have we talked about before?")
Similarity joins in data pipelines

RAG — the main consumer of embeddings
Vector database — where embeddings are stored
Token — unrelated numerically but conceptually adjacent
LLM — typically the downstream generator after retrieval
Fine-tuning — a related adaptation technique

Embedding

Why it matters

How it works

How it's used

FAQ

Do I need an embedding model to use Claude Code?

Can I use a single embedding for both text and code?

How often should I re-embed?

Related terms