Embedding
An embedding is a dense vector that represents the meaning of text, code, or an image — the basis for semantic search, clustering, and RAG.
An embedding is a dense numeric vector — typically hundreds to a few thousand dimensions — produced by a model that maps inputs (text, code, images) to a space where semantically similar things are near each other. "Cat" and "kitten" produce nearby vectors; "cat" and "refinance" do not. Embeddings are the plumbing behind semantic search, RAG, recommendation, and clustering.
Why it matters
For developer tools, embeddings enable semantic code search — "find functions that parse dates" — where keyword search fails because the word "parse" may not appear in the code. They also power the retrieval half of RAG, which is how many AI tools give LLMs access to knowledge that doesn't fit in the context window.
Claude Code and other agentic coding CLIs don't always use embeddings directly — their file tools do direct keyword/glob search by default — but embeddings often show up in MCP servers that wrap codebase search, docs search, or company knowledge.
How it works
An embedding model (OpenAI's text-embedding-3, Voyage AI's code embedders, Cohere, open models like bge or e5) is a neural network trained so that semantically related inputs produce vectors with high cosine similarity. To use one:
- Embed your corpus once and store the vectors in a vector database
- At query time, embed the query with the same model
- Compute similarity (cosine, dot product) and return the nearest K results
Dimensions are a tradeoff: higher dimensions capture more nuance but cost more storage and compute. Typical sizes are 768, 1024, 1536, or 3072.
How it's used
Embeddings in developer tooling:
- Semantic code search ("find the function that validates coupons")
- Duplicate detection across large codebases
- Docs search — RAG over product documentation
- Long-term memory for agents ("what have we talked about before?")
- Similarity joins in data pipelines
Related terms
- RAG — the main consumer of embeddings
- Vector database — where embeddings are stored
- Token — unrelated numerically but conceptually adjacent
- LLM — typically the downstream generator after retrieval
- Fine-tuning — a related adaptation technique
FAQ
Do I need an embedding model to use Claude Code?
No. Claude Code's built-in tools do direct file/grep search, which is often enough. Embeddings matter when you integrate external corpora via an MCP server.
Can I use a single embedding for both text and code?
Yes, but code-specific embedders (voyage-code, bge-code, etc.) tend to beat general-purpose models on code tasks. Pick based on the workload.
How often should I re-embed?
Whenever the underlying content changes. Most teams run embedding jobs on commit hooks or nightly builds.
Related terms
- Agentic codingAgentic coding is software development where an LLM-powered agent plans, edits, runs, and verifies code on its own using tools, not just autocomplete.
- AI pair programmingAI pair programming is a collaboration style where an LLM assistant sits alongside you, suggesting code and reviewing changes in real time as you work.
- ANSI escape codesANSI escape codes are control sequences that terminals interpret for colors, cursor movement, and screen clearing — the language of every modern CLI UI.
- Autonomous agentAn autonomous agent is an AI program that perceives, decides, and acts on its own toward a goal — the architecture behind modern coding CLIs.
- CheckpointA checkpoint is a saved snapshot of file state that lets you roll back an AI coding agent's changes to a known-good point.
- Claude CodeClaude Code is Anthropic's official command-line agent that plans, edits, runs, and verifies code across your repo using Claude models and tool use.