Definition

Embedding

An embedding is a dense vector that represents the meaning of text, code, or an image — the basis for semantic search, clustering, and RAG.

An embedding is a dense numeric vector — typically hundreds to a few thousand dimensions — produced by a model that maps inputs (text, code, images) to a space where semantically similar things are near each other. "Cat" and "kitten" produce nearby vectors; "cat" and "refinance" do not. Embeddings are the plumbing behind semantic search, RAG, recommendation, and clustering.

Why it matters

For developer tools, embeddings enable semantic code search — "find functions that parse dates" — where keyword search fails because the word "parse" may not appear in the code. They also power the retrieval half of RAG, which is how many AI tools give LLMs access to knowledge that doesn't fit in the context window.

Claude Code and other agentic coding CLIs don't always use embeddings directly — their file tools do direct keyword/glob search by default — but embeddings often show up in MCP servers that wrap codebase search, docs search, or company knowledge.

How it works

An embedding model (OpenAI's text-embedding-3, Voyage AI's code embedders, Cohere, open models like bge or e5) is a neural network trained so that semantically related inputs produce vectors with high cosine similarity. To use one:

  1. Embed your corpus once and store the vectors in a vector database
  2. At query time, embed the query with the same model
  3. Compute similarity (cosine, dot product) and return the nearest K results

Dimensions are a tradeoff: higher dimensions capture more nuance but cost more storage and compute. Typical sizes are 768, 1024, 1536, or 3072.

How it's used

Embeddings in developer tooling:

  • Semantic code search ("find the function that validates coupons")
  • Duplicate detection across large codebases
  • Docs search — RAG over product documentation
  • Long-term memory for agents ("what have we talked about before?")
  • Similarity joins in data pipelines
  • RAG — the main consumer of embeddings
  • Vector database — where embeddings are stored
  • Token — unrelated numerically but conceptually adjacent
  • LLM — typically the downstream generator after retrieval
  • Fine-tuning — a related adaptation technique

FAQ

Do I need an embedding model to use Claude Code?

No. Claude Code's built-in tools do direct file/grep search, which is often enough. Embeddings matter when you integrate external corpora via an MCP server.

Can I use a single embedding for both text and code?

Yes, but code-specific embedders (voyage-code, bge-code, etc.) tend to beat general-purpose models on code tasks. Pick based on the workload.

How often should I re-embed?

Whenever the underlying content changes. Most teams run embedding jobs on commit hooks or nightly builds.

Related terms