Definition

Context window

The context window is the maximum number of tokens an LLM can consider at once — the hard limit on how much conversation and code it can see.

The context window is the maximum number of tokens an LLM can consider in a single forward pass. It's a hard limit on everything the model sees at once — system prompt, conversation history, tool definitions, tool observations, and any retrieved snippets. When you exceed it, the client must truncate, summarize, or fail.

Why it matters

Context window is the single biggest pragmatic constraint on agentic coding. Every file read, every bash output, every diff, and every human instruction consumes tokens from a budget that's often 200k or 1M. On long tasks the window fills, and the agent either forgets earlier steps or wastes turns recompacting.

Models with larger windows — Kimi (via Kimi CLI), Claude (via Claude Code), Gemini — handle bigger codebases without losing state. Running several agents in a SpaceSpider grid layout, each with its own window, is a practical way to sidestep the limit: split the task, give each agent a bounded scope.

How it works

Every input character is first tokenized (see token). The model computes attention over all tokens in the window, so compute scales roughly quadratically with window length — which is why "cheap frontier model + long context" often isn't as cheap as the per-token price suggests.

The client manages what fits:

  • Rolling truncation — drop the oldest messages
  • Summarization — replace old turns with a compact summary
  • RAG — retrieve only the most relevant chunks instead of dumping everything
  • Caching — providers like Anthropic let you mark prefixes as cacheable so long system prompts don't recompute

How it's used

Practical context-window techniques:

  • Keep system prompts tight so more room is left for the task
  • Use subagents for scoped investigation so the parent's window stays clean
  • Prefer targeted read_file calls over dumping whole directories via cat
  • For very large repos, pair with embeddings + retrieval

See /blog/managing-context-in-claude-code for deeper strategies.

  • Token — the atomic unit of the window
  • LLM — where the window lives
  • RAG — how to cheat the window size
  • Subagent — the standard context-hygiene tool
  • Hallucination — what overflowed context often causes

FAQ

Is a bigger context window always better?

No. Larger windows cost more (compute is quadratic) and performance can degrade on the "lost in the middle" problem — models over-weight the beginning and end and under-attend the middle. A compact, well-organized 40k window often beats a sprawling 400k one.

How do I check token usage during a session?

Most CLIs expose a status line or command showing current token count (Claude Code has /context, Codex CLI similar). Watch it on long sessions to decide when to compact or restart.

Related terms