LLM
An LLM is a neural network trained on massive text corpora to predict the next token. Modern LLMs power coding agents, chat, and tool-using assistants.
A large language model (LLM) is a neural network — almost always a transformer — trained on very large text corpora to predict the next token given the previous tokens. "Large" here means billions to trillions of parameters. Modern LLMs like Claude, GPT, Gemini, Qwen, and Kimi are the engines behind every serious AI developer tool.
Why it matters
Every agentic coding CLI is just a thin orchestration loop around an LLM. When you run Claude Code, Codex CLI, or Qwen Code, the CLI packages your files and instructions into a prompt, sends it to an LLM, and interprets the response as tool use calls. The model is doing the thinking; the CLI is doing the plumbing.
Understanding LLMs — their context window, their tendency to hallucinate, the effect of a system prompt — makes you a better user of every AI coding tool, including the ones SpaceSpider hosts.
How it works
At inference time an LLM takes a sequence of input tokens and produces a probability distribution over the next token. The client samples from that distribution (with temperature, top-p, top-k parameters tuning how random the pick is), appends the chosen token, and repeats. This happens one token at a time, which is why you see streaming output.
Key properties developers care about:
- Context window — the maximum number of tokens a model can attend to (8k, 200k, 1M+)
- Training cutoff — the date after which the model has no knowledge (without retrieval)
- Capability tier — frontier vs. smaller/cheaper models, with tradeoffs in speed and cost
- Tool use — whether the model can emit structured function calls
Post-training includes instruction tuning, RLHF, and often fine-tuning on code-specific data, which is what makes coding-specialized models good at diffs, compilation errors, and test output.
How it's used
In an agentic CLI loop:
- Client builds a prompt: system prompt + conversation history + available tools
- LLM emits text or a tool call
- Client executes the tool call, adds the result to the conversation
- Loop
Compression via RAG, embeddings, and context summarization keeps conversations productive across long tasks.
Related terms
- Token — the atomic unit LLMs operate on
- Context window — how much the model can see at once
- Hallucination — the failure mode you care about
- Fine-tuning — how models get specialized
- RAG — bolting external knowledge onto an LLM
FAQ
Can I run an LLM locally?
Yes — open-weights models (Llama, Qwen, Mistral, DeepSeek) run on consumer GPUs via llama.cpp, vLLM, or Ollama. Frontier closed models (Claude, GPT-4/5, Gemini Ultra) don't run locally.
Why are coding-specialized LLMs better at code?
They've seen far more code during training and are often fine-tuned on diffs, test cases, and error output. The architecture is the same; the data is what differs.
Related terms
- Agentic codingAgentic coding is software development where an LLM-powered agent plans, edits, runs, and verifies code on its own using tools, not just autocomplete.
- AI pair programmingAI pair programming is a collaboration style where an LLM assistant sits alongside you, suggesting code and reviewing changes in real time as you work.
- ANSI escape codesANSI escape codes are control sequences that terminals interpret for colors, cursor movement, and screen clearing — the language of every modern CLI UI.
- Autonomous agentAn autonomous agent is an AI program that perceives, decides, and acts on its own toward a goal — the architecture behind modern coding CLIs.
- CheckpointA checkpoint is a saved snapshot of file state that lets you roll back an AI coding agent's changes to a known-good point.
- Claude CodeClaude Code is Anthropic's official command-line agent that plans, edits, runs, and verifies code across your repo using Claude models and tool use.