Definition

LLM

An LLM is a neural network trained on massive text corpora to predict the next token. Modern LLMs power coding agents, chat, and tool-using assistants.

A large language model (LLM) is a neural network — almost always a transformer — trained on very large text corpora to predict the next token given the previous tokens. "Large" here means billions to trillions of parameters. Modern LLMs like Claude, GPT, Gemini, Qwen, and Kimi are the engines behind every serious AI developer tool.

Why it matters

Every agentic coding CLI is just a thin orchestration loop around an LLM. When you run Claude Code, Codex CLI, or Qwen Code, the CLI packages your files and instructions into a prompt, sends it to an LLM, and interprets the response as tool use calls. The model is doing the thinking; the CLI is doing the plumbing.

Understanding LLMs — their context window, their tendency to hallucinate, the effect of a system prompt — makes you a better user of every AI coding tool, including the ones SpaceSpider hosts.

How it works

At inference time an LLM takes a sequence of input tokens and produces a probability distribution over the next token. The client samples from that distribution (with temperature, top-p, top-k parameters tuning how random the pick is), appends the chosen token, and repeats. This happens one token at a time, which is why you see streaming output.

Key properties developers care about:

Context window — the maximum number of tokens a model can attend to (8k, 200k, 1M+)
Training cutoff — the date after which the model has no knowledge (without retrieval)
Capability tier — frontier vs. smaller/cheaper models, with tradeoffs in speed and cost
Tool use — whether the model can emit structured function calls

Post-training includes instruction tuning, RLHF, and often fine-tuning on code-specific data, which is what makes coding-specialized models good at diffs, compilation errors, and test output.

How it's used

In an agentic CLI loop:

Client builds a prompt: system prompt + conversation history + available tools
LLM emits text or a tool call
Client executes the tool call, adds the result to the conversation
Loop

Compression via RAG, embeddings, and context summarization keeps conversations productive across long tasks.

Token — the atomic unit LLMs operate on
Context window — how much the model can see at once
Hallucination — the failure mode you care about
Fine-tuning — how models get specialized
RAG — bolting external knowledge onto an LLM

LLM

Why it matters

How it works

How it's used

FAQ

Can I run an LLM locally?

Why are coding-specialized LLMs better at code?

Related terms