Definition

Fine-tuning

Fine-tuning adapts a pre-trained LLM to a specific task or domain by continuing training on a smaller, targeted dataset.

Fine-tuning is the process of continuing training a pre-trained LLM on a smaller, focused dataset to specialize it for a task or domain. The base model already "knows" language from pretraining; fine-tuning teaches it to answer in a specific format, adopt a voice, or handle a narrow workload (like your company's ticketing system) better than generic prompting could.

Why it matters

Before fine-tuning, strong prompt engineering and RAG are cheaper, faster to iterate on, and don't lock you to a frozen model checkpoint. Most teams should exhaust those first. Fine-tuning wins when:

  • The task has a consistent format you can't reliably prompt into
  • Quality matters more than flexibility
  • You need to shave tokens from each call by not re-specifying the task
  • You're running a smaller self-hosted model and need it to behave

For agentic coding, most users never fine-tune — they use off-the-shelf Claude, GPT, or Qwen through CLIs like Claude Code, Codex CLI, and Qwen Code. Fine-tuning is usually the domain of platform teams building a specialized AI product.

How it works

Fine-tuning variants from most to least expensive:

  • Full fine-tuning — update all model weights. Needs large GPUs and a lot of data. Rare outside labs.
  • LoRA / QLoRA — train small low-rank adapters on top of frozen base weights. Fast, cheap, runs on consumer hardware.
  • Prefix / prompt tuning — learn soft prompts rather than touching weights. Even lighter.
  • Supervised fine-tuning (SFT) — standard next-token loss on input/output pairs.
  • RLHF / DPO / ORPO — preference-based training with comparison pairs.

After training, the adapter or full checkpoint is deployed. You call it like any other model.

How it's used

Typical fine-tuning projects:

  • Domain-specific copilot — medical, legal, scientific text
  • Structured output — consistent JSON shape without heavy schema prompts
  • Style matching — corporate voice, documentation tone
  • Small-model specialization — LoRA a 7B model into something useful for one narrow task

Most coding CLIs don't need fine-tuning — frontier base models already handle code well. If your team runs a private model, fine-tuning on internal code can be worthwhile.

FAQ

Should I fine-tune for better code completion?

Rarely. Base frontier models already outperform most fine-tuned smaller ones for general coding. Fine-tune when you have a specific, repetitive format the base model won't respect even with strong prompting.

How much data do I need?

For LoRA on a narrow task, a few hundred high-quality examples often suffice. For broader behavior changes, thousands to tens of thousands. Data quality dominates quantity.

Related terms