Definition

Sandbox

A sandbox is an execution boundary that limits what an AI coding agent can touch — which files, processes, or network resources it can access.

A sandbox is an execution boundary that limits what an AI coding agent can access — which directories it can read or write, which processes it can spawn, which network hosts it can reach. For agentic coding CLIs like Codex CLI and Claude Code, sandboxing is how you give the agent shell access without giving it the keys to your whole machine.

Why it matters

Autonomous agents run shell commands. A confused agent with no constraints can rm -rf the wrong directory, exfiltrate credentials, or push to the wrong remote. Sandboxes turn "I trust this agent completely" into "the agent can only do what its sandbox permits" — which is the right posture for production use.

Even for solo devs, sandboxing prevents the class of mistakes where the agent hallucinates a command that happens to be destructive. SpaceSpider itself doesn't impose a sandbox — it runs whatever CLI you pick in a PTY — but most of the CLIs it hosts ship their own sandbox mechanisms.

How it works

Sandbox implementations depend on the OS:

  • macOS — Seatbelt (sandbox-exec) profiles deny syscalls and filesystem regions. Codex CLI uses this for its sandboxed execution mode.
  • Linux — Landlock + seccomp. Restricts directory access and kernel calls without requiring root.
  • Windows — AppContainer or Job Objects, though cross-platform agent CLIs often skip fine-grained Windows sandboxing.
  • Containers — Docker or Podman as a coarser sandbox for the whole agent process.

Common restrictions:

  • Writes limited to the project directory (no home, no system)
  • Network disabled or restricted to an allow-list
  • Process spawning limited to specific binaries
  • Read access limited to project + standard libs

Codex CLI exposes a full-auto mode that assumes its sandbox is active, letting the agent run commands without per-step approval.

How it's used

Typical sandbox configurations:

  • Dev: project-directory-only writes, network allow-listed to package registries
  • CI: strict sandbox plus no network, to keep builds reproducible
  • Exploratory: loose sandbox, trust the agent, rely on checkpoints for recovery
  • Plan mode — read-only mode, a related containment tool
  • Checkpoint — recovery vs. prevention
  • Hook — can add custom guardrails on top of a sandbox
  • Codex CLI — canonical sandbox-first CLI
  • Autonomous agent — the thing being contained

FAQ

Does sandboxing slow the agent down?

Syscall filtering adds negligible overhead (microseconds). The larger hit is when the sandbox blocks a command the agent needed and it has to adapt — but that's the point.

Is SpaceSpider sandboxed?

SpaceSpider itself runs in Tauri's process model with capability-based permissions (see capabilities/default.json). The CLIs it hosts run with whatever sandbox they configure internally. We don't add an extra layer on top of each CLI.

Related terms