Sandbox your AI agent's shell, keep its files on the host

The collision problem

Fan an agent out across several worktrees and they start stepping on each other. Two runs both want port 3000. One pip installmutates a shared environment another run depends on. A runaway dev server outlives the task that spawned it. You want each agent's commands boxed into an isolated container — but you still want host-side git to be instant, because diffing and merging worktrees on the host is the whole point of running them in parallel.

Those two wants seem to pull against each other. They don't.

Execution-surface-only sandboxing

The trick is to sandbox only the part that actually needs isolation — command execution — and leave everything else on the host. harnext agents run shell commands through a bash tool and a background-shell manager. The new release routes both through a single injectable seam, the CommandExecutor. File tools (read, edit, write) keep operating on the host worktree; only command execution is sent into a container via docker exec. A bind mount ties the two together, so a host-side edit is visible in the container the instant it's written.

File tools stay on the host; only command execution crosses into the container. The bind mount keeps both sides in sync.

The split shows up in two createAgentSession options: cwd is where the file tools operate (the host worktree), and execCwdis where commands run (the container's bind-mount target, e.g. /work).

Why a seam beats replacing the tools

Before this, the only way to change where commands ran was to replace the entire tool set — which silently disabled background shells and forced you to re-implement truncation, timeouts, output streaming, and abort handling by hand. The CommandExecutor owns only where and how a command runs, so foreground bash and background shells both flow through one tiny implementation and inherit every existing behavior for free. The default executor reproduces the old host behavior exactly. Three long-standing footguns go away with it:

Custom tools no longer silently disables background shells.
run_in_background never silently degrades to a blocking foreground run.
The executor owns env construction, so the host's process.env can't leak into the sandbox.

The whole executor

A sandbox is one class. spawn returns anything that looks like a child process; dispose tears the container down when the session ends. This DockerExecutor is distilled from the end-to-end verification that passed 22/22 checks against a real node:22-bookworm-slim container:

TypeScript

import { spawn } from 'node:child_process';
import type {
  CommandExecutor,
  ChildProcessLike,
  ExecutorSpawnOptions,
} from '@harnext/core';

export class DockerExecutor implements CommandExecutor {
  constructor(
    private readonly containerId: string,
    /** Clean env for the container — the host's process.env never leaks. */
    private readonly containerEnv: NodeJS.ProcessEnv = {},
  ) {}

  spawn(command: string, opts: ExecutorSpawnOptions): ChildProcessLike {
    const env = { ...this.containerEnv, ...(opts.env ?? {}) };
    const envFlags = Object.entries(env).flatMap(([k, v]) => ['-e', `${k}=${v}`]);
    const child = spawn(
      'docker',
      ['exec', '-w', opts.cwd, ...envFlags, this.containerId, 'sh', '-c', command],
      { stdio: ['ignore', 'pipe', 'pipe'] },
    );
    if (opts.signal) {
      const onAbort = () => child.kill('SIGTERM');
      if (opts.signal.aborted) onAbort();
      else {
        opts.signal.addEventListener('abort', onAbort, { once: true });
        child.on('close', () => opts.signal!.removeEventListener('abort', onAbort));
      }
    }
    return child;
  }

  async dispose(): Promise<void> {
    await new Promise<void>((resolve) => {
      const p = spawn('docker', ['rm', '-f', this.containerId], { stdio: 'ignore' });
      p.on('close', () => resolve());
      p.on('error', () => resolve());
    });
  }
}

Wiring it up is the cwd / execCwd split plus a closed tool set:

TypeScript

// 1. Bind-mount the worktree into a per-worktree container:
//    docker run -d --rm -v <hostWorktree>:/work <image> sleep infinity
// 2. Route command execution into it; keep read/edit/write on the host:
const { session } = await createAgentSession({
  provider,
  modelId,
  cwd: hostWorktree,   // read / edit / write operate here (host)
  execCwd: '/work',    // bash + background shells run here (container)
  executor: new DockerExecutor(containerId, {
    PATH: '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
  }),
  closedToolSet: true, // exact, auditable tool set — no host-spawned MCP
});

// ...run the agent...

await session.dispose(); // tears down background shells AND removes the container

Why closedToolSet

Without it, MCP servers and the skill tool would inject and spawn on the host — outside your sandbox. closedToolSet yields exactly the resolved tools, which is what you want when the whole point is to contain execution.

What you get

Multiple worktrees build and run the same project concurrently in isolated containers — no port conflicts, no shared-dependency drift — while host-side git diff and merge logic runs untouched against the worktree path. One seam, verified end-to-end, with all the bash semantics you already rely on still in place.

Go deeper

Custom sandbox (Docker)

The full guide: the CommandExecutor contract, the verified DockerExecutor, wiring, lifecycle, composition, and gotchas.

Design & implementation

Background in QualityUnit/harnext#43; the implementation landed in #47.

← All posts

The collision problem#

Execution-surface-only sandboxing#

Why a seam beats replacing the tools#

The whole executor#

What you get#