SDK

Running agents in a custom sandbox (Docker)

Route an agent's shell commands into a per-worktree container with the pluggable command executor, while file operations stay on the host.

harnext runs shell commands through a bash tool and a background-shell manager. The CommandExecutor seam lets you decide where and how those commands run — for example, inside a Docker container — without touching the file tools or re-implementing any of the existing truncation, timeout, streaming, or abort behavior.

Concept: execution-surface-only sandboxing

The goal is to isolate only the part that needs it. File tools (read, edit, write) keep operating on the host worktree, so host-side git diff and merge logic is untouched. Only command execution is routed into a container via docker exec. A bind mount ties the worktree to the container's working directory, so a host-side edit is visible in the container instantly.

HOSTWorktree — your filesread · edit · write · gitcwd → ./worktreeCONTAINER · docker execSandboxed shellbash + background shellsexecCwd → /workcommand executionCommandExecutorbind mountedits sync instantlyhost-side git stays fastno port / dep collisions

The split is two options on createAgentSession:

  • cwd — where the file tools operate (the host worktree).
  • execCwd — where commands run, when it differs from cwd (the container's bind-mount target, e.g. /work).

The CommandExecutor interface

An executor owns process creation. spawn returns a ChildProcessLike — anything that quacks like a Node child process — and the optional dispose is awaited when the session is disposed.

TypeScript
interface ChildProcessLike {
  stdout: Readable | null;
  stderr: Readable | null;
  pid?: number;
  kill(signal?: NodeJS.Signals): boolean;
  on(event: 'close', cb: (code: number | null) => void): unknown;
  on(event: 'error', cb: (err: Error) => void): unknown;
}

interface ExecutorSpawnOptions {
  cwd: string;             // command working dir (container-side path for a sandbox)
  env?: NodeJS.ProcessEnv; // optional — the executor OWNS env; host process.env never leaks
  signal?: AbortSignal;    // the executor kills its process on abort
}

interface CommandExecutor {
  spawn(command: string, opts: ExecutorSpawnOptions): ChildProcessLike;
  dispose?(): void | Promise<void>;  // awaited on session.dispose()
}
  • env is yours to construct. harnext does not pass the host process.env through — your executor decides exactly what the sandbox sees.
  • signal is a kill contract. When it aborts, the executor must terminate the process it started.
  • dispose runs on teardown. session.dispose() awaits executor.dispose?.(), so a container can be removed there.

A worked example: DockerExecutor

This reference is distilled from the end-to-end verification that passed 22/22 checks against a real node:22-bookworm-slim container. It runs commands with docker exec, constructs a clean env, and wires the abort signal to a SIGTERM.

TypeScript
import { spawn } from 'node:child_process';
import type {
  CommandExecutor,
  ChildProcessLike,
  ExecutorSpawnOptions,
} from '@harnext/core';

export class DockerExecutor implements CommandExecutor {
  constructor(
    private readonly containerId: string,
    /** Clean env for the container — the host's process.env never leaks. */
    private readonly containerEnv: NodeJS.ProcessEnv = {},
  ) {}

  spawn(command: string, opts: ExecutorSpawnOptions): ChildProcessLike {
    const env = { ...this.containerEnv, ...(opts.env ?? {}) };
    const envFlags = Object.entries(env).flatMap(([k, v]) => ['-e', `${k}=${v}`]);
    const child = spawn(
      'docker',
      ['exec', '-w', opts.cwd, ...envFlags, this.containerId, 'sh', '-c', command],
      { stdio: ['ignore', 'pipe', 'pipe'] },
    );
    if (opts.signal) {
      const onAbort = () => child.kill('SIGTERM');
      if (opts.signal.aborted) onAbort();
      else {
        opts.signal.addEventListener('abort', onAbort, { once: true });
        child.on('close', () => opts.signal!.removeEventListener('abort', onAbort));
      }
    }
    return child;
  }

  async dispose(): Promise<void> {
    await new Promise<void>((resolve) => {
      const p = spawn('docker', ['rm', '-f', this.containerId], { stdio: 'ignore' });
      p.on('close', () => resolve());
      p.on('error', () => resolve());
    });
  }
}

Wiring it up

Start a per-worktree container that bind-mounts the worktree, then point execCwd at the mount and pass the executor.

TypeScript
// 1. Bind-mount the worktree into a per-worktree container:
//    docker run -d --rm -v <hostWorktree>:/work <image> sleep infinity
// 2. Route command execution into it; keep read/edit/write on the host:
const { session } = await createAgentSession({
  provider,
  modelId,
  cwd: hostWorktree,   // read / edit / write operate here (host)
  execCwd: '/work',    // bash + background shells run here (container)
  executor: new DockerExecutor(containerId, {
    PATH: '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
  }),
  closedToolSet: true, // exact, auditable tool set — no host-spawned MCP
});

// ...run the agent...

await session.dispose(); // tears down background shells AND removes the container
Use closedToolSet for sandboxes
Without it, MCP servers and the skill tool inject and spawn on the host, outside your sandbox. closedToolSet: true yields exactly the resolved tools — auditable, and contained.

Session options

OptionPurpose
executorWhere shell commands run (foreground + background). Default = host child_process.
execCwdCommand working dir when it differs from cwd (the file-tool dir) — e.g. a container bind-mount target like /work.
toolOverridesSwap individual tools by name without losing the background-shell trio.
buildToolsTransform the default tool list.
closedToolSetYield exactly the resolved tools — no MCP / skill injection (auditable, sandbox-safe).
disableSkillToolSkip the skill tool only.

Lifecycle

Create the container before the session (or lazily on first spawn), bind-mount the worktree, and let session.dispose() clean up: it tears down any background shells and then awaits executor.dispose(), which removes the container. Because the bind mount makes host edits instantly visible inside the container, there's nothing to copy or sync between the two surfaces.

Composition

The executor changes where commands run; the tool-shaping options change which tools exist. They compose.

Swap a single tool

toolOverrides replaces tools by name while keeping everything else — crucially, the background-shell trio survives.

TypeScript
const { session } = await createAgentSession({
  executor: new DockerExecutor(containerId),
  execCwd: '/work',
  toolOverrides: {
    // replace just the 'web_fetch' tool; bash + background shells untouched
    web_fetch: myProxiedWebFetchTool,
  },
});

Transform the whole list

TypeScript
const { session } = await createAgentSession({
  executor: new DockerExecutor(containerId),
  execCwd: '/work',
  buildTools: (tools) => tools.filter((t) => t.name !== 'web_search'),
});

Gotchas

  • Env isolation. The executor owns env — if a command needs a variable, put it in containerEnv (or merge from opts.env). The host process.env is never forwarded.
  • Abort kills the container process, not the container. Wire opts.signal to a kill so an aborted command stops; the container itself lives until dispose.
  • Working-dir mapping. opts.cwd passed to spawn is the command-side path (e.g. /work), resolved from execCwd — not the host worktree path.
  • Keep MCP off the host. Use closedToolSetso MCP servers don't spawn outside the sandbox.
Source
The pluggable command executor was designed in QualityUnit/harnext#43 and implemented in #47 (building on the background-shell work in #42 / #45). See the announcement post for the why.