SDK

Running agents in a custom sandbox (Docker)

Route an agent's shell commands into a per-worktree container with the pluggable command executor, while file operations stay on the host.

harnext runs shell commands through a bash tool and a background-shell manager. The CommandExecutor seam lets you decide where and how those commands run — for example, inside a Docker container — without touching the file tools or re-implementing any of the existing truncation, timeout, streaming, or abort behavior.

Concept: execution-surface-only sandboxing

The goal is to isolate only the part that needs it. File tools (read, edit, write) keep operating on the host worktree, so host-side git diff and merge logic is untouched. Only command execution is routed into a container via docker exec. A bind mount ties the worktree to the container's working directory, so a host-side edit is visible in the container instantly.

The split is two options on createAgentSession:

cwd — where the file tools operate (the host worktree).
execCwd — where commands run, when it differs from cwd (the container's bind-mount target, e.g. /work).

The CommandExecutor interface

An executor owns process creation. spawn returns a ChildProcessLike — anything that quacks like a Node child process — and the optional dispose is awaited when the session is disposed.

TypeScript

interface ChildProcessLike {
  stdout: Readable | null;
  stderr: Readable | null;
  pid?: number;
  kill(signal?: NodeJS.Signals): boolean;
  on(event: 'close', cb: (code: number | null) => void): unknown;
  on(event: 'error', cb: (err: Error) => void): unknown;
}

interface ExecutorSpawnOptions {
  cwd: string;             // command working dir (container-side path for a sandbox)
  env?: NodeJS.ProcessEnv; // optional — the executor OWNS env; host process.env never leaks
  signal?: AbortSignal;    // the executor kills its process on abort
}

interface CommandExecutor {
  spawn(command: string, opts: ExecutorSpawnOptions): ChildProcessLike;
  dispose?(): void | Promise<void>;  // awaited on session.dispose()
}

env is yours to construct. harnext does not pass the host process.env through — your executor decides exactly what the sandbox sees.
signal is a kill contract. When it aborts, the executor must terminate the process it started.
dispose runs on teardown. session.dispose() awaits executor.dispose?.(), so a container can be removed there.

A worked example: DockerExecutor

This reference is distilled from the end-to-end verification that passed 22/22 checks against a real node:22-bookworm-slim container. It runs commands with docker exec, constructs a clean env, and wires the abort signal to a SIGTERM.

TypeScript

import { spawn } from 'node:child_process';
import type {
  CommandExecutor,
  ChildProcessLike,
  ExecutorSpawnOptions,
} from '@harnext/core';

export class DockerExecutor implements CommandExecutor {
  constructor(
    private readonly containerId: string,
    /** Clean env for the container — the host's process.env never leaks. */
    private readonly containerEnv: NodeJS.ProcessEnv = {},
  ) {}

  spawn(command: string, opts: ExecutorSpawnOptions): ChildProcessLike {
    const env = { ...this.containerEnv, ...(opts.env ?? {}) };
    const envFlags = Object.entries(env).flatMap(([k, v]) => ['-e', `${k}=${v}`]);
    const child = spawn(
      'docker',
      ['exec', '-w', opts.cwd, ...envFlags, this.containerId, 'sh', '-c', command],
      { stdio: ['ignore', 'pipe', 'pipe'] },
    );
    if (opts.signal) {
      const onAbort = () => child.kill('SIGTERM');
      if (opts.signal.aborted) onAbort();
      else {
        opts.signal.addEventListener('abort', onAbort, { once: true });
        child.on('close', () => opts.signal!.removeEventListener('abort', onAbort));
      }
    }
    return child;
  }

  async dispose(): Promise<void> {
    await new Promise<void>((resolve) => {
      const p = spawn('docker', ['rm', '-f', this.containerId], { stdio: 'ignore' });
      p.on('close', () => resolve());
      p.on('error', () => resolve());
    });
  }
}

Wiring it up

Start a per-worktree container that bind-mounts the worktree, then point execCwd at the mount and pass the executor.

TypeScript

// 1. Bind-mount the worktree into a per-worktree container:
//    docker run -d --rm -v <hostWorktree>:/work <image> sleep infinity
// 2. Route command execution into it; keep read/edit/write on the host:
const { session } = await createAgentSession({
  provider,
  modelId,
  cwd: hostWorktree,   // read / edit / write operate here (host)
  execCwd: '/work',    // bash + background shells run here (container)
  executor: new DockerExecutor(containerId, {
    PATH: '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
  }),
  closedToolSet: true, // exact, auditable tool set — no host-spawned MCP
});

// ...run the agent...

await session.dispose(); // tears down background shells AND removes the container

Use closedToolSet for sandboxes

Without it, MCP servers and the skill tool inject and spawn on the host, outside your sandbox. closedToolSet: true yields exactly the resolved tools — auditable, and contained.

Session options

Option	Purpose
`executor`	Where shell commands run (foreground + background). Default = host `child_process`.
`execCwd`	Command working dir when it differs from `cwd` (the file-tool dir) — e.g. a container bind-mount target like `/work`.
`toolOverrides`	Swap individual tools by name without losing the background-shell trio.
`buildTools`	Transform the default tool list.
`closedToolSet`	Yield exactly the resolved tools — no MCP / skill injection (auditable, sandbox-safe).
`disableSkillTool`	Skip the `skill` tool only.

Lifecycle

Create the container before the session (or lazily on first spawn), bind-mount the worktree, and let session.dispose() clean up: it tears down any background shells and then awaits executor.dispose(), which removes the container. Because the bind mount makes host edits instantly visible inside the container, there's nothing to copy or sync between the two surfaces.

Composition

The executor changes where commands run; the tool-shaping options change which tools exist. They compose.

Swap a single tool

toolOverrides replaces tools by name while keeping everything else — crucially, the background-shell trio survives.

TypeScript

const { session } = await createAgentSession({
  executor: new DockerExecutor(containerId),
  execCwd: '/work',
  toolOverrides: {
    // replace just the 'web_fetch' tool; bash + background shells untouched
    web_fetch: myProxiedWebFetchTool,
  },
});

Transform the whole list

TypeScript

const { session } = await createAgentSession({
  executor: new DockerExecutor(containerId),
  execCwd: '/work',
  buildTools: (tools) => tools.filter((t) => t.name !== 'web_search'),
});

Gotchas

Env isolation. The executor owns env — if a command needs a variable, put it in containerEnv (or merge from opts.env). The host process.env is never forwarded.
Abort kills the container process, not the container. Wire opts.signal to a kill so an aborted command stops; the container itself lives until dispose.
Working-dir mapping. opts.cwd passed to spawn is the command-side path (e.g. /work), resolved from execCwd — not the host worktree path.
Keep MCP off the host. Use closedToolSetso MCP servers don't spawn outside the sandbox.

Source

The pluggable command executor was designed in QualityUnit/harnext#43 and implemented in #47 (building on the background-shell work in #42 / #45). See the announcement post for the why.

Running agents in a custom sandbox (Docker)

Concept: execution-surface-only sandboxing#

The CommandExecutor interface#

A worked example: DockerExecutor#

Wiring it up#

Session options#

Lifecycle#

Composition#

Swap a single tool#

Transform the whole list#

Gotchas#

Concept: execution-surface-only sandboxing

The CommandExecutor interface

A worked example: DockerExecutor

Wiring it up

Session options

Lifecycle

Composition

Swap a single tool

Transform the whole list

Gotchas