SDK

Image input

Send images to the model — paste them with Ctrl+V in the CLI, or pass URLs, data URIs, file paths, and base64 to session.prompt() in the SDK.

harnext accepts multimodal input. Interactively you paste an image from the clipboard; programmatically you pass images alongside the text prompt and harnext resolves each to the base64 form providers expect.

SDK: session.prompt(text, images)

session.prompt() takes an optional second argument — a list of ImageInput values. Each entry is one of four shapes:

TypeScript
type ImageInput =
  | ImageContent                      // { type: 'image', data: '<base64>', mimeType?: string }
  | { url: string; mimeType?: string } // http(s) URL or data: URI
  | string;                            // http(s) URL, data: URI, or local file path
TypeScript
import { createAgentSession, type ImageInput } from '@harnext/core';

const { session } = await createAgentSession({ provider, modelId });

await session.prompt('describe this', [
  { url: 'https://example.com/cat.png' },                    // http(s) → fetched
  'data:image/png;base64,iVBORw0KGgo…',                      // data: URI
  '/path/to/local.jpg',                                      // file path → read
  { type: 'image', data: '<b64>', mimeType: 'image/png' },   // raw base64
]);

How resolution works

  • resolveImages(inputs) (and the single resolveImage(input)) normalize every form to ImageContent: http(s) URLs are fetched, file paths are read, data: URIs are decoded, and the bytes are base64-encoded.
  • MIME type is taken from the data: URI, the HTTP response, or the file extension — or the explicit mimeType you pass.
  • Base64-only transport. pi-ai carries images as base64; each provider transform then emits the per-API shape. You always hand harnext source images; it deals with the wire format.
  • Size cap. MAX_IMAGE_BYTES(20 MB) bounds a single image, so an oversized input fails fast rather than at the provider.
Exports
From @harnext/core: resolveImages, resolveImage, the ImageInput and ImageContent types, and MAX_IMAGE_BYTES. Resolve and validate ahead of time if you want to surface errors before prompting.

CLI: paste with Ctrl+V

At the interactive prompt, Ctrl+V grabs an image from the system clipboard, base64-encodes it, and attaches it to your next message. A 🖼 N chip in the footer shows how many images are pending, and image-only prompts (empty text) are allowed — paste and press enter.

Clipboard tools per OS

OSTool used
Linuxxclip (X11) or wl-paste (Wayland)
macOSpngpaste, falling back to pbpaste
WindowsPowerShell clipboard access
  • No image on the clipboard? Ctrl+V pastes the clipboard text instead, as usual.
  • No clipboard tool installed? harnext prints a one-time hint on how to install one — never a silent failure.
Source
Multimodal input shipped in QualityUnit/harnext#48. See the announcement post for the why, and the API reference for the exported types.