Release
Multimodal input: paste an image with Ctrl+V, or pass a URL from the SDK
harnext now takes images. In the terminal, press Ctrl+Vto attach whatever's on your clipboard. In the SDK, hand session.prompt() a URL, a data: URI, a file path, or raw base64 — harnext fetches and encodes the rest.
Ctrl+V in the terminal
Copy a screenshot, a design mock, a diagram — then press Ctrl+V at the prompt. harnext grabs the image from the system clipboard, base64-encodes it, and attaches it to your next message. A 🖼 N chip in the footer shows how many images are pending, and image-only prompts (no text) are allowed, so you can just paste and hit enter.
It's cross-platform: xclip / wl-paste on Linux, pngpaste / pbpaste on macOS, PowerShell on Windows. If the clipboard holds text instead of an image, Ctrl+V pastes the text as usual. If no clipboard tool is installed, you get a one-time hint on how to install one — never a silent failure.
Images from the SDK
Programmatically, session.prompt() takes an optional second argument: a list of images. Each one can be an http(s) URL, a data: URI, a local file path, or an already-encoded base64 block — mix and match freely.
import { createAgentSession, type ImageInput } from '@harnext/core';
const { session } = await createAgentSession({ provider, modelId });
await session.prompt('describe this', [
{ url: 'https://example.com/cat.png' }, // http(s) → fetched
'data:image/png;base64,iVBORw0KGgo…', // data: URI
'/path/to/local.jpg', // file path → read
{ type: 'image', data: '<b64>', mimeType: 'image/png' }, // raw base64
]);Under the hood, resolveImages()fetches URLs, reads files, decodes data URIs, and normalizes everything to base64 with a resolved MIME type — because pi-ai transports base64 only, and each provider transform emits the shape its API expects. There's a MAX_IMAGE_BYTEScap (20 MB) so an oversized image fails loudly rather than at the provider.
@harnext/core exports resolveImages, resolveImage, the ImageInput and ImageContent types, and the MAX_IMAGE_BYTES constant — so you can resolve and validate images yourself before prompting.Read the docs
The ImageInput forms, resolveImages, MIME resolution, the size cap, and the Ctrl+V clipboard flow per OS.
Multimodal input landed in QualityUnit/harnext#48.