Multimodal input: paste an image with Ctrl+V, or pass a URL from the SDK

Ctrl+V in the terminal

Copy a screenshot, a design mock, a diagram — then press Ctrl+V at the prompt. harnext grabs the image from the system clipboard, base64-encodes it, and attaches it to your next message. A 🖼 N chip in the footer shows how many images are pending, and image-only prompts (no text) are allowed, so you can just paste and hit enter.

harnext

[Ctrl+V] attached screenshot.png · 1 image pending

❯ why is this layout broken on mobile?

main · 🖼 1 · ⚙ 0 Background Jobs

It's cross-platform: xclip / wl-paste on Linux, pngpaste / pbpaste on macOS, PowerShell on Windows. If the clipboard holds text instead of an image, Ctrl+V pastes the text as usual. If no clipboard tool is installed, you get a one-time hint on how to install one — never a silent failure.

Images from the SDK

Programmatically, session.prompt() takes an optional second argument: a list of images. Each one can be an http(s) URL, a data: URI, a local file path, or an already-encoded base64 block — mix and match freely.

TypeScript

import { createAgentSession, type ImageInput } from '@harnext/core';

const { session } = await createAgentSession({ provider, modelId });

await session.prompt('describe this', [
  { url: 'https://example.com/cat.png' },                    // http(s) → fetched
  'data:image/png;base64,iVBORw0KGgo…',                      // data: URI
  '/path/to/local.jpg',                                      // file path → read
  { type: 'image', data: '<b64>', mimeType: 'image/png' },   // raw base64
]);

Under the hood, resolveImages()fetches URLs, reads files, decodes data URIs, and normalizes everything to base64 with a resolved MIME type — because pi-ai transports base64 only, and each provider transform emits the shape its API expects. There's a MAX_IMAGE_BYTEScap (20 MB) so an oversized image fails loudly rather than at the provider.

Exports

@harnext/core exports resolveImages, resolveImage, the ImageInput and ImageContent types, and the MAX_IMAGE_BYTES constant — so you can resolve and validate images yourself before prompting.

Read the docs

Image input

The ImageInput forms, resolveImages, MIME resolution, the size cap, and the Ctrl+V clipboard flow per OS.

Implementation

Multimodal input landed in QualityUnit/harnext#48.

← All posts

Ctrl+V in the terminal#

Images from the SDK#

Read the docs#

Ctrl+V in the terminal

Images from the SDK

Read the docs