Skip to main content

browser.interact

Executes a user-supplied ordered array of browser actions against a headless Chromium session. No LLM in the loop — the caller specifies every step explicitly. Returns base64 screenshots taken during the run, extracted element content, and an assertions_passed flag so the agent can branch on success.

This is the deterministic, fast option when speed and reproducibility matter. It's also the building block for the AI-driven web.test, which decomposes natural-language test instructions into browser.interact action sequences via Playwright MCP's accessibility tree.

Headless ≠ visible. Runs against an off-screen Chromium. Operators watching a session via noVNC see nothing when this pack runs. When the user is watching — or the task is "drive a browser so I can see it" — reach for the desktop.* REST primitives instead.

Inputs

FieldTypeRequiredDefaultNotes
urlstringyesAbsolute starting URL. Egress-guarded.
actionsarrayyesOrdered array of action objects (schema below).

Action schema

Each entry in actions is one of:

actionRequired fieldsOptionalBehavior
navigateurlNavigate to a new URL mid-flow. Egress-guarded.
clickselectorCSS-selector click via CDP.
typeselector, valueType into an input. Supports ${vault:...} placeholders.
waitms (default 1000)Sleep N milliseconds.
screenshotCapture a viewport-sized base64 PNG → appended to output.screenshots.
extractselectorformat (text|html, default text)Read content from an element → output.extractions[selector].
assert_texttextAssert text appears in the current page body. Failure → assertions_passed: false, but subsequent actions still run.

Outputs

FieldTypeNotes
output.urlstringEcho of the input URL.
output.steps_completednumberActions that ran before completion or first non-fatal failure.
output.screenshotsarrayBase64 viewport PNGs in capture order. Use browser.screenshot_url for full-page captures.
output.extractionsobjectMap of selector → extracted content.
output.assertions_passedbooleanTrue iff every assert_text matched.

Vault credentials needed

None for the pack itself. If type actions reference ${vault:NAME} in value, those credentials must exist + be granted to the calling actor. Credential-driven workflows usually belong in web.login_and_fetch which is purpose-built for them.

Use it from your agent (OpenClaw chat-UI worked example)

Prompt (sent in OpenClaw chat UI / openclaw-cli agent):

:Use the helmdeck__browser-interact tool against https://example.com with these actions: take a screenshot, extract the h1 text content, and assert that the text "Example Domain" is on the page. Report what you extracted and whether the assertion passed.

Tool call (3 calls, no failures):

{
"name": "helmdeck__browser-interact",
"arguments": {
"url": "https://example.com",
"actions": [
{
"action": "screenshot"
},
{
"action": "extract",
"selector": "h1"
},
{
"action": "assert",
"assertion": "text includes",
"value": "Example Domain"
}
]
}
}

Pack response (truncated):

{"error":"invalid_input","message":"action[2]: unknown action \"assert\" (supported: click, type, focus, screenshot, extract, assert_text, wait, execute)"}
{
"name": "helmdeck__browser-interact",
"arguments": {
"actions": [
{
"action": "screenshot"
},
{
"action": "extract",
"selector": "h1"
},
{
"action": "assert_text",
"selector": "h1",
"value": "Example Domain"
}
],
"url": "https://example.com"
}
}

Pack response (truncated):

{"error":"invalid_input","message":"action[2] assert_text: text required"}
{
"name": "helmdeck__browser-interact",
"arguments": {
"url": "https://example.com",
"actions": [
{
"action": "screenshot"
},
{
"action": "extract",
"selector": "h1"
},
{
"action": "assert_text",
"selector": "h1",
"text": "Example Domain"
}
]
}
}

Pack response (truncated):

{"pack":"browser.interact","version":"v1","output":{"assertions_passed":true,"extractions":{"h1":"Example Domain"},"screenshots":["/9j/4AAQSkZJRgABAQAAAQABAAD/4gHYSUNDX1BST0ZJTEUAAQEAAAHIAAAAAAQwAABtbnRyUkdCIFhZWiAH4AABAAEAAAAAAABhY3NwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAA9tYAAQAAAADTLQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlkZXNjAAAA8AAAACRyWFlaAAABFAAAABRnWFlaAAABKAAAABRiWFlaAAABPAAAABR3dHB0AAABUAAAABRyVFJDAAABZAAAAChnVFJDAAABZAAAAChiVFJDAAABZAAAAChjcHJ0AAABjAAAADxtbHVjAAAAAAAAAAEAAAAMZW5VUwAAAAgAAAAcAHMAUgBHAEJYWVogAAAAAAAAb6IAADj1AAADkFhZWiAAAAAAAABimQAAt4UAABjaWFlaIAAAAAAAACSgAAAPhAAAts9YWVogAAAAAAAA9tYAAQAAAADTLXBhcmEAAAAAAAQAAAACZmYAAPKnAAANWQAAE9AAAApbAAAAAAAAAABtbHVjAAAAAAAAAAEAAAAMZW5VUwAAACAAAAAcAEcAbwBvAGcAbABlACAASQBuAGMALgAgADIAMAAxADb/2wBDAAMCA
...[truncated]

Agent's final reply:

Extracted h1 text:Example Domain
Assertion result: Passed (the page contains the text “Example Domain”).

Verified via OpenClaw 2026.5.6 + helmdeck v0.9.0-dev + openrouter/openai/gpt-oss-120b on 2026-05-07 (cost: $0.0118).

Developer reference (curl)

curl -fsS -X POST http://localhost:3000/api/v1/packs/browser.interact \
-H "Authorization: Bearer $JWT" -H 'Content-Type: application/json' \
-d '{
"url": "https://example.com",
"actions": [
{"action": "screenshot"},
{"action": "extract", "selector": "h1", "format": "text"},
{"action": "assert_text", "text": "Example Domain"}
]
}'

Real captured response (screenshot bytes truncated):

{
"pack": "browser.interact",
"version": "v1",
"output": {
"url": "https://example.com",
"steps_completed": 3,
"screenshots": ["/9j/4AAQSkZJRgABAQAAAQABAAD/...<base64 PNG truncated>..."],
"extractions": {
"h1": "Example Domain"
},
"assertions_passed": true
},
"session_id": "9cd1d6d5-246a-4a13-bdb6-9f6f16fcf8e4"
}

For chained login + scrape workflows:

{
"url": "https://app.example.com/login",
"actions": [
{"action": "type", "selector": "#username", "value": "demo"},
{"action": "type", "selector": "#password", "value": "${vault:demo-app-pw}"},
{"action": "click", "selector": "#submit"},
{"action": "wait", "ms": 2000},
{"action": "assert_text", "text": "Welcome"},
{"action": "extract", "selector": ".user-id", "format": "text"}
]
}

Error codes

CodeTriggers
invalid_inputMalformed actions array; unknown action; selector missing for an action that requires it
session_unavailableEngine has no CDP factory (sidecar absent)
handler_failedA non-assertion CDP call fails mid-sequence (selector not found, navigate timed out). The response still reports steps_completed so the caller sees how far it got

assert_text failures are not error codes — they leave assertions_passed: false and let the rest of the sequence run.

Session chaining

needs_session: true. The engine creates an ephemeral session per pack call. To share state across multiple browser.interact calls, fold them into one larger actions array (cheaper than multiple session spin-ups).

Async behavior

Synchronous. A 10-step action array against a typical SaaS site runs in 5–15 seconds. The per-session timeout still applies — heavy wait actions count against it.

When to use which browser pack

PackUse when
browser.screenshot_urlOne-shot snapshot of a public URL. No interaction.
browser.interactDeterministic multi-step automation: known selectors, known order. Fast, no LLM, no human watching.
web.scrape_spaSingle page, structured extraction by JSON-Schema. The model gives helmdeck the schema; helmdeck does the extraction.
web.scrapeFirecrawl-backed clean-markdown scrape. No selectors.
web.testNatural-language testing. The LLM plans each step; this pack underlies the assertions.
vision.*When selectors don't exist or the page is non-DOM (Canvas, video, PDF).
desktop.* primitivesWhen a human is watching the session and you want them to see the actions happen.

See also