Skip to main content

web.test

The natural-language browser-testing pack. Caller hands in a URL plus a plain-English instruction — "log in as alice, open Settings, make sure 2FA is enabled" — plus an LLM model name; the pack drives Playwright MCP through it step-by-step, using the gateway model to decompose each turn's accessibility-tree snapshot into the next action. Optionally, the caller supplies a list of substring assertions checked against the final snapshot.

Why Playwright MCP and not selectors? Accessibility-tree snapshots return structured [ref=eN] identifiers the model addresses directly — much smaller token footprint than CSS selectors, much more deterministic for weak models, no vision step required.

Inputs

FieldTypeRequiredDefaultNotes
urlstringyesAbsolute http(s) URL to test against. Egress-guarded.
instructionstringyesPlain-English description of what the agent should verify.
modelstringyesProvider/model that drives the action loop. e.g. openrouter/openai/gpt-oss-120b, openai/gpt-4o, anthropic/claude-sonnet-4.6.
max_stepsnumberno8Cap on the plan loop (excludes the seed browser_navigate + initial snapshot).
assertionsarrayno[]Substrings to check against the final accessibility snapshot. Substring match, case-sensitive.
_session_idstringyes (chained)Standard chained input.

Outputs

FieldTypeNotes
urlstringEcho.
completedbooleantrue only when the model emitted done AND all assertions passed.
stepsarrayPer-step trace: {tool, arguments, result, is_error, reasoning}.
steps_usednumberTotal steps emitted (including seed nav + final snapshot).
final_snapshotstringThe last accessibility-tree dump after the last action. Used for assertions.
assertions_passedbooleantrue if every supplied assertion was a substring of final_snapshot. Always true when assertions is empty.
reasonstringPlain-English exit reason ("model emitted done", "max_steps (8) reached without done", "egress guard blocked mid-test navigation", etc.).

Vault credentials needed

None. The model is selected by the model argument and resolved through the AI provider keys configured in the AI Providers UI panel.

Use it from your agent (OpenClaw chat-UI worked example)

Prompt (sent in OpenClaw chat UI / openclaw-cli agent):

Use helmdeck__web-test to verify that https://example.com loads correctly. Instruction: "navigate to the page, confirm the heading 'Example Domain' is present, then exit". Use model openrouter/openai/gpt-oss-120b. Set assertions=["Example Domain"]. Tell me whether completed=true and which assertions passed.

Tool call (1 call, no failures):

{
"name": "helmdeck__web-test",
"arguments": {
"url": "https://example.com",
"instruction": "navigate to the page, confirm the heading 'Example Domain' is present, then exit",
"assertions": [
"Example Domain"
],
"max_steps": 10,
"model": "openrouter/openai/gpt-oss-120b"
}
}

Pack response (truncated):

{"pack":"web.test","version":"v1","output":{"assertions_passed":true,"completed":true,"final_snapshot":"### Open tabs\n- 0: (current) [Example Domain](https://example.com/)\n- 1: [New Tab](chrome://new-tab-page/)\n### Page\n- Page URL: https://example.com/\n- Page Title: Example Domain\n- Console: 1 errors, 0 warnings\n### Snapshot\n```yaml\n- generic [ref=e2]:\n - heading \"Example Domain\" [level=1] [ref=e3]\n - paragraph [ref=e4]: This domain is for use in documentation examples without needing permission. Avoid use in operations.\n - paragraph [ref=e5]:\n - link \"Learn more\" [ref=e6] [cursor=pointer]:\n - /url: https://iana.org/domains/example\n```","reason":"model emitted done","steps":[{"tool":"browser_navigate","arguments":{"url":"https://example.com"},"result":"### Ran
...[truncated]

Agent's final reply:

Result: completed = true
Assertions passed: true (the heading “Example Domain” was found).

Verified via OpenClaw 2026.5.6 + helmdeck v0.9.0-dev + openrouter/openai/gpt-oss-120b on 2026-05-07 (cost: $0.1750).

Developer reference (curl)

curl -fsS -X POST http://localhost:3000/api/v1/packs/web.test \
-H "Authorization: Bearer $JWT" -H 'Content-Type: application/json' \
-d '{
"url": "https://example.com",
"instruction": "navigate to the page and confirm the heading Example Domain is present",
"model": "openrouter/openai/gpt-oss-120b",
"max_steps": 3,
"assertions": ["Example Domain"]
}'

Live capture is dependent on your gateway-model configuration; the response shape is the schema above with steps populated by per-action records.

Error codes

CodeTriggersCaptured response
invalid_inputurl empty{"error":"invalid_input","message":"url is required"}
invalid_inputinstruction empty{"error":"invalid_input","message":"instruction is required"}
invalid_inputmodel empty{"error":"invalid_input","message":"model is required (provider/model)"}
invalid_inputURL resolves to a blocked range{"error":"invalid_input","message":"egress denied: …"}
session_unavailableSidecar built without Playwright MCPsession has no Playwright MCP endpoint; rebuild the sidecar with T807a
handler_failedPlaywright MCP Initialize failed after retriesplaywright mcp initialize (after retries): …
handler_failedMid-test action returned an error from MCP<tool> (step N): …
handler_failedModel returned an unparseable planplan step N: no parseable plan JSON in model response: …
internalPack registered without a gateway dispatcherweb.test registered without a gateway dispatcher

Session chaining

Required session. Always chained — each web.test run gets a fresh session by default (no cookie leak between tests). Pass _session_id to reuse a session; useful for "log in via fill_form_by_label, then web.test the post-login state".

Async behavior

Synchronous, but the wall-clock scales with max_steps × per-step LLM latency. A 5-step run against an open-weight model can take 30–60 seconds. For CI usage, cap max_steps aggressively and rely on the substring assertions rather than waiting for the model to emit done voluntarily.

See also