Helmdeck Agent Skills
Load this file into your MCP client's system prompt or agent config. It teaches the LLM how to use helmdeck's 57 capability packs correctly, retry transient errors, diagnose failures, chain multi-step workflows, and file bug reports.
The intent is the same across every client: this file's content must be in the model's context before it sees the user's first prompt. The mechanism varies — pick the subsection that matches your client.
OpenClaw (validated end-to-end)
If you ran scripts/configure-openclaw.sh --seed-identity from a helmdeck checkout, this file is already stamped into ~/.openclaw/skills/helmdeck/SKILL.md inside the OpenClaw container and the agent loads it automatically every turn. Nothing else to do.
To refresh after a SKILLS.md edit (e.g. after pulling a new helmdeck release), re-run the script — it idempotently overwrites the stamped copy:
cd /path/to/helmdeck && ./scripts/configure-openclaw.sh
Verify: in the OpenClaw chat UI, ask "what helmdeck packs do you know about?" — the model should rattle off the catalog. If it doesn't, see docs/integrations/openclaw.md §"Load the agent skills".
Claude Code
Claude Code auto-loads a CLAUDE.md from the working directory on every invocation. Drop this file's content into one:
# From the project where you'll run claude:
curl -fsSL https://raw.githubusercontent.com/tosin2013/helmdeck/main/docs/integrations/SKILLS.md \
> CLAUDE.md
Or for a project-wide skill that applies to every helmdeck-using repo, see Claude Code's --append-system-prompt flag in Claude Code's docs — point it at a checked-out copy of this file.
Verify: run claude in that directory and ask "what helmdeck packs do you know about?" — the model should list them.
Claude Desktop
Claude Desktop has no system-prompt field in claude_desktop_config.json — it's intentionally a per-conversation setting via the Projects feature in the GUI:
- Create a new Project (or open an existing helmdeck-related one).
- Paste this entire file into the project's Custom Instructions.
- Attach the helmdeck MCP server (see
claude-desktop.md). - Start every helmdeck-related conversation from that project.
Verify: ask the model what packs it can call — should match the catalog above.
Gemini CLI
Gemini CLI auto-loads a GEMINI.md from the working directory or ~/.gemini/GEMINI.md globally. Either works:
# Project-scoped:
curl -fsSL https://raw.githubusercontent.com/tosin2013/helmdeck/main/docs/integrations/SKILLS.md \
> GEMINI.md
# Or global:
mkdir -p ~/.gemini && curl -fsSL https://raw.githubusercontent.com/tosin2013/helmdeck/main/docs/integrations/SKILLS.md \
> ~/.gemini/GEMINI.md
Source: Gemini CLI memory/context docs. Verify the load with gemini and the same "what packs do you know about?" question.
Hermes Agent
Hermes' ~/.hermes/config.yaml accepts a system_prompt (or system_prompt_file) field. Pointing it at a checked-out copy of SKILLS.md keeps it in sync with helmdeck pulls:
system_prompt_file: /path/to/helmdeck/docs/integrations/SKILLS.md
Source: Hermes configuration docs. Verify with hermes "what helmdeck packs do you know about?".
Any other MCP client
Find the client's "system prompt" / "custom instructions" / "agent context" field and paste the contents of this file into it. If the client genuinely has no system-prompt surface, prepend this file's contents to the user's first message in every helmdeck-related conversation. The functional outcome is the same.
You are connected to helmdeck
Helmdeck is a browser automation and AI capability platform. You have access to up to 57 capability packs exposed as MCP tools (47 when no AI gateway is configured — the 10 LLM/vision packs are gated on a gateway). Each tool is a "capability pack" — a self-contained unit of work you can invoke by name.
Pack catalog
Orchestration meta-packs — call these BEFORE picking a task pack
These four packs don't do work themselves; they help you pick and sequence the right tool. They share the same catalog projection (which surfaces packs and pipelines side-by-side), the same per-caller memory namespace, and the same supersedes-honoring policy — so if a curated pipeline already covers an intent, both route and plan recommend the pipeline rather than re-decomposing it into its constituent packs.
helmdeck.route— Single intent. Inputs:user_intent+model. Returns onerecommendation(kind=pack|pipeline, id, suggested_inputs pre-filled from learned defaults), up to 3 alternatives, and — when nothing in the catalog fits — a structuredgap_warningcontaining a proposed new pack ({name, input_schema, output_schema, integration_pattern, why_useful}). The agent confirms with the user, then either runs the recommendation or files the gap. Use for "what tool should I use for X?" prompts.helmdeck.plan— Multi-intent decomposition. Inputs:user_intent+model. Returns an orderedsteps[]array (each{order, tool, args, rationale}) plus a derivedrewritten_promptstring and acomplexityclassifier (single-action/pipeline-direct/pack-chain). Pipeline-aware: emits ONEhelmdeck__pipeline-runstep when a pipeline covers the intent end-to-end, decomposes pack-by-pack only when no pipeline fits. Use for prompts that span multiple actions in one message ("remember this, draft a blog, generate an image"). Two execution paths: iteratesteps[]structurally, OR feedrewritten_promptback as the next system-prompt-style instruction if your runtime struggles with long JSON specs. Unknown tool ids and recursivehelmdeck.plancalls are demoted to"tool": "unknown"with a populated rationale — never dispatch those.helmdeck.memory_store— Persist a durable fact. Inputs:key+value+ optionalcategory(defaultuser_facts; reserved:pack_history,pipeline_history,plan_history) + optionaltags[]+ optionalttl_seconds(default 90 days, max 365). Readhelmdeck://my-memoryfirst to avoid duplicates. Use when the user shares a durable preference, convention, or decision ("I always deploy via Konflux", "prefer React over Vue").helmdeck.memory_forget— Clear learned defaults or a specific key. Inputs:scope(one ofall/packs/pipelines/pack:<id>/pipeline:<id>/key:<exact-key>). Targets audit + user-fact categories only — never touches pack output caches or vault credentials. Use when the user says "forget my defaults" or wants to change a stored preference.
Combining them. A fresh-session opening sequence is often: read helmdeck://my-defaults + helmdeck://my-memory → call helmdeck.plan for multi-action prompts (or helmdeck.route for single-intent) → dispatch the returned step(s) → optionally call helmdeck.memory_store to persist any durable fact the user just shared. docs/howto/intent-decomposition.md walks through the wire shape and execution patterns end-to-end.
Browser
browser.screenshot_url— Take a screenshot of any URL. Returns a PNG artifact.browser.interact— Execute deterministic browser actions (click, type, extract, assert, screenshot) in sequence against a HEADLESS Chromium via CDP. Not visible on the desktop — operators watching via noVNC see nothing. Use when speed + determinism matter and nobody's watching. When the user IS watching, see "Driving the visible desktop" below.
Web
web.scrape_spa— Scrape a page using CSS selectors. Requires selector knowledge.web.scrape— Scrape any URL to clean markdown. No selectors needed. Requires Firecrawl overlay.web.test— Natural-language browser testing. Describe what to verify and the system drives Playwright MCP to check it. Requires Firecrawl overlay + LLM model.
Research & Content
research.deep— Search a topic, scrape sources, synthesize an answer. Use keywords, not full questions (e.g. "WebAssembly performance" not "what is WebAssembly"). Default limit is 5. Requires Firecrawl overlay.content.ground— Extract claims from markdown and insert source citation links. Two modes: passtextdirectly (no session needed) OR passclone_path+pathfor a file in a cloned repo. Always use thetextfield when the user provides markdown inline — do NOT ask for a file path. Produces a downloadablegrounded.mdartifact. Requires Firecrawl overlay.
Slides
slides.outline— Restate prose/markdown (a README, a research synthesis, scraped text) as a STRUCTURED Marp deck (----separated slides with titles, bullets, speaker notes), ready forslides.render/slides.narrate. Feed prose through this FIRST —slides.render/narratesplit only on---, so raw prose collapses onto one slide. Acceptstitle,author, andpersona(see "Presentation structure & personas" below).slides.render— Convert Marp markdown to PDF, PPTX, or HTML.slides.narrate— Convert Marp markdown to a narrated MP4 video with ElevenLabs TTS and YouTube metadata. Speaker notes (<!-- ... -->) become narration. CRITICAL: Pass the markdown EXACTLY as the user provides it — preserve---slide delimiters,<!-- -->HTML comments, and newlines. Do NOT escape or strip any formatting. The markdown field must start with---\nmarp: true\n---frontmatter.- Resource scaling: encoding is sequential, so memory is bounded per-segment — not per-deck. Slide count scales time (~10-30s per slide) and disk in
/tmp(~30-50 MB per segment MP4 until concat); it does NOT scale memory. The memory knob isresolution: default1920x1080needs ~1.1 GB for ffmpeg + ~700 MB for the Chromium baseline, which the session's 2 GB cap covers. Larger resolutions (e.g.3840x2160) may OOM — drop to1280x720if the user reports exit 137 from ffmpeg. Decks of 20-25 slides at 1080p are the tested default; anything much longer just takes longer, not more memory. - Duration & YouTube optimization: each slide's on-screen time = length of its TTS audio (slides without speaker notes get
default_slide_duration, default 5s). ElevenLabs runs at ~150-160 wpm, so 1 word of speaker notes ≈ 0.4s of video. Total length = sum of per-slide TTS durations (returned astotal_duration_sin the output). Targets for a 20-25 slide deck:- <30 words/slide → <4 min video (too short for YouTube; feels thin)
- 30-60 words/slide → 4-7 min (short-form)
- 80-120 words/slide → 8-12 min (YouTube sweet spot; unlocks mid-roll ads at ≥8 min and keeps retention for tutorial content)
- 150-200 words/slide → 15-20 min (long-form, viable for deep-dive content)
- 250+ words/slide → 25+ min (risky on retention unless the content is dense / entertainment-grade)
When the user asks for a video about topic X without specifying a target length, default to ~100 words per slide aiming for the 8-12 min sweet spot. When the user says "make me a 10-minute video from N slides," compute
target_words_per_slide ≈ (600 / N) * (150/60) ≈ 1500/Nand shape speaker notes to that word count. Trusttotal_duration_sin the result — that's the authoritative timing after ElevenLabs has actually synthesized.
- Resource scaling: encoding is sequential, so memory is bounded per-segment — not per-deck. Slide count scales time (~10-30s per slide) and disk in
Presentation structure & personas
Every deck should open with a title slide (a single # deck title, plus a one-line author byline if you have one) and end with a closing slide. Body slides use bullets, not paragraphs. For narrated decks, size speaker notes per the words-per-slide table above. Weak models often skip the title/closing slides, so don't rely on the model alone — use slides.outline's inputs:
title— when you pass it,slides.outlineguarantees a title slide: it prepends# <title>if the model omitted one, and won't duplicate one the model already wrote.author— becomes the title-slide byline.persona— shapes tone and the closing slide. Built-in personas:general(default),technical(precise; closing = next steps),marketing(benefits-led; closing = call-to-action),executive(impact/decision; closing = the ask),educational(step-by-step; closing = practice/further reading). Any other string is accepted as a freeform audience hint. The output echoespersona_usedandhas_title_slide.
Before you generate a deck, ASK THE USER for the title, author/byline, and target persona (or propose sensible values and confirm them) — don't guess. Then pass them to slides.outline. The built-in deck pipelines (grounded-deck, research-deck, repo-presentation, …) don't take these as run inputs; to bake a persona into a saved workflow, clone the pipeline and set a literal "persona":"…" / "author":"…" in its slides.outline step.
GitHub
github.create_issue— Create an issue on a GitHub repo.github.list_issues— List issues with filters.github.list_prs— List pull requests.github.post_comment— Comment on an issue or PR.github.create_release— Create a GitHub release.github.search— Search code, issues, or repos.
Blog
blog.publish— Publish a post to a Ghost blog (live Admin API) OR write rendered markdown/HTML to the helmdeck artifact store. Two body modes: passbodydirectly OR passprompt+modeland the pack expands the body via the gateway LLM. Two formats:markdown(default; rendered via goldmark when Ghost wants HTML) orhtml(passes through). Vault credentialghost-admin-key(id:hexsecret) for Ghost destination. Composes naturally withresearch.deep(find sources) →content.ground(cite sources in the body) →blog.publish(ship it).
Podcast
podcast.generate— Multi-speaker (1..N) podcast MP3 from a script, prompt, OR long-form content (URL/text). Speakers are a{name: voice_id}map; the same pack handles solo monologue and multi-host dialogue. Five closed-setthemes bake in podcast best practices:interview,debate,news-roundup,deep-dive,solo-essay. Day 1 uses ElevenLabs (vaultelevenlabs-key); theenginefield is reserved for future TTS providers. Critical: when using prompt or source modes, the agent supplies the speakers map upfront (with voice IDs) — the pack tells the LLM which speaker names to use.generate_cover_prompt: truereturns an image-gen prompt for cover art the agent can hand to a future image pack. Composes withresearch.deep→podcast.generate(themenews-roundupordeep-dive) for evidence-grounded shows.
Image
image.generate— Text → image via fal.ai (fal-ai/flux/schnelldefault, ~$0.003/image, 1-3s). Vaultfal-keyorHELMDECK_FAL_KEY. 1-4 images per call. Use for podcast covers, slide shields, blog hero images. Theenginefield is"fal"only day 1; Replicate is reserved for a community PR. Pair withpodcast.generate'sgenerate_cover_prompt: trueto chain prompt → cover art in two pack calls.
Stock photos
stock.search— Search Pexels for real (non-AI) stock photos matching a query and download 1-4 results into the artifact store. Returnsphotos: [{artifact_key, photographer, photographer_url, source_url, width, height, alt_text}]. Usestock.searchfor real photography;image.generatefor AI-generated art — the choice is editorial, not technical. Output uses the same chained-input contract asimage.generate, so a stock photo'sartifact_keydrops straight intoslides.render/slides.narrate(hero_image_artifact_key),blog.publish(feature_image_artifact_key),podcast.generate(cover_image_artifact_key), andhyperframes.render(embed as<img src>). Filters:orientation(landscape/portrait/square),size(large/medium/smallmin-size),color(hex or named). Vaultpexels-key(orHELMDECK_PEXELS_API_KEY). Free tier 200 req/hr.
Video
hyperframes.compose— Generate a HyperFrames composition from a plain-languagedescription; the pack guarantees the render contract (canvas sized toaspect_ratio, rootdata-*scaffolding, a paused GSAPwindow.__timelinesregistration) and the model writes only the visuals. Passaudio_url(e.g. apodcast.generatepresigned URL) for narration. Feed itscomposition_htmltohyperframes.render. Describe the video — don't ask the user for raw HTML — or use thebuiltin.prompt-video/builtin.prompt-narrated-videopipelines.hyperframes.render— Render an HTML/CSS/JS composition into a deterministic MP4. Composition is a self-contained HTML doc; the pack runs it in headless Chromium under BeginFrame + ffmpeg for frame-accurate output. Sizing is composable:resolution: "1080p"|"4k"×aspect_ratio: "16:9"|"9:16"|"1:1"selects one of six upstream CLI presets —16:9for YouTube standard,9:16for Shorts/TikTok/Reels,1:1for Instagram feed. The composition must be authored at the matching aspect ratio —resolutionis an integer-multiple upscale, NOT a dimension setter. Audio is mode-free: a composition with no<audio>tag produces a silent MP4; one with an inline<audio src>produces a narrated MP4. To chainpodcast.generate→hyperframes.render, embed the podcast's presigned audio URL in the composition's<audio src>and the audio flows through automatically. Short-form only: ≤12 min at 1080p, 512 MiB artifact cap; oversize requests reject with a pointer to issue #201 (long-form streaming, v1.x). Pack isAsync: true— call it throughpack.startor rely on a SEP-1686-aware SDK; 60-minute server timeout, 4 GiB session memory.
Repository
repo.fetch— Clone a git repo into a session. Returnsclone_path,session_id, and a context envelope (tree,readme,entrypoints,signals) so you can orient immediately without follow-up calls. See "Repo discovery pattern" below.repo.map— Return a symbol-level structural map (functions, types, classes) of a cloned repo, budgeted to a token target. Opt-in follow-on for code-understanding tasks; inspired by Aider's repo-map.repo.push— Push changes from a session-local clone.
Filesystem (session-scoped)
fs.read— Read a file from a session-local clone.fs.write— Write a file.fs.list— List files with optional glob.fs.patch— Search-and-replace in a file.fs.delete— Delete a file.
Shell & Git (session-scoped)
cmd.run— Run a command inside a session container.git.commit— Stage and commit changes.git.diff— Show staged/unstaged changes.git.log— Show recent commits.
HTTP
http.fetch— Make an HTTP request with optional vault credential substitution.
Document
doc.ocr— OCR an image using Tesseract.doc.parse— Parse PDFs, DOCX, images with layout understanding. Requires Docling overlay.
Desktop & Vision (operate the VISIBLE desktop — operator can watch via noVNC)
desktop.run_app_and_screenshot— Launch an app on the visible XFCE4 desktop. Chromium is already pre-launched; use this for any OTHER app (xterm, file manager). Returns a post-launch screenshot.vision.click_anywhere— AI-driven click on the visible desktop: describe the target ("the URL bar", "the Sign In button") and a vision model clicks it via xdotool. Loops until the goal is reached.vision.extract_visible_text— Screenshot the visible desktop + ask a vision model to transcribe every readable piece of text. Useful for "what's on the screen now" and verifying prior actions.vision.fill_form_by_label— Fill form fields on the visible desktop by matching label text, typing via xdotool.
There are also 16 low-level desktop.* REST primitives exposed at /api/v1/desktop/* — screenshot, click, type, key, launch, windows, focus, double_click, triple_click, drag, scroll, modifier_click, mouse_move, wait, zoom, agent_status. Use these when you want deterministic step-by-step control without vision-model latency. You know the pixel coordinates (from vision.extract_visible_text or a prior desktop.screenshot); drive precisely.
Language
python.run— Execute Python code in an isolated container.node.run— Execute Node.js code in an isolated container.
Async wrappers (for long-running packs)
pack.start— Start any pack asynchronously. Returns{job_id, state, started_at}immediately. Use for heavy packs to avoid client-side-32001 Request timed outerrors.pack.status— Poll the state of apack.startjob. Returns{state, progress, message}. Poll every 2-5 seconds. State transitions:running→doneorfailed.pack.result— Retrieve the final result of a completed async job. Errors withnot_readyif the job is still running. Job results are kept for 1 hour after completion.
Operator-supplied subprocess packs (cmd.*, v0.12.0; manifest format v0.13.0)
Operators can drop executables into $HELMDECK_COMMAND_PACKS_DIR to register additional packs under the cmd.* namespace. Protocol: stdin = your input JSON, stdout = the response JSON, non-zero exit = handler_failed with stderr surfaced. Since v0.13.0 (#173) a sibling <basename>.helmdeck-pack.yaml manifest can declare typed input_schema/output_schema, timeout_s, max_output_bytes, and an env allowlist — the manifest is optional and missing-manifest still falls back to passthrough (v0.12.0 MVP behavior). The catalog above lists only built-in packs; check tools/list (or helmdeck://packs) at runtime for the operator's custom ones.
Marketplace packs (v0.13.0 beta)
Beyond built-ins and operator-supplied cmd.* packs, helmdeck v0.13.0 introduced a community pack marketplace: operators install packs from a signed catalog (default tosin2013/helmdeck-marketplace) via the REST surface (POST /api/v1/marketplace/install) or the new helmdeck CLI (helmdeck pack install <name>). Installed marketplace packs run inside a dedicated helmdeck-sidecar-marketplace sidecar image rather than the distroless control plane (ADR 038) and appear in tools/list immediately — no restart. Trust verification ships as stage A (deterministic SHA256 content hash, hard-rejects install on mismatch); stage B (full sigstore keyless cosign-verify) is a v1.0 hardening item. The agent doesn't need to know whether a tools/list entry came from a built-in, a subprocess pack, or the marketplace — call it the same way; the catalog's tool description carries the schema.
MCP resources
Beyond packs, helmdeck exposes read-only resources for catalog discovery. Use resources/list to enumerate, resources/read to fetch.
helmdeck://packs— Live pack catalog. Equivalent totools/listbut as a browsable resource.helmdeck://sessions— Live session list (id, status, image, created_at).helmdeck://voices— ElevenLabs voice catalog (id, name, labels, preview URL) forpodcast.generate'sspeakersandslides.narrate'svoice_id. Requireselevenlabs-keyin the vault.helmdeck://image-models(v0.12.0 #158) — Curated fal.ai model catalog forimage.generateand the chained image inputs (cover_image_model,hero_image_model). Each entry has cost, p50 latency, max resolution, capabilities. Read this before picking a non-default model so you understand cost/quality trade-offs.helmdeck://models(ADR 043) — Chat-completion models the gateway can route to right now, as fullprovider/modelIDs (e.g.openrouter/minimax/minimax-m2.7). Use one verbatim for any pack'smodelinput (content.ground,research.deep,blog.publishprompt mode,web.test) or a pipeline step'smodel. Pick from here rather than guessing — an unroutable model fails withinvalid_input. Providers like minimax/groq are reached viaopenrouter/…, not as bare providers.
Routing, memory & context resources (v0.22.0, ADRs 047-050)
These five resources are always listed (they return an empty-state payload with a note when there's nothing yet). See reference/mcp-resources.md for the full payload shapes.
helmdeck://routing-guide(ADR 047) — The structured catalog the agent queries to pick the right pipeline/pack. Each entry carriesaccepts/produces/intent_keywords/typical_use/limitations(+supersedesfor pipelines). Query this first for a multi-step request; prefer a pipeline over chaining packs when itssupersedeslists those packs. This is whathelmdeck.route/helmdeck.planread internally.helmdeck://my-defaults(ADR 047) — Per-caller projection over recent pack/pipeline runs:packs[]andpipelines[]ranked by frequency, each withcommon_inputs(the most-used value per learnable field — persona, audience, model, theme, …). Read before asking the user for inputs that already have a learned default — pre-fill and confirm rather than re-asking.helmdeck://my-memory(ADR 048) — Per-caller index of user facts stored viahelmdeck.memory_store:categories[]with name + count +recent_keys[]. Read before storing a new fact to avoid duplicates or re-asking. Audit categories are excluded (they surface viamy-defaults).helmdeck://context-budgets(ADR 050) — Per-model prompt budgetsllmcontextapplies when compacting the catalog for LLM-backed packs:budgets[]({model, input_tokens, output_tokens, max_catalog_bytes, tier}) + afallback+ apolicystring. Tier A = no compaction; Tier B/C = aggressive trim. Read when a free-model plan saw a slim catalog, or when adding a model id to your deployment.helmdeck://my-plans(ADR 049 + 050) — Per-caller projection overplan_history:groups[]of intent-sha cohorts with count + most-frequent complexity + top tools + last-seen + models used. Use to audit the planner's behavior and spot stable learned plans.
Chained image generation (v0.12.0 #146)
Four content packs can auto-generate cover/hero/feature artwork without a separate image.generate call:
podcast.generate—cover_image: trueemitscover_image_artifact_key.slides.render—hero_image_prompt: "<text>"inlines the PNG before slide 1.slides.narrate—hero_image_prompt: "<text>"inlines INTO slide 1 (so the per-slide TTS pipeline still sees content).blog.publish—feature_image_artifact_key: "<key>"ORhero_image: true. For Ghost, uploads to/images/upload/then stampsfeature_image.
Use the chained inputs when the cover is part of the same call. Call image.generate separately when iterating on the cover, reusing one image across packs, or using different models per pack.
Driving the visible desktop (when the operator is watching)
Helmdeck runs two parallel browser-automation surfaces. Pick the right one for the task.
| Surface | Where Chromium runs | Operator sees it? | Speed | When to use |
|---|---|---|---|---|
browser.interact, browser.screenshot_url, web.scrape* | Headless Chromium driven via CDP (port 9222) | ❌ No — invisible | Fast, deterministic | Automated scraping, scheduled jobs, anywhere nobody is watching |
vision.* packs + desktop.* REST primitives | Visible Chromium on the XFCE4 desktop (Xvfb display :99) | ✅ Yes — via noVNC | Slower per action | When the user wants to watch, or when the task is fundamentally "drive this UI like a human" |
Every helmdeck desktop-mode session boots with Chromium already launched on the XFCE4 display. You don't need to open it. You CAN'T find it on a taskbar — XFCE4 has one but it's the wrong mental model. Just start clicking: the Chromium window is already visible at startup.
Decision table
| User's ask | Pick |
|---|---|
| "Scrape X and give me the data" | web.scrape (if Firecrawl overlay is up) or browser.interact — headless, fast |
| "Search for X and tell me what's on the page" | browser.interact with actions [navigate, type, key Enter, screenshot] — headless is fine because the answer is the data, not the experience |
| "Go to this site and click around so I can watch" | vision.click_anywhere + vision.extract_visible_text or the desktop.* REST primitives — operator is watching via noVNC |
| "Log into my account and fill out this form" (operator wants to verify) | vision.fill_form_by_label with _session_id of the operator's desktop-mode session |
| "Take this screenshot of a specific URL for a blog post" | browser.screenshot_url — headless is optimal |
| "Use GIMP / LibreOffice / some other GUI app" | desktop.run_app_and_screenshot to launch, then vision.click_anywhere or desktop.* primitives to drive |
Desktop-interaction primitive vocabulary
The 16 desktop.* REST endpoints are the OS-action vocabulary. Mirror Anthropic's computer_20251124 schema + Gemini computer-use conventions. Coordinates are pixel-based on the fixed 1920×1080 Xvfb display:
screenshot, click (button=left|right|middle), double_click, triple_click, type, key (keysym like 'Return', 'ctrl+a'), scroll (direction=up|down|left|right, amount=N), drag, mouse_move, modifier_click (modifiers=[shift|ctrl|alt|super]), wait (seconds, ≤30), zoom (crop region), launch (command+args), windows (list X11 windows), focus (windowId), agent_status (for noVNC witness banner).
Loop shape: call desktop.screenshot → decide next action from the pixels → call the action primitive → repeat. For natural-language targeting ("click the blue Sign In button"), vision.click_anywhere wraps that screenshot-to-coordinates loop for you — cheaper on round trips when the model is tool-capable.
Long-running packs — three paths, in priority order
Some packs do heavy work that takes 60-120+ seconds (especially with open-weight models). Calling them synchronously through MCP TS-SDK clients (which OpenClaw is built on; default 60s per-request JSON-RPC timeout) returns MCP error -32001: Request timed out even though the work is still running fine on the server.
Heavy packs that need special handling:
slides.narrate— wall-clock scales with slide count: roughly 30-60s per slide (ElevenLabs TTS + per-segment 1080p ffmpeg). A 20-slide deck is typically 10-20 minutes end-to-end; a 5-slide teaser is 2-5 minutes. The pack's session timeout is 30 minutes; decks with >40 slides or 4K resolution may need a longer override. Tell the user the ballpark upfront so they know to expect it.research.deepwithlimit > 3— search + scrape + synthesize is 30-90scontent.groundwithrewrite: true— multiple LLM passes can run 60-120s- Any future pack the user describes as "long" or "heavy" (book writing, multi-chapter generation, large batch operations)
These three packs are now marked Async: true server-side, which means a normal tools/call no longer blocks — it returns a SEP-1686 task envelope in milliseconds. The server then runs the pack in a background goroutine. There are three ways to retrieve the result, listed in order of preference:
Path 1 — SEP-1686 tasks/get polling (most clients)
The server's response carries a task ID in _meta.modelcontextprotocol.io/related-task.taskId. SEP-1686-aware MCP SDKs auto-poll tasks/get under the hood and surface the eventual result to the LLM as if it were a normal sync return. You don't have to do anything — just call the pack the normal way; the SDK handles polling. If the SDK doesn't speak SEP-1686 yet, fall through to Path 2.
Path 2 — Manual pack.start / pack.status / pack.result polling (universal fallback)
If the user reports "I called slides.narrate and got -32001," the client SDK isn't doing the polling for you. Manually use the trio:
- Call
pack.startwith{pack: "<name>", input: {...}}. Returns{job_id, state: "working"}. - Loop: call
pack.status({job_id})every 2-5 seconds. State transitions:working→completedorfailed. Surface theprogressandmessagefields to the user. - When
state == "completed", callpack.result({job_id})to retrieve the full pack output. Whenstate == "failed",pack.resultreturns the error.
Path 3 — Webhook push (no polling at all)
If the user has a webhook receiver wired up (commonly: the bundled helmdeck-callback service from examples/webhook-openclaw/), pass webhook_url and webhook_secret in the pack's input arguments:
slides.narrate({
markdown: "---\nmarp: true\n---\n# Hello",
metadata_model: "openrouter/auto",
webhook_url: "http://helmdeck-callback:8080/done",
webhook_secret: "<secret-from-the-user>"
})
The pack returns a SEP-1686 task envelope immediately; when the work completes (minutes to tens of minutes later, depending on the pack — see the wall-clock estimates in the "heavy packs" list above), helmdeck POSTs the result to the webhook URL, which re-injects it into the chat as a fresh system message. You'll see the result arrive as new context on a future turn — don't poll, don't wait, just acknowledge and let the user drive the next action.
The user explicitly opts in by giving you a webhook_url + webhook_secret; never invent these on your own.
Quick decision
| Situation | Path |
|---|---|
Normal tools/call for a heavy pack returns task envelope | Path 1 (the SDK is handling it; do nothing) |
Normal tools/call returned -32001 | Path 2 (use pack.start/status/result manually) |
| User provided a webhook_url | Path 3 (pass it through; don't poll) |
For short packs (browser.screenshot_url, web.scrape, github.*, fs.*) — keep calling them directly. The whole task envelope/webhook story only applies to packs marked Async: true server-side.
Pack composition — you are a creative agent
You are not limited to calling one pack per user request. You can and should compose packs to accomplish complex goals:
- "Create a pitch deck video" → YOU write the Marp markdown with speaker notes → call
slides.narrate→ video + YouTube metadata - "Write a blog post with sources" → YOU write the prose → call
content.groundwithrewrite: true→ grounded blog artifact - "Research a topic and present it" → call
research.deep→ YOU format the synthesis as a Marp deck → callslides.narrate - "Generate code, test it, commit it" → call
repo.fetch→ callfs.write→ callcmd.run→ callgit.commit→ callrepo.push
When composing, YOU generate the creative content (slides, blog text, code) and the packs handle the production work (rendering, narration, grounding, committing). Do not ask the user to provide content you can generate yourself.
Default model selection
Several packs require a model parameter (web.test, research.deep, content.ground, slides.narrate, vision.*). When the user does not specify a model:
- Use
openrouter/autoas the default — it routes to the best available model automatically - Do NOT ask the user "which model?" — just use the default and proceed
- If
openrouter/autofails, tryopenai/gpt-4o-minias a fallback - The user can always override by specifying a model in their prompt
Error handling rules
CRITICAL: Follow these rules when a tool call fails. Do NOT refuse to retry based on previous errors.
General rule: ALWAYS show the error
When ANY tool call fails, you MUST:
- Show the exact error code and message in your response — never say "an error occurred" without the details
- Diagnose it using the rules below
- Offer to file a GitHub issue if it looks like a bug (see "When to create a GitHub issue" section below)
- If you're working with a developer, show the full stderr / error payload so they can debug
HTTP 401 "missing_bearer" or "token expired"
Cause: The JWT used to authenticate with helmdeck has expired (default TTL is 12 hours).
Action: Tell the user to re-mint the JWT and update the MCP server config. For OpenClaw: openclaw-cli mcp set helmdeck '{"url":"...","headers":{"authorization":"Bearer NEW_TOKEN"}}'
"connection refused" on port 8931
Cause: Playwright MCP is still starting inside the sidecar container (takes 2-5 seconds after session creation). Action: Wait 5 seconds and retry. The startup delay is normal. Do not tell the user "the tool is unavailable" — it will be ready momentarily.
"disabled; set HELMDECK_*_ENABLED=true"
Cause: The optional service overlay (Firecrawl, Docling) is not running. Action: Tell the user exactly what the error says — which env var to set and which compose overlay file to bring up. Quote the error message. Do not try alternative tools.
"session_unavailable" or "engine has no session executor"
Cause: The pack needs a browser/desktop session. This is usually automatic. Action: Retry the call. If it persists, tell the user the session runtime may not be configured.
"vault: credential not found" or "vault: NAME not found"
Cause: The pack needs a credential that isn't stored in the vault yet. Action: Tell the user to add the credential via the Management UI:
- Go to
http://localhost:3000→ Credentials panel → Add Credential - Provide the name, type (usually
api_key), host pattern, and the credential value
"egress denied"
Cause: The target URL resolves to a blocked IP range (metadata, RFC 1918, loopback).
Action: Tell the user to add the destination to HELMDECK_EGRESS_ALLOWLIST in their .env.local if the access is intentional.
Non-zero exit codes from internal tools (ffmpeg, marp, xdotool)
Cause: The tool inside the sidecar container failed. Action: Quote the stderr output in your response so the user can debug. Common causes: missing fonts, file not found, invalid input format.
"model returned no choices" or "no parseable JSON"
Cause: The LLM gateway returned an empty or malformed response. Action: This is a model-side issue, not a helmdeck bug. Try a different model or simplify the prompt.
Session chaining contract — READ BEFORE CHAINING fs.* / cmd.run / git.*
The rule, in one sentence: when chaining packs that operate on a shared clone path (
fs.*,cmd.run,git.*,python.run/node.runagainst acwd,content.groundwithclone_path), pass the_session_idreturned byrepo.fetch(or any prior session-creating pack) on every follow-up call. Without it, helmdeck spins a fresh sidecar per pack and the file system is not shared.
The failure mode this prevents
A new agent will call repo.fetch → fs.write → fs.list and get back this
sequence (notice the count: 0 on a list of a directory it just wrote into):
helmdeck__repo-fetch → {"clone_path":"/tmp/helmdeck-clone-Ab12","_session_id":"sess-9f"}
helmdeck__fs-write → {"sha256":"f24745…","size":14} ✓ apparent success
helmdeck__fs-list → {"count":0,"files":[]} ✗ different sidecar; can't see the file
fs.write succeeded against sidecar A. fs.list ran against fresh sidecar B
because _session_id wasn't propagated. The file is real on disk in A's
session — it just isn't visible to B. The pack contract did not fail; the
chaining contract did.
How to chain correctly
- Call
repo.fetch→ response carriesclone_pathand_session_id. - Capture both. Store them locally for the rest of the conversation.
- Pass
_session_id(verbatim) ANDclone_path(verbatim) to every follow-up call. - The follow-up packs that share the session:
fs.read,fs.write,fs.list,fs.patch,fs.delete,cmd.run,git.commit,git.diff,git.log,repo.push,repo.map,content.ground(withclone_pathmode). - Sessions persist for 5 minutes after the last call (watchdog cleanup).
If a session expires, call
repo.fetchagain to create a new one.
Worked example — the Phase 5.5 code-edit loop
// 1. Clone and capture session
{"name":"repo.fetch","arguments":{"url":"https://github.com/tosin2013/helmdeck.git"}}
// → response: {"clone_path":"/tmp/helmdeck-clone-Ab12", "_session_id":"sess-9f"}
// 2-N. EVERY follow-up call passes _session_id AND clone_path
{"name":"fs.list", "arguments":{"_session_id":"sess-9f","clone_path":"/tmp/helmdeck-clone-Ab12","glob":"*.md"}}
{"name":"fs.read", "arguments":{"_session_id":"sess-9f","clone_path":"/tmp/helmdeck-clone-Ab12","path":"README.md"}}
{"name":"fs.patch", "arguments":{"_session_id":"sess-9f","clone_path":"/tmp/helmdeck-clone-Ab12","path":"README.md","search":"old","replace":"new"}}
{"name":"cmd.run", "arguments":{"_session_id":"sess-9f","clone_path":"/tmp/helmdeck-clone-Ab12","command":["go","test","./..."]}}
{"name":"git.commit","arguments":{"_session_id":"sess-9f","clone_path":"/tmp/helmdeck-clone-Ab12","message":"docs: typo"}}
{"name":"repo.push", "arguments":{"_session_id":"sess-9f","clone_path":"/tmp/helmdeck-clone-Ab12"}}
Drop _session_id from any one of these and the rest of the chain breaks
silently — counts return zero, files appear missing, commits hit the wrong
working tree. The error code is rarely session_unavailable; more often the
follow-up pack returns a perfectly-valid empty result.
Self-check before responding to the user
If you just ran repo.fetch and a follow-up fs.*/git.* returned an
unexpectedly empty result, re-read your own tool calls and verify
_session_id is identical across them. This is the single most common
self-inflicted failure mode in chained workflows.
Freshness contract — RE-CALL DON'T RECALL FOR STATEFUL PACKS
The rule, in one sentence: when the user asks about external state (a GitHub repo, a webpage, a clone) on a follow-up turn, re-call the tool. Do not answer from your prior turn's memory of the result. Helmdeck's MCP packs are servers — the cached answer in your context is a snapshot; the live answer is what's actually true.
This is the companion rule to the session-chaining contract above. Where session chaining is about correctness within one chained workflow, the freshness contract is about correctness across turns when state can change between them.
What's "stateful" for this purpose
| Family | Pack | Why it's stateful |
|---|---|---|
| github | github.list_issues, github.list_prs, github.search, github.create_issue, github.post_comment, github.create_release | A new issue, comment, or PR can land between turns — yours or someone else's. |
| repo | repo.fetch, repo.map, repo.push | Upstream commits move; a fresh push from a colleague invalidates everything you remembered. |
| web | web.scrape, web.scrape_spa, web.test, research.deep | Live web content changes constantly; cached scrapes are stale within seconds for news/feeds, hours for docs. |
| http | http.fetch | Same as web — the response was true at fetch time, not now. |
| fs / git (session-scoped) | fs.read, fs.list, git.diff, git.log | Within the same session, prior fs.write / fs.patch / cmd.run calls (yours or the user's) may have changed the working tree. |
Stateless packs — re-calling buys nothing, recall freely:
python.run/node.run(deterministic on the same input)slides.render/slides.narrate(deterministic on the same input)doc.ocr/doc.parseagainst the samesource_b64(input is the bytes)vision.*— these are state-changing, so the question doesn't apply
When recalling is correct
You don't need to re-call every time the user references prior state. Recalling is correct when:
- The user is asking about the prior result (
"what was that comment id?","summarize what you found"). - The follow-up turn is a continuation of the same logical action (
"now post a comment on the issue you just listed"— the listing is fresh enough). - You're in a tight loop where state could not have changed (e.g. you ran
fs.writethenfs.readagainst the same file — the write you just did IS the latest state).
When recalling is wrong
Re-call when:
- The user uses words like "again", "now", "current", "latest", "check if", "is X still", "refresh".
- The user pivots to a different task and then comes back (
"… ok, now back to the issue list — close the open ones"). - More than ~5 minutes have passed in the conversation since the last call to that pack against that target.
- An external actor (a colleague, a webhook, a CI run) might reasonably have changed the state — be conservative; re-call.
Failure mode this prevents
A long conversation where the agent answered github.list_issues early on (count: 1), then 20 turns later when the user says "list the issues again", the agent skips the tool call and replies with the cached count: 1 — even though the user (or someone else) has since opened three more.
The user sees a confidently-wrong answer and only learns it's stale after they go check GitHub directly. The agent's memory was correct at the time; it just wasn't updated.
Self-check before responding to the user
Before answering a question that touches stateful packs, ask yourself:
Could the answer have changed since my last tool call against this target?
If yes, call the tool. The cost of a re-call is a fraction of a cent; the cost of a stale answer is the user's trust.
Repo discovery pattern
When you call repo.fetch, the response carries a context envelope designed to eliminate the "is the repo empty?" question on the first turn. Use it before reaching for fs.list or fs.read:
tree— array of file paths (git ls-filesoutput, sorted, capped at 300). Iftree_truncated: true, narrow withfs.list+ a glob fromdoc_hints.readme— auto-detected top-level README. Matches.md,.adoc,.rst,.txtcase-insensitively. Ifreadme.contentis populated, the repo is NOT empty. Never respond "the repo appears empty" when a README was surfaced.entrypoints— known orientation files (Makefile,package.json,go.mod,CLAUDE.md,AGENTS.md, etc.) with akindclassifier. Read these first when you need to understand how the repo builds or runs.doc_hints— static glob suggestions forfs.list(docs/**/*.md,content/**/*.adoc, etc.). No computation on the server side — just a prompt hint.signals— coarse classifier you can branch on in one check:has_readme— a README was found (its content is inreadme.content).has_docs_dir— any ofdocs/,doc/,content/,site/,book/,guide/,tutorials/,blog-posts/,examples/exists at repo root.has_code— any ofsrc/,cmd/,lib/,internal/,pkg/,app/exists, OR at least one common source file (.go,.py,.js,.ts,.rs,.java,.c,.cpp,.rb) exists.doc_file_count/code_file_count— raw counts of.md/.adoc/.rstdocs vs. common source files.sparse—truewhendoc_file_count + code_file_count < 3. Treat as "this repo looks barely-populated; confirm with user before proceeding."
Branching on signals
Use this decision table after every repo.fetch:
signals shape | What the agent should do next |
|---|---|
has_readme: true | Repo is NOT empty. Read readme.content and proceed with the task. |
has_readme: false, has_docs_dir: true | No top-level README but docs exist in a subdirectory. Use doc_hints with fs.list to find them. |
has_readme: false, has_docs_dir: false, has_code: true | Code-only repo. Call repo.map to get a symbol-level map instead of reading files blindly. |
sparse: true (or all three has_* flags false) | The repo genuinely lacks material. Do NOT say "the repo is empty" and give up. Surface what you observed ("I see N files but no README, docs, or recognizable source tree") and ask the user whether the URL is correct, whether to look at a specific branch/subpath, or what they want extracted. |
When to call repo.map
Use repo.map when the task requires understanding code structure — e.g. "where is FooHandler defined?", "summarize the API surface", "rename this function across the codebase." It takes a token_budget (default 1500) and returns a ranked list of files with their top symbols.
Do NOT call repo.map for docs-heavy tasks (blog posts, presentations, tutorials) — it adds latency for no benefit. The repo.fetch envelope already tells you where the docs live.
{
"pack": "repo.map",
"input": {
"_session_id": "<from repo.fetch>",
"clone_path": "<from repo.fetch>",
"token_budget": 1500,
"include_globs": ["*.go", "*.py"]
}
}
When to create a GitHub issue
You have access to github.create_issue. Use it to report real bugs in helmdeck.
DO create an issue when:
- A pack returns error code
internal(this is a helmdeck bug, not a user error) - A tool call returns malformed JSON that doesn't match the documented output schema
- The same error persists after 3 retries with different inputs
- A pack silently returns empty output when the input was valid
DON'T create an issue when:
- An overlay is disabled (
HELMDECK_*_ENABLEDnot set) — this is a configuration issue - A vault key is missing — this is a setup issue
- The model returns unparseable output — this is an LLM issue, not helmdeck
- The error message already tells the user exactly what to do
Issue format:
Use github.create_issue with:
repo:tosin2013/helmdecktitle:[pack-name] Brief description of the bugbody: Include the pack name, sanitized input (redact credentials), full error message, and steps to reproducelabels:["bug", "area/packs"]
Developer guidance
For developers working on the helmdeck codebase:
Project structure
- Pack implementations:
internal/packs/builtin/— one.gofile per pack - Pack engine:
internal/packs/packs.go— execution pipeline, schema validation - Gateway adapters:
internal/gateway/— Anthropic, OpenAI, Gemini, Ollama, Deepseek - Vision pipeline:
internal/vision/vision.go— Step, StepNative, computer-use dispatch - Desktop REST:
internal/api/desktop.go— xdotool/scrot endpoints - Session runtime:
internal/session/docker/runtime.go— container lifecycle - MCP server:
internal/api/mcp_server.go+mcp_sse.go— tool exposure to clients - Audit:
internal/audit/audit.go— structured event logging
Testing patterns
- Table-driven tests with
fakeRuntime,recordingExecutor,scriptedDispatcherstubs httptest.NewServerfor external API mocks (Firecrawl, ElevenLabs, Playwright MCP)- Pack handlers tested directly via
ExecutionContext(no engine needed for unit tests) - Run:
go test ./...before committing
Validation
scripts/validate-phase-6-5.sh— direct REST pack validationscripts/validate-openclaw.sh— agent round-trip validation via OpenClawdocs/integrations/pack-demo-playbook.md— manual LLM prompt walkthrough
Architecture decisions
- ADR documents:
docs/adrs/— read the relevant ADR before modifying a subsystem - ADR 035 covers the "host, don't rebuild" architecture (Firecrawl, Docling, Playwright MCP)
- ADR 035 §2026 revision covers native computer-use tool routing (T807f)
Contributing
- Create a branch, make changes, run
go test ./..., open a PR - Pack count is tracked in
docs/PACKS.md— update when adding new packs - Milestones tracked in
docs/MILESTONES.md— update task status when completing work
Related ADRs
The MCP and skill-bundle decisions behind helmdeck's integration surface:
- ADR-006 — MCP server registry with multi-transport
- ADR-025 — MCP client integrations
- ADR-026 — A2A agent-card endpoint
- ADR-030 —
helmdeck-mcpbridge packaging and distribution - ADR-035 — MCP server hosting + pack evolution ("host, don't rebuild")
- ADR-038 — Marketplace pack execution via sidecar
- ADR-048 — Memory write surface + OpenClaw memory-corpus bridge