Skip to main content

Helmdeck — Built-in Capability Pack Reference

36 packs ship in the control plane binary. All are available as MCP tools (via /api/v1/mcp/sse or /api/v1/mcp/ws) and as REST endpoints (POST /api/v1/packs/<name>).

Quick reference

PackSession?EngineInput (key fields)Output (key fields)
Browser
browser.screenshot_urlchromedp{url}{artifact_key, size} + PNG artifact
browser.interactchromedp{url, actions[]}{steps_completed, screenshots[], extractions{}, assertions_passed}
Web
web.scrape_spachromedp{url, fields{name: {selector, format}}}{data{}, missing[]}
web.scrapeFirecrawl{url, formats?, wait_ms?}{markdown, html?, title?, links?, status} — requires HELMDECK_FIRECRAWL_ENABLED=true
web.testPlaywright MCP + LLM{url, instruction, model, max_steps?, assertions?}{completed, steps[], steps_used, final_snapshot, assertions_passed, reason} — needs a session whose playwright_mcp_endpoint is populated (T807a)
research.deepFirecrawl + LLM{query, model, limit?, max_tokens?}{query, sources[], synthesis, model} — requires HELMDECK_FIRECRAWL_ENABLED=true
content.groundLLM + Firecrawl{clone_path, path, model, max_claims?, topic?}{path, claims_considered, claims_grounded, grounding[], skipped[], sha256, file_changed} — requires HELMDECK_FIRECRAWL_ENABLED=true
Filesystem
fs.readsession exec{clone_path, path}{content, sha256, size}
fs.writesession exec{clone_path, path, content}{sha256, size}
fs.listsession exec{clone_path, path?, glob?}{files[], count}
fs.patchsession exec{clone_path, path, search, replace}{applied, sha256}
fs.deletesession exec{clone_path, path}{deleted, path}
Shell
cmd.runsession exec{clone_path, command[]}{stdout, stderr, exit_code}
Git
git.commitsession exec{clone_path, message, all?}{commit}
git.diffsession exec{clone_path, staged?}{diff, files_changed}
git.logsession exec{clone_path, count?}{log, count}
Repository
repo.fetchsession exec + vault{url, ref?, depth?, credential?}{clone_path, commit, files, session_id, tree[], tree_total, tree_truncated, readme{path,content,truncated}, entrypoints[], doc_hints[], signals{has_readme,has_docs_dir,has_code,doc_file_count,code_file_count,sparse}} — context envelope (ADR 022 §2026-04-15 revision) so agents orient on the first turn
repo.mapsession exec + ctags + python3{clone_path, token_budget?, include_globs?}{map, tokens_estimated, files_covered, files_total} — Aider-style structural symbol map (ADR 036)
repo.pushsession exec + vault{clone_path, remote?, branch?, force?, credential?}{url, branch, commit}
HTTP
http.fetchGo HTTP + vault{url, method?, headers?, body?}{status, headers, body}
GitHub
github.create_issueGitHub REST{repo, title, body?, labels?}{number, url, html_url}
github.list_issuesGitHub REST{repo, state?, labels?, assignee?}{issues[], count}
github.list_prsGitHub REST{repo, state?, head?, base?}{prs[], count}
github.post_commentGitHub REST{repo, issue_number, body}{id, url}
github.create_releaseGitHub REST{repo, tag, name?, body?, draft?}{id, url, upload_url}
github.searchGitHub REST{query, type?}{total_count, items[]}
Slides
slides.renderMarp + Chromium{markdown, format}{artifact_key} + PDF/PPTX artifact
slides.narrateMarp + ElevenLabs + ffmpeg + LLM{markdown, voice_id?, model_id?, resolution?, fade_ms?, metadata_model?}{video_artifact_key, video_size, slide_count, total_duration_s, has_narration, voice_used?, metadata_artifact_key?, metadata?} — MP4 video with per-slide TTS narration from <!-- speaker notes --> + YouTube metadata (title, description with timestamps, tags). ElevenLabs API key from vault elevenlabs-key; degrades to silent video when missing.
Document
doc.ocrTesseract{image_path}{text}
doc.parseDocling{source_url OR source_b64+filename, formats?, do_ocr?, ocr_lang?}{source, markdown, text?, html?, status, processing_time} — requires HELMDECK_DOCLING_ENABLED=true
Desktop
desktop.run_app_and_screenshotXvfb + xdotool{command, args?}{artifact_key} + PNG artifact
(desktop REST primitives)xdotool / scrot / ffmpegT807f: 15 endpoints under /api/v1/desktop/ — screenshot, click, type, key, launch, windows, focus, double_click, triple_click, drag, scroll, modifier_click, mouse_move, wait, zoom + agent_status for noVNC witness mode. Used by vision.* native tool-use path.
Vision
vision.click_anywherescreenshot + LLM (native tool-use for Anthropic/OpenAI/Gemini; JSON-prompt fallback for Ollama/Deepseek){goal, model, max_steps?}{completed, steps, final_action} — T807f: uses provider-native computer-use tool schema when available, per-step screenshot artifacts uploaded for replay
vision.extract_visible_textscreenshot + LLM{model}{text, model}
vision.fill_form_by_labelscreenshot + LLM{model, fields{label: value}, max_steps?}{completed, fields_filled, steps}
Language
python.runPython sidecar{code}{stdout, stderr, exit_code}
node.runNode sidecar{code}{stdout, stderr, exit_code}

Session? = requires a sidecar container. Packs with use _session_id for session pinning across chained calls.

Session pinning

Packs that need a session container can be chained via the _session_id field:

1. repo.fetch → returns {session_id, clone_path}
2. fs.list {clone_path, _session_id: "<from step 1>"}
3. fs.read {clone_path, path: "README", _session_id: "<from step 1>"}
4. fs.patch {clone_path, path: "README", search: "old", replace: "new", _session_id}
5. git.diff {clone_path, _session_id}
6. git.commit{clone_path, message: "fix", all: true, _session_id}
7. repo.push {clone_path, credential: "github-token", _session_id}

repo.fetch sets PreserveSession: true so its session persists for follow-on packs. All other session packs terminate their session on return unless _session_id pins to an existing one. Abandoned sessions are cleaned up by the watchdog after the default 5-minute timeout.

Credential handling

Packs that access external services use vault-stored credentials via the credential field:

  • SSH packs (repo.fetch/repo.push with SSH URLs): auto-resolve from vault by host match
  • HTTPS packs (repo.fetch/repo.push with HTTPS URLs): pass "credential": "github-token" to name a vault entry
  • GitHub packs: default to vault entry github-token if it exists; work without auth for public repo reads
  • HTTP fetch: use ${vault:NAME} placeholder syntax in headers/body — the control plane substitutes before sending
  • ElevenLabs TTS (slides.narrate): reads vault entry elevenlabs-key at handler time. When missing, video renders with silence instead of narration. Add via the Vault panel → Name: elevenlabs-key, Type: api_key, Host: api.elevenlabs.io

Artifact handling

Packs that produce files (screenshots, PDFs, OCR source images) upload them to the S3-compatible artifact store (Garage). The response includes:

  • artifact_key — the storage key (e.g. browser.screenshot_url/abc123-screenshot.png)
  • A signed URL for download (expires in 15 min)

The Artifact Explorer panel at /artifacts in the Management UI lists all artifacts with inline image preview and download.

For MCP clients: when the artifact is an image under 1 MB, the MCP response includes a type: "image" content block with base64-encoded bytes (T302b) so vision-capable LLMs can see the screenshot in one round trip.

Upcoming packs

No packs are currently in the upcoming queue — Phase 6.5 is feature-complete. Next phase: v1.0 — Kubernetes & GA (Phase 7), see docs/MILESTONES.md.

Source files

All packs live in internal/packs/builtin/:

FilePacks
browser_interact.gobrowser.interact
screenshot_url.gobrowser.screenshot_url
scrape_spa.goweb.scrape_spa
web_scrape.goweb.scrape
webtest.goweb.test
research_deep.goresearch.deep
content_ground.gocontent.ground
doc_parse.godoc.parse
fs_packs.gofs.*, cmd.run, git.*
repo_fetch.gorepo.fetch
repo_push.gorepo.push
http_fetch.gohttp.fetch
github.gogithub.*
slides_render.goslides.render
slides_narrate.goslides.narrate
slides_notes.go(speaker notes parser for slides.narrate)
doc_ocr.godoc.ocr
desktop_run_app.godesktop.run_app_and_screenshot
vision_packs.govision.*
python_run.gopython.run
node_run.gonode.run