Helmdeck — Built-in Capability Pack Reference

57 packs ship in the control plane binary (47 without an AI gateway configured — the 10 gateway-gated packs are the LLM/vision packs). All are available as MCP tools (via /api/v1/mcp/sse or /api/v1/mcp/ws) and as REST endpoints (POST /api/v1/packs/<name>).

Quick reference

Pack	Session?	Engine	Input (key fields)	Output (key fields)
Orchestration (meta-packs)
`helmdeck.route`	❌	LLM + catalog metadata + memory	`{user_intent, model, context?, max_tokens?}`	`{recommendation{kind,id,suggested_inputs}, alternatives[], gap_warning?, reasoning, model}` — recommends the best pack/pipeline for an intent; emits `gap_warning` when nothing fits. Needs a gateway.
`helmdeck.plan`	❌	LLM + catalog metadata + llmcontext	`{user_intent, model, context?, max_tokens?}`	`{steps[], rewritten_prompt, complexity, reasoning, compaction?, model}` — decomposes a multi-intent prompt into ordered tool/pipeline calls. Needs a gateway.
`helmdeck.memory_store`	❌	memory store	`{key, value, category?, tags?, ttl_seconds?}`	`{key, category, expires_at}` — persist a durable user fact (default category `user_facts`, 90-day TTL; min 1h / max 365d). Reserved categories `pack_history`/`pipeline_history` reject.
`helmdeck.memory_forget`	❌	memory store	`{scope?}`	`{scope, deleted}` — erase the caller's routing/audit history. `scope` ∈ `all` / `packs` / `pipelines` / `pack:<id>` / `pipeline:<id>` / `key:<exact>`. Never touches pack caches or vault.
Browser
`browser.screenshot_url`	✅	chromedp	`{url}`	`{artifact_key, size}` + PNG artifact
`browser.interact`	✅	chromedp	`{url, actions[]}`	`{steps_completed, screenshots[], extractions{}, assertions_passed}`
Web
`web.scrape_spa`	✅	chromedp	`{url, fields{name: {selector, format}}}`	`{data{}, missing[]}`
`web.scrape`	❌	Firecrawl	`{url, formats?, wait_ms?}`	`{markdown, html?, title?, links?, status}` — requires `HELMDECK_FIRECRAWL_ENABLED=true`
`web.test`	✅	Playwright MCP + LLM	`{url, instruction, model, max_steps?, assertions?}`	`{completed, steps[], steps_used, final_snapshot, assertions_passed, reason}` — needs a session whose `playwright_mcp_endpoint` is populated (T807a)
`research.deep`	❌	Firecrawl + LLM	`{query, model, limit?, max_tokens?}`	`{query, sources[], synthesis, model}` — requires `HELMDECK_FIRECRAWL_ENABLED=true`
`content.ground`	✅	LLM + Firecrawl	`{clone_path, path, model, max_claims?, topic?}`	`{path, claims_considered, claims_grounded, grounding[], skipped[], sha256, file_changed}` — requires `HELMDECK_FIRECRAWL_ENABLED=true`
Filesystem
`fs.read`	✅	session exec	`{clone_path, path}`	`{content, sha256, size}`
`fs.write`	✅	session exec	`{clone_path, path, content}`	`{sha256, size}`
`fs.list`	✅	session exec	`{clone_path, path?, glob?}`	`{files[], count}`
`fs.patch`	✅	session exec	`{clone_path, path, search, replace}`	`{applied, sha256}`
`fs.delete`	✅	session exec	`{clone_path, path}`	`{deleted, path}`
Shell
`cmd.run`	✅	session exec	`{clone_path, command[]}`	`{stdout, stderr, exit_code}`
Git
`git.commit`	✅	session exec	`{clone_path, message, all?}`	`{commit}`
`git.diff`	✅	session exec	`{clone_path, staged?}`	`{diff, files_changed}`
`git.log`	✅	session exec	`{clone_path, count?}`	`{log, count}`
Repository
`repo.fetch`	✅	session exec + vault	`{url, ref?, depth?, credential?}`	`{clone_path, commit, files, session_id, tree[], tree_total, tree_truncated, readme{path,content,truncated}, entrypoints[], doc_hints[], signals{has_readme,has_docs_dir,has_code,doc_file_count,code_file_count,sparse}}` — context envelope (ADR 022 §2026-04-15 revision) so agents orient on the first turn
`repo.map`	✅	session exec + ctags + python3	`{clone_path, token_budget?, include_globs?}`	`{map, tokens_estimated, files_covered, files_total}` — Aider-style structural symbol map (ADR 036)
`repo.push`	✅	session exec + vault	`{clone_path, remote?, branch?, force?, credential?}`	`{url, branch, commit}`
SWE
`swe.solve`	✅	session exec + LLM + git	`{repo_url OR clone_path, task, model, output_mode?, base?, branch?}`	`{output_mode, summary, branch?, commit?, pr_url?, patch?}` — autonomous code-edit agent; `output_mode` ∈ `patch`/`branch`/`pull_request`. Backs the `repo-solve-*` and `issue-to-pr` pipelines.
HTTP
`http.fetch`	❌	Go HTTP + vault	`{url, method?, headers?, body?}`	`{status, headers, body}`
Communication
`email.send`	❌	Resend API + vault	`{to, from?, subject?, html?, cc?, bcc?, reply_to?}`	`{message_id}` — send a transactional email. Vault credential `resend-api-key`.
GitHub
`github.create_issue`	❌	GitHub REST	`{repo, title, body?, labels?}`	`{number, url, html_url}`
`github.list_issues`	❌	GitHub REST	`{repo, state?, labels?, assignee?}`	`{issues[], count}`
`github.get_issue`	❌	GitHub REST (5-min cache)	`{repo, issue_number, credential?}`	`{number, title, body, state, labels[], html_url, user}` — read one issue; pairs with `swe.solve` for issue→PR.
`github.list_prs`	❌	GitHub REST	`{repo, state?, head?, base?}`	`{prs[], count}`
`github.create_pr`	❌	GitHub REST	`{repo, head, base, title, body?, draft?, credential?}`	`{number, url, html_url}` — open a PR; final step of `swe.solve`'s `pull_request` mode.
`github.post_comment`	❌	GitHub REST	`{repo, issue_number, body}`	`{id, url}`
`github.create_release`	❌	GitHub REST	`{repo, tag, name?, body?, draft?}`	`{id, url, upload_url}`
`github.search`	❌	GitHub REST	`{query, type?}`	`{total_count, items[]}`
Slides
`slides.outline`	✅	LLM	`{content, title?, author?, persona?, model}`	`{markdown, persona_used, has_title_slide}` — restate prose as a structured Marp deck (feed this to `slides.render`/`narrate`). Needs a gateway.
`slides.render`	✅	Marp + Chromium + mmdc	`{markdown, format, mermaid?, hero_image_prompt?, hero_image_model?}`	`{artifact_key, hero_image_model_used?}` + PDF/PPTX artifact — `mermaid:true` (default) pre-renders ```mermaid fences to inline SVG; `hero_image_prompt` (v0.12.0 #146) chains `image.generate` and base64-inlines the result before slide 1.
`slides.narrate`	✅	Marp + ElevenLabs + ffmpeg + LLM	`{markdown, voice_id?, model_id?, resolution?, fade_ms?, metadata_model?, hero_image_prompt?, hero_image_model?, captions_sidecar?, captions_burn_in?, validate?}`	`{video_artifact_key, video_size, slide_count, total_duration_s, has_narration, voice_used?, engagement?, engagement_artifact_key?, captions_artifact_key?, captions_burned_in, validation?, validation_artifact_key, hero_image_model_used?}` — MP4 video with per-slide TTS narration from `<!-- speaker notes -->` + YouTube engagement metadata (`engagement` object renamed from `metadata` in v0.26.0). `hero_image_prompt` (v0.12.0 #146) inlines a chained hero image INTO slide 1 (no separator, preserves narration). `captions_sidecar` default-on emits an SRT artifact for YouTube/Vimeo CC auto-import (PR #425); `captions_burn_in:true` renders subtitles into every frame via libass (visible always-on). `validate:true` default-on (PR #432) runs `av.validate` as a post-step and embeds the structured `validation` report in the output. ElevenLabs API key from vault `elevenlabs-key`.
Blog
`blog.rewrite_for_audience`	❌	LLM	`{source_content, audience, model, angle?, title?, persona?, max_tokens?}`	`{markdown, persona_used, model}` — translate a source doc into an original blog post for a stated audience/angle (not a summarizer). Generator at the heart of the `*-rewrite-blog` pipelines. Needs a gateway.
`blog.publish`	❌	Ghost Admin API + goldmark + LLM	`{destination, format, title, body OR (prompt+model), tags?, status?, published_at?, host?, credential?, feature_image_artifact_key?, hero_image?, hero_image_prompt?, hero_image_model?}`	`{destination, format, body_source, model_used?, hero_image_model_used?}` + ghost: `{post_id, url, html_url, status, published_at, feature_image_url?}` OR artifact: `{artifact_key, size, feature_image_artifact_key?}` — publishes to a Ghost blog (live API) or stores rendered markdown/HTML as a helmdeck artifact. Two body modes (agent supplies body OR prompt+model the pack expands via LLM). Feature image is operator-supplied via `feature_image_artifact_key` OR auto-generated via `hero_image:true` (v0.12.0 #146); Ghost-mode uploads via `/images/upload/` then stamps `feature_image`. Ghost vault credential `ghost-admin-key` (id:hexsecret).
Podcast
`podcast.generate`	✅	ElevenLabs TTS + ffmpeg + LLM (engine-pluggable)	`{speakers, script OR (prompt+model) OR (source_url/source_text+model), engine?, model_id?, theme?, duration_target_min?, silence_between_turns_ms?, generate_cover_prompt?, cover_image?, cover_image_model?, metadata_model?, cta_style?, language?, validate?}`	`{engine, audio_artifact_key, audio_size, duration_s, speaker_count, turn_count, script_source, model_used?, voices_used, has_narration, theme, cover_image_prompt?, cover_image_artifact_key?, cover_image_model_used?, engagement?, engagement_artifact_key?, validation?, validation_artifact_key}` — multi-speaker (1..N) podcast MP3. Three input modes: agent-supplied script, prompt+model (LLM generates dialogue), or long-form content (URL/text → LLM converts). Five themes (`interview`/`debate`/`news-roundup`/`deep-dive`/`solo-essay`) bake in podcast best practices. `cover_image:true` (v0.12.0 #146) auto-generates cover artwork via `image.generate`. `metadata_model` default-on (`openrouter/auto`) emits Apple-Podcasts-shaped engagement metadata (title/subtitle/show_notes_md/chapters/hook_30s/cta); pass `""` to disable. `validate:true` default-on (PR #432) runs `av.validate` post-concat and embeds the structured `validation` report. Day 1: ElevenLabs only (vault `elevenlabs-key`); future engines (PlayHT, Hume.ai, Resemble.ai) slot in via `engine` field. Silent-fallback when key missing.
AV utilities
`av.validate`	❌	ffprobe + libavfilter (`silencedetect`/`blackdetect`/`freezedetect`/`ebur128`) + python3	`{video_artifact_key? OR video_path?, audio_artifact_key? OR audio_path?, captions_artifact_key? OR captions_path?, ebur128_target?, skip_checks?, strict?}`	`{validation: {checks[], passed, failed, warnings, all_passed}, validation_artifact_key}` — structured AV-artifact validator (PR #430). 13-check set: faststart, codec pin, bitstream decode, packet contiguity, RMS sweep, LUFS, silence/black/freeze runs, audio↔video duration parity, SRT format. Severity model: `fail` (matches a shipped bug fix) / `warn` (soft heuristic) / `pass`. Default soft-surface — checks fail land in the `validation` field, pack returns success; pass `strict:true` to surface `fail`-severity failures as a typed `CodeArtifactFailed` (CI publish-gate use case). Default-on as a post-step on `slides.narrate` + `podcast.generate` (PR #432). See ADR 052.
Image / Stock
`image.generate`	❌	fal.ai sync `fal.run` (engine-pluggable)	`{prompt, engine?, model?, image_size?, num_images?, seed?, credential?}`	`{image_artifact_key, image_size, engine, model_used, prompt_used, seed_used?, image_artifact_keys?}` — text → image. Day 1: fal.ai only (vault `fal-key`, `HELMDECK_FAL_KEY`); default model `fal-ai/flux/schnell` (~$0.003/image, 1-3s). 1-4 images per call. `engine` field reserved for Replicate as a community PR. Hard-fails when credential missing.
`stock.search`	❌	Pexels API + vault	`{query, count?, orientation?, size?, color?}`	`{photos[{artifact_key, photographer, photographer_url, source_url, width, height, alt_text}]}` — real (non-AI) stock photos. Same chained-input contract as `image.generate`. Vault `pexels-key` (or `HELMDECK_PEXELS_API_KEY`).
Video (HyperFrames)
`hyperframes.compose`	✅	LLM	`{description, aspect_ratio?, audio_url?, model}`	`{composition_html}` — generate a HyperFrames composition (canvas + GSAP scaffolding) from a plain-language description. Feed `composition_html` to `hyperframes.render`. Needs a gateway.
`hyperframes.render`	✅	headless Chromium + ffmpeg	`{composition_html, resolution?, aspect_ratio?}`	`{video_artifact_key, video_size, duration_s, has_audio}` — render an HTML/CSS/JS composition into a deterministic MP4. Short-form only (≤12 min @ 1080p, 512 MiB cap). `Async: true`.
Document
`doc.ocr`	✅	Tesseract	`{image_path}`	`{text}`
`doc.parse`	❌	Docling	`{source_url OR source_b64+filename, formats?, do_ocr?, ocr_lang?}`	`{source, markdown, text?, html?, status, processing_time}` — requires `HELMDECK_DOCLING_ENABLED=true`
Desktop
`desktop.run_app_and_screenshot`	✅	Xvfb + xdotool	`{command, args?}`	`{artifact_key}` + PNG artifact
(desktop REST primitives)	✅	xdotool / scrot / ffmpeg	T807f: 15 endpoints under `/api/v1/desktop/` — screenshot, click, type, key, launch, windows, focus, double_click, triple_click, drag, scroll, modifier_click, mouse_move, wait, zoom + agent_status for noVNC witness mode. Used by `vision.*` native tool-use path.
Vision
`vision.click_anywhere`	✅	screenshot + LLM (native tool-use for Anthropic/OpenAI/Gemini; JSON-prompt fallback for Ollama/Deepseek)	`{goal, model, max_steps?}`	`{completed, steps, final_action}` — T807f: uses provider-native computer-use tool schema when available, per-step screenshot artifacts uploaded for replay
`vision.extract_visible_text`	✅	screenshot + LLM	`{model}`	`{text, model}`
`vision.fill_form_by_label`	✅	screenshot + LLM	`{model, fields{label: value}, max_steps?}`	`{completed, fields_filled, steps}`
Language
`python.run`	✅	Python sidecar	`{code}`	`{stdout, stderr, exit_code}`
`node.run`	✅	Node sidecar	`{code}`	`{stdout, stderr, exit_code}`

Session? = requires a sidecar container. Packs with ✅ use _session_id for session pinning across chained calls.

Session pinning

Packs that need a session container can be chained via the _session_id field:

repo.fetch → returns {session_id, clone_path}
fs.list   {clone_path, _session_id: "<from step 1>"}
fs.read   {clone_path, path: "README", _session_id: "<from step 1>"}
fs.patch  {clone_path, path: "README", search: "old", replace: "new", _session_id}
git.diff  {clone_path, _session_id}
git.commit{clone_path, message: "fix", all: true, _session_id}
repo.push {clone_path, credential: "github-token", _session_id}

repo.fetch sets PreserveSession: true so its session persists for follow-on packs. All other session packs terminate their session on return unless _session_id pins to an existing one. Abandoned sessions are cleaned up by the watchdog after the default 5-minute timeout.

Credential handling

Packs that access external services use vault-stored credentials via the credential field:

SSH packs (repo.fetch/repo.push with SSH URLs): auto-resolve from vault by host match
HTTPS packs (repo.fetch/repo.push with HTTPS URLs): pass "credential": "github-token" to name a vault entry
GitHub packs: default to vault entry github-token if it exists; work without auth for public repo reads
HTTP fetch: use ${vault:NAME} placeholder syntax in headers/body — the control plane substitutes before sending
ElevenLabs TTS (slides.narrate): reads vault entry elevenlabs-key at handler time. When missing, video renders with silence instead of narration. Add via the Vault panel → Name: elevenlabs-key, Type: api_key, Host: api.elevenlabs.io

Artifact handling

Packs that produce files (screenshots, PDFs, OCR source images) upload them to the S3-compatible artifact store (Garage). The response includes:

artifact_key — the storage key (e.g. browser.screenshot_url/abc123-screenshot.png)
A signed URL for download (expires in 15 min)

The Artifact Explorer panel at /artifacts in the Management UI lists all artifacts with inline image preview and download.

For MCP clients: when the artifact is an image under 1 MB, the MCP response includes a type: "image" content block with base64-encoded bytes (T302b) so vision-capable LLMs can see the screenshot in one round trip.

Gateway-gated packs

10 of the 57 packs require an AI gateway (a configured chat-completion provider). Without one, the binary registers 47 packs and these are absent: vision.click_anywhere, vision.extract_visible_text, vision.fill_form_by_label, web.test, research.deep, content.ground, slides.outline, blog.rewrite_for_audience, hyperframes.compose, slides.narrate. The newest pack, av.validate, has no gateway dependency (ffprobe + libavfilter + python3 are baked into the sidecar image).

Beyond the built-ins, operators can register cmd.* subprocess packs (HELMDECK_COMMAND_PACKS_DIR) and install community packs from the marketplace (helmdeck pack install <name>); both appear in tools/list at runtime.

Source files

All packs live in internal/packs/builtin/. Registration happens in cmd/control-plane/main.go:

File	Packs
`route.go`	`helmdeck.route`
`plan.go`	`helmdeck.plan`
`memory_store.go`	`helmdeck.memory_store`
`memory_forget.go`	`helmdeck.memory_forget`
`browser_interact.go`	`browser.interact`
`screenshot_url.go`	`browser.screenshot_url`
`scrape_spa.go`	`web.scrape_spa`
`web_scrape.go`	`web.scrape`
`webtest.go`	`web.test`
`research_deep.go`	`research.deep`
`content_ground.go`	`content.ground`
`doc_parse.go`	`doc.parse`
`fs_packs.go`	`fs.`, `cmd.run`, `git.`
`repo_fetch.go`	`repo.fetch`
`repo_map.go`	`repo.map`
`repo_push.go`	`repo.push`
`swe_solve.go`	`swe.solve`
`http_fetch.go`	`http.fetch`
`email_send.go`	`email.send`
`image_generate.go`	`image.generate`
`stock_search.go`	`stock.search`
`github.go`	`github.*` (incl. `get_issue`, `create_pr`)
`slides_outline.go`	`slides.outline`
`slides_render.go`	`slides.render`
`slides_narrate.go`	`slides.narrate`
`slides_notes.go`	(speaker notes parser for `slides.narrate` — not a pack)
`blog_publish.go`	`blog.publish`
`blog_rewrite_for_audience.go`	`blog.rewrite_for_audience`
`podcast_generate.go`	`podcast.generate`
`av_validate.go`	`av.validate`
`hyperframes_compose.go`	`hyperframes.compose`
`hyperframes_render.go`	`hyperframes.render`
`doc_ocr.go`	`doc.ocr`
`desktop_run_app.go`	`desktop.run_app_and_screenshot`
`vision_packs.go`	`vision.*`
`python_run.go`	`python.run`
`node_run.go`	`node.run`

Architectural decisions behind helmdeck's pack model and per-pack design:

ADR-003 — Capability packs as the primary product surface
ADR-012 — Wasm executor for pack handlers
ADR-014 / ADR-015 — Slides packs (render + video)
ADR-016 / ADR-017 / ADR-020 / ADR-021 — Web automation packs
ADR-018 — Desktop runtime + screenshot pack
ADR-019 — Document OCR pack
ADR-022 / ADR-023 / ADR-036 — Repository packs
ADR-024 — User-authored pack extensibility
ADR-027 — Dual-mode action API (structured + vision)
ADR-034 — Pack marketplace
ADR-038 — Marketplace pack execution via sidecar
ADR-045 — Pack resource sizing via CPU profiles

Quick reference​

Session pinning​

Credential handling​

Artifact handling​

Gateway-gated packs​

Source files​

Related ADRs​