Skip to main content

Helmdeck — Built-in Capability Pack Reference

57 packs ship in the control plane binary (47 without an AI gateway configured — the 10 gateway-gated packs are the LLM/vision packs). All are available as MCP tools (via /api/v1/mcp/sse or /api/v1/mcp/ws) and as REST endpoints (POST /api/v1/packs/<name>).

Quick reference

PackSession?EngineInput (key fields)Output (key fields)
Orchestration (meta-packs)
helmdeck.routeLLM + catalog metadata + memory{user_intent, model, context?, max_tokens?}{recommendation{kind,id,suggested_inputs}, alternatives[], gap_warning?, reasoning, model} — recommends the best pack/pipeline for an intent; emits gap_warning when nothing fits. Needs a gateway.
helmdeck.planLLM + catalog metadata + llmcontext{user_intent, model, context?, max_tokens?}{steps[], rewritten_prompt, complexity, reasoning, compaction?, model} — decomposes a multi-intent prompt into ordered tool/pipeline calls. Needs a gateway.
helmdeck.memory_storememory store{key, value, category?, tags?, ttl_seconds?}{key, category, expires_at} — persist a durable user fact (default category user_facts, 90-day TTL; min 1h / max 365d). Reserved categories pack_history/pipeline_history reject.
helmdeck.memory_forgetmemory store{scope?}{scope, deleted} — erase the caller's routing/audit history. scopeall / packs / pipelines / pack:<id> / pipeline:<id> / key:<exact>. Never touches pack caches or vault.
Browser
browser.screenshot_urlchromedp{url}{artifact_key, size} + PNG artifact
browser.interactchromedp{url, actions[]}{steps_completed, screenshots[], extractions{}, assertions_passed}
Web
web.scrape_spachromedp{url, fields{name: {selector, format}}}{data{}, missing[]}
web.scrapeFirecrawl{url, formats?, wait_ms?}{markdown, html?, title?, links?, status} — requires HELMDECK_FIRECRAWL_ENABLED=true
web.testPlaywright MCP + LLM{url, instruction, model, max_steps?, assertions?}{completed, steps[], steps_used, final_snapshot, assertions_passed, reason} — needs a session whose playwright_mcp_endpoint is populated (T807a)
research.deepFirecrawl + LLM{query, model, limit?, max_tokens?}{query, sources[], synthesis, model} — requires HELMDECK_FIRECRAWL_ENABLED=true
content.groundLLM + Firecrawl{clone_path, path, model, max_claims?, topic?}{path, claims_considered, claims_grounded, grounding[], skipped[], sha256, file_changed} — requires HELMDECK_FIRECRAWL_ENABLED=true
Filesystem
fs.readsession exec{clone_path, path}{content, sha256, size}
fs.writesession exec{clone_path, path, content}{sha256, size}
fs.listsession exec{clone_path, path?, glob?}{files[], count}
fs.patchsession exec{clone_path, path, search, replace}{applied, sha256}
fs.deletesession exec{clone_path, path}{deleted, path}
Shell
cmd.runsession exec{clone_path, command[]}{stdout, stderr, exit_code}
Git
git.commitsession exec{clone_path, message, all?}{commit}
git.diffsession exec{clone_path, staged?}{diff, files_changed}
git.logsession exec{clone_path, count?}{log, count}
Repository
repo.fetchsession exec + vault{url, ref?, depth?, credential?}{clone_path, commit, files, session_id, tree[], tree_total, tree_truncated, readme{path,content,truncated}, entrypoints[], doc_hints[], signals{has_readme,has_docs_dir,has_code,doc_file_count,code_file_count,sparse}} — context envelope (ADR 022 §2026-04-15 revision) so agents orient on the first turn
repo.mapsession exec + ctags + python3{clone_path, token_budget?, include_globs?}{map, tokens_estimated, files_covered, files_total} — Aider-style structural symbol map (ADR 036)
repo.pushsession exec + vault{clone_path, remote?, branch?, force?, credential?}{url, branch, commit}
SWE
swe.solvesession exec + LLM + git{repo_url OR clone_path, task, model, output_mode?, base?, branch?}{output_mode, summary, branch?, commit?, pr_url?, patch?} — autonomous code-edit agent; output_modepatch/branch/pull_request. Backs the repo-solve-* and issue-to-pr pipelines.
HTTP
http.fetchGo HTTP + vault{url, method?, headers?, body?}{status, headers, body}
Communication
email.sendResend API + vault{to, from?, subject?, html?, cc?, bcc?, reply_to?}{message_id} — send a transactional email. Vault credential resend-api-key.
GitHub
github.create_issueGitHub REST{repo, title, body?, labels?}{number, url, html_url}
github.list_issuesGitHub REST{repo, state?, labels?, assignee?}{issues[], count}
github.get_issueGitHub REST (5-min cache){repo, issue_number, credential?}{number, title, body, state, labels[], html_url, user} — read one issue; pairs with swe.solve for issue→PR.
github.list_prsGitHub REST{repo, state?, head?, base?}{prs[], count}
github.create_prGitHub REST{repo, head, base, title, body?, draft?, credential?}{number, url, html_url} — open a PR; final step of swe.solve's pull_request mode.
github.post_commentGitHub REST{repo, issue_number, body}{id, url}
github.create_releaseGitHub REST{repo, tag, name?, body?, draft?}{id, url, upload_url}
github.searchGitHub REST{query, type?}{total_count, items[]}
Slides
slides.outlineLLM{content, title?, author?, persona?, model}{markdown, persona_used, has_title_slide} — restate prose as a structured Marp deck (feed this to slides.render/narrate). Needs a gateway.
slides.renderMarp + Chromium + mmdc{markdown, format, mermaid?, hero_image_prompt?, hero_image_model?}{artifact_key, hero_image_model_used?} + PDF/PPTX artifact — mermaid:true (default) pre-renders ```mermaid fences to inline SVG; hero_image_prompt (v0.12.0 #146) chains image.generate and base64-inlines the result before slide 1.
slides.narrateMarp + ElevenLabs + ffmpeg + LLM{markdown, voice_id?, model_id?, resolution?, fade_ms?, metadata_model?, hero_image_prompt?, hero_image_model?, captions_sidecar?, captions_burn_in?, validate?}{video_artifact_key, video_size, slide_count, total_duration_s, has_narration, voice_used?, engagement?, engagement_artifact_key?, captions_artifact_key?, captions_burned_in, validation?, validation_artifact_key, hero_image_model_used?} — MP4 video with per-slide TTS narration from <!-- speaker notes --> + YouTube engagement metadata (engagement object renamed from metadata in v0.26.0). hero_image_prompt (v0.12.0 #146) inlines a chained hero image INTO slide 1 (no separator, preserves narration). captions_sidecar default-on emits an SRT artifact for YouTube/Vimeo CC auto-import (PR #425); captions_burn_in:true renders subtitles into every frame via libass (visible always-on). validate:true default-on (PR #432) runs av.validate as a post-step and embeds the structured validation report in the output. ElevenLabs API key from vault elevenlabs-key.
Blog
blog.rewrite_for_audienceLLM{source_content, audience, model, angle?, title?, persona?, max_tokens?}{markdown, persona_used, model} — translate a source doc into an original blog post for a stated audience/angle (not a summarizer). Generator at the heart of the *-rewrite-blog pipelines. Needs a gateway.
blog.publishGhost Admin API + goldmark + LLM{destination, format, title, body OR (prompt+model), tags?, status?, published_at?, host?, credential?, feature_image_artifact_key?, hero_image?, hero_image_prompt?, hero_image_model?}{destination, format, body_source, model_used?, hero_image_model_used?} + ghost: {post_id, url, html_url, status, published_at, feature_image_url?} OR artifact: {artifact_key, size, feature_image_artifact_key?} — publishes to a Ghost blog (live API) or stores rendered markdown/HTML as a helmdeck artifact. Two body modes (agent supplies body OR prompt+model the pack expands via LLM). Feature image is operator-supplied via feature_image_artifact_key OR auto-generated via hero_image:true (v0.12.0 #146); Ghost-mode uploads via /images/upload/ then stamps feature_image. Ghost vault credential ghost-admin-key (id:hexsecret).
Podcast
podcast.generateElevenLabs TTS + ffmpeg + LLM (engine-pluggable){speakers, script OR (prompt+model) OR (source_url/source_text+model), engine?, model_id?, theme?, duration_target_min?, silence_between_turns_ms?, generate_cover_prompt?, cover_image?, cover_image_model?, metadata_model?, cta_style?, language?, validate?}{engine, audio_artifact_key, audio_size, duration_s, speaker_count, turn_count, script_source, model_used?, voices_used, has_narration, theme, cover_image_prompt?, cover_image_artifact_key?, cover_image_model_used?, engagement?, engagement_artifact_key?, validation?, validation_artifact_key} — multi-speaker (1..N) podcast MP3. Three input modes: agent-supplied script, prompt+model (LLM generates dialogue), or long-form content (URL/text → LLM converts). Five themes (interview/debate/news-roundup/deep-dive/solo-essay) bake in podcast best practices. cover_image:true (v0.12.0 #146) auto-generates cover artwork via image.generate. metadata_model default-on (openrouter/auto) emits Apple-Podcasts-shaped engagement metadata (title/subtitle/show_notes_md/chapters/hook_30s/cta); pass "" to disable. validate:true default-on (PR #432) runs av.validate post-concat and embeds the structured validation report. Day 1: ElevenLabs only (vault elevenlabs-key); future engines (PlayHT, Hume.ai, Resemble.ai) slot in via engine field. Silent-fallback when key missing.
AV utilities
av.validateffprobe + libavfilter (silencedetect/blackdetect/freezedetect/ebur128) + python3{video_artifact_key? OR video_path?, audio_artifact_key? OR audio_path?, captions_artifact_key? OR captions_path?, ebur128_target?, skip_checks?, strict?}{validation: {checks[], passed, failed, warnings, all_passed}, validation_artifact_key} — structured AV-artifact validator (PR #430). 13-check set: faststart, codec pin, bitstream decode, packet contiguity, RMS sweep, LUFS, silence/black/freeze runs, audio↔video duration parity, SRT format. Severity model: fail (matches a shipped bug fix) / warn (soft heuristic) / pass. Default soft-surface — checks fail land in the validation field, pack returns success; pass strict:true to surface fail-severity failures as a typed CodeArtifactFailed (CI publish-gate use case). Default-on as a post-step on slides.narrate + podcast.generate (PR #432). See ADR 052.
Image / Stock
image.generatefal.ai sync fal.run (engine-pluggable){prompt, engine?, model?, image_size?, num_images?, seed?, credential?}{image_artifact_key, image_size, engine, model_used, prompt_used, seed_used?, image_artifact_keys?} — text → image. Day 1: fal.ai only (vault fal-key, HELMDECK_FAL_KEY); default model fal-ai/flux/schnell (~$0.003/image, 1-3s). 1-4 images per call. engine field reserved for Replicate as a community PR. Hard-fails when credential missing.
stock.searchPexels API + vault{query, count?, orientation?, size?, color?}{photos[{artifact_key, photographer, photographer_url, source_url, width, height, alt_text}]} — real (non-AI) stock photos. Same chained-input contract as image.generate. Vault pexels-key (or HELMDECK_PEXELS_API_KEY).
Video (HyperFrames)
hyperframes.composeLLM{description, aspect_ratio?, audio_url?, model}{composition_html} — generate a HyperFrames composition (canvas + GSAP scaffolding) from a plain-language description. Feed composition_html to hyperframes.render. Needs a gateway.
hyperframes.renderheadless Chromium + ffmpeg{composition_html, resolution?, aspect_ratio?}{video_artifact_key, video_size, duration_s, has_audio} — render an HTML/CSS/JS composition into a deterministic MP4. Short-form only (≤12 min @ 1080p, 512 MiB cap). Async: true.
Document
doc.ocrTesseract{image_path}{text}
doc.parseDocling{source_url OR source_b64+filename, formats?, do_ocr?, ocr_lang?}{source, markdown, text?, html?, status, processing_time} — requires HELMDECK_DOCLING_ENABLED=true
Desktop
desktop.run_app_and_screenshotXvfb + xdotool{command, args?}{artifact_key} + PNG artifact
(desktop REST primitives)xdotool / scrot / ffmpegT807f: 15 endpoints under /api/v1/desktop/ — screenshot, click, type, key, launch, windows, focus, double_click, triple_click, drag, scroll, modifier_click, mouse_move, wait, zoom + agent_status for noVNC witness mode. Used by vision.* native tool-use path.
Vision
vision.click_anywherescreenshot + LLM (native tool-use for Anthropic/OpenAI/Gemini; JSON-prompt fallback for Ollama/Deepseek){goal, model, max_steps?}{completed, steps, final_action} — T807f: uses provider-native computer-use tool schema when available, per-step screenshot artifacts uploaded for replay
vision.extract_visible_textscreenshot + LLM{model}{text, model}
vision.fill_form_by_labelscreenshot + LLM{model, fields{label: value}, max_steps?}{completed, fields_filled, steps}
Language
python.runPython sidecar{code}{stdout, stderr, exit_code}
node.runNode sidecar{code}{stdout, stderr, exit_code}

Session? = requires a sidecar container. Packs with use _session_id for session pinning across chained calls.

Session pinning

Packs that need a session container can be chained via the _session_id field:

1. repo.fetch → returns {session_id, clone_path}
2. fs.list {clone_path, _session_id: "<from step 1>"}
3. fs.read {clone_path, path: "README", _session_id: "<from step 1>"}
4. fs.patch {clone_path, path: "README", search: "old", replace: "new", _session_id}
5. git.diff {clone_path, _session_id}
6. git.commit{clone_path, message: "fix", all: true, _session_id}
7. repo.push {clone_path, credential: "github-token", _session_id}

repo.fetch sets PreserveSession: true so its session persists for follow-on packs. All other session packs terminate their session on return unless _session_id pins to an existing one. Abandoned sessions are cleaned up by the watchdog after the default 5-minute timeout.

Credential handling

Packs that access external services use vault-stored credentials via the credential field:

  • SSH packs (repo.fetch/repo.push with SSH URLs): auto-resolve from vault by host match
  • HTTPS packs (repo.fetch/repo.push with HTTPS URLs): pass "credential": "github-token" to name a vault entry
  • GitHub packs: default to vault entry github-token if it exists; work without auth for public repo reads
  • HTTP fetch: use ${vault:NAME} placeholder syntax in headers/body — the control plane substitutes before sending
  • ElevenLabs TTS (slides.narrate): reads vault entry elevenlabs-key at handler time. When missing, video renders with silence instead of narration. Add via the Vault panel → Name: elevenlabs-key, Type: api_key, Host: api.elevenlabs.io

Artifact handling

Packs that produce files (screenshots, PDFs, OCR source images) upload them to the S3-compatible artifact store (Garage). The response includes:

  • artifact_key — the storage key (e.g. browser.screenshot_url/abc123-screenshot.png)
  • A signed URL for download (expires in 15 min)

The Artifact Explorer panel at /artifacts in the Management UI lists all artifacts with inline image preview and download.

For MCP clients: when the artifact is an image under 1 MB, the MCP response includes a type: "image" content block with base64-encoded bytes (T302b) so vision-capable LLMs can see the screenshot in one round trip.

Gateway-gated packs

10 of the 57 packs require an AI gateway (a configured chat-completion provider). Without one, the binary registers 47 packs and these are absent: vision.click_anywhere, vision.extract_visible_text, vision.fill_form_by_label, web.test, research.deep, content.ground, slides.outline, blog.rewrite_for_audience, hyperframes.compose, slides.narrate. The newest pack, av.validate, has no gateway dependency (ffprobe + libavfilter + python3 are baked into the sidecar image).

Beyond the built-ins, operators can register cmd.* subprocess packs (HELMDECK_COMMAND_PACKS_DIR) and install community packs from the marketplace (helmdeck pack install <name>); both appear in tools/list at runtime.

Source files

All packs live in internal/packs/builtin/. Registration happens in cmd/control-plane/main.go:

FilePacks
route.gohelmdeck.route
plan.gohelmdeck.plan
memory_store.gohelmdeck.memory_store
memory_forget.gohelmdeck.memory_forget
browser_interact.gobrowser.interact
screenshot_url.gobrowser.screenshot_url
scrape_spa.goweb.scrape_spa
web_scrape.goweb.scrape
webtest.goweb.test
research_deep.goresearch.deep
content_ground.gocontent.ground
doc_parse.godoc.parse
fs_packs.gofs.*, cmd.run, git.*
repo_fetch.gorepo.fetch
repo_map.gorepo.map
repo_push.gorepo.push
swe_solve.goswe.solve
http_fetch.gohttp.fetch
email_send.goemail.send
image_generate.goimage.generate
stock_search.gostock.search
github.gogithub.* (incl. get_issue, create_pr)
slides_outline.goslides.outline
slides_render.goslides.render
slides_narrate.goslides.narrate
slides_notes.go(speaker notes parser for slides.narrate — not a pack)
blog_publish.goblog.publish
blog_rewrite_for_audience.goblog.rewrite_for_audience
podcast_generate.gopodcast.generate
av_validate.goav.validate
hyperframes_compose.gohyperframes.compose
hyperframes_render.gohyperframes.render
doc_ocr.godoc.ocr
desktop_run_app.godesktop.run_app_and_screenshot
vision_packs.govision.*
python_run.gopython.run
node_run.gonode.run

Architectural decisions behind helmdeck's pack model and per-pack design: