Helmdeck — Release Plan ("What Ships When")
Forward-looking changelog. Each release maps 1:1 to a phase milestone (MILESTONES.md) and has hard exit criteria pulled from TASKS.md.
Agent sync checklist — every release
Helmdeck ships its agent instructions as a native OpenClaw Skill at skills/helmdeck/SKILL.md, stamped with the helmdeck commit hash in its frontmatter (metadata.openclaw.helmdeckVersion). The stamp is how operators detect drift between their deployed agent and the latest release.
Every release — required:
-
Update the pack count and decision tables in
skills/helmdeck/SKILL.mdif this release adds/removes packs, changes an error code, or revises a pattern (e.g. therepo.fetchsignals table). When a release adds a pack or pipeline, also add its prompt template todocs/reference/prompt-templates/(packs.md/pipelines.md; copy the shape from_template.md). -
Bump the
helmdeckVersionstamp —scripts/configure-openclaw.shregenerates this automatically fromgit rev-parse --short HEADat install time, so you don't edit it by hand. Ensure the release commit lands onmainbefore operators run the configure script, otherwise the stamp reflects a stale pointer. -
Call out new packs in the release notes under "Ships" with their full
helmdeck__<name>MCP prefix, so operators (and agents reading the release notes post-fact) know what's new. -
Tell deployed operators to refresh:
cd /path/to/helmdeck && git pull./scripts/configure-openclaw.sh # reinstalls the versioned SKILL.mdThe script is idempotent; re-running it without other flags will only touch the skill, the JWT (if expiring), and the model pin.
-
Document upstream regressions — if OpenClaw itself ships a breaking change between our tested versions and the current one, add a row to the table in
docs/integrations/openclaw-upgrade-runbook.mdpointing at the affected version range and the workaround. -
Refresh the README + cost-positioning numbers —
README.mdopens with time-stamped prose ("Today's helmdeck install ran a full 6-step code-edit loop … for $0.07") and a four-row cost-comparison table. The same numbers live in the long-formdocs/explanation/why-helmdeck.mdand the cost-positioning blog post atwebsite/blog/2026-05-08-cheap-models-do-frontier-work.md. On a release that meaningfully changes pack performance, the chat model recommendation, or the OpenRouter pricing landscape:- Re-run the 5 reproduction workflows from
docs/explanation/why-helmdeck.md§"Run the comparison yourself" against the new pack set. - Update the comparison table in all three places (README, long-form explanation, blog post) so the numbers don't drift between them.
- Either revise the time-stamped prose at the top of the README to reflect the new release, or — if numbers haven't moved meaningfully — leave it but add a "Last verified: vX.Y on YYYY-MM-DD" footer line so readers know the cited workflow is fresh enough.
On a release that does NOT change agent-side performance, the cost numbers are stable enough to skip this step; only update if you'd otherwise be overstating the gap.
- Re-run the 5 reproduction workflows from
-
Operator upgrade procedure — every release MUST be cleanly upgradable from the prior tag without operator data loss or extended downtime. Verify before tagging:
- The procedure in
docs/howto/upgrade-helmdeck.md§"In-place Compose-stack upgrade" runs cleanly against a fresh checkout:git checkout v<new>; make sidecars; make installproduces a healthy stack internal/store/migrations/has any new migrations needed for new tables/columns. Auto-applied viastore.Openon next startup (no manualmigrate uprequired), but the migration file MUST be additive — noDROP COLUMNorALTER TABLE … RENAMEthat would break a vbinary trying to read the same DB - If a release introduces a destructive schema change, flag it under
### BreakinginCHANGELOG.mdAND link from the upgrade howto's §7 "Version-specific notes" table - Pack-input-schema changes that drop a previously-required field, or change a closed-set value, are also
### Breaking— agents written against the old schema will error - Post-tag, smoke-test the upgrade against a snapshot of v
's helmdeck.db(a manual cross-version run; the automated CI smoke is tracked at the Phase 7 audit issue list under "upgrade smoke-test in CI")
- The procedure in
-
Re-publish to the official MCP Registry — automated via
.github/workflows/mcp-registry.yml. The workflow fires on everyv*tag push: it pulls the tag's version into.mcp/server.json, schema-validates the document, authenticates toregistry.modelcontextprotocol.iovia GitHub OIDC (no PAT needed — the workflow'sid-token: writepermission is enough), and publishes. Watch the run; the workflow summary prints the live listing URL. Downstream aggregators (mcp.so, Glama, PulseMCP) ingest within 24h.If the workflow fails or you need to re-publish without cutting a new tag, two fallback paths:
workflow_dispatch— go to the Actions tab → "Publish to MCP Registry" → "Run workflow" with an optionalversion_overrideinput- Local script —
scripts/publish-to-mcp-registry.shbuilds the publisher locally, runs interactive GitHub OAuth, and publishes from a maintainer shell. Useful if the GitHub Actions OIDC path breaks for any reason.
-
Refresh the model tier table (ADR 051 PR #5) — have any new models shipped to OpenRouter or any of the configured providers since the last release? If yes:
- Scan
https://openrouter.ai/api/v1/modelsfor additions in the provider families helmdeck already supports (anthropic/,openai/,google/,deepseek/,moonshotai/,tencent/,meta-llama/,mistralai/, etc.). - Calibrate the most relevant ones via
scripts/calibrate-model.sh— run each twice and treat single-run results as suggestive only (free-route reliability is noisy). - Methodology and tier-selection rules are documented in
docs/howto/calibrate-model-tiers.md. - Open a docs PR adding the new tier entries to
internal/llmcontext/budgets.gowith the source-of-classification trailing comment. The PR should be a 10-minute review.
This step is intentionally manual — there's no helmdeck cron job watching provider catalogs. The tradeoff is: the maintainer who runs the release also notices when their fallback chain has new options worth investigating.
- Scan
Related:
- OpenClaw upgrade runbook — the operator-facing sync procedure
- ADR 025 — MCP client integrations — architecture decision record; the §2026-04-18 revision covers CLI vs chat-UI regression policy
skills/helmdeck/SKILL.md— the canonical agent skill file (source of truth)
v0.1.0 — Core Infrastructure (Week 4)
Theme: "A browser session is one REST call away."
Milestone: v0.1 — Core Infrastructure (Phase 1) · Tasks: Phase 1
Ships
- Go control plane binary (Gin + chromedp + Docker SDK)
- Browser sidecar image with Chromium, Marp, Tesseract, ffmpeg, xdotool, Xvfb, XFCE4, noVNC
- Ephemeral session lifecycle:
POST /api/v1/sessions…DELETE /api/v1/sessions/{id} - CDP REST endpoints: navigate, extract, screenshot, execute, interact
- JWT bearer auth on every endpoint
- Audit log (write-only at this stage)
- Single-node Compose deployment (
deploy/compose/compose.yaml) make smokeend-to-end harness in CI
Does NOT ship
- AI gateway, packs, MCP, vault, UI, Kubernetes — all later
Audience
Internal only. Tag a pre-release on GitHub.
v0.2.0 — AI Gateway & First Packs (Week 8)
Theme: "Capability Packs are real, and weak models can drive them."
Milestone: v0.2 — AI Gateway & Pack Substrate (Phase 2) · Tasks: Phase 2
Ships
- OpenAI-compatible
/v1/chat/completionsand/v1/models - Provider adapters: Anthropic, Gemini, OpenAI, Ollama, Deepseek
- Encrypted key store with rotation API
- Fallback routing rules (rate-limit / error / timeout triggers)
- Pack Execution Engine with input/output schema validation
- Typed error code enforcement (closed set per pack)
- Pack registry with versioned dispatch
- Three reference packs:
browser.screenshot_url,web.scrape_spa,slides.render - Object store integration + signed-URL artifacts
- A2A Agent Card at
/.well-known/agent.json
Hard exit gate
≥90% success rate on browser.screenshot_url and web.scrape_spa against MiniMax-M2.7 and Llama 3.2 7B. This is the defining metric of the platform — without it nothing else matters.
Audience
Design partners. Public alpha tag.
v0.3.0 — Bridge & Client Integrations (Week 10)
Theme: "Register one MCP server, get every helmdeck pack."
Milestone: v0.3 — MCP Bridge & Client Integrations (Phase 3) · Tasks: Phase 3
Ships
- MCP registry with stdio/SSE/WebSocket transports
- Built-in MCP server auto-derived from the pack catalog
helmdeck-mcpbridge binary distributed via:- Homebrew tap
tosin2013/helmdeck - Scoop bucket
tosin2013/helmdeck - npm
@helmdeck/mcp-bridge(withnpxpostinstall) - OCI image
ghcr.io/tosin2013/helmdeck-mcp - GitHub Releases (cosigned)
- Homebrew tap
- CI smoke matrix verifying
browser.screenshot_urlfrom Claude Code, Claude Desktop, OpenClaw, Gemini CLI
Audience
Public beta. First "helmdeck works with my agent" demo video.
v0.4.0 — Desktop & Vision (Week 13)
Theme: "Beyond the DOM."
Milestone: v0.4 — Desktop & Vision (Phase 4) · Tasks: Phase 4
Ships
- Desktop Actions REST API (xdotool/scrot)
desktop.run_app_and_screenshot,doc.ocr- Vision-mode endpoint
POST /api/v1/sessions/{id}/vision/act - Reference vision packs:
vision.click_anywhere,vision.extract_visible_text,vision.fill_form_by_label - noVNC live viewer endpoint
Audience
Public beta continues.
v0.5.0 — Vault & Repo Packs (Week 16)
Theme: "Agents stop holding secrets."
Milestone: v0.5 — Vault, Repo Packs & Hardening (Phase 5) · also covers v0.5.5 — Code Edit Loop · Tasks: Phase 5, Phase 5.5
Ships
- AES-256-GCM Credential Vault with placeholder-token injection
- Vault types: login, session cookies, API keys, OAuth (with refresh), SSH/git
- CDP cookie injection at session start
- HTTP gateway intercept-and-substitute for outbound agent traffic
repo.fetchandrepo.push(closes the canonical 2026-04-06 git-SSH failure)web.login_and_fetch,web.fill_form,slides.video(vault-dependent packs)- NetworkPolicy egress allowlist + metadata IP / RFC 1918 block
- Sandbox baseline: non-root, drop-all-caps, seccomp
- OpenTelemetry GenAI semantic conventions on every span
- Trivy CRITICAL gate in CI
Audience
Production design partners. Hardening RC.
v0.6.0 — Management UI (Week 20)
Theme: "Operators close the weak-model gap themselves."
Milestone: v0.6 — Management UI (Phase 6) · Tasks: Phase 6
Ships
- React/Tailwind/shadcn UI embedded in Go binary
- All read-only panels: Dashboard, Sessions, AI Providers, MCP Registry, Capability Packs, Security Policies, Credential Vault, Audit Logs, Connect Clients
- Model Success Rates section on the AI Providers panel (per-(provider, model) rollup over a configurable window, backed by the new
provider_callsaggregation table written by every gateway dispatch) - "Connect" panel emitting per-client MCP config snippets for Claude Code, Claude Desktop, OpenClaw, Gemini CLI, and Hermes Agent
Deferred from v0.6.0
- Pack Authoring (T608) — moved to v1.x (Phase 8) and clustered with T801 (WASM Executor). The pack registry is in-process today and has no publish surface; building one requires either landing a sandboxed code runtime first (WASM) or a composite-pack JSON runtime. Neither is on the v0.6.0 critical path. Operators observe and dispatch packs in v0.6.0; they author them in v1.x.
Audience
Public beta — full self-service for everything except authoring custom packs.
v0.8.0 — MCP Server Hosting & Pack Evolution (Phase 6.5) — ✅ Shipped 2026-04-12
Theme: "Host third-party agent infrastructure instead of rebuilding it."
Milestone: v0.8 — MCP Server Hosting & Pack Evolution (Phase 6.5) ✅ Tasks: Phase 6.5 — see
docs/TASKS.md
Ships (36 packs total at the v0.8.0 cutover)
- Playwright MCP bundled in the browser sidecar (T807a) — auto-attached to the running Chromium via CDP; one browser, one cookie jar, shared state with chromedp packs.
- Firecrawl as an optional compose overlay (T807b) —
compose.firecrawl.yml; newweb.scrapepack returns clean markdown. - Docling as an optional compose overlay (T807c) —
compose.docling.yml; newdoc.parsepack supersedesdoc.ocrfor layout/tables. - Native computer-use tool routing (T807f, supersedes T807d) — Anthropic / OpenAI / Gemini schemas wired through the gateway; eight new desktop REST primitives;
vision.StepNativecross-provider executor;EventComputerUseaudit + replay. web.test(T807e) — natural-language browser testing via Playwright MCP accessibility tree; egress-guarded mid-test navigations.research.deep(T622) — Firecrawl-backed research composite (search + per-source scrape + LLM synthesis with inline citations).repo.fetchcontext envelope +repo.map(T622a) — agents orient on the first turn without chainingfs.list/fs.read; ctags-derived structural symbol map under a token budget.content.ground(T623) — link grounding for blog posts; verbatim-substring patching skips hallucinated claims.slides.narrate(T406, moved from Phase 4) — narrated MP4 from Marp decks via ElevenLabs TTS + ffmpeg + LLM-generated YouTube metadata.- Provider-adapter community contributions — Groq (PR #45) and Mistral (PR #47) adapters land alongside (T202a).
Hard exit gate (met)
scripts/validate-phase-6-5.sh passes against a fresh stack including the Firecrawl + Docling overlays; native computer-use round-trip works against at least one frontier provider; 36 packs total.
Audience
Public beta continues. Tag v0.8.0 (shipped 2026-04-12). Sets up Phase 7 (Kubernetes & GA) as the next gate.
v0.9.0 — Polish + plumbing (Phase 6.5+) — ✅ Shipped 2026-05-07
Theme: "Tighten what shipped before adding more."
Milestone: Continuation of v0.8 / Phase 6.5 — no new milestone created. Aggregates 70 commits of post-v0.8.0 hardening.
Ships
No new packs. No API changes. The 36-pack catalog from v0.8.0 stays the surface area. Operationally: a real install fix, public docs site at helmdeck.dev, two community-contributed AI provider adapters (Groq, Mistral), gitleaks secret scanning, the planning-doc cross-references that were documented-but-not-implemented at v0.8.0, and the priority-label taxonomy (priority/P0..P3) on every issue.
See the full per-section breakdown in CHANGELOG.md v0.9.0.
Audience
Existing v0.8.0 operators. A direct upgrade — git pull && make install picks up everything (the install fix is the highest-value change for fresh deploys; existing deploys can ignore it).
v0.10.0 — Content packs (Phase 6.5+) — ✅ Shipped 2026-05-09
Theme: "Two new packs (blog + podcast), the cost story, and an upgrade procedure."
Repurposed slot. The originally-planned v0.10.0 (Pack Authoring + Test Runner) didn't ship this cycle —
blog.publishandpodcast.generatewere ready, plus the v0.9.0 → v0.10.0 doc work earned the version bump on its own. The Pack-Authoring + Test-Runner plan moves to v0.11.0 below.
Ships
blog.publish(#68) — Ghost Admin API + artifact-store destinations × body/prompt modes × markdown/html formats. Vault credentialghost-admin-key. Closes the personal-content marketplace seed.podcast.generate— multi-speaker (1..N) MP3 from script / prompt+model / source_url-or-source_text. Five themed system prompts (interview, debate, news-roundup, deep-dive, solo-essay) bake in podcast best practices. Day 1 ships ElevenLabs behind apodcast.Engineinterface ininternal/podcast/so future PRs add PlayHT / Hume.ai / Resemble.ai by adding one file. Vault credentialelevenlabs-key(same asslides.narrate); silent-fallback when missing.- 38 per-pack reference pages at helmdeck.dev/reference/packs — every shipped pack on the agent-first / developer-second template, with live OpenClaw chat-UI transcripts.
- OpenClaw transcript capture pipeline at
scripts/oc-capture/—capture-oc.sh,capture-batch.sh,extract-oc-transcript.py,inject-transcripts.py, plus prompt files for the three pack-doc clusters. - Cost-positioning blog (
/blog/cheap-models-do-frontier-work) + long-form why-helmdeck reference (/explanation/why-helmdeck) with five comparison tables and a reproduction recipe. - Operator upgrade documentation at
/howto/upgrade-helmdeck— pre-flight checklist, in-place Compose path, schema-migration handling, post-upgrade validation, rollback, Helm-path preview. Closes the upgrade-docs gap that was the maintainer's blocker for v1.0 prep. - SKILLS.md "Freshness contract" + per-client "Load the agent skills" subsections for every integration doc.
- Per-release-checklist additions — step 6 (refresh README + cost numbers), step 7 (operator upgrade procedure smoke).
Fixed (highlights — full list in CHANGELOG.md)
vision.click_anywheremechanical loop bug (#102) — per-step screenshots now reflect post-action state. Caveat: model-side completion-detection limitation remains; tracked at #112. Treat both vision packs as experimental for production.repo.fetchempty-remote infinite hang (#94)fs.patchAnthropic-edit-shape rejection (#90)doc.parseformats: "markdown"rejection (#91)- OpenClaw capture pipeline cross-prompt context bleed — fresh
--session-idper call (#97)
Pre-Kubernetes audit issues filed (no v0.10.0 blockers)
- #108 — schema-migration cross-version test (P1, Phase 7)
- #109 — sidecar version pinning (P2, Phase 7)
- #110 — vault master-key rotation (P2, Phase 7)
- #111 — cross-version upgrade smoke in CI (P2, Phase 7)
- #112 —
vision.click_anywheremodel-side convergence research (P2)
Audience
Production design partners + community.
v0.10.2 — MCP Resources + registry description refinement — ✅ Shipped 2026-05-09
Theme: "Browse helmdeck state as MCP resources, not just tools."
Closes #44. Adds
resources/list+resources/readso MCP clients can browsehelmdeck://packsandhelmdeck://sessionsas read-only resources alongside the existingtools/*surface. Strictly additive — no breaking changes.
Ships
- MCP Resources spec implementation —
resources/listreturnshelmdeck://packs(always) andhelmdeck://sessions(when a session runtime is wired);resources/readserves both as JSON. Theinitializecapabilities advert now includesresources: {}. - Refined registry description — "Self-hosted MCP server: sandboxed browser, desktop, vision, code-edit packs for any agent." (was a 38-pack feature list). Leads with the value prop + self-hosted differentiator.
- Registry submission script + workflow doc fixes — point at the search API URL instead of the broken
/servers/<name>web URL (registry is API-only in preview).
Audience
Same as v0.10.1 — production design partners + community. MCP-client builders who want a browsable resource surface; everyone else can skip.
Out of scope (deferred follow-ups)
- JWT scope filtering on resources (full #44 acceptance criteria item)
- Per-MCP-client integration tests for resource discovery
v0.10.1 — MCP Registry namespace verification — ✅ Shipped 2026-05-09
Theme: "Make the published artifacts pass the MCP Registry's namespace-verification checks."
Functionally identical to v0.10.0. No pack/API/binary behavior changes. This release exists solely to add two pieces of metadata the official MCP Registry's validators need to confirm we own the
io.github.tosin2013/helmdecknamespace. Existing v0.10.0 installs do not need to upgrade unless they specifically want the registry-listed install path.
Ships
mcpNamefield on the npm package —@helmdeck/mcp-bridge@0.10.1'spackage.jsonnow declares"mcpName": "io.github.tosin2013/helmdeck". The npm validator reads this to confirm the package belongs to the registered namespace.io.modelcontextprotocol.server.namelabel on the OCI image —ghcr.io/tosin2013/helmdeck-mcp:0.10.1now carries the label. The OCI validator reads this to confirm namespace ownership..github/workflows/mcp-registry.ymlauto-publishes.mcp/server.jsontoregistry.modelcontextprotocol.ioon everyv*tag push (also supportsworkflow_dispatchfor ad-hoc runs). Authenticates via GitHub OIDC — no PAT required.
Live registry entry
io.github.tosin2013/helmdeck published to the official MCP Registry as of 2026-05-09T17:13Z, status active, both packages (npm + OCI) registered. Verify via the search API:
https://registry.modelcontextprotocol.io/v0/servers?search=io.github.tosin2013%2Fhelmdeck
Downstream aggregators (mcp.so, Glama, PulseMCP) ingest from the official registry on a 1–24h schedule and will appear automatically.
Audience
Same as v0.10.0 — production design partners + community. Skip this release unless you need the registry-listed install path.
v0.11.0 — podcast/slides UX hardening + image generation — ✅ Shipped 2026-05-10
Theme: "The new content packs work — now their first-run UX matches."
Closes #136, #137, #138, #140, #141, #142, #143, #145, and ships the new
image.generatepack (#71). Adds thehelmdeck://voicesMCP resource. Closes #139 + #144 as duplicates. Defers #146 (chained image-gen integrations) to a follow-up release.
A coherent feature release driven by 9 issues filed during a v0.10.2 OpenClaw integration: silent MP3s when the credential name was wrong, hardcoded /root/openclaw paths, blocking Go preflight on the docker-only path, no voice discovery, no cost preview. The vault env-hydrate fix (#142) is the load-bearing piece — it root-causes the silent-fallback class of bug, not just the ElevenLabs instance.
Ships
image.generatepack (#71) — text → image via fal.ai's synchronousfal.runendpoint. Default modelfal-ai/flux/schnell(~$0.003/image, 1-3s). 1-4 images per call. Theengineinput field is reserved for a follow-up community PR to add Replicate. Vault credentialfal-key(withHELMDECK_FAL_KEYenv-var fallback, auto-hydrated via #142).- Vault env-hydrate (#142) —
WellKnownEnvCredentialsregistry auto-importsHELMDECK_*_API_KEYenv vars into the vault under their canonical names at startup. Newvault.Store.UpsertByName. Wildcard ACL granted on first create; user-managed entries never clobbered. One INFO log per hydration (vault env hydrate ok name=elevenlabs-key). podcast.generate+slides.narraterequire narration by default (#138) — pre-this-change, missing the ElevenLabs credential silently produced a silence-padded artifact. Now both packs hard-fail withmissing_credential+ an actionable message. Passallow_silent_output: trueto opt back into the silent path. Shared 4-step credential resolver: explicit input → vaultelevenlabs-key→ vaultelevenlabs-api-key(back-compat alias) →os.Getenv("HELMDECK_ELEVENLABS_API_KEY").helmdeck://voicesMCP resource (#143) — exposes the operator's ElevenLabs voice catalog with 1h cache keyed on credential fingerprint. Newinternal/voices/package withListVoices(ctx, apiKey) → []Voice.min_turn_duration_sper-turn floor (#141) — both packs gain the input (default5s); short TTS turns get padded withanullsrcso output respects the floor.0opts out.dry_run+ cost preview (#145) — both packs gaindry_run:bool; short-circuits before TTS, returnstts_chars+estimated_cost_usd. Cost block also included in regular responses. Plan rate table covers Free/Starter/Creator/Pro/Scale; override viaHELMDECK_ELEVENLABS_RATE_PER_CHAR_USD.slides.narrateffmpeg failure surfaces full stderr (#140) — inline cap raised 512 → 4096 bytes; full stderr persisted to artifact store asffmpeg-stderr-segment-NNN.txt.scripts/install.sh--no-buildfix (#136) — Go preflight skipped when--no-buildis set; unblocks the docker-only path on hosts with apt-default Go 1.22.scripts/configure-openclaw.shpaths + auth (#137) — newOPENCLAW_COMPOSE_FILEenv override;OPENCLAW_LOAD_SHELL_ENV=truerecognized so the auth-list probe doesn't false-positive.
Audience
Operators integrating helmdeck with OpenClaw or running the content packs (podcast.generate, slides.narrate); anyone wanting image.generate for podcast covers / blog hero images. The credential fail-loud change (#138) is a behavior break — silent-fallback callers must add allow_silent_output: true to keep working. Strictly additive otherwise.
Out of scope (deferred follow-ups)
- #146 — chain
image.generateintopodcast.generate.cover_image/slides.narrate.shield_image+slide_images/blog.publish.hero_image. The pack lands in this release; the integration layer on top of it lands later. - Voice-id pre-validation in
podcast.generate/slides.narrate— currently agents discover voices viahelmdeck://voicesand pass the IDs verbatim; future work could pre-validate at handler entry and returninvalid_voicesynchronously. speakers: {"alice":"auto"}auto-pick mode forpodcast.generate— pick distinct voices automatically with seed for reproducibility.- Replicate engine for
image.generate— flagged as a community-friendly follow-up; theengineinput field is in the schema from day 1 so adding it is a new switch arm rather than a schema break.
MCP Registry
The auto-publish workflow (.github/workflows/mcp-registry.yml) republishes the listing on v* tag push. After tagging, verify at https://registry.modelcontextprotocol.io/v0/servers?search=io.github.tosin2013%2Fhelmdeck (expect version: 0.11.0, isLatest: true).
v0.12.1 — Release-image hot-patch + v0.12.0 reliability bugs — ✅ Shipped 2026-05-13
Theme: "Same-day hot-patch for what v0.12.0 missed."
Four bugs landed within hours of v0.12.0 shipping. The dominant one (#180) is a release-image regression — every fresh docker pull user saw a blank UI. The other three are smaller reliability fixes around the firecrawl overlay and the content.ground pack. All four landed as separate small PRs (#186–189) so each is independently revertible if a regression surfaces. Bundled into v0.12.1 the same day. Planning artefact: /root/.claude/plans/i-would-like-to-elegant-kahan.md.
Shipped
- #180 — release workflow now runs
npm run buildbefore docker image build. The dominant fix.web/dist/assets/is gitignored; CI workflow was building the docker image without ever bundling the Vite output, so the image baked in whatever staleweb/dist/index.htmlwas last committed (referencing asset hashes not present inweb/dist/assets/). Added Node setup +cd web && npm ci && npm run buildstep to.github/workflows/release.yml, plus a verify step that fails the release loud if rebuiltindex.htmlreferences missing assets — defense in depth so this regression can't ship twice. - #181 — firecrawl-rabbitmq healthcheck
start_period: 15s→60s. RabbitMQ's Erlang VM + mnesia init takes 30-60s on alpine cold-boot. Shorter window exhausted retries before health was achievable → container reported unhealthy →helmdeck-firecrawl(correctly waiting viadepends_on: condition: service_healthy) never started → operator had todocker compose upagain. Aligned withfirecrawl-searxng's 60s precedent in the same file. - #179 —
content.groundconfigurable completion-token cap. Hard-coded 1024 was too tight for the structured claim-plan JSON (system prompt + topic + 5-8 claim entries ≈ 750 tokens, leaving ~270 of headroom; weak models or large posts blew through it →CodeHandlerFailed: claim extractor returned unparseable JSON). Default bumped 1024 → 2048; new optionalmax_completion_tokensinput lets operators raise the cap up to 8192. Over-cap rejects withCodeInvalidInput. - #182 —
content.groundfails loud when Firecrawl is unreachable. Per-claim grounding loop was swallowingcallFirecrawlSearchtransport errors silently, producing empty-success "no sources found" output. Now tracksfirecrawlCallsvsfirecrawlErrorsseparately; when 100% of attempted calls hit transport errors →CodeHandlerFailedwith a Firecrawl-reachability message. Partial-success runs preserved.
Hard exit gates (all met)
- ✅
go test ./internal/packs/builtin/... -run ContentGroundgreen (5 new tests) - ✅
make smokeregression-protect v0.12.0 - ✅
docker pull ghcr.io/tosin2013/helmdeck:0.12.1shows assets matching index.html - ✅ MCP-Registry chained publish (PR #177 workflow_run trigger) — validated this release
- ✅ npm
@helmdeck/mcp-bridge@0.12.1published with provenance
Not in v0.12.1 (deferred)
- #183 audit-table columns (
job_id,finish_reason,raw_content_len) — migration + write-path changes; v0.13.0. - #173 / #174 — community good-first-issues, kept open for external contributors.
Concurrent docs/SEO change
- #184 — SKILL.md catalog refresh. Pack count 36 → 39 (added
blog.publish,podcast.generate,image.generatewhich had never been documented in SKILL.md).helmdeckVersionfrontmatter from24bd0c3→v0.12.0. Shipped as PR #190, not part of the v0.12.1 patch bundle. - SEO sitemap trim. Dropped
/blog/tags/*,/blog/archive,/blog/authorsfrom the Docusaurus sitemap (137 URLs → 122) after Google Search Console reported 61 URLs in "Discovered – currently not indexed" with crawl timestamp1969-12-31. Shipped as PR #185.
v0.12.0 — Content-pack image chaining + v1.0 install-path unblocker + pack-authoring MVP — ✅ Shipped 2026-05-12
Theme: "Covers come for free, the install path becomes Kubernetes-ready, and pack-authoring grows up."
Bundled release across four threads that lined up after v0.11.0. Originally framed as Pack Authoring + Test Runner alone; re-scoped during the v0.11.0 retrospective to absorb #146 (unblocked by v0.11.0's #71), #158 (sibling), and #134 step 1 (v1.0 prerequisite). Planning artefact: /root/.claude/plans/i-would-like-to-elegant-kahan.md.
Shipped
- #146 — chain
image.generateinto the three content packs.podcast.generategainscover_image: bool→ emitscover_image_artifact_key.slides.renderandslides.narrategainhero_image_prompt: string→ injects inline base64 PNG (before slide 1 for render; INTO slide 1 for narrate to preserve the per-slide TTS pipeline).blog.publishgainsfeature_image_artifact_key+hero_image: bool— for Ghost, uploads via/ghost/api/admin/images/upload/first then stamps the URL intofeature_image; for artifact-mode, writes a sidecar<slug>-cover.png. All four packs share oneRunImageGenentrypoint (extracted frominternal/packs/builtin/image_generate.goin PR #165's first commit) so chains don't pay for a registry round-trip. - #158 —
helmdeck://image-modelsMCP resource. Mirrorshelmdeck://voices(v0.11.0). 7-model curated catalog: flux/schnell, flux/dev, flux-pro/v1.1, fast-sdxl, flux-realism, recraft-v3, ideogram/v2. Each entry has cost, p50 latency, seed/image-size support, max resolution, capability tags. Newinternal/imagemodelspackage. Also lands the long-overduefal-keyentry inWellKnownEnvCredentials— closes the consistency gapimage_generate.go:74advertised since v0.11.0. - #134 step 1 — unified install paths (P1 v1.0-rc1 unblocker).
deploy/compose/compose.yamlstripsbuild:blocks, pins versioned tags. Newdeploy/compose/compose.build.yamloverlay re-adds them for source-build.scripts/install.sh --image-modeflag pulls pre-built images, skips Go/Node/make preflight. Hosts with only Docker +openssl+curlcan install the full stack. The Helm chart (v1.0-rc1) will reuse the same versioned-tag convention. - T606a MVP — Pack Test Runner UI. Click a pack row in
/packs→ modal with JSON textarea + Submit. POSTs to/api/v1/packs/{name}, renders response (duration, cost hint, full JSON). Closes the "no UI today" gap. Schema-derived form ships v0.13.0. - T811 MVP — subprocess pack type.
packs.NewCommandPack(...)constructor +LoadCommandPacksdir-scanner +HELMDECK_COMMAND_PACKS_DIRwire-up. Pack authors can ship in any language (Python, Node, Bash, Rust) without a Go toolchain. Protocol: stdin = JSON input; stdout = JSON output; exit ≠0 →handler_failedwith truncated stderr.
Hard exit gates (all met)
- Image-mode install works on a clean VM with no Go toolchain. Verified locally; CI smoke leg (
compose-lintjob) validates both compose layouts on every PR. - All four content-pack chains produce valid output end-to-end. ~20 new unit tests cover each chain with stubbed fal.ai/ElevenLabs/Ghost.
helmdeck://image-modelslists 7 models. Verified ininternal/mcp/resources_test.go.- T606a UI can run
image.generateend-to-end. Manual click-through plus full TypeScript strict-mode build green. - T811 example pack round-trips through subprocess with audit-log parity to a Go pack. 17 new tests via the self-exec pattern.
Slipped to v0.13.0
- T606a schema-derived form — JSON Schema → React form (replaces the v0.12.0 MVP textarea)
- T811 manifest format — typed schemas via YAML sidecar (
#173) - T811 egress sandbox — confine subprocess pack network access (
#174) - Marketplace UI / install CLI — bundled with v0.13.0's T810
Slipped to v1.0-rc1
- #134 step 2 — the Helm chart itself
- arm64 sidecar image (still blocked on Marp upstream)
Audience
Operators wanting Kubernetes prep; community contributors who want to write packs without Go; existing users who want covers/heroes for free.
MCP Registry
The auto-publish workflow republishes the listing on v* tag push. Watch for the npm-publish race condition documented in release.yml:118-157 — workflow_dispatch the mcp-registry.yml after npm publish completes if the first run fails with "package not found."
v0.13.0 — Marketplace beta — ✅ Shipped 2026-05-15
Theme: "Discover and install community packs."
Shipped
Marketplace track (the headline):
- T810 catalog endpoint (#219) —
GET /api/v1/marketplace/catalog+POST /api/v1/marketplace/refresh. Fetchesindex.yamlfromHELMDECK_MARKETPLACE_URL(defaulthttps://github.com/tosin2013/helmdeck-marketplace) at boot; failed refresh preserves the previously-cached snapshot. Three URL shapes supported:github.com/<owner>/<repo>, direct raw URLs,file:///for air-gapped operators.HELMDECK_MARKETPLACE_DISABLE=1opts out. - T812 install/uninstall REST (#220) —
POST /api/v1/marketplace/{install,uninstall}+GET /api/v1/marketplace/installed. Hot-load:git clone --depth=1 --filter=blob:nonethe marketplace repo, copypacks/<name>/toHELMDECK_PACKS_DIR, register with the livepacks.Registry— pack appears intools/listimmediately.command-handler packs only in beta;builtin/composite/wasmreject. Lands ADR 038 — marketplace packs route through a dedicatedhelmdeck-sidecar-marketplaceimage (bash + jq + curl + python3 + Node 20) rather than the distroless control plane. - T813
/marketplaceUI panel (#221) — React panel with browse-by-category chips, free-text search, pack-detail dialog with schema preview + worked examples + trust badge, install/uninstall buttons with automatictools/listcache invalidation, unsigned-pack confirmation per ADR 034. NewGET /api/v1/marketplace/packs/{name}returns catalog entry + fullhelmdeck-pack.yamlmanifest on demand (catalog endpoint deliberately doesn't pre-load every manifest). - Marketplace trust verification stage A (#222) — replaces PR #220's structured stub with real deterministic SHA256 content-hash verification. Excludes
helmdeck-pack.yamlfrom the hash (chicken-and-egg). Hard-rejects install on mismatch (removes materialized files). Stage B (full sigstore keyless cosign-verify) deferred to v1.0 hardening. helmdeckCLI binary (#223) — operator-facing CLI wrapping the marketplace endpoints:pack list,pack marketplace [--refresh],pack install <name>,pack uninstall <name>,pack installed. Same env-var conventions ashelmdeck-mcp(HELMDECK_URL+HELMDECK_TOKEN).--jsonfor shell pipelines. Ships via goreleaser alongsidecontrol-plane+helmdeck-mcp. Seedocs/howto/use-the-helmdeck-cli.md.- T814 community marketplace repo —
tosin2013/helmdeck-marketplaceseeded with three packs (cmd.upper,ai.review,gif.make) + maintainer-runscripts/populate-trust-hashes.mjs+ CIvalidate.yml+sign.yml-with---checkgate.
New built-in packs:
hyperframes.render(#200) — HTML/CSS/JS composition → deterministic MP4 via Chromium BeginFrame + ffmpeg using upstreamhyperframesCLI in the newhelmdeck-sidecar-hyperframesimage. Composable sizing:resolution(1080p/4k) ×aspect_ratio(16:9/9:16/1:1) resolves to one of six upstream presets. Mode-free audio: silent compositions produce silent MP4s;<audio src>produces narrated MP4s — chainpodcast.generate→hyperframes.renderby embedding the podcast's presigned URL. Short-form only (≤12 min, 512 MiB cap). Pack count 39 → 40.stock.search(#218) — Pexels-backed stock photo search; downloads top 1-4 results into the artifact store with per-photo attribution metadata. Same chained-input contract asimage.generate— drops straight intoslides.render/slides.narrate/blog.publish/podcast.generate/hyperframes.render. Engine-pluggable;unsplash/pixabayreserved for community PRs. Pack count 40 → 41.
Quality + diagnostics:
slides.rendercontrast guardrails (#216, closes #202) — three-pronged fix: docs + agent skill teaching WCAG-AA 4.5:1; static contrast lint surfacingsection-background-without-nested-overrides+wcag-aa-text-contrastwarnings in the response; two curated embedded Marp themes (helmdeck-dark,helmdeck-corporate) declaring WCAG-AA colors for every nested element.provider_callsdiagnostic columns (#183) —job_id(joins gateway audit to the pack-job that triggered the call),finish_reason,raw_content_len. Migration0005_provider_calls_diagnostics.sqlviaALTER TABLE ADD COLUMN(O(1) metadata-only).- Subprocess pack manifest format (#173) — operator-supplied command packs declare typed I/O schemas + execution overrides via a sibling
<basename>.helmdeck-pack.yaml. Completes the v0.12.0 MVP. New how-to:docs/howto/build-subprocess-pack.md. blog.publishartifact-first refactor (#203) —destinationis now optional, defaults to"artifact". Ghost-targeted calls also save the body as an artifact by default (also_save_artifact: falseto opt out). Ghost failures return a partial-success response (status: "artifact_saved_ghost_failed"+ghost_error+artifact_key) instead of losing the expensive prompt-expanded body.
Architecture decisions captured
- ADR 034 — Pack marketplace — catalog + manifest + trust model + handler types. Written ahead of T810/T812/T813 implementation.
- ADR 037 — Upstream package version management — exact pins + CLI-surface sentinel + Dependabot. Surfaced by the hyperframes-npm-pin incident; now a project-wide discipline.
- ADR 038 — Marketplace pack execution via sidecar — control plane is distroless-static; marketplace packs need bash/jq/python/node; therefore packs route through
helmdeck-sidecar-marketplaceviaec.Execrather than in-processexec.CommandContext.
Slipped to v1.x
- Stage B trust verification — full sigstore keyless cosign-verify of the signer identity. Captures malicious-author-modifying-the-manifest, which stage A doesn't.
hyperframes.renderlong-form (#201) — multi-GB MP4 streaming viaArtifactStore.PutStream. Defers to the v1.x artifact-streaming track.- T606a schema-derived test-runner form — JSON Schema → React form rendering. The v0.12.0 MVP textarea ships in v0.13.0 unchanged; schema-derived form lands later.
- Multi-arch
helmdeck-sidecar-marketplace— amd64 only at v0.13.0; multi-arch follows the base sidecar's track.
Audience
Operators looking for "an existing pack for X" before writing one. Designed to land before K8s so community surface area precedes enterprise surface area.
MCP Registry
The auto-publish workflow republishes the listing on v* tag push. After tagging, verify at https://registry.modelcontextprotocol.io/v0/servers?search=io.github.tosin2013%2Fhelmdeck (expect version: 0.13.0, isLatest: true).
v0.13.1 — Post-v0.13.0 cleanup — ✅ Shipped 2026-05-18
Theme: Bug-cleanup release. No feature changes.
Ships:
- #229 —
deploy/compose/.env.examplemissingHELMDECK_FAL_KEYandHELMDECK_PEXELS_API_KEY - #230 — pexels-key vault auto-hydration missing (CHANGELOG advertises it,
internal/vault/hydrate.godoesn't register it) - #231 —
compose.firecrawl.ymlhealthcheck useswgetagainst an image with neitherwgetnorcurl - #232 —
repo.fetch'sclone_pathinvisible to subsequentfs.*/cmd.run/repo.mapcalls
Out:
- Anything feature-level. Patch-release discipline — same shape as v0.12.1.
Discipline call: #231 is the first to defer if v0.13.1 needs to ship faster than expected — it affects health UI only, not request serving.
v0.13.2 — Hot-patch for v0.13.1 missing control-plane image — ✅ Shipped 2026-05-23
Theme: v0.13.1 shipped without ghcr.io/tosin2013/helmdeck:0.13.1 because the Publish control-plane image job in the Release workflow failed at cd web && npm run build. Dependabot #247 had landed three breaking majors (Vite 6 → 8, TypeScript 5 → 6, lucide-react 0 → 1) between the release branch cut and the tag push, and the CI workflow never builds web/ — only Release does. Goreleaser binaries, the bridge image, and @helmdeck/mcp-bridge@0.13.1 on npm shipped fine; this release closes the asymmetry.
Ships:
- #250 — Vite 8 / TS 6 / lucide-react 1 web build unblock.
manualChunks(Rollup) →codeSplitting.groups(Rolldown);baseUrldropped, newweb/src/vite-env.d.ts;Githubicon →GitBranch.
Out:
- Anything else. Strict hot-patch discipline — same shape as v0.12.1's release-image regression patch.
Follow-ups discovered (not in v0.13.2, will file separately):
- CI gap: the
CIworkflow doesn't buildweb/. Only theReleaseworkflow does, so any breaking change to the web toolchain ships silently until the next tag. Fix: gate every PR tomainon aweb buildstep. - Dependabot ergonomics: #247 grouped 14 deps including three majors and auto-merged. Three majors should not land in one group — re-bucket
web-npmso majors land one-at-a-time.
v0.14.0 — Autonomous code-fix + ADR 037 fully enforced — ✅ Shipped 2026-05-26
Theme: swe.solve headline + close out ADR 037 across every sidecar Dockerfile.
Ships:
- #233 —
swe.solveepic: Phase 1 (HelmdeckEnvironmentadapter, ✅ shipped #265) + Phase 3 (swe.solveGo pack handler, ✅ shipped #271) + Phase 4 (trajectory artifact in Garage S3, with Phase 3) + Phase 6 (GitHub-issue auto-trigger via ADR 033 — label an issue, get a PR; posts the result back as a comment) - #253 — post-install/upgrade integration smoke check via OpenClaw round-trip (✅ shipped #263)
- #212–#215 — ADR 037 fully enforced: dependabot, exact pins, CLI-surface sentinels, docs (✅ shipped #240–#243)
- #248 — ADR 037 follow-up cleanups: drop
marp --stdin, fix--htmlformat spec, pinned globalplaywright-mcpbin in the sidecar entrypoint (✅ shipped #264) - ADR 039 — Universal Memory delivery layer (refines ADR 029): first implementation shipped — pluggable
MemoryStore(SQLite default, AES-256-GCM at rest), theec.Memoryengine seam + namespace model,Context()aggregation (#260), and thegithub.list_issuesread-through cache exemplar (#258). Default-OFF and additive: packs without opt-in and deployments withoutHELMDECK_MEMORY_KEYbehave exactly as before. Tracked in epic #254 (#255/#256/#257/#258/#260). - #259 / ADR 040 — Persistent repos volume + cross-session clone reuse:
repo.fetch(andswe.solve) clone into a per-caller path on a sharedhelmdeck-reposvolume andgit fetchinstead of re-cloning on a repeat, with a persistent per-language dependency cache (.hdcache) and a GC janitor (TTL + size cap). Unblocked by #232. Default-OFF (no volume ⇒ ephemeral/tmpclones); enabled by default in the bundled Compose viaHELMDECK_PERSISTENT_REPOS.
Out:
- Universal memory deferred tiers: Redis-backed Episodic and the pgvector/Semantic tier remain out per ADR 039 (the pluggable
MemoryStoreinterface keeps the door open). The community validation middleware (#268) is the next seam consumer. swe.solveremaining phases: Phase 5 (OTel spans per agent step), Phase 7 (A2A skill exposure via ADR 026), Phase 8 (procedural-memory pack promotion via ADR 029). Phases 7–8 lean on ADRs currentlyStatus: Proposed— premature to commit. (Phase 4 trajectory storage and Phase 6 GitHub-issue auto-trigger landed in this release — ADR 033 was alreadyAccepted.)
Status: the ADR 037 quad (#212–#215) shipped together as planned, with the #248 cleanups completing the enforcement. swe.solve Phases 1, 3, 4, and 6 are in (adapter, pack, trajectory artifact, GitHub-issue auto-trigger), alongside the universal-memory layer (ADR 039) and persistent repos (ADR 040). #232 is resolved.
Blocked by: #232 — Phase 3 of swe.solve requires repo.fetch → fs.* working in a session.
v0.15.0 — Pipelines as a first-class resource — ✅ Shipped 2026-05-26
Theme: A pipeline — a stored, named, ordered sequence of pack steps — becomes a first-class resource any actor can create, run, and inspect. helmdeck stops being only a tool server and starts owning the workflow.
Ships:
- ADR 041 — Pipelines as a first-class resource (runnable slice): a new
internal/pipelinespackage (SQLite-persisted definitions + run history, a sequential runner reusingEngine.Execute,${{ steps.X.output.field }}/${{ inputs.* }}dot-notation templating, automatic_session_idthreading), REST CRUD + async run + run-history at/api/v1/pipelines, andhelmdeck__pipeline-{list,get,create,run,run-status}MCP tools so any connected agent (OpenClaw, Gemini CLI, Claude Code) can build and run pipelines conversationally. - ~13 built-in starter pipelines auto-seeded at startup and runnable out of the box — including
content.ground → slides.render(grounded deck),content.ground → blog.publish(grounded blog),research.deep → {slides,podcast,blog},web.scrape → content.ground → blog.publish, andrepo.fetch → {slides.narrate, podcast.generate}(clone a repo → media about it). Provider-dependent starters degrade gracefully (stable premade voice +allow_silent_output); a starter whose packs aren't registered is skip-and-logged. podcast.generatenow surfaces a presignedaudio_urlin its output (from the artifact store), unlocking a cleanpodcast.generate → hyperframes.rendernarrated-video chain (embed the URL in the composition's<audio src>).- Migration
0007_pipelines.sql(additive:pipelines+pipeline_runstables, auto-applied). - Management UI
/pipelinespanel — pulled forward from v1.2: list built-in + agent-created pipelines, trigger a run with JSON inputs, and watch run status/history poll live (pending → running → succeeded/failed, with per-step status) — operators see what agents build via the MCP tools.
Out (deferred follow-ups, seams in place):
- Cron + webhook pipeline triggers (the runner is HTTP-decoupled — ADR 033's receiver and a future scheduler call the same
StartRun). A2A pipeline-management skill and "promote a successful run from the audit log into a pipeline" follow per ADR 041's sequencing (v1.0→v1.3).
Status: the v0.15.0 slice is the REST + MCP + runner + starters foundation; triggers/UI/audit-promote are explicitly later so the data model lands correct first.
v0.16.0 — Correctness + housekeeping — ✅ Shipped 2026-05-27
Theme: Sharp edges off the pipeline work — grounding stops silently truncating long slide decks, artifacts become deletable on demand, and a new email.send pack lands.
Ships:
content.groundrewrite no longer truncates long documents — the optional full-document rewrite was hard-capped at 2048 output tokens, silently cutting off any input larger than the test fixtures (a 20–25 slide deck lost its back half when run throughbuiltin.grounded-deck). The rewrite budget now scales with the input (cap 8192), a ceiling-hit rewrite (finish_reason: length) is discarded in favor of the structure-preserving citation-only version, and the prompt is told to keep every---separator.grounded_textis now always emitted so pipeline steps referencing it never fail. The deck pipelines (builtin.grounded-deck,builtin.research-ground-deck) now ground withrewrite: false. (#290)- Manual artifact deletion —
DELETE /api/v1/artifacts/{key}+ a delete button in the Management UI Artifact Explorer; previously only the TTL janitor could delete. (#290) email.sendpack (helmdeck__email-send) — send a transactional email via Resend (vaultresend-api-key); 44 packs in-tree. (#289)- Prompt-template reference pages at
/reference/prompt-templates/— a copy-and-fill{{VARIABLE}}prompt for every pack and pipeline. (#288)
Upgrade: no migrations, no breaking changes; grounded_text is an additive output field. Clean in-place Compose upgrade from v0.15.0.
v0.17.0 — Legible, recoverable failures — ✅ Shipped 2026-05-28
Theme: When something goes wrong, an agent (or operator) should be able to tell why and what to do — not re-guess. This release makes model errors and pipeline failures legible and recoverable.
Ships:
helmdeck://modelsMCP resource + caller-fixable model errors (ADR 043) — a model the gateway can't route (e.g.minimax/…, reachable only asopenrouter/minimax/…) now fails withinvalid_inputpointing at the newhelmdeck://modelscatalog, instead of an opaque doubledhandler_failed. Agents pick a real model up front rather than hallucinating one. (#293)- Pipeline failure attribution + re-run (ADR 044, slice 1) — a failed run now carries a
failure_class(caller_fixable/pack_bug→ prefilled GitHub-issue link /transient/state_changed) and a one-linefailure_reason, surfaced in run-status, the MCP tool, and the/pipelinesUI. One-call re-run viaPOST …/runs/{runId}/rerun,helmdeck__pipeline-rerun, and a UI button. (#294) - Pipeline run records list each step's artifacts, and a hermetic end-to-end test runs all 13 built-in pipelines through the runner with stub packs. (#292)
- Docs + blogs: a "When a pipeline fails" how-to,
helmdeck://modelsdocs, and two field-report posts. (#295)
Out (ADR 044 slice 2): resume-from-failed-step and auto-retry of transient failures.
Upgrade: no migrations, no breaking changes (new run/run-step fields are additive JSON). Clean in-place Compose upgrade from v0.16.0.
v0.18.0 — Pipelines you can see and trust — ✅ Shipped 2026-05-28
Theme: The deck/narrate pipelines turn prose into a real multi-slide deck (no more a whole README collapsing onto one slide and rendering a degenerate 7-second video), and the Management UI shows which pipelines are running plus a copy-paste agent prompt for each.
Ships:
slides.outlinepack — restates prose/markdown (a README, aresearch.deepsynthesis,content.groundoutput) as a structured Marp deck (----separated slides with titles, bullets, and<!-- speaker notes -->). Bounded bymax_slides+ a clamped token budget; guarantees a multi-slide deck or failsinvalid_input("content too thin") rather than emitting a degenerate one-slide deck.- Pipelines UI: live "running" indicators + per-pipeline "Copy prompt" button —
/pipelinespollsGET /api/v1/pipeline-runs, shows a pulsing running badge + an "N running" count, and copies a ready-to-pastehelmdeck__pipeline-run …prompt with a fill-in line per declared input.
Changed: the deck & narrate pipelines (grounded-deck, research-deck, research-narrate, research-ground-deck, scrape-deck, repo-readme-narrate) now insert a slides.outline step before rendering, so prose with no --- becomes a genuine multi-slide deck (or fails legibly) instead of a ~7-second silent video reported as succeeded.
Upgrade: non-breaking, additive. Clean in-place Compose upgrade from v0.17.x.
v0.19.0 — Repo presentations worth watching — ✅ Shipped 2026-05-28
Theme: builtin.repo-presentation (replacing repo-readme-narrate) builds a narrated deck from a repo's README plus its docs and code structure — not a paraphrase of the front page.
Ships:
repo.fetchdocsoutput — concatenated markdown/adoc/rst from the repo's doc dirs (docs/,doc/,content/, …) plus top-level design docs (ARCHITECTURE.md,DESIGN.md, …), bounded to 16 KB with a per-file path header (empty when the repo has none). Lets presentation/grounding pipelines ground on a project's real docs.builtin.repo-presentation— chainsrepo.fetch → repo.map → slides.outline → slides.narrate. Samerepo_urlinput; thebuiltin.repo-readme-narrateid is gone.
Upgrade: non-breaking. The repo-readme-narrate pipeline id was removed — switch to repo-presentation.
v0.19.1 — ✅ Shipped 2026-05-28
Fixed: the Pipelines page "Copy prompt" button now works over plain HTTP — navigator.clipboard only exists in a secure context, so on a LAN host served over plain HTTP the button silently did nothing. It now falls back to a hidden-<textarea> + execCommand('copy') and reflects the real result.
v0.20.0 — A more trustworthy agent surface — ✅ Shipped 2026-05-28
Theme: Pipelines reject unfilled {{PLACEHOLDER}} inputs instead of running with them; built-in pipeline descriptions say what the packs actually do; slides.outline guarantees a title slide and gains audience personas; and a new installable helmdeck-debug skill sweeps every pipeline + pack and drafts GitHub issues for what it finds.
Ships:
- Pipeline runs reject unfilled
{{PLACEHOLDER}}inputs with acaller_fixableerror that names the input — instead of silently producing a post titled{{TITLE}}. helmdeck-debugintegration-debugger skill (skills/helmdeck-debug/SKILL.md) — sweeps every pipeline + pack (static checks + a live end-to-end run sweep classified byfailure_class) and drafts a ready-to-file GitHub issue per real bug, confirming before filing. Installed byscripts/configure-openclaw.shand the newscripts/configure-claude.sh.slides.outlineguarantees a title slide + supports personas + an author byline — deterministic title slide whentitleis provided (never duplicated), apersonainput (general/technical/marketing/executive/educationalor freeform), and new outputshas_title_slide+persona_used.
Changed: honest descriptions for the ground/blog built-in pipelines — content.ground cites claims (it does not rewrite into a new voice/structure) and blog.publish saves a markdown artifact by default. Descriptions + prompt-template docs now say so.
Upgrade: non-breaking, additive.
v0.21.0 — Pipelines you can see into, stop, and resize — ✅ Shipped 2026-05-30
Theme: Running runs surface each step's live progress; a Cancel button (+ helmdeck__pipeline-cancel MCP tool + REST) genuinely stops a wedged run by tearing down its session container; the runner auto-cleans runs orphaned by a restart; and CPU-bound packs declare a host-aware compute profile. Plus a new hyperframes.compose pack turns a plain-language description into a HyperFrames composition.
Ships:
- CPU profiles for session packs (ADR 045) — a pack declares
session.ProfileIO(default, 1 core) orsession.ProfileCompute(host-awareclamp(host_cores-1, 1, 6)), tunable viaHELMDECK_IO_CPU_LIMIT/HELMDECK_COMPUTE_CPU_LIMIT.hyperframes.render+slides.narratemigrate toProfileCompute. Seereference/hardware-sizing.md. - Live per-step progress + Cancel — the
/pipelinesUI renders each running step's latest progress inline;POST …/runs/{runId}/cancel+helmdeck__pipeline-cancelhard-stop a run and force-remove every session container tagged with the run id (newhelmdeck.run_idDocker label). Already-terminal runs return409 not_cancellable. hyperframes.composepack + describe-a-video pipelines — turns a plain-language description into a HyperFrames composition (guaranteeing the render contract);builtin.prompt-video(compose → render, silent) andbuiltin.prompt-narrated-video(podcast → compose → render) chain it.podcast.generatenow always emitsaudio_url(empty without a presigned store) so the narrated pipeline degrades to silent instead of failing.
Fixed: docker pull now retries transient failures (3× linear backoff); in-flight pipeline runs orphaned by a control-plane restart are reaped to failed/transient on boot; pipeline MCP tools are advertised bare so namespacing clients resolve them to helmdeck__pipeline-* (was double-prefixed); built-in podcast pipelines default model to openrouter/auto; the podcast.generate double-registration that clobbered the gateway dispatcher is fixed; the runner no longer threads a non-preserved session into later steps.
Upgrade: non-breaking, additive. Clean in-place Compose upgrade from v0.20.0.
v0.22.0 — Agents that work on free models, with memory — ✅ Shipped 2026-06-01
Theme: Close the loop from "helmdeck has 50+ packs" to "an agent on a free model can pick the right ones." Closes four ADRs end-to-end and validates the result against the live free-model failure that motivated the work — the MiniMax M3 launch paste + 3-action ask that empty-completed at 29.5s before any of this work now returns a valid 3-step plan on openrouter/openrouter/free.
Ships:
- ADR 047 — Catalog metadata + memory-driven routing (complete). Self-describing pack/pipeline
metadata(accepts / produces / intent_keywords / typical_use / limitations + supersedes on pipelines);helmdeck://routing-guideprojection; per-caller audit memory writing one row per pack/pipeline run;helmdeck://my-defaultsprojection of the caller's most-used tools withcommon_inputspriors;helmdeck.memory_forgetpack;helmdeck.routeLLM-backed meta-pack with structuredgap_warningfor tools the catalog can't serve; Routing Memory management UI page that surfaces and clears the audit history without needing an MCP-aware client. - ADR 048 — Memory write surface + OpenClaw memory-corpus bridge (complete). Embedding sidecar overlay (
compose.embeddings.yml) that runs an Ollama container for OpenClaw'smemory_searchsemantic recall;helmdeck.memory_storepack +POST /api/v1/memory/storeREST surface so agents can persist durable user facts;helmdeck://my-memoryprojection; QMD-compatible MCP endpoint at/api/v1/mcp/qmd/ssethat bridges helmdeck's per-caller audit + facts corpus into OpenClaw'smemory_searchtool via MCPorter. - ADR 049 PR #1 —
helmdeck.planintent decomposer pack. Multi-intent prompts decompose into orderedsteps[](each{order, tool, args, rationale}) + a derivedrewritten_prompt+ acomplexityclassifier (single-action/pipeline-direct/pack-chain). Pipeline-aware: prefers a curated pipeline over re-decomposing its constituent packs. Self-learning seam: every successful plan writes a compactPlanAuditrow toplan_history. - ADR 050 — Retrieval-augmented tool selection (complete, 4 PRs). PR #1 shipped
internal/llmcontextwith per-model token budgets (Tier A/B/Ccalibrated by empirical structured-output reliability, not vendor specs) + deterministicCompactCatalogmetadata trim. PR #2 wiredhelmdeck.routeto the same cascade, added thehelmdeck://context-budgetsMCP resource, surfaced the Trim record on plan output as an optionalcompactionfield. PR #3 added the cascadingSelect()entry point with lexical retrieval + top-N truncation as the third stage when compaction alone can't reach budget, plushelmdeck://my-plansprojection over theplan_historyaudit category. PR #4 added an optional two-pass LLM filter cascade for the worst Tier C cases plus a JSON-decoder tolerance fix (read first complete JSON object, ignore trailing garbage — that was the critical change that unblocked the original motivating prompt). - Pack + pipeline additions.
hyperframes.compose(description → composition);github.get_issue(single-issue fetch with read-through cache);blog.rewrite_for_audiencepack + four rewrite-blog pipelines (builtin.brief-rewrite-blog,doc-rewrite-blog,scrape-rewrite-blog,research-rewrite-blog);personainput on blog/slide packs;image_prompts+export_outlineon all seven slide pipelines;builtin.grounded-narrate+builtin.grounded-podcast; coding pipelines beta (builtin.issue-to-pr,repo-solve-pr,repo-solve-branch,repo-solve-patch); pipelines page grouped by output format; live per-step progress + pipeline cancel in the UI + via MCP; CPU profiles (ProfileIO/ProfileCompute) for session packs. - Catalog fixes.
doc.parserejects non-document URLs upfront with a routing hint;doc.parseagainst current Docling's discriminatedsourcesshape; auto-split slide overflow for code blocks longer than 22 lines and image+bullets slides. - Pipeline removals. Replaced
builtin.grounded-blog,builtin.scrape-ground-blog,builtin.research-blog,builtin.doc-ground-blogwith the rewrite-blog matrix; startup reaper deletes orphanedbuiltin=1rows on upgrade so operators land on a clean catalog without running SQL.
Pack count: 41 → 52 (added the four meta-packs plus hyperframes.compose + github.get_issue + the rewrite-blog scaffolding).
MCP resources count: 4 → 6 (added helmdeck://context-budgets, helmdeck://my-plans plus the existing routing-guide / my-defaults / my-memory / packs).
Tests: 1446 passing across all internal packages.
Audience: operators running helmdeck on a mix of paid and free models who want multi-action prompts to work consistently regardless of model tier; agents (OpenClaw, Claude Desktop, Claude Code, Gemini CLI, Hermes) that want to plan and route through helmdeck's catalog without re-implementing tool selection.
Out (deferred to a future release):
- ADR 049 PR #2 (
helmdeck://my-plansprojection) — consolidated into ADR 050 PR #3 and shipped here, but the broader self-learning loop (priors from history feeding the lexical ranker's score) remains a follow-up. - ADR 049 PR #3 (frontier-model gap detection via
expert_baseline) — speculative; not on the v1.0 critical path. - Auto-invocation hardening for free models (the agent recognizing it should call
helmdeck.planfirst without explicit user direction) — SKILL.md tips help on frontier models, but free-model tool-selection is a separate problem.
Upgrade: no migrations, no breaking changes (all new packs, resources, and Budget fields are additive). Clean in-place Compose upgrade from v0.21.0. Re-run scripts/configure-openclaw.sh after upgrading to install the v0.22.0-stamped SKILL.md so your OpenClaw agent sees the four new meta-packs and the two new resources.
v0.23.0 — Reliable narrated decks + shared audio/video helper — ✅ Shipped 2026-06-03
Theme: Close the slides.narrate failure surface end-to-end and consolidate every lesson PRs #379–#405 paid for into a single reusable package. For weeks handler_failed: ffmpeg segment N failed (exit 0) was the most-reported single error from operators running narrated-deck pipelines; this release fixes the root causes (Mermaid blank renders, audio dropouts at concat boundaries, silent ffmpeg failures, misclassified OOMs, sessions reaped mid-encode by an inherited 5-minute watchdog, transport errors masquerading as exit 0) AND moves the patterns into internal/avenc/ with 99.3% test coverage so future audio/video packs start from a battle-tested base instead of re-paying for the same lessons.
Ships:
internal/avenc/shared ffmpeg/ffprobe/TTS-validation helpers (3-PR consolidation arc). New package consolidates every audio/video pattern previously duplicated acrossslides.narrateandinternal/podcast.Concat. 10 exported helpers (GenerateSilence,ProbeAudioDuration,PadAudioToMin,ConcatAudio,ConcatVideoMP4s,EncodeVideoSegmentwith OOM-retry built-in,RequireNonEmptyOutput,LooksLikeMP3,ValidateMP3Body,ValidateMP4Streams) plus a sharedExecutortype alias and tuned byte-size floors. Three sequential PRs: #406 created the package with 80 tests at 99.3% line coverage; #407 migratedslides.narrate(567 LOC deleted, 43 added, 37 orchestration tests pass byte-identically); #408 migratedinternal/podcast.Concat(closed the consolidation arc, picked up two upgrades for free along the way —LC_ALL=Cffprobe locale stability andSilenceTurn's previously-known 0-byte-output hole). External research informed two gap-closers the bug history hadn't reached:LC_ALL=Cprefix on every ffprobe and ffprobe-based MP4 stream-presence validation.slides.narratereliability — 6 distinct production bugs closed. #379 OOM-killed ffmpeg now classifies astransientinstead ofpack_bug(exit 137 →CodeResourceExhaustedvia newclassifyShellExitCodeshared helper); #388 resolution normalization translates named presets toWIDTHxHEIGHTbefore ffmpeg sees them + video pipelines no longer hardcode aspect_ratio/resolution; #389 closed-set classifier now admitsCodeResourceExhaustedandCodeCredentialInvalid(PR #379 + PR #381 had been silently coerced toCodeInternal); #390 ffmpeg-threads 4default + adaptive OOM retry (-threads 1 -preset veryfast); #399 marp-rendered PNG validation before ffmpeg encode (catches Mermaid-broken slides with acaller_fixableerror instead ofpack_bug); #400 closed 12-mode silent-failure taxonomy (transport-error honest message, post-encode existence check, PNG magic-byte check, ffprobe NaN/Inf safety, generateSilence post-check, ElevenLabs HTTP-200-wraps-error guard via MP3 sync-word sniff, padSlideAudioToMin coverage); #404 Mermaid pre-rendering via mmdc (parity with slides.render) + concat audio re-encode to eliminate mid-segment AAC frame-boundary dropouts; #405 image_prompt comments no longer spoken aloud by the narrator (allowlist filter on structured-metadata comment prefixes).- Pipeline + session reliability — 5 cross-cutting fixes. #377 tolerant pipeline-template resolver for missing optional inputs (whole-value miss drops the field; embedded miss substitutes empty string); #380 drops JSON field when whole-value
inputs.*ref misses (typed-field fix — bool/number/array no longer reject empty-string substitution); #381 paid-API credential precheck + honesthas_narrationforslides.narrate; #397 pipeline-run single-flight coalescing — duplicate concurrentpipeline-runrequests with same(caller, pipeline_id, inputs)dedupe onto the in-flight run viasha256fingerprint + partial UNIQUE INDEX (migration0008_pipeline_run_fingerprint.sql); #401 pinned-session timeout extension on reuse —Runtime.ExtendTimeout(ctx, id, newTimeout)soslides.narrate's 30-minute Spec.Timeout is no longer silently overridden byrepo.fetch's 5-minute default in shared-session pipelines. - Docker build + Repo paths — 2 housekeeping fixes. #372
isSafeClonePathaccepts ADR 040 persistent clone paths (<PersistentReposPath>/<Caller>/...) — unblocks the entirebuiltin.repo-presentationchain that had been failing at the first downstream consumer ofrepo.fetchoutput; #398 control-plane Docker image builds web bundle inside a Node stage — eliminates the recurring "blank page after rebuild" caused byweb/dist/index.htmlreferencing stale bundle hashes (multi-stage Dockerfile +.dockerignoreexcludes host-sideweb/dist). - ADR 051 routing reliability — 5-PR arc. #368 reasoning-token stripping (
<think>/<reasoning>/[REASONING]) + consolidatedDecodeStructuredResponseJSON-parser surface (plan + route + content.ground all converge on one helper) + 14 new tier entries calibrated from research synthesis (Tier A:openai/o3-mini,google/gemini-2.5-pro/flash,anthropic/claude-3.7-sonnet; Tier B:openrouter/deepseek/deepseek-v4-pro/v3.2/chat,openrouter/x-ai/grok-; Tier C:openrouter/moonshotai/kimi-k2/kimi-,openrouter/tencent/); #367 model-tier calibration tooling (scripts/calibrate-model.sh+docs/howto/calibrate-model-tiers.md); #369 cause-typed empty completions (ErrSafetyFiltered/ErrLengthTruncated/ErrConstrainedDeadlock/ErrLikelyTimeout) +Budgetcapability flags (IsHybridReasoning/WantsStrictJSON/SupportsPrefixCache/CachedInputCostUSDPerMTok); #370 provider-side strict JSON viaresponse_formatongateway.ChatRequest(OpenAIresponse_format, GeminiresponseMimeType, Anthropic ignored, Tier C guard); #371 prefix-cache routing for the catalog block inhelmdeck.planandhelmdeck.route(catalog moves into the SYSTEM prompt onSupportsPrefixCache=trueso two sequential calls share a byte-identical prefix, unlocking the 50-96.7% input-cost discounts the configured providers offer). - Content packs. #362
blog.append_ctapack + opt-in CTA wiring across all four*-rewrite-blogpipelines (brief-rewrite-blog,doc-rewrite-blog,scrape-rewrite-blog,research-rewrite-blog). Newctastep betweencontent.groundandblog.publish— when all link inputs are empty the pack is a strict no-op (no model call); when any link is set it LLM-rewrites a closing CTA in the article's voice using the same persona helper asblog.rewrite_for_audience.
Pack count: 52 → 53 (added blog.append_cta).
MCP resources count: 6 (unchanged).
Tests: 1702 passing across all internal packages (was 1446 at v0.22.0, +256 new).
Audience: operators running narrated deck + podcast pipelines (builtin.repo-presentation, builtin.grounded-narrate, builtin.research-narrate, builtin.*-podcast) — every recurring slides.narrate error class is now either eliminated or surfaces with an honest message and a recovery hint instead of pointing at an imaginary pack bug. Operators authoring new audio/video packs — internal/avenc/ is the canonical import.
Out (deferred to a future release):
slides.narrate.encodeSegmentmigration toavenc.EncodeVideoSegment— blocked on adding a shared stderr-tap hook to avenc so the per-failure artifact-store dump (the production debug feature the localencodeSegmentretains) doesn't regress. Tracked.- Issue #402 —
podcast.generateexplicitSpec.Timeout(currently inherits the 5-minute default; uncomfortable margin given its 30s-3min stated workload). - Issue #403 — pipeline-level Spec aggregation for
MemoryLimit/CPULimit(the harder follow-up from PR #401; needs pipeline-runner-level precomputation since Docker freezes those at container creation). hyperframes.renderadoptinginternal/avenc/— the pack uses ahyperframesCLI wrapper that opaquely runs ffmpeg, so the avenc shape doesn't fit. Not regressing anything; flagged for a future "expose ffmpeg directly" refactor if the CLI wrapper becomes a bottleneck.
Upgrade: one new migration (internal/store/migrations/0008_pipeline_run_fingerprint.sql) added by PR #397 — auto-applied via store.Open on next startup, additive (new columns with safe defaults + partial unique index), no operator action required. No breaking pack-input-schema changes. Re-run scripts/configure-openclaw.sh after upgrading to install the v0.23.0-stamped SKILL.md so your OpenClaw agent sees the new blog.append_cta pack and the avenc consolidation note.
v0.25.0 — The cheap-model reliability bet, empirically proved — ✅ Shipped 2026-06-04
Theme: Eight PRs (A–H) shipped the v0.24.0 + v0.25.0 reliability arcs as a single release. The architectural claim helmdeck rests on — that weak, cheap models can drive complex workflows iff the surrounding environment is perfectly reliable (typed errors per ADR 008, strict schemas, context compaction per ADR 050/051) — has moved from "we have typed errors and contract tests" to "every layer (handlers, schemas, engine, MCP, S3, model recovery) has a regression-impossible backstop AND we have empirical evidence that a free 120B-class model recovers correctly from helmdeck's typed errors at ≥7/10 across all 5 reliability scenarios." A 100%-covered codebase still wouldn't prove the LLM acts on the typed-error vocabulary correctly; v0.25.0 ships the first piece of data that says it does.
Ships:
- Per-package coverage gate + golangci-lint job in CI (PR A, #410). New
scripts/coverage-gate.shparses Go's coverage profile and asserts statement-weighted per-package coverage against documented floors. Different fromgo tool cover -func's total: tracks specific packages individually so a strong package can't subsidise a weak one. Initial floors: avenc=90, llmcontext=90, gateway=85, packs/builtin=75, api=60 (current baseline rounded down so PR A doesn't fail itself; PRs B–H ratchet)..golangci.ymlv2 config with errcheck/govet/staticcheck/unused/ineffassign;only-new-issues: trueratchets cleanly without forcing pre-existing cleanup.internal/api/62.8% → 80.1% (+17.3pp; 121 new tests across 16 files). All tracked packages still PASS at end-of-arc floors. - Schema-contract + typed-error contract tests (PR B, #411). Closed two specific drift surfaces.
output_schema_contract_test.goextended from 2 packs to 7 (helmdeck.plan, helmdeck.route, content.ground, research.deep, swe.solve) — closes the v0.17.1-regression class where unit tests bypassEngine.Execute(the only placeOutputSchema.Validateruns) and a handler can emit output violating its schema. NEWtyped_error_contract_test.go— table-driven, enumerates 47 builtin packs, asserts every handler returns a*PackErrorwith a code in the closedvalidCodesset. A future pack returning&PackError{Code:"weird"}fails the contract loudly. Coverage: packs/builtin 76 → 77. - Close the zero-coverage handler set in
internal/packs/builtin(PR C, #412). 14 tests forbrowser.interactviacdpfake.Client(+ caught and fixed a real bug: nilscreenshotsslice marshals asnull, violating the OutputSchemaarraytype); 8 tests forgithub.*handlers viahttptest.NewServer(promotedgithubAPIBaseconst→var); 7 tests for the ElevenLabs credential ladder (resolveElevenLabsKey's 4-step explicit → canonical → alias → env fallback). Coverage: packs/builtin 77 → 80. - Property-based tests + nightly mutation-testing workflow (PR D, #413). Closes v0.24.0 arc.
pgregory.net/rapid v1.3.0for invariants onpipelines.Validate,BasicSchema.Validate,gateway.SplitModel— each runs ~100 generated cases per check. Caught a real bug:BasicSchema.Validateaccepted top-levelnullbecausejson.Unmarshal([]byte("null"), &map[string]json.RawMessage{})succeeds with the map left as nil. Fixed ininternal/packs/schema.go. NEW.github/workflows/mutation.yml— scheduled 04:00 UTC,go-mutesting v1.2.0against decision-dense LLM-facing code (classify.go,gateway/fallback.go,internal/avenc/). internal/packs/s3store.gowire-tested against a stub S3 endpoint (PR E, #414). Pre-PR-E coverage audit surfaced the bigger latent risk:internal/packs/s3store.gowas 0% covered in CI. Every operator deploying with MinIO/R2/B2/AWS S3 was running unreviewed code on every artifact upload. 11 new tests via anhttptest.NewServerthat speaks the S3 REST + XML surface — full Put → Get round-trip, BucketExists failure at construction,*PackError{CodeArtifactFailed}translation, ListForPack/ListAll/Delete + index cleanup, PublicEndpoint host rewrite, PresignTTL/Region defaults. Two non-trivial stub pieces: AWS chunked-signed PUT decoding (X-Amz-Content-Sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD) and persistent error injection (minio-go retries internally). Coverage:internal/packs72.6 → 82.0. Floor set at 80.- Engine audit + memory machinery covered (PR F, #415). ADR 048 LLM-context machinery. Before PR F:
WritePlanAudit,WritePipelineAudit,MemoryStore()accessor,StoreFact,ProjectDefaults,CallerFromContext,WithProgress,ProgressFromContext,FactStoreError.Error()were all at 0%. 23 new tests across 4 files. The most architecturally-important assertion:ProjectDefaultsexcludes failed runs from learned defaults — a caller-fixable failure withpersona="executive"must NOT pin executive as the caller's preferred default. Coverage:internal/packs82.0 → 87.6. Floor 80 → 87. internal/mcpratchet 69.5 → 81.5 (PR G, #416). PR D's reshape deferred mcp (would have failed the gate at 80% infra floor). PR G fixes that — the MCP package is the wire surface every connected agent talks to. 20 tests for the pipeline tool dispatcher (pipelines.go8.4 → 92.2 — biggest single-file jump in the arc), 12 forbuildMyDefaults/buildMyMemory/buildRoutingGuide/formatPipelineAuditChunk, 7 for helpers including theextractWebhookFieldssecurity boundary (pack handler MUST NEVER seewebhook_url/webhook_secret), 4 fordefaultAdapterFactoryper-transport routing. Floor:internal/mcpnewly tracked at 81.- Model-recovery loop test against
openai/gpt-oss-120b:free(PR H, #417). Closes the v0.25.0 arc. The headline empirical proof: given a typed-error envelope, a free weak-tier model picks the right next action ≥7/10 across all 5 scenarios. NEW.github/workflows/model-recovery.yml— weekly Sunday 06:00 UTC +workflow_dispatch. Pinsopenai/gpt-oss-120b:freevia three workflow env vars (RECOVERY_MODEL,MODEL_LAST_VERIFIED=2026-06-04,MODEL_NEXT_REVIEW_DUE=2026-09-04). Preflight: asserts the pin is in OpenRouter's catalog; emits a::warning::annotation past the review date (loud-but-not-blocking — same cadence as the coverage gate). NEWinternal/reliability/package — build-taggedrecoveryso a defaultgo test ./...is a no-op. Three gates protect the live API: build tag,HELMDECK_RECOVERY_TESTS=1,OPENROUTER_API_KEY(repository secret). First validated run scored 10/10/9/10/9/10/7/10/10/10 — all 5 scenarios PASS at threshold. RequiresOPENROUTER_API_KEYrepository secret to be added by the operator before the first nightly fires. First-run history (kimi-k2.6:free pin → swapped to gpt-oss-120b:free same day due to Moonshot upstream throttling) documented in the workflow file.
Pack count: 53 (unchanged; this was a reliability arc, not a feature release).
MCP resources count: 6 (unchanged).
Tests: ~1,994 passing across all internal packages (was 1,702 at v0.23.0, +292 new).
Coverage at end of arc (gate-tracked packages, all PASS): internal/avenc 99.3% (floor 90), internal/llmcontext 92.1% (90), internal/gateway 88.1% (88; bumped from 85 in PR D), internal/packs 87.6% (87; NEW track in PR E), internal/packs/builtin 80.5% (80), internal/api 80.1% (80), internal/pipelines 84.0% (80; NEW track in PR D), internal/mcp 81.5% (81; NEW track in PR G).
Audience: all helmdeck operators — every layer of the stack now has a regression-impossible backstop. Pack maintainers — schema and typed-error contracts catch the v0.17.1-class regression in unit tests instead of production. Operators considering the cheap-model bet — the first piece of empirical evidence that a free 120B-class model handles helmdeck's typed errors correctly. PR contributors — the property and mutation workflows catch the kind of subtle correctness drift coverage alone can't.
Out (deferred to a future release):
- Multi-model recovery matrix (v0.26.0 candidate). The single-pinned-model proof shipped with PR H; the discovery mechanism — running the same 5 scenarios against
google/gemma-4-31b-it:free,nvidia/nemotron-3-ultra-550b-a55b:free, andopenrouter/freeweekly — is its own arc. Combined with an auto-issue mechanism on 0/10 hard failure and an empirically-validated decision rule for whetheropenrouter/freeis reliable enough to surface as a helmdeck default routing target. internal/mcp.jobs.go.sweep(the SEP-1686 async-job janitor) — needs timer-clock injection that adds more weight than it removes; integration-tested via the integration suite when it next runs.internal/mcp.stdio.goreader-side adapter — sub-process spawn semantics; integration territory, not unit-test scope.internal/packs.memoryAdapter.Namespace/List/Delete— exercised indirectly viaEngine.Execute; pinning the adapter directly is a follow-up if signals warrant.- PR D's PR-D'-reshape model-recovery work scope was bigger than the arc: the model-recovery loop (PR H) covers the per-scenario claim; a long-term trend dashboard rendering recovery scores across nightly runs is its own product surface (deferred to a docs-side follow-up).
Upgrade: zero new migrations. No breaking pack-input-schema changes. v0.25.0 is purely test infrastructure + workflows + the new build-tagged internal/reliability/ package. Existing operators upgrade cleanly via git pull + make build. One operator action required to enable PR H's weekly recovery test: add OPENROUTER_API_KEY as a GitHub repository secret (Settings → Secrets and variables → Actions). Without the secret the workflow's preflight emits a clear "secret not set" warning and skips cleanly — no failure. Re-run scripts/configure-openclaw.sh after upgrading to install the v0.25.0-stamped SKILL.md.
v1.0.0-rc1 — Kubernetes preview (planned)
Theme: "Helm install works; production hardening pending."
Hard prerequisite (must land before any rc1 work)
- #134 — unified install paths so
compose.yamland the Helm chart reference the same versioned GHCR tags (ghcr.io/tosin2013/helmdeck:0.X.Y) instead of the build-time-only:devtag. The Helm chart cannot ship referencing:dev(operators have no source tree), so this gates rc1.
Ships (planned)
- T701
client-goSessionRuntimebackend. - T702 Helm chart
charts/baas-platform/. - T703 PostgreSQL StatefulSet sub-chart (Bitnami) with
database.external.enabledtoggle. - T704 Session pod template (seccomp, restartPolicy: Never, memory-backed
/dev/shm).
Operators can install on GKE/EKS but production-hardening items (NetworkPolicy, isolation tiers, TLS, audit) are not gates.
v1.0.0 — Kubernetes & GA (Week 22)
Theme: "Production."
Milestone: v1.0 — Kubernetes & GA (Phase 7) · Tasks: Phase 7
Ships
client-goSessionRuntimebackend- Helm chart
charts/baas-platform/with all toggles - Two-namespace layout (
baas-system/baas-sessions) + scoped RBAC - Session pod template (seccomp, restartPolicy: Never, memory-backed
/dev/shm) - NetworkPolicies (ingress + egress)
- KEDA ScaledObject on
baas_queued_session_requests+ utilization browser-pool-warmupDeployment for cold-start eliminationisolation.level: standard (Docker) / enhanced (gVisor) / maximum (Firecracker via RuntimeClass)- cert-manager + Ingress-NGINX TLS termination
- OTel Collector DaemonSet
- External Secrets Operator integration
- Argo CD reference manifest in
deploy/gitops/
Hard exit gates
- Helm install on a fresh GKE or EKS cluster passes the same smoke matrix as Compose
- Load test: 100 concurrent sessions, 24h soak, ≤150 MB control plane footprint, ≤5 s recovery
- gVisor tier passes the smoke matrix
- External security audit clean
Audience
General availability. Tag v1.0.0. Announce.
v1.x — Post-GA Innovation Tracks
Released as feature-gated minors as they stabilize. No hard sequence.
| Version | Headline feature | ADR |
|---|---|---|
| v1.1 | WASM Executor for sandboxed third-party packs | 012, 024 |
| v1.2 | Four-tier Memory API (Working/Episodic/Semantic/Procedural) | 029 |
| v1.3 | Procedural→Pack promotion UI | 024, 029 |
| v1.4 | WebRTC live session streaming | 028 |
| v1.5 | WebMCP detection and preferential routing | 027 |
| v1.6 | Pre-packaged Chrome DevTools MCP / Playwright MCP entries | 006 |
| v1.7 | Firecracker production hardening (bare-metal node guidance) | 011 |
| v1.x | Lightpanda alternate browser engine | 001 |
| v1.x | NVIDIA OpenShell integration — sidecars in MicroVMs + L7 policy | 011, 036 (planned) |
| v1.x | Long-form artifact streaming (#201) — ArtifactStore.PutStream for multi-GB MP4/audio outputs (unblocks hyperframes.render long-form, podcast videos 30–60 min) | 037 (planned) |
v1.x — Enterprise integration tracks
Post-GA themes that compose with the innovation tracks above but are scoped as community-led integration work rather than core platform features. Each is broken into independently-mergeable phases tracked as separate GitHub issues so contributors can pick up one phase without blocking on the others.
NVIDIA OpenShell integration
Theme: "Helmdeck sidecars inside hardware-isolated, policy-governed sandboxes."
NVIDIA OpenShell is a Rust-based safe runtime for autonomous AI agents — declarative YAML policies, OPA-enforced L7 network rules, libkrun MicroVM compute driver, Landlock filesystem isolation. Helmdeck's pack engine operates at the tool layer; OpenShell operates at the sandbox layer. The integration is non-duplicative — each project covers a layer the other doesn't.
Canonical design doc: docs/integrations/openshell.md.
Four phases (all post-v1.0):
- Shallow integration — run the helmdeck control plane inside an OpenShell sandbox. Docs + example policy only. No helmdeck code changes. Good first issue.
- Agent sandbox integration — run the agent (OpenClaw / Claude Code / Hermes) inside an OpenShell sandbox with egress restricted to helmdeck MCP +
inference.local. Docs + example policy. Extendsopenclaw.md's topology section. Good first issue. OpenShellSessionRuntimebackend — thirdSessionRuntimeimplementation (alongsideDockerSessionRuntimeand v1.0'sKubernetesSessionRuntime) that routes sidecar lifecycle through the OpenShell Gateway API. Hardware-isolated browser / Python / Node sidecars. Help wanted; multi-week Go work. Lands a new ADR (036).- Correlated observability — join helmdeck's OTel GenAI traces with OpenShell's OCSF security events on the sandbox ID. End-to-end traces from MCP tool call → policy decision → outbound HTTP. Help wanted; OTel collector + OPA experience.
Why post-v1.0: Phase 3 modifies SessionRuntime, the seam between helmdeck's pack engine and execution backends. Touching it pre-GA forks the v1.0 test matrix; post-GA it's purely additive. Plus OpenShell is alpha — production deployments need a stable OpenShell Gateway API first.
Gating: v1.0 ships first. Phases 1 and 2 can land as docs-only PRs once both projects are GA. Phases 3 and 4 wait on a stable OpenShell Gateway API (no calendar commitment).
Versioning policy
- Pre-1.0: every minor may break compatibility; document in release notes.
- 1.0 onward: SemVer. Breaking pack-schema changes require a new pack version under
/api/v1/packs/{name}/v{n}(ADR 024); the previous version stays callable for at least one full minor cycle. - Bridge ↔ control plane: version-pinned. The bridge logs a deprecation warning when older than the platform's minimum recommended (ADR 030).
Distribution channels at GA
| Artifact | Channel |
|---|---|
| Control plane image | ghcr.io/tosin2013/helmdeck:vX.Y.Z |
| Browser sidecar image | ghcr.io/tosin2013/helmdeck-sidecar:vX.Y.Z |
| Helm chart | oci://ghcr.io/tosin2013/charts/baas-platform |
helmdeck-mcp bridge | Homebrew, Scoop, npm, OCI, GH Releases |
| Compose stack | deploy/compose/compose.yaml in repo |
Related ADRs
The release-management and deployment decisions behind helmdeck's distribution model: