Why helmdeck
Frontier-model APIs price a single agentic workflow at $0.20–$0.50. Helmdeck runs the same workflow on a cheap or local model for $0.05–$0.10, with deterministic packs absorbing the ambiguity that the model would otherwise burn tokens rediscovering.
Cheap models, real work
Run agentic browser, code, slides, vision, and desktop workflows on gpt-oss-120b, Gemma 4, or Mistral — the same Phase 5.5 code-edit loop that needs Sonnet on Cursor.
Deterministic primitives
36 typed capability packs do the work. The LLM only picks which pack to call. Move recurring deterministic work out of the expensive token-priced layer.
Self-hosted, audited
Your data, your keys, your hardware. Per-pack audit log, vault-backed credentials, egress-guarded network. Apache 2.0.
Tutorials
Learning-oriented walkthroughs. Start here if helmdeck is new — go from zero to a working pack-driven agent with explicit steps.
Read →How-to guides
Problem-solving recipes. Wire helmdeck into a specific MCP client, extend a sidecar, ship a webhook integration.
Read →Reference
Information lookup. Pack contracts, SKILLS for LLMs, every Architecture Decision Record, project tracking.
Read →Explanation
Understanding-oriented background. The why behind the security model and architecture choices.
Read →Recently shipped
The latest engineering notes, design rationale, and field reports from the helmdeck project.
Render ≠ preview: what we learned shipping a hyperframes integration
A v0.29.2 pipeline produced 15 seconds of animation followed by 83 seconds of blank canvas. We assumed it was a slot-lifetime bug, filed upstream issues, shipped a fix, and tagged a release — then discovered that even upstream's own decision-tree example doesn't render at all (2 distinct frames over 15 seconds). The actual story: hyperframes has a known, documented 'render ≠ preview' bug class, and the registry's own decision-tree trips over it. Upstream's own `hyperframes lint` was telling us this the whole time. We wrapped it as a helmdeck pack so the next agent catches it before burning the render budget.
Read post →When agent-instruction docs drift from upstream spec
I wrote a best-practices guide for helmdeck's HyperFrames integration. A maintainer asked one question — 'where's this sourced from?' — and the answer turned out to be 'I made it up.' Here's what we did about it, and the broader lesson for anyone writing agent reference docs.
Read post →HuggingFace isn't just another LLM router — it's a platform helmdeck barely uses
PR #489 added HF Inference Providers as alternative routing. The bigger opportunity is everything else HF offers — datasets, embeddings, Spaces, tokenizers — that helmdeck currently ignores. Epic #490 frames the strategic direction.
Read post →Empirical validation: the audit-callback pattern fires (and the profile only gets you partway)
A profile-aware Tier C agent ran the audit-callback pattern end-to-end on openai/gpt-oss-120b:free — real artifacts, real verify_manifest with all_present:true. It also simplified the skill's 9-platform table to 2 variations. The library is a starting point, not a finished product.
Read post →Plausibility-shaped output: when Tier C models manifest deposits they never made
A Tier C free model produced a confidently-formatted six-entry deposit manifest, with byte sizes and a policy citation, for artifacts that never existed. One real pack call, six fabricated. The architectural fix is verify-against-ground-truth.
Read post →The audit-callback pattern: verify-against-ground-truth as anti-hallucination middleware
For any pack call an LLM might transform in its text response, ship a paired audit pack that reads ground truth. The architecture is the same shape as ADR 052 av-validate — applied at the chat-response layer instead of the artifact layer.
Read post →Tier A is structurally better. The deposit-step failure is universal.
We ran the same prompt on Claude Sonnet 4.6 that we ran on gpt-oss-120b:free. Tier A handles parallel tool use, 8-platform fanout, the InfoQ 6-criterion fit check, and the "one clarifying question" rule. It also skips the mandatory artifact.put step the same way Tier C does. The deposit-step failure is tier-invariant.
Read post →Recipe-style docs are dramatically underused. Here's the case for them.
We shipped a cookbook of intent → prompt recipes alongside our reference docs. Within 48 hours it had eclipsed the prompt-templates page as the most-linked-to doc in our reference site. The pattern is simple, the per-recipe cost is ~15 minutes, and most projects don't do it.
Read post →