Skip to main content

Explore with packs, exploit with pipelines: making a workflow a first-class resource

· 5 min read
Tosin Akinosho
Helmdeck maintainer

A capable agent will happily chain research.deep → content.ground → slides.render to build you a fact-checked deck. Ask for the same thing next week and it does the whole dance again from scratch: re-reasoning the sequence, re-threading each step's output into the next, re-passing the session id by hand. The workflow lives in the agent's prompt, not in the platform — so it can't be scheduled, triggered, shared, or replayed. helmdeck v0.15.0 (ADR 041) fixes that by making a pipeline — a stored, named, ordered sequence of pack steps — a first-class resource that any actor can create, run, and inspect.

Orchestration that lives in the prompt

helmdeck has always been a tool server: an agent calls a pack, gets a result, calls the next. Composition is the agent's job, every time. That's exactly right for exploration — the agent is figuring out what sequence even works. It's wasteful for exploitation — running a known-good sequence the hundredth time. Each ad-hoc run is N tool round-trips, N chances to mis-thread an output or drop a _session_id, and a pile of tokens spent re-deciding a sequence that hasn't changed.

The fix isn't to make the agent smarter at orchestration. It's to let the agent hand the orchestration back to the platform once it's settled. A pipeline is pure data — [{id, pack, input}] with ${{ steps.<id>.output.<field> }} references between steps — so it lives in the database next to credentials and audit entries, addressable through one REST/MCP surface.

Explore with packs, exploit with pipelines

The mental model we landed on, and wrote into the agent's skill file, is one line: explore with packs, exploit with pipelines.

While exploring, the agent calls packs directly — because exploration needs the agent in the loop. It inspects the research before deciding how to slide it; it retries with different inputs; it branches on an intermediate result; it pauses to ask the user. Pipelines are deliberately linear and fail-fast — no branching, no loops, no human-in-the-middle — so anything needing control flow stays a direct pack call. That constraint is a feature: it keeps pipelines simple enough to be reliable and reproducible.

Once the sequence is settled, the agent codifies it with one MCP call:

// helmdeck__pipeline-create
{
"name": "weekly-k8s-brief",
"steps": [
{ "id": "research", "pack": "research.deep",
"input": { "query": "${{ inputs.topic }}", "model": "openrouter/auto" } },
{ "id": "ground", "pack": "content.ground",
"input": { "text": "${{ steps.research.output.synthesis }}", "rewrite": true } },
{ "id": "deck", "pack": "slides.render",
"input": { "markdown": "${{ steps.ground.output.grounded_text }}", "format": "pdf" } }
]
}

From then on the workflow is one call returning a run_id — the agent polls helmdeck__pipeline-run-status instead of babysitting three round-trips. The templating and session-threading happen server-side; the whole thing is audited as a unit and replayable. And because a pipeline is just a resource, any actor can run it: the user from the UI, a different agent over MCP, and — landing next — a cron schedule or a GitHub webhook, all calling the same stored definition.

The discipline that makes this safe is the same one we apply everywhere: the output-templating resolver works on the decoded JSON tree, resolves in a single pass (so a resolved value is never re-scanned for references), and re-marshals through the JSON encoder — a resolved value can neither break out of its position nor trigger a second-order injection. An unresolved reference is a loud failure, never a silent empty.

We shipped ~13 built-in starters so the feature is useful on day one without anyone writing YAML: grounded deck, grounded blog, research→{deck,podcast,blog}, scrape→ground→blog, and "clone a repo → narrated deck / podcast about it." helmdeck__pipeline-list surfaces them, so the agent's first move on a familiar request is to check whether a pipeline already exists rather than re-deriving it. And the new /pipelines panel in the management UI lets an operator watch a run advance — pending → running → succeeded, per step — which is how you see what your agents have been building.

The signal to watch for

If you're building agent infrastructure, watch for the moment your agent starts doing the same multi-step thing repeatedly. That's the signal that orchestration has escaped the platform and is now living — fragile, un-schedulable, un-auditable — inside a prompt. The instinct is to make the agent better at the dance. The better move is to give it a way to stop dancing: a place to save the sequence as data, parameterize it, and run it by name.

The split that makes it work is explore vs. exploit. Keep the open-ended, judgment-in-the-loop work as direct tool calls — that's what agents are for. But the instant a sequence is known-good and repeatable, the agent's most valuable act is to codify it, because that turns a per-run cost (tokens, latency, mis-threading risk) into a one-time write. The loop closes inside the platform: agents create pipelines, pipelines run packs, packs produce artifacts, artifacts feed agents — every step audited, every credential vaulted, every run reproducible.

See also