Skip to main content

The audit-callback pattern: verify-against-ground-truth as anti-hallucination middleware

· 6 min read
Tosin Akinosho
Helmdeck maintainer

Hook

Three architectural fixes from a single morning closed three different Tier C failure modes. A fourth — the agent producing a confidently-formatted manifest of fictitious deposits — survived all three. The structural answer isn't another fix at the producer side. It's a typed audit pack that reads ground truth after the fact, with the skill forced to surface the gap.

Context

Helmdeck's been on a Tier C reliability arc for a week. Three patterns kept recurring:

PatternExampleFix shape
Skill prose ignored"Save to artifacts" → markdown returned inlineTurn the advisory into a typed pack call (PR #450)
Required arg omittedcontent.ground rejects when model missingResolve a default at the pack layer (PR #453)
Mechanism vs. persona mixedTier C overwhelmed by 17 KB monolithic SKILL.mdSplit per OpenClaw's canonical agent-workspace modelissue #457 and follow-ups

We shipped all three, plus the layered workspace refactor, and retested on openai/gpt-oss-120b:free. The first three fixes worked — the agent loaded the layered files correctly, applied the decision rules from AGENTS.md, picked the right publishing mode, and made one successful blog.rewrite_for_audience call without specifying model. Then it produced a six-entry deposit manifest table for artifacts that didn't exist. The skill was in context. The pack was reachable. The model invented the calls as text.

That class of failure can't be fixed at the producer side — the producer was never called. It needs a verifier at the consumer side.

Finding

The shape that worked

artifact.verify_manifest:

{
"tool": "helmdeck__artifact-verify-manifest",
"arguments": {
"expected": [
{ "artifact_key": "blog.publish/abc-mcp-adr-canonical.md" },
{ "artifact_key": "blog.publish/def-mcp-adr-linkedin.md" }
]
}
}

Returns:

{
"verified": [
{ "artifact_key": "blog.publish/abc-mcp-adr-canonical.md",
"filename": "mcp-adr-canonical.md",
"namespace": "blog.publish",
"size": 7421,
"content_type": "text/markdown" }
],
"missing": [
{ "artifact_key": "blog.publish/def-mcp-adr-linkedin.md", "reason": "artifact not found" }
],
"all_present": false,
"summary": "1 of 2 claimed artifacts verified; 1 missing"
}

Handler: pure passthrough to ArtifactStore.Get per claimed key, dedup before lookup, accumulate found vs. not-found. ~150 LOC, 100% per-function coverage on 15 tests.

The skill update is two paragraphs:

### 4b. Verify deposit — MANDATORY, NOT ADVISORY

After producing the deposit-manifest table in §4, you MUST call
helmdeck__artifact-verify-manifest with every artifact_key from
the table. This is an anti-hallucination audit.

If `all_present: false` — DO NOT claim the deposit succeeded.
Report the missing[] entries explicitly and propose retrying the
deposit step for those specifically.

That's it. The audit pack is a tool name, not advisory prose — Tier C invokes it ~most of the time because it's a concrete tool call, not a "remember to" reminder. When it does invoke it, the returned missing[] is in the LLM's context window for the next response turn, making "all six deposited" implausible to assert.

Why this is the same shape as ADR 052

ADR 052 (av-output-validation-post-step) made av.validate a default-on post-step on slides.narrate and podcast.generate. The token-savings claim was concrete: every "the video has issues" diagnostic burns ~3,000 tokens of bash output and analysis; reading the validation field from the run record collapses that to ~200 tokens. The architecture: turn an implicit trust in the artifact ("looks fine, ship it") into a typed pack output the agent reads in O(200) tokens.

artifact.verify_manifest is the same shape at a different layer:

LayerWhat's verifiedTrust replaced
ADR 052 (artifact layer)The artifact's structural integrity (codec, faststart, packet contiguity, RMS)"the encoder produced a usable file" → typed validation.checks[]
artifact.verify_manifest (chat-response layer)The agent's claims about what's in the store"the agent said it deposited" → typed verified[] / missing[]

Both move from implicit trust to explicit verification, both surface findings in O(200) tokens, both pin the failure mode at a place where it can't drift back.

Phase 2 — generalize

The pattern fits a lot of helmdeck packs. Anywhere the LLM might transform a producer's output in its text response, you can pair the producer with an audit pack that re-reads authoritative state:

ProducerAuditor (planned)Verifies
artifact.putartifact.verify_manifest (shipped)Keys exist in store
repo.fetchrepo.verify-cloneClaimed clone_path exists, commit SHA matches
blog.publishblog.verify-publishedPublished URL is reachable, content matches
pack.start (async)pack.verify-completedjob_id is completed, not working
slides.narrateslides.verify-renderedMP4 exists + passes av.validate
content.groundcontent.verify-groundedclaims_grounded_count matches grounded[] length
pipeline-runpipeline.verify-completionClaimed step outputs match run record

Each follows the same shape: input is the agent's claim, output is {verified[], missing[], summary}. Handler reads authoritative state and reports the gap. Tracking in #461.

Phase 3 — engine-level hook (deferred)

The skill-prose dependency in Phase 1 ("after the deposit step, you MUST call verify-manifest") is itself a Tier C failure surface — small chance the model ignores it. The next architectural step is an engine-level post-call hook: when a producer pack completes, the engine auto-invokes the registered auditor, attaches the result to the same response envelope, and the LLM sees both without skill-prose dependency.

That's its own ADR. Not shipping it until Phase 1 + 2 prove the pattern is generally useful. Premature middleware is a way to build a complicated system you can't justify.

Why this matters to you

If you're building an agent on weak models, the producer-audit pair is a more durable shape than trying to make the model infallible.

Three principles that fall out of the work:

  1. Trust the producer; verify the consumer. Packs are reliable when they're called. The unreliability is the agent's claims about what it called. Verifying the consumer side closes that gap regardless of model tier.
  2. Make the audit a typed tool, not prose. "Remember to verify" is a Tier C failure mode. "Call helmdeck__artifact-verify-manifest" is a tool dispatch. The tool's existence in the catalog AND the skill's mandatory-step prose together raise the floor.
  3. The audit response has to be in context when the agent writes its final text. If verification runs out-of-band and the result lands in a log, the agent never sees it and continues asserting compliance. The audit must be a tool call whose result the LLM reads before its next text turn.

The pattern transfers to any MCP-tooling system, not just helmdeck. The MCP spec's tool-call envelope is exactly the surface this pattern uses. If your agent produces structured claims about world state (deposits, sends, publishes, mutations), pair each producer with an auditor and require the auditor in your skill template.

See also