How to build a Concept Animator agent on `openai/gpt-oss-120b:free`

This recipe shows how to set up an OpenClaw agent running openai/gpt-oss-120b:free that turns a single-sentence concept into a narrated animated video (16:9, 9:16, or 1:1), with an explicit max-length option that unlocks the full 12-minute hyperframes.compose cap. It closes part of issue #496 — the video-agents reference recipes for gpt-oss-120b:free.

The recipe is model-family-specific. Workflow mechanics are the same that PR #470 validated for the iterative-blog use case — single objective, explicit constraints, machine-checkable invalidation rules — but the artifact is video instead of markdown and the chain is 5 pack calls (podcast.generate → hyperframes.compose → hyperframes.render → av.validate → artifact.verify_manifest). Five calls edges into the model profile's medium-to-long chain band; the invalidation-rules-as-success-criteria framing plus the explicit av.validate post-step + audit-callback at the end keep it on rails.

When to use this recipe

Use it when you want a Tier C concept-animator agent that reliably:

Calls podcast.generate to produce the narration audio (with av.validate running automatically inside the pack per ADR-052 Phase 3)
Calls hyperframes.compose with the returned audio_url, passing duration_seconds set to the podcast's actual length (the pack rejects audio_url without an explicit duration_seconds per issue #498; without this constraint older recipe versions silently truncated narration at 8 seconds)
Calls hyperframes.render with the resulting composition_html, producing a sub-512 MiB MP4
Calls av.validate explicitly on the rendered MP4 (it does NOT run automatically post-render — only inside podcast.generate and slides.narrate)
Closes the chain with an audit-callback (artifact.verify_manifest) so the operator gets a machine-checkable confirmation the MP4 actually exists — not just a text claim of success

It does NOT replace a hand-authored animation skill — it's the small, opinionated worked example of getting a Tier C model to drive a 5-call video chain without hallucinating intermediate output.

Worked example — Maya, security researcher

This recipe uses Maya, a hypothetical security researcher who publishes short explainers about kernel observability and memory-corruption mitigations on Mastodon and YouTube, as the worked persona. Maya is sanitized — no real operator's identity, employer, or platform list. Adapt the persona to your own context.

Pre-flight

OpenRouter API key set; openai/gpt-oss-120b:free confirmed reachable
Helmdeck packs available: helmdeck__podcast-generate, helmdeck__hyperframes-compose, helmdeck__hyperframes-render, helmdeck__av-validate, helmdeck__artifact-verify_manifest, helmdeck__artifact-get
ElevenLabs API key configured (otherwise podcast.generate must be called with allow_silent_output: true)
Per-model profile YAML reviewed: models/openai-gpt-oss-120b-free.yaml. Particular sections to internalize before writing the AGENTS.md: prompting_style, anti_patterns, chain_call_reliability.

Step 1 — Create the workspace

In OpenClaw, create a new agent workspace (e.g., ~/.openclaw/workspace-maya-animator/). Add the canonical OpenClaw files: SOUL.md, IDENTITY.md, USER.md, AGENTS.md. The persona files (SOUL / IDENTITY / USER) are yours to define; the recipe below focuses on AGENTS.md, which is the load-bearing file for gpt-oss-120b's prompting fit.

Step 2 — Configure the model route

In OpenClaw's per-agent model config, set:

provider: openrouter
model: openai/gpt-oss-120b:free
sampling:
  temperature: 0.7
  top_p: 0.95
reasoning_effort: medium

Why these values: gpt-oss-120b exposes a graded reasoning-effort knob (low / medium / high). The concept-animator chain involves duration math (audio length → composition seconds) plus pack-selection plus invalidation-rule self-check — that's medium work. Bumping to high is unnecessary and slow; dropping to low raises the risk of the "plausibility-shaped output" failure mode documented in the profile YAML where the model claims a tool call as text instead of executing it.

Step 3 — AGENTS.md template

Copy the template below to ~/.openclaw/workspace-maya-animator/AGENTS.md. The template uses gpt-oss-120b's preferred style — single OBJECTIVE, explicit CONSTRAINTS, machine-checkable SUCCESS CRITERIA framed as INVALIDATION RULES (per the model profile's prompting_style: objectives_constraints_success_criteria setting):

# AGENTS.md — Maya's concept animator on openai/gpt-oss-120b:free

This workspace produces short narrated animated videos on a Tier C agent
running gpt-oss-120b. The AGENTS.md prose is tuned to gpt-oss's
Objectives + Constraints + Success-Criteria style per the helmdeck profile
models/openai-gpt-oss-120b-free.yaml — NOT a numbered step-by-step
procedure. The chain is 5 pack calls (podcast.generate →
hyperframes.compose → hyperframes.render → av.validate →
artifact.verify_manifest). Per the profile's chain_call_reliability table,
this edges into medium-long; framing pack calls as part of success
criteria (not separable steps) plus the explicit av.validate post-step
and the audit-callback at the end are the critical levers.

# OBJECTIVE

Convert the operator's concept into a hosted MP4 animated video with
narration. Default narration target is 60 seconds (social-first). When
the operator requests "max length", scale to the 12-minute cap.

# SOURCE PRIORITY

1. The operator's most recent message (concept + optional max-length flag).
2. Prior turns in this conversation (for follow-up edits to the same concept).
3. General knowledge (only for animation conventions, e.g., aspect-ratio
   norms for vertical / horizontal output).

# CONSTRAINTS

- Do not micromanage rendering details. The packs handle their own internals.
- If the operator requests "max length", pass `duration_target_min: 12` to
  `podcast.generate` (and the matching `duration_seconds` to
  `hyperframes.compose`, per the duration data-flow constraint below).
  Otherwise default to a 60-second narration target on `podcast.generate`.
- Pass `podcast.generate`'s `audio_url` field — the presigned URL, NOT
  `audio_artifact_key` (the sidecar key) — to `hyperframes.compose` as the
  `audio_url` input. Two related fields exist on the response; the
  presigned URL is the one `hyperframes.compose` consumes.
- When `ELEVENLABS_API_KEY` is unavailable, call `podcast.generate` with
  `allow_silent_output: true` so the chain still produces a silence-padded
  MP3 the composer can frame against.
- **Always pass `speakers` to `podcast.generate`** — it is a REQUIRED field
  in the pack schema; calls without it fail validation immediately (the
  failure was empirically observed during the 2026-06-13 first session
  against this recipe: the model retried podcast.generate 9 times without
  ever progressing). For a single-narrator concept animation, use:
    `"speakers": {"Narrator": "21m00Tcm4TlvDq8ikWAM"}`
  (`21m00Tcm4TlvDq8ikWAM` is ElevenLabs' "Rachel" — the canonical default
  voice. For multi-speaker dialogue, add more `{name: voice_id}` entries.)
- Use Mode B for `podcast.generate`: pass `prompt` (the user's concept
  expanded into a ~60-second-target narration brief) AND
  `model: "openrouter/openai/gpt-oss-120b:free"` so the pack's gateway
  LLM writes the script using the SAME free model the agent itself runs
  on. Do NOT try to author the `script` array yourself.
- Also pass `metadata_model: "openrouter/openai/gpt-oss-120b:free"` on
  `podcast.generate`. The default is `openrouter/auto`, which routes to a
  PAID model for engagement metadata. Overriding keeps the whole chain on
  free tier.
- When calling `hyperframes.compose`, pass
  `model: "openrouter/openai/gpt-oss-120b:free"`. This field is REQUIRED
  — it's the LLM that generates the creative HTML/CSS/JS composition
  from the description. Pinning to the free model keeps the entire
  chain off the paid tier.
- Also pass `metadata_model: "openrouter/openai/gpt-oss-120b:free"` on
  `hyperframes.compose`. The default is `openrouter/auto` (PAID).
  Setting it pins the engagement-metadata-generation step (per [PR
  #500](https://github.com/tosin2013/helmdeck/pull/500)) to free tier
  and unlocks the duration-band-aware engagement payload:
    <60s   → short_form  (title / hook / hashtags / caption / thumbnail_prompt)
    60-180s → mid_form   (above + social_blurb)
    ≥180s  → long_form   (above + description / chapters / tags / hook_30s / category)
  The compose pack picks the shape from `duration_seconds` automatically.
- **ALWAYS pass `duration_seconds` to `hyperframes.compose`, set to the
  `duration_s` value returned by `podcast.generate`** (round up to the
  nearest whole second). The pack rejects `audio_url` without an explicit
  `duration_seconds` per issue [#498](https://github.com/tosin2013/helmdeck/issues/498)
  — without it (in older pack versions), the composition timeline would
  default to 8s and silently truncate longer narration. Example data flow:
    podcast.generate returns `duration_s: 88.581` →
    hyperframes.compose call: `duration_seconds: 89, audio_url: ...`
- Pass `hyperframes.compose`'s returned `composition_html` to
  `hyperframes.render` verbatim. Do not modify the HTML.
- After `hyperframes.render` returns a `video_artifact_key`, call
  `helmdeck__av-validate` with that key as `video_artifact_key`. This is
  a load-bearing post-render quality gate — it reports faststart, codec
  pin, packet contiguity, RMS sweep, loudness LUFS, A/V duration parity,
  and black-run detection.
- If `av.validate` returns `all_passed: false`, surface the failed + warn
  checks to the operator in the final report. Do NOT silently drop them.

# SUCCESS CRITERIA (Invalidation Rules — applied strictly)

The response is INVALID and must NOT be reported as success when:

- `helmdeck__podcast-generate` was not called.
- `helmdeck__podcast-generate` was called WITHOUT a `speakers` map, or
  WITHOUT a `model` field paired with `prompt` (Mode B requires both),
  or WITH a `model` other than `openrouter/openai/gpt-oss-120b:free`
  (operator-cost discipline: the chain stays on free tier).
- `helmdeck__podcast-generate` was called without
  `metadata_model: "openrouter/openai/gpt-oss-120b:free"`. The default
  routes to PAID; explicit override keeps engagement metadata free too.
- `helmdeck__hyperframes-compose` was not called with the `audio_url`
  returned by `podcast.generate`.
- `helmdeck__hyperframes-compose` was called WITHOUT a `model` field OR
  with a `model` other than `openrouter/openai/gpt-oss-120b:free`.
- `helmdeck__hyperframes-compose` was called without
  `metadata_model: "openrouter/openai/gpt-oss-120b:free"`. The default
  routes engagement metadata to PAID `openrouter/auto`; explicit
  override keeps the chain end-to-end free AND unlocks the
  duration-band-aware engagement payload.
- `helmdeck__hyperframes-compose` was called WITHOUT a `duration_seconds`
  field, OR with a `duration_seconds` value not matching (within ±1s)
  the `duration_s` returned by `podcast.generate`. Mismatch causes
  silent audio truncation in the final video.
- `helmdeck__hyperframes-render` was not called with the `composition_html`
  returned by `hyperframes.compose`.
- `helmdeck__av-validate` was not called with the rendered MP4's
  `video_artifact_key`.
- `helmdeck__artifact-verify_manifest` was not called with the rendered
  MP4's `video_artifact_key`, OR the response field `all_present` is not
  `true`.
- The final report omits the `av.validate` summary (which checks passed
  / failed / warned). If `all_passed: false`, the operator MUST see the
  failed + warn checks listed, not just an "OK" summary.
- Any pack result is paraphrased or invented as text instead of cited
  from the actual tool return.

# NOTE ON av.validate

- `av.validate` runs automatically inside `podcast.generate` (per ADR-052
  Phase 3 default-on integration). The audio is validated; no explicit
  call needed on the audio leg.
- `av.validate` does NOT run automatically after `hyperframes.render`,
  so the chain MUST call it explicitly with the rendered MP4's
  `video_artifact_key` (see CONSTRAINTS + SUCCESS CRITERIA above). Real
  finding from a 2026-06-13 test session: the rendered MP4 hit a black
  run + low loudness that would have shipped silently without this post-step.

# OUTPUT FORMAT

When the chain succeeds, report:

- The concept (one line).
- Audio duration (seconds) from `podcast.generate`.
- Composed duration (seconds) from `hyperframes.compose`.
- Rendered MP4 `video_artifact_key`.
- `av.validate` `all_passed` + summary of any failed/warn checks.
- `verify_manifest` `all_present` result.
- **Engagement metadata** from `hyperframes.compose.engagement` (the
  duration-band-aware payload added in [PR #500](https://github.com/tosin2013/helmdeck/pull/500)).
  At minimum surface:
    - `engagement.format` (short_form / mid_form / long_form)
    - `engagement.title` (proposed video title)
    - `engagement.hashtags` (the relevance-validated list)
    - `engagement.thumbnail_prompt` (so the operator can chain
      `image.generate` for hero artwork if wanted)
    - For `long_form` only: also surface `engagement.description`,
      `engagement.chapters`, and `engagement.hook_30s` (the YouTube
      publishing pack).
- `engagement_artifact_key` so the operator can fetch the full JSON
  sidecar later via `artifact.get` if they want all the fields.
- A short note on aspect ratio chosen if the operator left it unspecified.

Do not include any URL the operator did not see in a tool result.

Variant — bring-your-own audio (v0.29.5+)

When Maya has an existing audio file — a recorded interview, a stitched-together podcast clip, an ElevenLabs render she did out-of-band — the recipe above is the wrong shape: podcast.generate re-generates audio from a prompt rather than re-using the audio she already trusts. Two changes that landed in v0.29.4 and v0.29.5 make a BYO-audio variant tractable:

v0.29.4's builtin.byo-audio-narrated-video pipeline takes an audio_artifact_key instead of generating audio, and chains the pre-render validation suite (hyperframes.lint → inspect → validate) before render so blank-canvas or silent-audio failures abort cheaply.
v0.29.5's new POST /api/v1/artifacts/upload endpoint + drag-drop card on the Management UI's Artifacts page closes the "how do I get my MP3 into the artifact store?" UX gap. Maya opens the UI, drags her file, copies the returned audio_artifact_key, and hands it to the agent.

Operator step (out-of-band, not in chat)

Open Management UI → Artifacts
Drag the audio file onto the upload card (100 MiB cap; covers long-form audio + large pre-rendered media)
Click Copy on the resulting artifact_key (it'll look like operator-uploads/abc123-narration.mp3)
Note the audio's duration in seconds — ffprobe -v error -show_entries format=duration -of csv=p=0 file.mp3. The cap is 720s; bigger audio is rejected at pack-input validation as CodeInvalidInput.

AGENTS.md addendum for the BYO variant

The base AGENTS.md template stays the same; add this CONSTRAINTS section override for runs where the operator passes an artifact key:

# CONSTRAINTS — BYO audio variant

- The operator will provide:
  - `audio_artifact_key` (starts with `operator-uploads/`)
  - `description` (topic context for composition authoring)
  - `duration_seconds` (the audio's actual length in seconds)
- Call ONE pack: `helmdeck__pipeline-run` with pipeline_id=`builtin.byo-audio-narrated-video`.
- Pass through every required input verbatim — do NOT regenerate the audio, do NOT call `podcast.generate`.
- The pipeline runs lint → inspect → validate → render internally with strict mode on; any validation failure surfaces as a typed `CodeArtifactFailed` error. **DO NOT** swallow it and retry the same composition — surface the finding to the operator with the code (e.g. `media_missing_id`, `text_box_overflow`) and the suggested fix from the pack output. The validation suite is the publish gate; bypassing it ships broken video.

# SUCCESS CRITERIA (Invalidation Rules — applied strictly)

- INVALID if `helmdeck__pipeline-run` is not the only pack called for this request.
- INVALID if `audio_artifact_key`, `description`, or `duration_seconds` is missing from the pipeline input.
- INVALID if the agent regenerates audio via `podcast.generate` when an `audio_artifact_key` was supplied.
- INVALID if the agent claims the render succeeded without surfacing the pipeline's returned `video_artifact_key` (or `mp4_artifact_key`) verbatim.

Test prompt for the BYO variant

@Maya I uploaded a 4-minute MP3 about Antigravity CLI game-building.
The artifact key is operator-uploads/abc123-antigravity.mp3 and the
duration is 234 seconds. Generate a narrated 16:9 animated video
explaining the build process to match the audio.

Expected single pack call:

{
  "tool": "helmdeck__pipeline-run",
  "input": {
    "pipeline_id": "builtin.byo-audio-narrated-video",
    "inputs": {
      "audio_artifact_key": "operator-uploads/abc123-antigravity.mp3",
      "description": "Antigravity CLI game-building",
      "duration_seconds": 234,
      "aspect_ratio": "16:9"
    }
  }
}

Why this is shorter than the from-scratch chain

The from-scratch path is 5 tool calls (podcast.generate → hyperframes.compose → hyperframes.render → av.validate → artifact.verify_manifest). The BYO variant is 1 tool call because the pipeline composes the chain internally AND inlines the pre-render validation gates that close the most common silent-failure modes. Fewer tool calls means a Tier C model has fewer opportunities to drift. The pipeline IS the audit-callback; we don't need to assemble it call-by-call.

When NOT to use the BYO variant

The operator wants a fresh narration generated from a prompt (use the from-scratch path)
The audio is shorter than 5 seconds or longer than 720 seconds (pack rejects; check duration upfront)
The operator has only a description and no audio file at all (use the from-scratch path)

Step 4 — Test prompt

After bootstrapping the agent, run this prompt to verify the workflow fires end-to-end:

Animate: eBPF tracepoint observability lets you watch kernel module
loads without writing a kernel module yourself. Show the trace flow.

(no max-length flag — defaults to 60s)

And a max-length variant:

Animate: How modern Linux kernels detect rootkits via tracepoint
attestation and signed module measurement — explain in depth.

(max length)

Expected behavior: five pack calls (the chain plus the audit-callback). Each subsequent call consumes the prior call's typed output. The 60s prompt should produce podcast.generate with duration_target_min ≈ 1 and hyperframes.compose returning engagement.format: "short_form". The max-length prompt should produce podcast.generate with duration_target_min: 12, hyperframes.compose with duration_seconds: 720 and engagement.format: "long_form" (with chapters + description). The final verify_manifest must report all_present: true, and the operator's report must surface the engagement metadata, not just the MP4 key.

If the model:

skips a pack call,
paraphrases a tool result instead of citing the actual response,
claims all_present: true without showing the verify-manifest call,
or sets a duration value other than 60 (default) or 720 (max-length),

that's a gpt-oss-120b-specific finding worth capturing in the profile YAML's community_traces[] — see docs/howto/add-free-models.md §7 for the contribution path.

Capture an empirical trace

After running both prompts (default + max-length) against the agent, extract a community trace via the helmdeck-trace CLI:

./scripts/helmdeck-trace/helmdeck-trace extract \
  --session ~/.openclaw/agents/<workspace-name>/sessions/<session-id>.jsonl \
  --use-case concept-animator \
  --contributor <your-github-handle> \
  --decision <profile-works|profile-helps-partially|profile-not-enough> \
  --url 'https://github.com/tosin2013/helmdeck/issues/496' \
  --output trace-concept-animator.yaml

The CLI walks the session JSONL, pairs toolCall / toolResult events FIFO, tallies real pack invocations (not text claims), and emits a schema-compliant community_traces[] entry ready to paste into models/openai-gpt-oss-120b-free.yaml. Open a follow-on PR with the appended entry.

What to capture for the empirical trace

For the YAML's community_traces[] entry:

Metric	Notes
`real_pack_calls`	Total real pack invocations across the chain. Expected: 5 (`podcast.generate`, `hyperframes.compose`, `hyperframes.render`, `av.validate`, `artifact.verify_manifest`)
`av_validate_called`	Boolean — did the explicit post-render call fire?
`av_validate_all_passed`	Boolean from `av.validate`'s response. Surface fail + warn checks in the report regardless
`verify_manifest_called`	Boolean — did the audit-callback fire?
`all_present`	Boolean from the `verify_manifest` response. The chain is valid only when `true`
`hallucination_count`	Fake or paraphrased pack-result claims — count them
`simplification_observed`	Did the model take a shortcut? E.g., claiming `video_artifact_key` without rendering. Booleanish
`duration_handling`	"default 60s" / "max 720s" / "drift to other value" — qualitative
`cost_discipline_observed`	Boolean — did the agent pin all FOUR model fields (`podcast.generate` `model` + `metadata_model` + `hyperframes.compose` `model` + `metadata_model`) to the free tier?
`engagement_payload_surfaced`	Boolean — did the agent's final report surface the `engagement` payload from `hyperframes.compose` (title / hashtags / thumbnail_prompt at minimum, plus chapters/description for long_form)? Per PR #500.
`engagement_format_correct`	"short_form" / "mid_form" / "long_form" — confirm the band matched the actual `duration_seconds`

Aim for decision: profile-works when the strict invalidation rules drove the model through all 5 calls, all_present: true came back honestly, and av.validate's findings (pass/fail/warn) were surfaced in the final report.

Why this shape

The Tier C reliability literature (per the model profile YAML + PR #470 + PR #481/#484) is consistent: explicit invalidation rules + audit-callback close the simplification gap on medium-to-long chains where reasoning-only "remember to call X" framing fails. Framing each pack call as part of the success criteria — not as a numbered step the model can skip — is what makes the 5-call chain actually fire.

The final two calls (av.validate + artifact.verify_manifest) are the load-bearing quality + audit gates. av.validate runs faststart / codec-pin / packet-contiguity / loudness / black-run / A/V parity checks against the rendered MP4 (it does NOT run automatically after hyperframes.render — only inside podcast.generate and slides.narrate). verify_manifest then gives the operator a yes/no machine-checkable confirmation that the MP4 actually exists, instead of a model-paraphrased text claim. This is exactly the pattern that closed the gpt-oss baseline failure mode in the original 2026-06-09 trace — extended here with the explicit AV quality gate after the empirical 2026-06-13 finding that the rendered MP4 had a black-run + loudness-out-of-range issue that would have shipped silently otherwise.

Per-model profile: models/openai-gpt-oss-120b-free.yaml
Companion recipe: gpt-oss-120b-slide-narrator.md — same model, single-pipeline call instead of multi-pack chain
BYO-variant prerequisites: Management UI Artifacts upload, builtin.byo-audio-narrated-video, pre-render validation suite (lint / inspect / validate)
Render-deterministic authoring guidance (load this skill in the agent's context for any composition-authoring agent): skills/helmdeck-hyperframes-authoring/SKILL.md
Tracking issue: #496
Pack references: hyperframes.compose, hyperframes.render, podcast.generate
ADR-052 (av.validate Phase 3 default-on integration): docs/adrs/052-av-output-validation-post-step.md
Audit-callback lineage: issues #461 / #471 / #472
Free-model recipe: docs/howto/add-free-models.md

When to use this recipe​

Worked example — Maya, security researcher​

Pre-flight​

Step 1 — Create the workspace​

Step 2 — Configure the model route​

Step 3 — AGENTS.md template​

Variant — bring-your-own audio (v0.29.5+)​

Operator step (out-of-band, not in chat)​

AGENTS.md addendum for the BYO variant​

Test prompt for the BYO variant​

Why this is shorter than the from-scratch chain​

When NOT to use the BYO variant​

Step 4 — Test prompt​

Capture an empirical trace​

What to capture for the empirical trace​

Why this shape​

Related​