`slides.narrate`

The "deck-to-narrated-video" pack. Caller hands in a Marp deck where each slide carries  HTML comments. The pipeline runs entirely server-side:

Marp render — each slide becomes a 1920×1080 PNG.
ElevenLabs TTS — each slide's speaker notes become an MP3 using a vault-stored ElevenLabs key + a chosen voice.
ffmpeg encode — per-slide PNG + per-slide MP3 → per-slide MP4 segment, with optional cross-slide fade.
ffmpeg concat — all segments stitched into one final MP4.
(Optional) LLM metadata synthesis — if metadata_model is set, a frozen system prompt asks the model to generate a YouTube title, description with timestamps, tags, category, and language code, written as a separate JSON artifact.

The pack is async by default — calling tools/call returns a SEP-1686 task envelope immediately; the work runs in the background. SDK clients that speak SEP-1686 surface the eventual result transparently. Otherwise use pack.start / pack.status / pack.result or pass webhook_url + webhook_secret.

Setup prerequisite

The pack runs without the ElevenLabs key (degrades to silent video, has_narration: false), but the typical case wants narration. Add via the Vault panel:

Field	Value
Name	`elevenlabs-key` (exact string)
Type	`api_key`
Host pattern	`api.elevenlabs.io`
Value	Your ElevenLabs API key (`sk_…`)

Get a key from https://elevenlabs.io/app/settings/api-keys. Free tier is 10,000 chars/month — plenty to validate a few decks end-to-end.

Inputs

Field	Type	Required	Default	Notes
`markdown`	`string`	yes	—	Marp deck. Must preserve `---` slide delimiters and `<!-- speaker:notes -->` HTML comments exactly — agent prompts that escape or reformat the markdown will produce broken output. The frontmatter must start `---\nmarp: true\n---`. Custom design (themes, CSS) goes in the markdown's frontmatter — see `slides.render` §"Custom design" for the syntax; the same Marp render is used internally here.
`voice_id`	`string`	no	random from top 5 popular voices	ElevenLabs voice ID. The pack queries `/v1/voices` and picks if unset; falls back to `EXAVITQu4vr4xnSDxMaL` (Rachel) on listing failure.
`model_id`	`string`	no	`"eleven_multilingual_v2"`	ElevenLabs model. `eleven_turbo_v2_5` is faster/cheaper; `eleven_multilingual_v2` handles non-English.
`resolution`	`string`	no	`"1920x1080"`	Video resolution. Smaller = lower memory (try `1280x720` if you OOM at 4K).
`fade_ms`	`number`	no	`0`	Cross-slide fade duration in ms. `300`–`500` looks polished.
`default_slide_duration`	`number`	no	`5.0`	Seconds of silence for slides without speaker notes.
`metadata_model`	`string`	no	—	Provider/model for YouTube metadata (e.g., `openrouter/openai/gpt-4o-mini`). When unset, no `metadata_artifact_key` is returned.
`webhook_url`	`string`	no	—	Push the result to this URL on completion (sync alternative to polling).
`webhook_secret`	`string`	no	—	HMAC signature secret for the webhook callback.

Outputs

Field	Type	Notes
`video_artifact_key`	`string`	`slides.narrate/<rand>-deck.mp4`. Resolve via `/api/v1/artifacts/<key>`.
`video_size`	`number`	Bytes. Capped at 256 MiB.
`slide_count`	`number`	Number of slides rendered.
`total_duration_s`	`number`	Cumulative video length, post-TTS — the authoritative timing after ElevenLabs has actually synthesized.
`has_narration`	`boolean`	`true` if TTS succeeded; `false` if the ElevenLabs key was missing or the API errored on every slide.
`voice_used`	`string`	Voice ID that narrated. Empty when `has_narration: false`.
`metadata_artifact_key`	`string`	Present only when `metadata_model` was set. JSON file with the YouTube metadata.
`metadata`	`object`	Same content as `metadata_artifact_key`'s JSON, inline for convenience: `{title, description, tags, category, language}`.

Vault credentials needed

elevenlabs-key — type api_key, host pattern api.elevenlabs.io. Optional — without it the pack still ships an MP4, just silent.

Use it from your agent (OpenClaw chat-UI worked example)

📌 The transcript below shows the narrated path (has_narration: true) — the elevenlabs-key is in the vault, ElevenLabs synthesized 2 slides of speech, and ffmpeg encoded them into a 199 KB MP4. The same prompt without the key in the vault produces a silent 47 KB MP4 (has_narration: false); the silent-fallback transcript was the original capture for this page. The transcript is also a clean reference for the async polling pattern (pack.start → pack.status × N → pack.result).

Prompt (sent in OpenClaw chat UI / openclaw-cli agent):

Use helmdeck__slides-narrate with this 2-slide deck: "---\nmarp: true\n---\n# Helmdeck\n\n\n---\n\n# Thanks\n" and model_id=eleven_turbo_v2_5. Tell me the video_artifact_key, slide_count, total_duration_s, and whether has_narration is true.

Tool call (26 calls, no failures):

{
  "name": "helmdeck__slides-narrate",
  "arguments": {
    "markdown": "---\nmarp: true\n---\n# Helmdeck\n<!-- speaker:notes Welcome to a quick demo of the slides.narrate pack. -->\n\n---\n\n# Thanks\n<!-- speaker:notes See you next time. -->",
    "model_id": "eleven_turbo_v2_5"
  }
}