Configure a non-OpenRouter LLM provider for helmdeck

A walkthrough for operators wanting to run helmdeck-driven OpenClaw agents against providers OTHER than OpenRouter — HuggingFace Inference Providers, Together AI direct, Groq, Cerebras, SambaNova, or self-hosted vLLM / SGLang / TGI.

This unblocks the #482 community contribution track: submit model profiles + traces for non-OpenRouter routes. See docs/reference/model-profiles-schema.md for the YAML schema and docs/howto/add-free-models.md for the contribution workflow.

Why bypass OpenRouter

The OpenRouter :free pool is congested. Three of five Phase 1 models were unreachable today (2026-06-10) purely because of upstream rate limits:

google/gemma-4-26b-a4b-it:free — 429 from Google AI Studio (per error metadata); affects all google/*:free slugs simultaneously
meta-llama/llama-3.3-70b-instruct:free — 429; error metadata cited "Venice" provider
qwen/qwen3-coder:free — 429; same "Venice" provider attribution

OpenRouter's own rate-limit docs explicitly recommend cross-slug fallback when one free model returns 429. Operators running sustained empirical work hit the wall fast. The alternative routing layers below offer:

Transparent provider selection — HF Inference Providers lets you pick :fastest / :cheapest / :preferred and see which upstream is serving each request
Independent rate-limit pools — Together AI / Groq / Cerebras direct each have their own quotas; bypass shared-pool congestion
Operator-controlled SLA — self-hosted vLLM / SGLang / TGI runs on your infrastructure; you set the limits

Empirical evidence for these patterns is in PRs #481 + #484 (Nemotron baseline-vs-hardened A/B that demonstrated per-use-case AGENTS.md hardening as the workflow-shape lever — provider doesn't change that lesson).

HuggingFace Inference Providers (primary path)

HF Inference Providers at router.huggingface.co/v1 is the OpenAI-compatible router that fronts multiple partner providers (Cerebras, Fireworks, Groq, SambaNova, Together, Novita, Hyperbolic, DeepInfra, Nscale, HF Inference itself). Same chat/completions API surface OpenRouter uses; transparent provider-selection policies.

1. Get an HF API key

Sign up at https://huggingface.co (free tier exists; PRO and Team plans have higher quotas)
Generate an API key at https://huggingface.co/settings/tokens
Save the token — you'll paste it into OpenClaw

2. Configure OpenClaw with the HF endpoint

OpenClaw needs the base URL + API key for the HF router. In OpenClaw's UI:

Open the Models panel → Add Provider
Set:
- Provider type: openai-compatible
- Base URL: https://router.huggingface.co/v1
- API Key: the HF token from step 1
Save the profile

For CLI-based OpenClaw setup, edit ~/.openclaw/openclaw.json:

{
  "agents": {
    "defaults": {
      "models": {
        "huggingface/openai/gpt-oss-120b": {
          "alias": "HF gpt-oss-120b",
          "baseUrl": "https://router.huggingface.co/v1",
          "apiKey": "<your-HF-token>"
        }
      }
    }
  }
}

3. Provider-selection policies

HF Inference Providers exposes three routing policies. Choose by appending the policy to the model ID (or set in OpenClaw's per-agent config):

Policy	Behavior	When to use
`:fastest`	Routes to the lowest-latency partner currently available	Interactive sessions, dev/test loops
`:cheapest`	Routes to the lowest-cost partner	Batch jobs, long-running pipelines
`:preferred`	Uses your account's preferred partner (set in HF settings)	Sustained work with quota planning

Pinning a specific partner is also supported — use the model ID format <repo-id>:<partner> (e.g., openai/gpt-oss-120b:cerebras).

4. Free-tier credit ceiling

The HF Inference Providers free tier is small (writeups quote ~$0.10/month in inference credits — call it ~25k tokens of gpt-oss-120b). PRO and Team plans have substantially larger quotas. For sustained empirical work, PRO is the realistic minimum.

5. Worked example — switch the trace-test agent to HF

The trace-test agent pattern works identically on HF. To run the same three-turn iterative blog-drafter prompt that PR #480 validated on the OpenRouter route:

Set the trace-test agent's model to openai/gpt-oss-120b via the HF provider (configured above)
Run the standard BLOG DRAFT trigger prompt (same one the OpenRouter route used)
Walk the three turns
Run helmdeck-trace extract --session ~/.openclaw/agents/trace-test/sessions/<id>.jsonl --use-case blog-drafter-hf-test --contributor <your-handle> --decision profile-works --url 'https://github.com/tosin2013/helmdeck/issues/482'
Open a PR adding the community_traces[] entry to models/huggingface-openai-gpt-oss-120b.yaml

That's a complete cross-provider A/B contribution: same model, same prompt, different routing layer. The empirical data tells operators whether HF route reliability matches OpenRouter for gpt-oss.

Together AI / Groq / Cerebras / SambaNova direct

All four expose OpenAI-compatible chat-completions endpoints. The OpenClaw setup is identical to the HF pattern above with different base URLs + API keys.

Provider	Base URL	Auth docs
Together AI	`https://api.together.xyz/v1`	https://docs.together.ai/docs/quickstart
Groq	`https://api.groq.com/openai/v1`	https://console.groq.com/docs/quickstart
Cerebras	`https://api.cerebras.ai/v1`	https://inference-docs.cerebras.ai/quickstart
SambaNova	`https://api.sambanova.ai/v1`	https://docs.sambanova.ai/cloud/docs/get-started/overview

Each has its own free-tier policy:

Together AI: new-account credits reported variously as $5 or $25 depending on the program. Per the rate-limit docs, dynamic rate limits adjust with usage. Good for prototyping; check x-ratelimit-reset headers for sustained work.
Groq: free developer tier with daily token limits per model. Hardware-tuned for gpt-oss-120b.
Cerebras: free developer tier; also tuned for gpt-oss-120b on their CS-3 hardware.
SambaNova: free tier with model-specific limits.

When you contribute a profile for one of these providers, name the YAML accordingly:

models/together-meta-llama-llama-3.3-70b-instruct.yaml
models/groq-openai-gpt-oss-120b.yaml
models/cerebras-openai-gpt-oss-120b.yaml
models/sambanova-openai-gpt-oss-120b.yaml

Reuse the existing OpenRouter sibling profile's prompting guidance (model behavior is provider-agnostic); the deltas are provider:, model:, context_window_notes:, and the empirical sections.

Self-hosted (vLLM / SGLang / TGI / Ollama)

For operators running their own inference server, use provider: custom in the YAML and set endpoint_base_url to your OpenAI-compatible endpoint.

vLLM example

# Start vLLM with OpenAI-compatible API
vllm serve openai/gpt-oss-120b \
  --host 0.0.0.0 --port 8000 \
  --tool-call-parser qwen3_coder  # for tool-call parsing

In OpenClaw config:

{
  "models": {
    "custom/openai/gpt-oss-120b": {
      "baseUrl": "http://localhost:8000/v1",
      "apiKey": "not-required"
    }
  }
}

In the YAML profile:

provider: custom
model: openai/gpt-oss-120b
endpoint_base_url: http://localhost:8000/v1
tool_parser: qwen3_coder

Tool-parser configuration

Some models require a specific tool-call parser to translate the model's tool-call format into the OpenAI deltas the OpenClaw harness expects. Per the models/nvidia-nemotron-3-super-120b-a12b-free.yaml profile, Nemotron-3 Super uses qwen3_coder parser across vLLM / SGLang / TRT-LLM. The Nvidia developer forum has the definitive thread on this — symptom is the model emitting plain-text <tool_call> XML instead of proper toolCall deltas; resolution per the thread is "Native fixed it" (native client-side parsing, not vLLM's).

Set the tool_parser: field in the YAML to the parser your inference engine uses so future operators don't guess.

What community traces look like across providers

The community_traces[] schema is identical regardless of provider:. The helmdeck-trace CLI extracts the same metric_summary structure from any OpenClaw session jsonl. No provider-specific tooling needed — the CLI just reads the session file structure that OpenClaw writes.

Anonymization rule stays the same per the standing memory rule: agent / workspace names redacted to sanitized labels ("Tier C agent on openai/gpt-oss-120b, three-turn iterative workflow"); contributor GitHub handle is fine in the contributor: field.

Submission methodology

Same shape as docs/howto/add-free-models.md § 7:

Set up the agent on your preferred routing layer (HF / Together / self-hosted) following the section above
Optional: copy + adapt an existing per-model AGENTS.md recipe from docs/howto/per-model-agents/ for the prompting shape that matches your model
Run the workflow — the standard three-turn iterative blog-drafter pattern OR your own use case
Capture the session jsonl at ~/.openclaw/agents/<your-agent>/sessions/<id>.jsonl

Extract metrics:

./scripts/helmdeck-trace/helmdeck-trace extract \
  --session ~/.openclaw/agents/<agent>/sessions/<id>.jsonl \
  --use-case <label> \
  --contributor <gh-handle> \
  --decision <profile-works | profile-helps-partially | profile-not-enough | no-profile-needed> \
  --url <PR-or-issue-url>

Submit a PR:
- If a profile for your (model × provider) combo already exists, add the trace to its community_traces[] array
- If no profile exists yet, create one following docs/reference/model-profiles-schema.md and seed it with your trace

Verification

After setup, verify the routing works:

# Test the OpenAI-compatible endpoint directly
curl -X POST <base-url>/chat/completions \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<model-id>",
    "messages": [{"role": "user", "content": "Reply with the single word: ok"}]
  }'

Expected response includes {"choices": [...]} with the model's reply. If you get 401, the API key is wrong; if you get 404, the base URL or model ID is wrong; if you get a CORS error, the request is hitting browser security limits — use a server-side call.

If the curl works but OpenClaw still routes through OpenRouter, the per-agent model config in ~/.openclaw/openclaw.json needs updating to point at the new provider profile (see step 2 above).

Schema reference: docs/reference/model-profiles-schema.md
Contribution workflow: docs/howto/add-free-models.md § 7
Trace extraction CLI: scripts/helmdeck-trace/README.md
Per-model recipe pattern: docs/howto/per-model-agents/gemma-4-iterative-workflow.md
Tier classification: docs/reference/models.md
First HF template: models/huggingface-openai-gpt-oss-120b.yaml
Parent issue: #464 (per-model profile library)
HF community track: #482
Empirical motivation: PR #481 + PR #484 (Nemotron baseline-vs-hardened A/B)

Why bypass OpenRouter​

HuggingFace Inference Providers (primary path)​

1. Get an HF API key​

2. Configure OpenClaw with the HF endpoint​

3. Provider-selection policies​

4. Free-tier credit ceiling​

5. Worked example — switch the trace-test agent to HF​

Together AI / Groq / Cerebras / SambaNova direct​

Self-hosted (vLLM / SGLang / TGI / Ollama)​

vLLM example​

Tool-parser configuration​

What community traces look like across providers​

Submission methodology​

Verification​

Related​