Configure LLM providers
Helmdeck's /v1/chat/completions is OpenAI-compatible, but the real configuration surface lives at /api/v1/providers/keys. This page covers registering keys, routing model IDs to providers, and reading the success-rate panel that tells you when a provider is flapping.
For the architectural shape (request flow, fallback semantics), see Architecture overview §2.b and ADR 005.
Prerequisites
- A running helmdeck stack with
HELMDECK_KEYSTORE_KEYset (32 hex bytes; distinct fromHELMDECK_VAULT_KEY— separate encryption keys for separate purposes per ADR 007) - A JWT with
providers:*scope - An API key from at least one upstream provider
JWT="<your helmdeck JWT>"
HELMDECK_URL="${HELMDECK_URL:-http://localhost:3000}"
Supported providers
Each provider has a stable identifier — that's the prefix you'll use in model IDs (<provider>/<model>):
| Provider id | Upstream | Auth shape |
|---|---|---|
anthropic | api.anthropic.com | x-api-key header |
openai | api.openai.com | Authorization: Bearer |
gemini | generativelanguage.googleapis.com | ?key=… query param |
openrouter | openrouter.ai | Authorization: Bearer |
ollama | (your Ollama URL) | None — local, network-reachable |
deepseek | api.deepseek.com | Authorization: Bearer |
groq | api.groq.com (community adapter, v0.8.1+) | Authorization: Bearer |
mistral | api.mistral.ai (community adapter, v0.8.1+) | Authorization: Bearer |
A provider must be registered (key present) and have at least one model exposed before it appears in GET /v1/models.
Register a provider key
curl -fsS -X POST "$HELMDECK_URL/api/v1/providers/keys" \
-H "Authorization: Bearer $JWT" \
-H 'Content-Type: application/json' \
-d '{
"provider": "openrouter",
"label": "primary",
"key": "sk-or-v1-…"
}' | jq .
Response: a record with id (UUID), provider, label, created_at. The key itself is encrypted with HELMDECK_KEYSTORE_KEY and never echoes back.
The gateway hot-reloads after the create — the new key is immediately usable on /v1/chat/completions without restarting the control plane.
Worked examples for the common providers
# Anthropic
-d '{"provider":"anthropic","label":"primary","key":"sk-ant-…"}'
# OpenAI
-d '{"provider":"openai","label":"primary","key":"sk-…"}'
# Gemini (use the API key from aistudio.google.com)
-d '{"provider":"gemini","label":"primary","key":"AIza…"}'
# OpenRouter (recommended default — exposes 200+ models behind one key)
-d '{"provider":"openrouter","label":"primary","key":"sk-or-v1-…"}'
# Ollama (local, no key needed but the registration is still required for routing)
-d '{"provider":"ollama","label":"local","key":"http://host.docker.internal:11434"}'
# ^ for Ollama, the "key" field is the base URL of your Ollama server
For Groq and Mistral (community adapters), the same shape:
# Groq (community, v0.8.1+)
-d '{"provider":"groq","label":"primary","key":"gsk_…"}'
# Mistral (community, v0.8.1+)
-d '{"provider":"mistral","label":"primary","key":"…"}'
List, get, rotate, test, delete
# List all registered keys (provider + label + last-used; never the secret)
curl -fsS -H "Authorization: Bearer $JWT" "$HELMDECK_URL/api/v1/providers/keys" | jq .
# Get one
curl -fsS -H "Authorization: Bearer $JWT" "$HELMDECK_URL/api/v1/providers/keys/$ID" | jq .
# Rotate — replaces the secret in place; the key id stays stable so any
# config referencing it (label-based fallback rules, etc.) keeps working
curl -fsS -X POST -H "Authorization: Bearer $JWT" \
-H 'Content-Type: application/json' \
-d '{"key":"<new-secret>"}' \
"$HELMDECK_URL/api/v1/providers/keys/$ID/rotate" | jq .
# Test — fires a tiny no-op chat completion against the upstream to
# verify the key works. Returns success/failure + provider-side error.
curl -fsS -X POST -H "Authorization: Bearer $JWT" \
"$HELMDECK_URL/api/v1/providers/keys/$ID/test" | jq .
# Delete
curl -fsS -X DELETE -H "Authorization: Bearer $JWT" \
"$HELMDECK_URL/api/v1/providers/keys/$ID"
Use the gateway
Any OpenAI SDK or curl works:
curl -fsS -X POST "$HELMDECK_URL/v1/chat/completions" \
-H "Authorization: Bearer $JWT" \
-H 'Content-Type: application/json' \
-d '{
"model": "openrouter/anthropic/claude-haiku-4.5",
"messages": [{"role":"user","content":"What is 2+2?"}]
}' | jq .
Model IDs follow provider/model:
openrouter/anthropic/claude-haiku-4.5→ routed toopenrouter, forwarded asanthropic/claude-haiku-4.5anthropic/claude-sonnet-4.6→ routed directly toanthropicollama/llama3.1:8b→ routed to your local Ollamagemini/gemini-3-flash→ routed to Gemini
GET /v1/models lists everything currently routable across all registered providers — combine its output with your registered keys to build a model picker UI.
Streaming works end-to-end ("stream": true in the body, SSE chunks back).
Configure fallback rules
Today's fallback layer is pluggable but not REST-configurable — fallback chains are wired in code via internal/gateway/fallback.go and consumed by the gateway at startup. To add or change a chain you edit the config file the control plane reads (typically deploy/compose/gateway-fallback.json or set inline via HELMDECK_GATEWAY_FALLBACK_JSON).
The chain shape:
[
{
"primary": "openrouter/anthropic/claude-haiku-4.5",
"fallbacks": [
"openrouter/openai/gpt-4o-mini",
"anthropic/claude-haiku-4.5"
],
"triggers": ["rate_limit", "timeout", "error"]
}
]
Three trigger types from the closed set:
| Trigger | Fires on |
|---|---|
rate_limit | HTTP 429 from the upstream |
timeout | Request-context deadline hit before a response |
error | Any other non-timeout, non-429 failure (5xx, network, auth, decode) |
Empty triggers slice means advance on anything. Fallbacks are tried in order; the first that succeeds wins; the chain bails after the last fallback fails.
Each attempt (primary + every fallback hop) writes a row to the provider_calls table with fallback_used: 1 for every non-primary attempt — so you can query how often a primary is degrading enough to trip the chain (see Inspect audit logs).
A REST surface for fallback rules is on the roadmap — until then, edit the JSON config and restart the control plane.
Read the success-rate panel
The Management UI's AI Providers → Model Success Rates tab shows a rolled-up view of provider_calls per (provider, model) over a configurable time window — success count, fail count, average latency, p95 latency. This is T607's deliverable.
The same data is available over REST:
curl -fsS -H "Authorization: Bearer $JWT" \
"$HELMDECK_URL/api/v1/providers/stats?since=24h" | jq .
Returns one row per (provider, model) with success/fail counts and latency rollups. Useful for ops dashboards or external monitoring.
For raw access (custom queries, exports for compliance), drop to SQLite:
docker compose -f deploy/compose/compose.yaml exec control-plane \
sqlite3 /var/lib/helmdeck/helmdeck.db \
"SELECT provider, model, COUNT(*) AS n,
SUM(CASE WHEN status='success' THEN 1 ELSE 0 END) AS ok,
AVG(latency_ms) AS avg_ms
FROM provider_calls
WHERE ts >= datetime('now','-1 day')
GROUP BY provider, model
ORDER BY n DESC;"
Helmdeck-as-LLM-gateway for your agents
Three integration patterns:
- Hermes Agent — point Hermes'
base_urlathttp://localhost:3000/v1. Helmdeck observes every chat completion + every MCP tool call. See Hermes integration. - Custom OpenAI-SDK code — set
OPENAI_BASE_URL=http://localhost:3000/v1andOPENAI_API_KEY=<helmdeck JWT>. Works with any OpenAI Python/Node SDK. - Claude Code / Claude Desktop / OpenClaw / Gemini CLI — these clients hard-wire to their respective upstreams (Anthropic, Google) and don't expose an OpenAI-compatible base-URL escape hatch. Helmdeck only sees their MCP tool calls, not their chat completions.
Common pitfalls
- Wrong key shape — pasting a Gemini key into an OpenAI registration silently registers it; the failure surfaces only when a chat call returns 401 from the upstream. Use
POST /api/v1/providers/keys/{id}/testafter registering to verify. - Model ID typo —
claude-haiku-4vsclaude-haiku-4.5is a 400 from the upstream, not a 404 from helmdeck. Helmdeck doesn't validate model IDs against the upstream catalog (some providers add models faster than we could whitelist them). - Provider key for an unregistered provider —
POST /api/v1/providers/keyswithprovider: "groq"on a build that doesn't include the Groq adapter returns 400. Confirm the adapter is in the build withGET /v1/models(Groq models will appear if the adapter is present). - Forgot
HELMDECK_KEYSTORE_KEY— auto-generated keys appear in dev mode but a fresh one is generated on every restart, so registered keys become unrecoverable across restarts. SetHELMDECK_KEYSTORE_KEYexplicitly for any sustained install.
Known limitations
- Fallback rules are not REST-configurable — JSON config + restart only. REST CRUD for fallback chains is on the v1.x roadmap.
- No multi-key load balancing — registering two keys for the same provider (label "primary" + "secondary") doesn't load-balance. The first match wins; the second is reserve.
- No quota / rate-limit awareness inside helmdeck — the gateway forwards verbatim and only fails over on the upstream's response. Helmdeck doesn't pre-emptively throttle to stay under a quota.
- Master-key rotation requires manual re-registration — rotating
HELMDECK_KEYSTORE_KEYinvalidates all stored keys. Rotation tooling is part of the same track as #110 (vault key rotation), but for the keystore.
Related
- Architecture overview §2.b — LLM gateway flow — the request-flow diagram for chat completions
- ADR 005 — OpenAI-compatible multi-provider AI gateway — design rationale, fallback semantics, provider adapter shape
- Inspect audit logs — query patterns for
provider_calls(the table behind the success-rate panel) - Manage credentials in the vault — for non-LLM credentials (GitHub tokens, Ghost keys, ElevenLabs, etc.) which use the separate credential vault