Skip to main content

Why a $0.10 model can do work that needs a $3 model

· 6 min read
Tosin Akinosho
Helmdeck maintainer

⚠️ These are my findings, not a vendor benchmark. I ran them on one helmdeck install, with a specific set of prompts, against a few specific competitor stacks. Your numbers will probably differ. The recipe to reproduce is at the bottom — if your numbers disagree, please share and I'll update this page.

Today's helmdeck install ran a 6-step Phase 5.5 code-edit loop on gpt-oss-120b for $0.07 total — clone a repo, read a file, apply a one-line patch, run tests, commit, push. The same loop on Cursor / Claude Code direct via Sonnet would have run $0.30+. Same outcome; ~5× cost gap.

That's not unusual. Here's what I see across five common workflows: