glamworksThe best open models now beat the frontier at the center of the distribution — which is most work. The fight has moved to who owns the context that makes any model useful. glamfire is the open harness for that fight: own your context, route your intelligence, never rent your company's brain back.
Same shape of tool as Claude Code or opencode: it plans, calls real tools (read/write/edit files, search code, read git, run allowed commands), observes the results, iterates, and stops when the work is done — or when its budget is hit. glamfire authored its own CHANGELOG.md this way — the PR merged with human review as the gate — and a faithful re-run of that task costs about two cents. The difference is everything wrapped around that loop:
Intelligence got roughly 98% cheaper. Open models caught the frontier on the broad middle of everyday work. But a model is a brain in a jar — what your company actually runs is the harness: the context it sees, the routing that picks it, the tool-calls shaped to its grammar, the surfaces where work happens. Switching models means rebuilding all of that, so most teams sign a frontier contract instead.
And now the frontier labs are making the next move: putting their assistant inside your team chat, where it quietly accumulates the messy, uncodified context that is your actual edge. Once a vendor's model is that close to your context, it doesn't matter how cheap the open models get — you can't rip it out. That's the failure mode of this decade: renting your company's brain back from a frontier lab.
There's a second, newer reason to own the harness: continuity. 2026 has already shown that the model you build on can be forced offline for weeks, restricted to approved partners, or repriced overnight. The teams that shrugged were the ones that never tied their work to a single model — they owned their harness, routed somewhere else, and kept moving. No single model's outage, ban, or price hike should ever stall your work.
GLM 5.2 (MIT license, 1M-token context) is the #1-ranked open-weight model on the Artificial Analysis Intelligence Index, and beats frontier flagships on real-work coding benchmarks like SWE-bench Pro — at roughly a fifth to a sixth of frontier cost. That's not "good enough": it's the best model in the world at center-of-distribution work — which by definition is most of your work. Frontier models still win the messy, novel edge; they should earn that escalation, task by task, not collect rent on your whole workload.
| Model (open weights) | Served by | $/1M in · out | Good at |
|---|---|---|---|
| DeepSeek V4 Flash | Fireworks (FP8) | 0.14 · 0.28 | cheapest capable 1M-context tier |
| MiniMax M3 | Fireworks (FP8) | 0.30 · 1.20 | cheap agentic + multimodal work |
| Kimi K2.7 Code | Fireworks (FP8) | 0.95 · 4.00 | long autonomous coding sessions |
| GLM 5.2 — default workhorse | Fireworks (FP8) | 1.40 · 4.40 | agentic coding, design, long-horizon tasks |
| DeepSeek V4 Pro | Fireworks (FP8) | 1.74 · 3.48 | open escalation tier, frontier-class reasoning |
Prices from provider pricing pages, accessed 2026-07-03; open-model prices are decaying in weeks, not quarters (GLM 5.2 was undercut within three weeks of launch). Quantization matters: FP4 routes are cheaper but measurably worse for coding — glamfire's catalog records quant per endpoint and defaults coding to FP8. Sources and the full cited brief (research/25) live in the repo's research base.
The point isn't any one model — the winners change monthly. Frontier-class open weights plus cheap, respected on-demand inference means intelligence is now a commodity market — and what nobody hands you is the buyer's side of it: picking, switching, and escalating between models must not be your job. Companies aren't switching models; they're routing across several. That routing layer — plus the owned context underneath it — is the harness. Model choice must never become work.
glamfire is opinionated about the split: your context lives on your disk; your inference is rented on demand from trusted clouds — the fire in the name is Fireworks-class serverless GPUs, not your laptop. Most teams don't own AI inference hardware and shouldn't need to: frontier-class open models are 400B–1.6T-parameter MoEs, and renting them FP8 costs cents. Self-hosting via vLLM is a supported escape hatch, not the default. Local-first describes your data, not your GPUs.
A model-agnostic, agent-agnostic harness that closes the last mile — so switching models is a config change, not a rewrite of your work system.
Your context, local-first and portable — SQLite on your disk, exportable to human-readable JSONL and back, bit-exact. Owned, never uploaded, never rented back.
Scores each task center ↔ edge and sends it to the cheapest capable model; frontier gets the task only when it earns it. Shows you the decision and the $ saved.
A conformance-tested harness per model family (GLM 5.2/Fireworks first). The per-model tuning that makes migration a rewrite — deleted.
The agent loop: plan → act → observe, real tool dispatch, least-privilege permissions, sandboxing, hard cost budgets that actually stop mid-task.
Portable capability packs that travel across models unchanged.
Self-hosted Slack/Discord/HTTP surface — the open answer to renting your team's context to a lab. The knowledge stays in your store.
Trust is mechanical here, not vibes: least-privilege permissions (read → ask → deny), budget ceilings that genuinely halt a run, and verification the way a human would do it. Don't trust an agent because it sounds confident — trust what you can inspect.
Model-agnosticism is table stakes in 2026. What nobody ships as one product: automatic center/edge cost-routing with earned escalation, an owned, portable context layer guaranteed by test, and conformance-tested adapters that make model migration a config change. Here's how that lands next to what's on your machine today.
| You use | It is | glamfire, next to it |
|---|---|---|
| Claude Code | the best frontier coding agent | Keep it for the hard edge. glamfire routes the routine center of your workload to open models at a fifth to a thirtieth of the price, with frontier as an earned escalation — plus a hard per-run budget stop no frontier-lab agent ships, and a spend ledger in a file you own. |
| opencode & other OSS agents | agent CLIs with per-agent models | There you assign models to agents and switch by hand. glamfire decides per task, automatically — price × capability × confidence — and switching families is conformance-tested, not vibes. |
| Ollama / vLLM | run open weights yourself | A model server is not a work system. glamfire is the loop + routing + ledger on top — rent the same weights FP8 serverless when your laptop can't hold a 753B MoE (a local-endpoint adapter is specified and in build). |
| OpenRouter | hosted gateway: one key, 400+ models, auto-router | One key and an auto-router per prompt — but a hosted middleman: every request and your spend metadata transit their gateway. glamfire goes direct to providers you choose, with the loop, owned local context, per-run hard budget stops, and the ledger on your disk. |
| A single open model (Hermes, GLM, DeepSeek…) | a frontier-class brain, free | A brain in a jar. glamfire is the jar-opener: the harness that turns raw weights into a working, budgeted, tool-using agent — and lets you swap the brain later. |
| Goose | config-driven multi-model OSS agent | Closest cousin, honestly — it ships lead/worker multi-model by config. glamfire's wedge: automatic per-task routing with earned escalation, a portable context layer guaranteed by test, and per-model conformance gates. |
Honesty moat: we publish current reality with every release and show how each claim was verified. What we say works, works.
npm, Homebrew, and Scoop are live and publish on every release; single-file binaries for macOS / Windows / Linux (arm64 + x64, checksummed, sigstore-signed) are on GitHub Releases. winget is submitted and pending Microsoft's community review. glam route works offline with no API key; glam run needs a Fireworks key (glam doctor will tell you).
We state plainly what is real — this section mirrors the repo's honesty contract and updates with every release.
If a capability is partial, we say so. A feature is DONE only when a real human can use it, verified the way a human would. We do not market vaporware.
Every company on earth needs this layer; almost none can hire for it. If you can reason about routing, context, tool-calls, or model adapters, your work here compounds for everyone who refuses to rent their brain back. The bar: real full-stack mini-features, no shims, verified the way a human would.