glamworks glamworks
The intelligence wars are over

The context wars
have begun.

The best open models now beat the frontier at the center of the distribution — which is most work. The fight has moved to who owns the context that makes any model useful. glamfire is the open harness for that fight: own your context, route your intelligence, never rent your company's brain back.

# one command, real CLI, real routing $ npm install -g glamfire $ glam route "refactor this module and add tests" # → routes to the cheapest capable model, shows $ saved vs always-frontier
what is this, concretely?

A command-line agent you point at real work.

$ glam run "read this repo and write a CHANGELOG.md from the git history" --max-usd 0.05

Same shape of tool as Claude Code or opencode: it plans, calls real tools (read/write/edit files, search code, read git, run allowed commands), observes the results, iterates, and stops when the work is done — or when its budget is hit. glamfire authored its own CHANGELOG.md this way — the PR merged with human review as the gate — and a faithful re-run of that task costs about two cents. The difference is everything wrapped around that loop:

  1. It picks the model per task, not per subscription. The router scores each task center-vs-edge and sends it to the cheapest model that can actually do it — open weights for the routine 80%, frontier only when confidence says the cheap model can't hold it. glam route "<task>" shows you the decision, offline, before any money moves.
  2. It bills like a meter, not a faith commitment. --max-usd is a hard ceiling that genuinely stops a run mid-task — checked every turn, honest partial cost on interrupt. Every run lands in a local ledger; glam usage shows spend by day, model, and provider, with monthly budget warnings.
  3. Your context is a file on your disk. The brain is SQLite you own — exportable to human-readable JSONL and back, bit-exact, tested. The memory that makes an agent good at your work stays yours when the model underneath changes.
  4. Models are swappable parts. Each model family gets a conformance-tested adapter — the per-model tuning that normally makes migration a rewrite is done once, in the open, gated by tests. Adding DeepSeek V4 to your routing was one TOML line, not a migration.
  5. It watches the market so you don't. glam models is a live catalog of top open-weight models across respected US hosts — real prices with as-of dates; --refresh pulls current prices and calls out drops.
from the 98% problem to the context wars

Intelligence got cheap. Your context is what they want now.

Intelligence got roughly 98% cheaper. Open models caught the frontier on the broad middle of everyday work. But a model is a brain in a jar — what your company actually runs is the harness: the context it sees, the routing that picks it, the tool-calls shaped to its grammar, the surfaces where work happens. Switching models means rebuilding all of that, so most teams sign a frontier contract instead.

And now the frontier labs are making the next move: putting their assistant inside your team chat, where it quietly accumulates the messy, uncodified context that is your actual edge. Once a vendor's model is that close to your context, it doesn't matter how cheap the open models get — you can't rip it out. That's the failure mode of this decade: renting your company's brain back from a frontier lab.

There's a second, newer reason to own the harness: continuity. 2026 has already shown that the model you build on can be forced offline for weeks, restricted to approved partners, or repriced overnight. The teams that shrugged were the ones that never tied their work to a single model — they owned their harness, routed somewhere else, and kept moving. No single model's outage, ban, or price hike should ever stall your work.

the landscape · july 2026

Open models won the center. Frontier must earn the edge.

GLM 5.2 (MIT license, 1M-token context) is the #1-ranked open-weight model on the Artificial Analysis Intelligence Index, and beats frontier flagships on real-work coding benchmarks like SWE-bench Pro — at roughly a fifth to a sixth of frontier cost. That's not "good enough": it's the best model in the world at center-of-distribution work — which by definition is most of your work. Frontier models still win the messy, novel edge; they should earn that escalation, task by task, not collect rent on your whole workload.

Model (open weights)Served by$/1M in · outGood at
DeepSeek V4 FlashFireworks (FP8)0.14 · 0.28cheapest capable 1M-context tier
MiniMax M3Fireworks (FP8)0.30 · 1.20cheap agentic + multimodal work
Kimi K2.7 CodeFireworks (FP8)0.95 · 4.00long autonomous coding sessions
GLM 5.2 — default workhorseFireworks (FP8)1.40 · 4.40agentic coding, design, long-horizon tasks
DeepSeek V4 ProFireworks (FP8)1.74 · 3.48open escalation tier, frontier-class reasoning

Prices from provider pricing pages, accessed 2026-07-03; open-model prices are decaying in weeks, not quarters (GLM 5.2 was undercut within three weeks of launch). Quantization matters: FP4 routes are cheaper but measurably worse for coding — glamfire's catalog records quant per endpoint and defaults coding to FP8. Sources and the full cited brief (research/25) live in the repo's research base.

The point isn't any one model — the winners change monthly. Frontier-class open weights plus cheap, respected on-demand inference means intelligence is now a commodity market — and what nobody hands you is the buyer's side of it: picking, switching, and escalating between models must not be your job. Companies aren't switching models; they're routing across several. That routing layer — plus the owned context underneath it — is the harness. Model choice must never become work.

glamfire is opinionated about the split: your context lives on your disk; your inference is rented on demand from trusted clouds — the fire in the name is Fireworks-class serverless GPUs, not your laptop. Most teams don't own AI inference hardware and shouldn't need to: frontier-class open models are 400B–1.6T-parameter MoEs, and renting them FP8 costs cents. Self-hosting via vLLM is a supported escape hatch, not the default. Local-first describes your data, not your GPUs.

the product · glamfire

One open harness. Every model. Your context.

A model-agnostic, agent-agnostic harness that closes the last mile — so switching models is a config change, not a rewrite of your work system.

open brain

Your context, local-first and portable — SQLite on your disk, exportable to human-readable JSONL and back, bit-exact. Owned, never uploaded, never rented back.

router

Scores each task center ↔ edge and sends it to the cheapest capable model; frontier gets the task only when it earns it. Shows you the decision and the $ saved.

adapters

A conformance-tested harness per model family (GLM 5.2/Fireworks first). The per-model tuning that makes migration a rewrite — deleted.

open engine

The agent loop: plan → act → observe, real tool dispatch, least-privilege permissions, sandboxing, hard cost budgets that actually stop mid-task.

open skills

Portable capability packs that travel across models unchanged.

team harness

Self-hosted Slack/Discord/HTTP surface — the open answer to renting your team's context to a lab. The knowledge stays in your store.

Trust is mechanical here, not vibes: least-privilege permissions (read → ask → deny), budget ceilings that genuinely halt a run, and verification the way a human would do it. Don't trust an agent because it sounds confident — trust what you can inspect.

where it fits

Next to the tools you already use.

Model-agnosticism is table stakes in 2026. What nobody ships as one product: automatic center/edge cost-routing with earned escalation, an owned, portable context layer guaranteed by test, and conformance-tested adapters that make model migration a config change. Here's how that lands next to what's on your machine today.

You useIt isglamfire, next to it
Claude Codethe best frontier coding agentKeep it for the hard edge. glamfire routes the routine center of your workload to open models at a fifth to a thirtieth of the price, with frontier as an earned escalation — plus a hard per-run budget stop no frontier-lab agent ships, and a spend ledger in a file you own.
opencode & other OSS agentsagent CLIs with per-agent modelsThere you assign models to agents and switch by hand. glamfire decides per task, automatically — price × capability × confidence — and switching families is conformance-tested, not vibes.
Ollama / vLLMrun open weights yourselfA model server is not a work system. glamfire is the loop + routing + ledger on top — rent the same weights FP8 serverless when your laptop can't hold a 753B MoE (a local-endpoint adapter is specified and in build).
OpenRouterhosted gateway: one key, 400+ models, auto-routerOne key and an auto-router per prompt — but a hosted middleman: every request and your spend metadata transit their gateway. glamfire goes direct to providers you choose, with the loop, owned local context, per-run hard budget stops, and the ledger on your disk.
A single open model (Hermes, GLM, DeepSeek…)a frontier-class brain, freeA brain in a jar. glamfire is the jar-opener: the harness that turns raw weights into a working, budgeted, tool-using agent — and lets you swap the brain later.
Gooseconfig-driven multi-model OSS agentClosest cousin, honestly — it ships lead/worker multi-model by config. glamfire's wedge: automatic per-task routing with earned escalation, a portable context layer guaranteed by test, and per-model conformance gates.

Honesty moat: we publish current reality with every release and show how each claim was verified. What we say works, works.

install

One command. Any OS.

# npm (Node ≥ 22) — provides the `glam` command $ npm install -g glamfire # macOS / Linux — Homebrew $ brew install glamworks/tap/glamfire # Windows — Scoop $ scoop bucket add glamworks https://github.com/glamworks/scoop-bucket $ scoop install glamfire # then $ glam --version $ glam doctor

npm, Homebrew, and Scoop are live and publish on every release; single-file binaries for macOS / Windows / Linux (arm64 + x64, checksummed, sigstore-signed) are on GitHub Releases. winget is submitted and pending Microsoft's community review. glam route works offline with no API key; glam run needs a Fireworks key (glam doctor will tell you).

five things to do with it this week

Real commands, real ceilings, real receipts.

  1. Halve your coding-agent bill without firing Claude. Send the routine work — changelogs, dep bumps, repo explanations, first-pass docs — through glam run at open-model prices; keep your frontier subscription for the tasks that deserve it. The ledger shows what you actually saved.
    $ glam run "explain this repo to a new contributor, write docs/ONBOARDING.md" --max-usd 0.05
  2. Put a real ceiling on an agent. The meter stops the run when it hits the cap — not a warning, a stop. Ctrl-C aborts the in-flight request and prints the honest partial cost.
    $ glam run "audit the TODOs and file them as issues-draft.md" --max-usd 0.10
  3. Meter a team. Every run is a line in ~/.glam/usage.jsonl — spend broken down by day, model, and provider. Set [usage] monthlyBudgetUsd in glam.toml and get warned at 80%.
    $ glam usage
  4. Read the market in one command. The current open-weight landscape with real prices and as-of dates; --refresh diffs live provider prices and flags drops.
    $ glam models --sort price $ glam models --refresh
  5. Fire-drill your continuity. Add a routing rule that prefers DeepSeek V4, and prove to yourself the same task completes when your primary provider is down. 2026 already showed frontier access can vanish for weeks — the teams that shrugged owned their routing.
    # glam.toml — matches every task; first match wins [[routing.rules]] candidates = ["accounts/fireworks/models/deepseek-v4-pro"] # then watch the router pick it — offline, no key $ glam route "refactor this module and add tests" # → chosen model: accounts/fireworks/models/deepseek-v4-pro
current reality

What runs today

We state plainly what is real — this section mirrors the repo's honesty contract and updates with every release.

glam run — the real agent loop, live-verified against GLM 5.2 on Fireworks: streaming, real tool dispatch (file read/write/edit, code search, read-only git, sandboxed commands), permission gate, and a hard cost budget that stops mid-task.
glam route — center/edge cost routing with a declarative policy engine and verified escalation cascade; offline dry-run with a distribution report ($ saved vs always-frontier), no key needed.
open brain — owned context store (SQLite + vectors + full-text), hybrid retrieval, and a tested export→import ownership invariant: your store round-trips to human-readable JSONL, bit-exact.
Adapters + conformance suite — fireworks-glm live-verified; Anthropic and Together (GLM 5.2 + Qwen3-Coder) green on the same conformance battery, live calls pending keys. Cross-provider cheap→frontier escalation is wired and cost-compared.
Installable everywhere — npm + Homebrew + Scoop live, signed binaries for 5 OS/arch targets, SBOM; CI green on macOS, Windows, Linux.
Built with itself — glamfire authored and merged its own pull request to its own repo, driven by GLM 5.2, with human review as the gate. A self-hosting CI job runs glamfire-on-glamfire on every push.
In active build (lock-step, no shims): team harness · SDK · server/daemon mode · Docker image · task receipts & distribution profiling.

If a capability is partial, we say so. A feature is DONE only when a real human can use it, verified the way a human would. We do not market vaporware.

build with us

Harness talent is the scarcest resource in AI. Build the open one.

Every company on earth needs this layer; almost none can hire for it. If you can reason about routing, context, tool-calls, or model adapters, your work here compounds for everyone who refuses to rent their brain back. The bar: real full-stack mini-features, no shims, verified the way a human would.

  1. Read the spec and CONTRIBUTING.
  2. Grab a good first issue.
  3. Branch, build the real thing, verify it, open a PR (DCO sign-off).
  4. Bigger ideas → open an RFC issue.

Browse issues → Join the discussion