Cost estimation, model routing, and compilation across every major LLM provider. All deterministic, all offline.
Every LLM API call costs money — and the currency is tokens.
When you send a prompt to an AI model, it doesn't read words — it reads tokens. A token is roughly ¾ of a word. The sentence "Write a blog post about AI trends" is 7 words but ~9 tokens.
You pay for input tokens (what you send) and output tokens (what the model replies). Output tokens typically cost 3–5× more than input. That's why a long, rambling prompt that generates a long response costs significantly more than a focused one.
A developer asks an AI to review 200 lines of code. Here's the token breakdown.
"Review this authentication module for security issues. Here's the code…"
+ 200 lines of code pasted below
| Model | Tier | Relative efficiency |
|---|---|---|
| Claude Opus | Top | 1× (baseline) |
| Claude Sonnet | Mid | 5× more efficient |
| Claude Haiku | Small | 19× more efficient |
| Gemini Flash | Small | 79× more efficient |
Same prompt, same quality output — the difference is which model runs it
The insight: Most teams send every prompt to their strongest model — even simple tasks that a smaller, faster model handles equally well. The router analyzes each prompt's complexity and risk, then picks the right-sized model automatically. You get the same quality output with dramatically better token efficiency.
Two levers: use fewer tokens per call, and route each call to the right-sized model.
The multi-stage compression pipeline intelligently strips boilerplate, collapses redundancy, and removes irrelevant content — while protecting code blocks, tables, and important structure. Typical reduction: 20–40% fewer tokens per call.
Not every prompt needs the most powerful model. The router classifies your task's complexity and risk, then picks the right-sized model for the job. A simple code review doesn't need Opus-level reasoning — Haiku handles it just as well, 19× more efficiently.
Without optimizer
With optimizer
The router picks the right model for your task — here's what each costs
Claude family — best for structured reasoning and XML compilation
| Model | Input | Output | Tier |
|---|---|---|---|
| Claude Haiku | $0.80 | $4.00 | Small |
| Claude Sonnet | $3.00 | $15.00 | Mid |
| Claude Opus | $15.00 | $75.00 | Top |
per 1M tokens
GPT family — broadest ecosystem and integration support
| Model | Input | Output | Tier |
|---|---|---|---|
| GPT-4o Mini | $0.15 | $0.60 | Small |
| GPT-4o | $2.50 | $10.00 | Mid |
| o1 | $15.00 | $60.00 | Top |
per 1M tokens
Gemini family — competitive pricing with strong multimodal capabilities
| Model | Input | Output | Tier |
|---|---|---|---|
| Gemini 2.0 Flash | $0.10 | $0.40 | Small |
| Gemini 2.0 Pro | $1.25 | $5.00 | Mid/Top |
per 1M tokens
Search-grounded models — auto-selected for research-intent prompts
| Model | Input | Output | Tier |
|---|---|---|---|
| Sonar | $1.00 | $1.00 | Small |
| Sonar Pro | $3.00 | $15.00 | Mid/Top |
per 1M tokens
A 2-step deterministic pipeline picks the right model for every prompt. Zero LLM calls — pure rules.
Routing is shaped by 5 optimization profiles — frozen presets that bundle budget, latency, and quality preferences:
One prompt, three compilation targets. Pick the format that matches your LLM.
<role>Senior engineer</role> <goal>Refactor auth module</goal> <constraints> No breaking changes Keep under 200 lines </constraints>
[SYSTEM] You are a senior engineer. Your goal is to refactor the auth module. [USER] Refactor the auth module. No breaking changes. Keep under 200 lines.
## Role Senior engineer ## Goal Refactor auth module ## Constraints - No breaking changes - Keep under 200 lines
All pricing data reflects current provider rates as of February 2026. Every model’s cost estimate is verified against real pricing and tested for consistency.
Savings are calculated against a standard baseline so you can see exactly how much cheaper a smaller model would be — before you commit.
Model routing and cost estimation are always free (not metered). No API keys required.
Route your next prompt to the right model at the right price. Free to use, no account required.
Get started