Supported Models — Prompt Control Plane

What are tokens?

Every LLM API call costs money — and the currency is tokens.

When you send a prompt to an AI model, it doesn't read words — it reads tokens. A token is roughly ¾ of a word. The sentence "Write a blog post about AI trends" is 7 words but ~9 tokens.

Your prompt (input tokens)

Write1 a1 blog1 post1 about1 AI1 trends2

= 8 tokens input

You pay for input tokens (what you send) and output tokens (what the model replies). Output tokens typically cost 3–5× more than input. That's why a long, rambling prompt that generates a long response costs significantly more than a focused one.

See how tokens add up

A developer asks an AI to review 200 lines of code. Here's the token breakdown.

📝

The prompt

"Review this authentication module for security issues. Here's the code…" + 200 lines of code pasted below

~1,500 input tokens

~750 output tokens

Same tokens, different efficiency

Model	Tier	Relative efficiency
Claude Opus	Top	1× (baseline)
Claude Sonnet	Mid	5× more efficient
Claude Haiku	Small	19× more efficient
Gemini Flash	Small	79× more efficient

Same prompt, same quality output — the difference is which model runs it

The insight: Most teams send every prompt to their strongest model — even simple tasks that a smaller, faster model handles equally well. The router analyzes each prompt's complexity and risk, then picks the right-sized model automatically. You get the same quality output with dramatically better token efficiency.

More value from every token

Two levers: use fewer tokens per call, and route each call to the right-sized model.

🗜️

Smart compression

The multi-stage compression pipeline intelligently strips boilerplate, collapses redundancy, and removes irrelevant content — while protecting code blocks, tables, and important structure. Typical reduction: 20–40% fewer tokens per call.

🧭

Intelligent routing

Not every prompt needs the most powerful model. The router classifies your task's complexity and risk, then picks the right-sized model for the job. A simple code review doesn't need Opus-level reasoning — Haiku handles it just as well, 19× more efficiently.

Before vs After — same code review task

Without optimizer

Model: Claude Opus (manual pick)

Input: 1,500 tokens (full context)

Output: 750 tokens

Tokens used: 2,250 on a top-tier model

With optimizer

Model: Claude Haiku (auto-routed)

Input: 1,050 tokens (30% compressed)

Output: 750 tokens

Tokens used: 1,800 on the right-sized model 20% fewer

Providers & Models

The router picks the right model for your task — here's what each costs

Anthropic

Claude family — best for structured reasoning and XML compilation

Model	Input	Output	Tier
Claude Haiku	$0.80	$4.00	Small
Claude Sonnet	$3.00	$15.00	Mid
Claude Opus	$15.00	$75.00	Top

per 1M tokens

OpenAI

GPT family — broadest ecosystem and integration support

Model	Input	Output	Tier
GPT-4o Mini	$0.15	$0.60	Small
GPT-4o	$2.50	$10.00	Mid
o1	$15.00	$60.00	Top

per 1M tokens

Google

Gemini family — competitive pricing with strong multimodal capabilities

Model	Input	Output	Tier
Gemini 2.0 Flash	$0.10	$0.40	Small
Gemini 2.0 Pro	$1.25	$5.00	Mid/Top

per 1M tokens

Perplexity

Search-grounded models — auto-selected for research-intent prompts

Model	Input	Output	Tier
Sonar	$1.00	$1.00	Small
Sonar Pro	$3.00	$15.00	Mid/Top

per 1M tokens

How Routing Works

A 2-step deterministic pipeline picks the right model for every prompt. Zero LLM calls — pure rules.

Complexity + risk analysis determines a default tier. Simple factual tasks route to small, analytical tasks to mid, and multi-step reasoning or high-risk prompts to top.
Budget and latency overrides shift the tier up or down. High budget sensitivity pushes toward cheaper models; low latency sensitivity allows larger models.
Target preference selects the provider. Setting target to claude picks Anthropic, openai picks OpenAI, and generic picks the best value across all providers.
Research intent detection auto-routes to Perplexity. A strict word-boundary regex identifies research, search, and citation prompts and recommends Perplexity's search-grounded models.

Routing is shaped by 5 optimization profiles — frozen presets that bundle budget, latency, and quality preferences:

cost_minimizer balanced quality_first creative enterprise_safe

Output Formats

One prompt, three compilation targets. Pick the format that matches your LLM.

Claude XML

<role>Senior engineer</role>
<goal>Refactor auth module</goal>
<constraints>
  No breaking changes
  Keep under 200 lines
</constraints>

OpenAI

[SYSTEM]
You are a senior engineer.
Your goal is to refactor the
auth module.

[USER]
Refactor the auth module.
No breaking changes.
Keep under 200 lines.

Generic Markdown

## Role
Senior engineer

## Goal
Refactor auth module

## Constraints
- No breaking changes
- Keep under 200 lines

Pricing You Can Trust

All pricing data reflects current provider rates as of February 2026. Every model’s cost estimate is verified against real pricing and tested for consistency.

Savings are calculated against a standard baseline so you can see exactly how much cheaper a smaller model would be — before you commit.

Model routing and cost estimation are always free (not metered). No API keys required.

Try the model router

Route your next prompt to the right model at the right price. Free to use, no account required.

Get started

11 models across 4 providers

What are tokens?

See how tokens add up

The prompt

Same tokens, different efficiency

More value from every token

Smart compression

Intelligent routing

Before vs After — same code review task

Providers & Models

Anthropic

OpenAI

Google

Perplexity

How Routing Works

Output Formats

Claude XML

OpenAI

Generic Markdown

Pricing You Can Trust

Try the model router