March 7, 2026 · 8 min read

Stop Using One Model for Everything

There is a default behavior in the AI space that costs people significant money and delivers worse results: using the same model for every task. Someone discovers Claude Opus or GPT-4o, it works well, and they route everything through it. Research tasks, formatting jobs, code review, data extraction, simple Q&A. All of it goes to the most expensive, most capable model available.

This is like hiring a senior engineer to update README files. It works, but it is a waste of capacity and budget. The better approach is model routing: matching each task to the cheapest model that can handle it reliably.

Why People Default to the Biggest Model

The reasons are understandable:

It is simpler. One API key, one model name, one set of quirks to learn. Adding routing logic means more code and more decisions.
Quality anxiety. People worry that a smaller model will produce bad output, so they avoid the risk by always using the best.
Lack of benchmarks. Without testing models against your specific tasks, you cannot know where the quality threshold actually is. So you default to overkill.

These are reasonable instincts, but they do not hold up under examination. Most agent workloads are not uniformly hard. They are a mix of trivial tasks and complex ones, and the ratio is heavily skewed toward trivial.

The 75/20/5 Rule

After analyzing hundreds of agent task logs, a consistent pattern emerges. For a typical production agent workload:

75% of tasks can be handled by a fast, cheap model (Haiku-class)
20% of tasks need a mid-tier model (Sonnet-class)
5% of tasks require a frontier model (Opus-class)

If you are sending 100% of tasks to an Opus-class model, you are overpaying by roughly 10-15x on three-quarters of your workload.

What Each Tier Is Good At

Haiku-Class (Fast, Cheap)

These models are optimized for speed and cost. They handle structured, well-defined tasks with high reliability:

Extracting specific fields from text (names, dates, amounts)
Classifying inputs into predefined categories
Reformatting data (JSON to CSV, markdown to HTML)
Simple text generation from templates
Yes/no decisions with clear criteria
Summarizing short documents (under 2,000 words)

Cost: roughly $0.25 per million input tokens, $1.25 per million output tokens.

Sonnet-Class (Balanced)

The middle tier handles tasks that require more reasoning but are not pushing the frontier of model capability:

Code generation for well-defined functions
Multi-step analysis with 2-3 reasoning steps
Writing that needs to match a specific style or tone
Comparing multiple options and making recommendations
Debugging straightforward issues
Summarizing long or complex documents

Cost: roughly $3 per million input tokens, $15 per million output tokens.

Opus-Class (Maximum Capability)

Reserve the frontier model for tasks where quality genuinely matters and where smaller models demonstrably fall short:

Complex architectural decisions involving multiple tradeoffs
Novel code design where there is no obvious pattern to follow
Multi-file refactoring that requires understanding system-wide implications
Nuanced writing that requires deep understanding of context
Tasks where a wrong answer has significant consequences

Cost: roughly $15 per million input tokens, $75 per million output tokens.

Concrete Cost Savings

Let us work through a real example. Suppose your agent processes 1,000 tasks per day, averaging 2,000 input tokens and 500 output tokens per task.

All-Opus approach:
  Input:  1000 * 2000 * $15/1M  = $30.00/day
  Output: 1000 * 500  * $75/1M  = $37.50/day
  Total: $67.50/day = $2,025/month

Routed approach (75/20/5 split):
  Haiku:  750 tasks  → $0.38 + $0.47  = $0.85/day
  Sonnet: 200 tasks  → $1.20 + $1.50  = $2.70/day
  Opus:    50 tasks  → $1.50 + $1.88  = $3.38/day
  Total: $6.93/day = $207.90/month

That is a 90% cost reduction with the same output quality, because 95% of your tasks never needed Opus in the first place.

How to Implement Routing Logic

Routing does not need to be complicated. Start with a simple rule-based system:

def select_model(task):
    # Tier 1: Simple, structured tasks
    if task.type in ["extract", "classify", "format", "template"]:
        return "claude-haiku"

    # Tier 2: Moderate reasoning tasks
    if task.type in ["code_gen", "analyze", "compare", "debug"]:
        return "claude-sonnet"

    # Tier 3: Complex reasoning tasks
    if task.type in ["architect", "refactor", "novel_design"]:
        return "claude-opus"

    # Default to middle tier for unknown tasks
    return "claude-sonnet"

Start coarse and refine over time. Log the model used, the task type, and the output quality for each request. After a week of data, you will see clearly where you can downgrade models without affecting quality, and where you need to upgrade.

The Fallback Pattern

A more robust approach is to start with the cheapest model and escalate on failure:

Send the task to Haiku
Check the output against a quality heuristic (length, format, confidence score)
If it passes, use it. If not, retry with Sonnet
If Sonnet fails the check, escalate to Opus

This guarantees you always use the cheapest model that produces acceptable output. The overhead of occasional retries is far less than the cost of always using the most expensive model.

The question is not "which model is best?" It is "what is the cheapest model that is good enough for this specific task?" That reframing changes everything about how you build agent systems.

Master model selection with real benchmarks and routing rules.

Get the Model Selection Guide →