Browse Consulting Blog Login / Sign Up
← Back to Blog
March 7, 2026 · 8 min read

Stop Using One Model for Everything

There is a default behavior in the AI space that costs people significant money and delivers worse results: using the same model for every task. Someone discovers Claude Opus or GPT-4o, it works well, and they route everything through it. Research tasks, formatting jobs, code review, data extraction, simple Q&A. All of it goes to the most expensive, most capable model available.

This is like hiring a senior engineer to update README files. It works, but it is a waste of capacity and budget. The better approach is model routing: matching each task to the cheapest model that can handle it reliably.

Why People Default to the Biggest Model

The reasons are understandable:

These are reasonable instincts, but they do not hold up under examination. Most agent workloads are not uniformly hard. They are a mix of trivial tasks and complex ones, and the ratio is heavily skewed toward trivial.

The 75/20/5 Rule

After analyzing hundreds of agent task logs, a consistent pattern emerges. For a typical production agent workload:

If you are sending 100% of tasks to an Opus-class model, you are overpaying by roughly 10-15x on three-quarters of your workload.

What Each Tier Is Good At

Haiku-Class (Fast, Cheap)

These models are optimized for speed and cost. They handle structured, well-defined tasks with high reliability:

Cost: roughly $0.25 per million input tokens, $1.25 per million output tokens.

Sonnet-Class (Balanced)

The middle tier handles tasks that require more reasoning but are not pushing the frontier of model capability:

Cost: roughly $3 per million input tokens, $15 per million output tokens.

Opus-Class (Maximum Capability)

Reserve the frontier model for tasks where quality genuinely matters and where smaller models demonstrably fall short:

Cost: roughly $15 per million input tokens, $75 per million output tokens.

Concrete Cost Savings

Let us work through a real example. Suppose your agent processes 1,000 tasks per day, averaging 2,000 input tokens and 500 output tokens per task.

All-Opus approach:
  Input:  1000 * 2000 * $15/1M  = $30.00/day
  Output: 1000 * 500  * $75/1M  = $37.50/day
  Total: $67.50/day = $2,025/month

Routed approach (75/20/5 split):
  Haiku:  750 tasks  → $0.38 + $0.47  = $0.85/day
  Sonnet: 200 tasks  → $1.20 + $1.50  = $2.70/day
  Opus:    50 tasks  → $1.50 + $1.88  = $3.38/day
  Total: $6.93/day = $207.90/month

That is a 90% cost reduction with the same output quality, because 95% of your tasks never needed Opus in the first place.

How to Implement Routing Logic

Routing does not need to be complicated. Start with a simple rule-based system:

def select_model(task):
    # Tier 1: Simple, structured tasks
    if task.type in ["extract", "classify", "format", "template"]:
        return "claude-haiku"

    # Tier 2: Moderate reasoning tasks
    if task.type in ["code_gen", "analyze", "compare", "debug"]:
        return "claude-sonnet"

    # Tier 3: Complex reasoning tasks
    if task.type in ["architect", "refactor", "novel_design"]:
        return "claude-opus"

    # Default to middle tier for unknown tasks
    return "claude-sonnet"

Start coarse and refine over time. Log the model used, the task type, and the output quality for each request. After a week of data, you will see clearly where you can downgrade models without affecting quality, and where you need to upgrade.

The Fallback Pattern

A more robust approach is to start with the cheapest model and escalate on failure:

  1. Send the task to Haiku
  2. Check the output against a quality heuristic (length, format, confidence score)
  3. If it passes, use it. If not, retry with Sonnet
  4. If Sonnet fails the check, escalate to Opus

This guarantees you always use the cheapest model that produces acceptable output. The overhead of occasional retries is far less than the cost of always using the most expensive model.

The question is not "which model is best?" It is "what is the cheapest model that is good enough for this specific task?" That reframing changes everything about how you build agent systems.

Master model selection with real benchmarks and routing rules.

Get the Model Selection Guide →