Browse Consulting Blog Login / Sign Up
← Back to Blog
March 22, 2026 · 7 min read

Agents Don't Need to Be Smart

The default instinct when building an AI agent is to reach for the most powerful model available. If Claude Opus exists, why would you use anything less? This instinct is understandable, but it is also the single most expensive mistake in agent design.

The truth is that the vast majority of agent tasks do not require frontier intelligence. They require reliable execution of well-defined operations. And for that, small models are not just adequate. They are better.

The "Bigger Is Better" Fallacy

Large models are extraordinary at complex reasoning, nuanced writing, and multi-step problem solving. But most agent tasks are none of those things. Most agent tasks look like this:

These are pattern matching and formatting tasks. A small, fast model handles them with the same accuracy as a large model, at a fraction of the cost and latency. Using Opus for email classification is like hiring an architect to hang a picture frame.

What Small Models Excel At

Small and mid-tier models (Haiku-class, GPT-4o-mini, Gemini Flash) are genuinely good at a wide range of practical tasks:

For these tasks, small models run 5-10x faster and cost 10-30x less per token than frontier models. When your agent runs these operations hundreds of times per day, the savings compound significantly.

What Actually Requires Large Models

Reserve your most capable model for tasks that genuinely demand it:

These tasks represent, in a typical agent workflow, about 10% of total operations.

The 90% Rule

90% of agent tasks can run on the cheapest model available. The remaining 10% justify the cost of a frontier model. The mistake is running everything on the expensive one.

This is not a theoretical estimate. Look at any agent workflow and categorize each step by the intelligence it actually requires. A research agent, for example, might have this breakdown:

  1. Fetch and parse web pages - no LLM needed, just HTTP and HTML parsing
  2. Extract key facts from each page - small model, extraction task
  3. Classify relevance to the research topic - small model, classification task
  4. Summarize each relevant source - small model, summarization task
  5. Synthesize findings into a coherent briefing - large model, complex synthesis

Only step 5 needs the frontier model. Steps 2-4 run perfectly well on a model that costs a fraction as much.

How to Test If a Smaller Model Works

Before committing to a model for an agent task, run a simple comparison:

  1. Prepare 20 representative inputs for the task your agent will perform
  2. Run them through both the large and small model with identical prompts
  3. Compare outputs side by side. For most structured tasks, the outputs will be functionally identical
  4. Check edge cases. If the small model fails on unusual inputs, you can route just those cases to the larger model
# Quick comparison script
for input in test_inputs/*.txt; do
  echo "=== $(basename $input) ==="
  echo "--- Haiku ---"
  cat "$input" | llm -m haiku "$PROMPT"
  echo "--- Opus ---"
  cat "$input" | llm -m opus "$PROMPT"
  echo ""
done

If the small model produces acceptable output on 18 of 20 test cases, it is the right choice. Handle the edge cases with a fallback to the larger model, or accept the occasional imperfection for the massive cost savings.

Practical Model-Task Matching

Here is a reference table for common agent tasks and the model tier they actually need:

The pattern is clear: the more a task resembles pattern matching or format conversion, the less intelligence it needs. The more it resembles judgment, reasoning, or creative synthesis, the more it benefits from a larger model.

Build your agents with model routing from the start. Default to the smallest viable model, and escalate only when the task demands it. Your cost per agent run will drop dramatically, your latency will improve, and the quality of output on the tasks that matter will stay exactly where it is.

Learn exactly which model to use for every agent task.

Get the Model Selection Guide →