Model Selection Guide

Which AI model for which task

📘 Guide · Engineering, Strategy · v1.0

About This Guide

You're burning money on the wrong models. Every team running multi-model AI operations makes the same mistake: they default to the most capable model for everything, then wonder why their API bill is $300/month when it should be $40. This guide fixes that.

Claude vs GPT vs Gemini vs local — the real answer isn't "pick one." It's knowing which task goes where. Creative synthesis and deep reasoning go to Opus. Code generation and structured extraction go to Sonnet. Mechanical tasks like classification, formatting, and templating go to Haiku. The wrong assignment on a high-volume task costs you $50/day. The right one costs $2.

The guide covers cost comparison with real production numbers from running 8 agents daily. You'll see what a 1,000-call research briefing actually costs on each Claude tier, what GPT-4o vs GPT-4o-mini looks like at scale, and when Gemini Flash is the answer nobody's using. The numbers will surprise you.

One of the most overlooked optimizations: flat-rate CLI subscriptions vs per-token API billing. For certain workloads — long iterative coding sessions, multi-step research chains — a $20/month Claude Pro subscription routed through an MCP server can replace $150/month in API tokens. This guide shows you the calculation. It's not always obvious, but it's always worth knowing.

Model routing by task type is the core framework. We've mapped 25 common agent task types to their optimal model — with cost per 1,000 calls and quality score for each. The decision tree for Opus vs Sonnet vs Haiku alone has saved production operations hundreds of dollars per month. Local models via Ollama get a full section: when latency matters more than quality, when you need air-gapped processing, and which models punch above their weight at the 7B and 13B parameter levels.

The included cost calculator is an actual spreadsheet with real formulas — plug in your task volumes and it outputs monthly cost across every major model option. No more guessing. The guide that saves you $50/day in wasted tokens pays for itself in the first hour.

🎁 What's Included

Model comparison matrix (Claude, GPT, Gemini, local)
Task-to-model routing guide (25 task types mapped)
Cost calculator spreadsheet with real formulas
Opus vs Sonnet vs Haiku decision tree
Local model guide (Ollama setup + model recommendations)
Flat-rate vs API cost comparison framework
Real cost examples from production operations
Model switching config templates (OpenClaw + n8n)
Performance benchmarks per task type

What Makes This Different

Built from 8-agent daily production data — not benchmarks, not theory
Includes actual dollar amounts per task type, not just relative comparisons
Covers the flat-rate vs token math that most guides skip entirely
Local model section is real: which Ollama models actually work for agent tasks
Config templates are drop-in ready — not pseudocode that you still have to implement

Core Capabilities

Model cost comparison matrix
Task-type routing framework
Opus decision criteria
Sonnet use cases defined
Haiku volume task patterns
GPT-4o vs GPT-4o-mini guide
Gemini Flash integration
Ollama local model setup
Cost calculator (spreadsheet)
Flat-rate vs API analysis
Model switching configs
Per-task benchmark data
Token budget strategies
Multi-model routing templates

Version History

v1.0

March 2026