The model cheatsheet
Which AI is best for which job?
There's no single "best" model — only the best one for the task in front of you. Here's what each is actually good at, ranked from real benchmark, human-preference, and price data. Not opinions.
The cheatsheet
Best models for each job
Pick the job, get the shortlist. Each list is ranked by the metric that actually matters for that task — and refreshes as new models land.
Reasoning & hard problems
Deep multi-step thinking, math, analysis
- 1 GPT-5.5 (xhigh) 54.8
- 2 Gemini 3.5 Flash (high) 50.2
- 3 MI MiniMax-M3 44.4
- 4 Claude Opus 4.6 (Adaptive Reasoning, Max Effort) 43.7
- 5 MO Kimi K2.6 42.8
Ranked by Intelligence Index
Writing & shipping code
Generating, refactoring, and fixing code
- 1 GPT-5.5 (xhigh) 74.9
- 2 Gemini 3.5 Flash (high) 70.1
- 3 XI MiMo-V2.5-Pro 60.2
- 4 MI MiniMax-M3 58.6
- 5 MO Kimi K2.6 56.0
Ranked by Coding Index
Agents & tool use
Autonomous workflows that call tools
- 1 GPT-5.5 (xhigh) 44.9
- 2 Gemini 3.5 Flash (high) 37.4
- 3 MI MiniMax-M3 35.4
- 4 MO Kimi K2.6 30.3
- 5 ZA GLM-5.1 (Reasoning) 29.9
Ranked by Agentic Index
General chat & writing
Everyday assistant, drafting, Q&A
- 1 claude-opus-4-6-thinking 1,501
- 2 gpt-5.4-mini-high 1,499
- 3 gemini-3.1-pro-preview 1,481
- 4 qwen3.7-max-preview 1,474
- 5 muse-spark 1,472
Ranked by LMArena (human votes)
Real-time & low latency
Voice, autocomplete, anything live
- 1 LA LFM2.5-1.2B-Instruct 488 tok/s
- 2 IB Granite 3.3 8B (Non-reasoning) 339 tok/s
- 3 Nova Micro 329 tok/s
- 4 Gemini 3.1 Flash-Lite 296 tok/s
- 5 Grok 4.20 0309 (Non-reasoning) 225 tok/s
Ranked by Output tokens/sec
Huge documents & long context
Whole codebases, books, long transcripts
- 1 Llama 4 Scout 17b 128e Instruct Maas 10M
- 2 Gemini Exp 1206 2.1M
- 3 Grok 4 Fast Reasoning 2M
- 4 GPT 5.5 1.1M
- 5 DA Databricks Gemini 2 5 Flash 1M
Ranked by Context window
Best bang for the buck
The most intelligence per dollar
- 1 Qwen3.5 4B (Non-reasoning) 266.7 pts/$
- 2 XI MiMo-V2.5 229.1 pts/$
- 3 ST Step 3.5 Flash 2603 173.3 pts/$
- 4 gpt-oss-20B (high) 170.3 pts/$
- 5 ZA GLM-4.7-Flash (Reasoning) 150.2 pts/$
Ranked by Intelligence per $/Mtok
High-volume on a budget
Cheap, good-enough, at scale
- 1 Llama 3.1 8b $0.035/Mtok
- 2 Meta Llama 3.2 1B Instruct $0.05/Mtok
- 3 Mistral Small Latest $0.09/Mtok
- 4 ZA GLM 4 32b 0414 128k $0.1/Mtok
- 5 CO Command R7b 12 2024 $0.12/Mtok
Ranked by Lowest blended $/Mtok
Ranked from live data · updated 3 hours ago. Model & provider names are trademarks of their owners, shown here only to report public benchmark and price data.
No favorites
How the picks are made
Benchmarks, not vibes
Reasoning, coding, agentic, speed and value come from independent Artificial Analysis indices. Chat is the LMArena leaderboard — millions of blind human votes. Context and budget come from the live price catalog.
Self-updating
Nothing is hand-picked. When a new model tops a benchmark or a price changes, the cheatsheet re-ranks itself on the next daily sync. No stale "best of 2024" lists.
One axis at a time
A model can win one job and lose another. We rank each category by the single metric that matters for it, so the shortlist is honest about trade-offs.
Quality & speed from Artificial Analysis; human preference from LMArena; prices from the MyTokenTracker catalog. See the full methodology.
Citation
Use this in your work
Open data, free to cite. Pair it with the price-vs-cost breakdown and the full State of AI.
Copy a citation
Free to use and cite under CC BY 4.0. See how this is measured.
Champlin Enterprises. (2026). Which AI for which job — the model cheatsheet (MyTokenTracker) [Data set]. MyTokenTracker. Retrieved June 20, 2026, from https://mytokentracker.io/which-ai
@misc{mytokentracker-which-ai,
title = {Which AI for which job — the model cheatsheet (MyTokenTracker)},
author = {{Champlin Enterprises}},
year = {2026},
howpublished = {MyTokenTracker, \url{https://mytokentracker.io/which-ai}},
note = {Accessed June 20, 2026. Licensed CC BY 4.0.},
url = {https://mytokentracker.io/which-ai}
}
Need a fixed point in time? Every day’s data is permanently archived in the open-data repository, so you can cite a specific date by linking that day’s committed file.
Free weekly digest
The best model keeps changing
New models top these lists every few weeks. Get the weekly digest — what moved, what's now best for what, and what it costs. Free, no account.