The model cheatsheet

Which AI is best for which job?

There's no single "best" model — only the best one for the task in front of you. Here's what each is actually good at, ranked from real benchmark, human-preference, and price data. Not opinions.

Reasoning Reasoning · GPT-5.5 (xhigh) (OpenAI) — 54.8 Reasoning · Gemini 3.5 Flash (high) (Google) — 50.2 MI Reasoning · MiniMax-M3 (MiniMax) — 44.4 Coding Coding · GPT-5.5 (xhigh) (OpenAI) — 74.9 Coding · Gemini 3.5 Flash (high) (Google) — 70.1 XI Coding · MiMo-V2.5-Pro (Xiaomi) — 60.2 Agents Agents · GPT-5.5 (xhigh) (OpenAI) — 44.9 Agents · Gemini 3.5 Flash (high) (Google) — 37.4 MI Agents · MiniMax-M3 (MiniMax) — 35.4 Chat & writing Chat & writing · claude-opus-4-6-thinking (Anthropic) — 1,501 Chat & writing · gpt-5.4-mini-high (OpenAI) — 1,499 Chat & writing · gemini-3.1-pro-preview (Google) — 1,481 Speed LA Speed · LFM2.5-1.2B-Instruct (Liquid AI) — 488 tok/s IB Speed · Granite 3.3 8B (Non-reasoning) (IBM) — 339 tok/s Speed · Nova Micro (Amazon) — 329 tok/s Long context Long context · Llama 4 Scout 17b 128e Instruct Maas (Meta) — 10M Long context · Gemini Exp 1206 (Google) — 2.1M Long context · Grok 4 Fast Reasoning (xAI) — 2M Best value Best value · Qwen3.5 4B (Non-reasoning) (Qwen) — 266.7 pts/$ XI Best value · MiMo-V2.5 (Xiaomi) — 229.1 pts/$ ST Best value · Step 3.5 Flash 2603 (StepFun) — 173.3 pts/$ Budget Budget · Llama 3.1 8b (Meta) — $0.035/Mtok Budget · Meta Llama 3.2 1B Instruct (Amazon) — $0.05/Mtok Budget · Mistral Small Latest (Mistral) — $0.09/Mtok WHICH AI? MyToken Tracker ranked from real data

The cheatsheet

Best models for each job

Pick the job, get the shortlist. Each list is ranked by the metric that actually matters for that task — and refreshes as new models land.

🧠

Reasoning & hard problems

Deep multi-step thinking, math, analysis

  1. 1 GPT-5.5 (xhigh) 54.8
  2. 2 Gemini 3.5 Flash (high) 50.2
  3. 3 MI MiniMax-M3 44.4
  4. 4 Claude Opus 4.6 (Adaptive Reasoning, Max Effort) 43.7
  5. 5 MO Kimi K2.6 42.8

Ranked by Intelligence Index

💻

Writing & shipping code

Generating, refactoring, and fixing code

  1. 1 GPT-5.5 (xhigh) 74.9
  2. 2 Gemini 3.5 Flash (high) 70.1
  3. 3 XI MiMo-V2.5-Pro 60.2
  4. 4 MI MiniMax-M3 58.6
  5. 5 MO Kimi K2.6 56.0

Ranked by Coding Index

🤖

Agents & tool use

Autonomous workflows that call tools

  1. 1 GPT-5.5 (xhigh) 44.9
  2. 2 Gemini 3.5 Flash (high) 37.4
  3. 3 MI MiniMax-M3 35.4
  4. 4 MO Kimi K2.6 30.3
  5. 5 ZA GLM-5.1 (Reasoning) 29.9

Ranked by Agentic Index

💬

General chat & writing

Everyday assistant, drafting, Q&A

  1. 1 claude-opus-4-6-thinking 1,501
  2. 2 gpt-5.4-mini-high 1,499
  3. 3 gemini-3.1-pro-preview 1,481
  4. 4 qwen3.7-max-preview 1,474
  5. 5 muse-spark 1,472

Ranked by LMArena (human votes)

Real-time & low latency

Voice, autocomplete, anything live

  1. 1 LA LFM2.5-1.2B-Instruct 488 tok/s
  2. 2 IB Granite 3.3 8B (Non-reasoning) 339 tok/s
  3. 3 Nova Micro 329 tok/s
  4. 4 Gemini 3.1 Flash-Lite 296 tok/s
  5. 5 Grok 4.20 0309 (Non-reasoning) 225 tok/s

Ranked by Output tokens/sec

📚

Huge documents & long context

Whole codebases, books, long transcripts

  1. 1 Llama 4 Scout 17b 128e Instruct Maas 10M
  2. 2 Gemini Exp 1206 2.1M
  3. 3 Grok 4 Fast Reasoning 2M
  4. 4 GPT 5.5 1.1M
  5. 5 DA Databricks Gemini 2 5 Flash 1M

Ranked by Context window

💎

Best bang for the buck

The most intelligence per dollar

  1. 1 Qwen3.5 4B (Non-reasoning) 266.7 pts/$
  2. 2 XI MiMo-V2.5 229.1 pts/$
  3. 3 ST Step 3.5 Flash 2603 173.3 pts/$
  4. 4 gpt-oss-20B (high) 170.3 pts/$
  5. 5 ZA GLM-4.7-Flash (Reasoning) 150.2 pts/$

Ranked by Intelligence per $/Mtok

🪙

High-volume on a budget

Cheap, good-enough, at scale

  1. 1 Llama 3.1 8b $0.035/Mtok
  2. 2 Meta Llama 3.2 1B Instruct $0.05/Mtok
  3. 3 Mistral Small Latest $0.09/Mtok
  4. 4 ZA GLM 4 32b 0414 128k $0.1/Mtok
  5. 5 CO Command R7b 12 2024 $0.12/Mtok

Ranked by Lowest blended $/Mtok

Ranked from live data · updated 3 hours ago. Model & provider names are trademarks of their owners, shown here only to report public benchmark and price data.

No favorites

How the picks are made

📊

Benchmarks, not vibes

Reasoning, coding, agentic, speed and value come from independent Artificial Analysis indices. Chat is the LMArena leaderboard — millions of blind human votes. Context and budget come from the live price catalog.

🔁

Self-updating

Nothing is hand-picked. When a new model tops a benchmark or a price changes, the cheatsheet re-ranks itself on the next daily sync. No stale "best of 2024" lists.

🎯

One axis at a time

A model can win one job and lose another. We rank each category by the single metric that matters for it, so the shortlist is honest about trade-offs.

Quality & speed from Artificial Analysis; human preference from LMArena; prices from the MyTokenTracker catalog. See the full methodology.

Citation

Use this in your work

Open data, free to cite. Pair it with the price-vs-cost breakdown and the full State of AI.

Copy a citation

Free to use and cite under CC BY 4.0. See how this is measured.

APA

Champlin Enterprises. (2026). Which AI for which job — the model cheatsheet (MyTokenTracker) [Data set]. MyTokenTracker. Retrieved June 20, 2026, from https://mytokentracker.io/which-ai

BibTeX
@misc{mytokentracker-which-ai,
  title        = {Which AI for which job — the model cheatsheet (MyTokenTracker)},
  author       = {{Champlin Enterprises}},
  year         = {2026},
  howpublished = {MyTokenTracker, \url{https://mytokentracker.io/which-ai}},
  note         = {Accessed June 20, 2026. Licensed CC BY 4.0.},
  url          = {https://mytokentracker.io/which-ai}
}

Need a fixed point in time? Every day’s data is permanently archived in the open-data repository, so you can cite a specific date by linking that day’s committed file.

Free weekly digest

The best model keeps changing

New models top these lists every few weeks. Get the weekly digest — what moved, what's now best for what, and what it costs. Free, no account.

No spam, no account. One click to leave.