What Are AI Tokens? A Deep Dive Into How Claude, ChatGPT, and Gemini Measure Usage

Every Word You Type Has a Price Tag

Every time you ask Claude to write a function, every time ChatGPT summarizes a document, every time Gemini answers a question — the AI isn't reading your words. It's reading tokens.

Tokens are the fundamental unit of AI. They determine how much you can say, how much the model can respond, how much context it can hold, and — if you're on a paid plan — how much it all costs. Yet most developers have only a vague sense of what tokens actually are.

This guide changes that. We're going deep on tokens: what they are at a technical level, how different AI providers tokenize text, what they cost across Claude, ChatGPT, and Gemini, and why tracking your token usage is one of the smartest things you can do as a developer in 2026.

What Is a Token, Exactly?

A token is a chunk of text that a language model processes as a single unit. It's not a word. It's not a character. It's somewhere in between — and the exact boundaries depend on the model's tokenizer.

Here's the simplest way to think about it:

Common short words like "the", "is", "and" are usually one token each
Longer words get split into pieces: "understanding" might become "under" + "standing" (2 tokens)
Uncommon or technical words get split further: "tokenization" might become "token" + "ization" (2 tokens), or "tok" + "en" + "ization" (3 tokens)
Spaces and punctuation count as tokens too (often attached to the following word)
Code has its own patterns: function is 1 token, but getElementById might be 3-4 tokens

The common rule of thumb: 1 token is roughly 4 characters or 0.75 words in English. So 1,000 tokens is approximately 750 words. A million tokens — which is Claude's current context window — is roughly 750,000 words, or about 10 full-length novels.

How Tokenization Works Under the Hood

Modern language models use a technique called Byte Pair Encoding (BPE) or variations of it. Here's the simplified version of how it works:

Start with individual characters. The tokenizer begins by treating every character as its own token.
Find the most common pairs. It scans a massive text corpus and finds which two adjacent tokens appear together most frequently. Maybe "t" + "h" appears billions of times.
Merge them. That pair becomes a new token: "th".
Repeat thousands of times. The process continues — "th" + "e" becomes "the", "in" + "g" becomes "ing", and so on — until the vocabulary reaches a target size (typically 50,000-100,000+ tokens).

The result is a vocabulary where common words and word fragments are single tokens, while rare words get broken into smaller pieces. This is why "the" is one token but "cryptocurrency" might be three.

Why This Matters for Developers

Code doesn't tokenize the same way as prose. Consider this Python snippet:

def calculate_total_price(items, tax_rate):
    return sum(item.price for item in items) * (1 + tax_rate)

To you, that's a two-line function. To a tokenizer, it might be 25-35 tokens depending on the model. Variable names with underscores get split at the underscores. Symbols like parentheses, colons, and operators each consume tokens. Indentation whitespace costs tokens too.

This is why token usage in coding assistants like Claude Code can be surprisingly high. You're not just paying for the code — you're paying for every syntactic detail the model needs to understand and generate.

Input Tokens vs. Output Tokens

Every AI interaction involves two types of tokens, and they're priced differently:

Input Tokens (What You Send)

Input tokens include everything the model reads before generating a response:

Your prompt or question
System instructions
Conversation history (every previous message in the thread)
Uploaded files or documents
Tool definitions (for function calling / tool use)
The model's own previous responses (when included in context)

This is a critical point that surprises many developers: conversation history is re-sent with every message. If your conversation is 20 messages deep, every new message includes all 20 previous messages as input tokens. This is why long conversations burn through tokens exponentially, not linearly.

Output Tokens (What the AI Generates)

Output tokens are the model's response — the text it writes back to you. Output tokens are almost always more expensive than input tokens, typically 3-5x the price. This makes sense: generating text requires more computation than reading it.

The Hidden Third Type: Thinking Tokens

Claude's extended thinking feature (and similar reasoning modes in other models) introduces a third category. When Claude "thinks" through a complex problem step by step, those internal reasoning tokens count toward your usage. On Claude's API, thinking tokens are billed at output token rates since the model is generating them — even though you might only see the final answer.

Context Windows: How Much the Model Can "Remember"

A model's context window is the maximum number of tokens it can process in a single conversation. Think of it as the model's working memory. Here's how the major providers compare in 2026:

Model	Context Window	Approx. Words	Max Output
Claude Opus 4.6	1,000,000 tokens	~750,000	128K tokens
Claude Sonnet 4.6	1,000,000 tokens	~750,000	64K tokens
Claude Haiku 4.5	200,000 tokens	~150,000	64K tokens
GPT-4.1	1,047,576 tokens	~785,000	32K tokens
GPT-4o	128,000 tokens	~96,000	16K tokens
Gemini 2.5 Pro	1,048,576 tokens	~786,000	64K tokens
Gemini 2.5 Flash	1,048,576 tokens	~786,000	64K tokens

Context windows have exploded in size over the past two years. Claude Opus 4.6's 1 million token context means it can hold an entire codebase — hundreds of files — in a single conversation. This is why tools like Claude Code can understand your full project structure, not just the file you're editing.

But bigger context windows mean more tokens consumed per message. If Claude is holding 500K tokens of context and you ask a question, those 500K tokens are all input tokens you're "spending" on that single exchange.

Token Pricing: What Every Provider Charges in 2026

Token pricing varies dramatically across providers and models. Here's a comprehensive comparison of what the major AI providers charge per million tokens (MTok) on their APIs:

Claude (Anthropic) — API Pricing

Model	Input (per MTok)	Output (per MTok)	Context Window
Claude Opus 4.6	$5.00	$25.00	1M tokens
Claude Sonnet 4.6	$3.00	$15.00	1M tokens
Claude Haiku 4.5	$1.00	$5.00	200K tokens

Claude also offers prompt caching that can cut costs significantly. Cache hits cost just 10% of standard input price ($0.50/MTok for Opus, $0.30/MTok for Sonnet). The Batch API gives a flat 50% discount on everything — Opus drops to $2.50/$12.50 per MTok for input/output.

ChatGPT / OpenAI — API Pricing

Model	Input (per MTok)	Output (per MTok)	Context Window
GPT-4.1	$2.00	$8.00	1M tokens
GPT-4.1 mini	$0.40	$1.60	1M tokens
GPT-4.1 nano	$0.10	$0.40	1M tokens
GPT-4o	$2.50	$10.00	128K tokens
o3	$2.00	$8.00	200K tokens
o4-mini	$1.10	$4.40	200K tokens

Gemini (Google) — API Pricing

Model	Input (per MTok)	Output (per MTok)	Context Window
Gemini 2.5 Pro	$1.25	$10.00	1M tokens
Gemini 2.5 Flash	$0.30	$2.50	1M tokens
Gemini 2.5 Flash-Lite	$0.10	$0.40	1M tokens

Google also offers free tiers for many Gemini models with rate-limited access, making them attractive for experimentation.

Putting Costs in Perspective

Let's make these numbers real. Say you're a developer using Claude Code with Opus 4.6 for a 30-minute coding session. A typical session might involve:

Input tokens: ~150,000 (your prompts + growing conversation context + file contents)
Output tokens: ~30,000 (Claude's code, explanations, and tool calls)

At API rates: (150K x $5/MTok) + (30K x $25/MTok) = $0.75 + $0.75 = $1.50 per session.

Do 10 sessions a day, 5 days a week? That's $75/week or $300/month at API rates. This is exactly why Anthropic's subscription plans (Pro at $20/month, Max at $100-$200/month) are such a good deal for individual developers — you're getting significantly more value than you'd pay at API rates.

Subscription Plans vs. API Tokens: Two Different Worlds

There's an important distinction that confuses many developers:

API Usage (Pay Per Token)

If you're building an application that calls the Claude, OpenAI, or Gemini API, you pay per token — exactly the prices listed above. Every input token and output token is metered and billed. This is transparent but can get expensive fast.

Subscription Plans (Rate-Limited Access)

If you're using Claude Code, ChatGPT, or Gemini as a personal coding assistant through their consumer products, you pay a flat monthly subscription. Instead of per-token billing, you get rate limits — a budget of how much you can use within a time window before getting throttled.

Here's the catch: subscription providers don't tell you exactly how many tokens you're consuming. Anthropic shows a percentage bar. OpenAI shows a similar vague indicator. Google gives you a message count. None of them give you per-session token counts, per-project breakdowns, or cost estimates.

This is the core problem. You're paying $20-$200/month but flying blind on actual usage.

Why Tokens Add Up Faster Than You Think

There are several non-obvious reasons your token usage might be higher than expected:

1. Conversation Context Grows With Every Message

Remember: every message includes the full conversation history. Message 1 sends 500 tokens. Message 2 sends those 500 + your new question + the AI's first response — maybe 2,000 tokens total. By message 10, you might be sending 20,000+ tokens of context with every single message. This is the single biggest driver of token usage in long coding sessions.

2. System Prompts Are Invisible But Expensive

Claude Code, GitHub Copilot, and other AI coding tools include large system prompts that you never see. These prompts define the tool's behavior, capabilities, and constraints. Claude Code's system prompt alone can be thousands of tokens — and it's sent with every message.

3. Tool Definitions Add Up

When an AI model has access to tools (file reading, web search, code execution), each tool's definition — its name, description, and parameter schema — is included as input tokens. Claude Code exposes dozens of tools, and their definitions add hundreds of tokens to every request.

4. File Contents Are Tokens

When Claude Code reads a file in your project, the entire file content becomes input tokens. Read 10 files in a session? That could be 50,000+ tokens just from file contents. This is why coding assistants with large context windows burn through tokens so fast — they can hold more, so they read more.

5. Thinking Tokens Are Hidden Output

Extended thinking in Claude (and reasoning modes in other models) generates tokens you don't see in the final response. A complex debugging task might generate 5,000 tokens of visible response but 15,000 tokens of internal reasoning. You're billed for all 20,000.

Token Optimization Strategies

Once you understand how tokens work, you can be smarter about how you use them:

Start New Conversations for New Tasks

Don't use a single 50-message thread for everything. When you switch tasks, start a fresh conversation. This resets the context and eliminates the exponential growth of conversation history tokens.

Choose the Right Model for the Task

Not every task needs the most powerful model. Use Haiku or Sonnet for quick questions, code formatting, and simple generation. Save Opus for complex architecture decisions, multi-file refactoring, and deep debugging. The cost difference is 5x between Haiku and Opus.

Be Specific in Your Prompts

Vague prompts lead to verbose responses (more output tokens) and often require follow-up messages (more input tokens from the growing context). A specific, well-structured prompt gets you a better answer in fewer tokens.

Use Context Windows Wisely

Just because a model can hold 1M tokens doesn't mean you should fill it. Point Claude Code at specific files rather than letting it scan your entire project. Use .claudeignore to exclude directories the model doesn't need (build artifacts, node_modules, etc.).

How to Actually Track Your Token Usage

If you're using the API, token counts are returned in every response. But if you're on a subscription plan — which most Claude Code users are — you're stuck with Anthropic's percentage bar.

That's why we built MyTokenTracker.

MyTokenTracker installs a lightweight hook into your Claude Code configuration that captures token usage metadata after every session. No code is collected. No conversation content is captured. Just the numbers that matter:

Input and output token counts per session
Model used (Opus, Sonnet, or Haiku)
Project context — which project directory the session was in
Estimated cost — what your session would cost at API rates
Historical trends — daily, weekly, and monthly usage patterns

Your dashboard shows you exactly where your tokens go. Which projects are expensive. Which models you lean on. Whether you're pacing to hit your rate limit or coasting with headroom.

Real-World Token Usage Examples

To give you a sense of scale, here's what typical Claude Code activities look like in tokens:

Activity	Typical Input Tokens	Typical Output Tokens	Est. API Cost
Quick bug fix (5 min)	~20,000	~5,000	$0.23
Feature implementation (30 min)	~150,000	~30,000	$1.50
Large refactor (1 hour)	~400,000	~80,000	$4.00
Architecture planning session	~250,000	~50,000	$2.50
Code review (medium PR)	~100,000	~15,000	$0.88
Full-day coding (8 hours)	~2,000,000	~400,000	$20.00

Estimates based on Claude Opus 4.6 API rates. Subscription plan costs vary.

Notice that a full day of intense Claude Code usage would cost $20 at API rates. That's your entire Pro subscription fee — in a single day. This is why the subscription model is a bargain for heavy users, and why Max plan users who code all day are getting incredible value.

The Future of Token Pricing

Token costs have dropped dramatically since GPT-3's launch. What cost $60/MTok in 2020 now costs $0.10/MTok with the cheapest models. The trend is clear: prices will keep falling as hardware improves and competition intensifies.

But usage is growing even faster. Larger context windows mean more tokens per conversation. Agentic workflows that chain multiple AI calls multiply token consumption. Multi-modal features (images, audio, video) introduce new token types with their own pricing.

The developers who thrive in this environment won't be the ones who spend the most — they'll be the ones who understand what they're spending and why.

Start Tracking Your Tokens Today

Tokens are the currency of AI development. You wouldn't run a business without looking at your bank statements. Don't run your AI workflow without looking at your token statements.

MyTokenTracker is free to sign up — just connect your GitHub account at mytokentracker.io and install the hook in under 60 seconds.

Use promo code EARLY-ACCESS to unlock a free lifetime PRO membership. Unlimited history, advanced analytics, per-project breakdowns, smart alerts, and self-learning cost algorithms — all free, forever. No credit card required.

Your tokens are adding up whether you track them or not. The difference is whether you know where they're going.