API Tokens Explained: How They Work and Why They Cost What They Do
A beginner-friendly guide to AI API tokens — what they are, how input vs output tokens affect your bill, and practical ways to reduce token usage.
What are API tokens?
Tokens are the fundamental unit of billing for AI APIs. Every text you send to an AI model (the input) and every text it returns (the output) is broken down into tokens, and you're charged based on the total count.
A token is roughly 3–4 characters of English text, or about 0.75 words. The sentence "Hello, how are you today?" is about 7 tokens. A typical 500-word blog post is roughly 670 tokens.
Input tokens vs output tokens
AI providers charge separately for input and output tokens, and output tokens are always more expensive:
- Input tokens — Everything you send: system prompt, user message, context, examples. You control this directly.
- Output tokens — Everything the model generates. You can control this indirectly with
max_tokensand prompt instructions like "be concise."
For most providers, output tokens cost 2–5x more than input tokens. With Claude 3 Opus, output tokens are 5x the input price ($75 vs $15 per 1M tokens). This means a chatty model response costs significantly more than the prompt that generated it.
How tokens affect your bill
Let's do the math for a typical scenario. Say you're running a customer support chatbot using GPT-4o:
- System prompt: 500 tokens (sent with every request)
- User message: ~100 tokens average
- Model response: ~300 tokens average
- Total per request: 600 input + 300 output tokens
At GPT-4o pricing ($2.50/1M input, $10.00/1M output), each request costs about $0.0045. Multiply by 10,000 requests per day, and you're spending $45/day or $1,350/month. The output tokens ($30/day) cost more than the input tokens ($15/day) despite being fewer in number.
Hidden token costs to watch for
Several factors inflate your token count beyond what's obvious:
- System prompts — These are sent with every single request. A 2,000-token system prompt across 10,000 daily requests is 20M tokens/day in input alone.
- Conversation history — Chat applications that include previous messages grow token counts linearly. A 10-turn conversation sends all previous turns as input.
- Function calling / tool use — Tool definitions count as input tokens. If you define 20 tools with detailed descriptions, that could be 1,000+ tokens per request.
- JSON mode — Structured output with JSON mode tends to produce more tokens than plain text responses.
Practical ways to reduce token usage
Now that you understand how tokens translate to costs, here are direct actions you can take:
- Trim your system prompt. Remove examples the model doesn't need. Test with shorter versions.
- Set
max_tokenson every request. A hard cap prevents runaway responses. - Truncate conversation history. Keep the last 5–10 messages, not the full history.
- Use prompt caching to avoid re-paying for static prefixes.
- Route simple tasks to cheaper models that use the same tokens but at a lower price.
Track your token usage
Understanding tokens is the first step. The second is tracking them. MeterFox shows you daily input and output token counts by model, so you can see exactly where your tokens — and dollars — are going. Monitor your token trends weekly and you'll catch cost creep before it becomes a problem.
Start monitoring your API costs for free
Track spending across 15+ providers in one dashboard. No credit card required.
Get Started Free