What are AI tokens?
Token limits, context windows, and why AI charges you by the token. Everything you need to know about the currency of AI.
4 min read
"GPT-4 has a 128k context window." "Claude can handle 200k tokens." "That'll cost you $0.01 per 1,000 tokens."
What even is a token?
Tokens are chunks of text
When you send text to an AI, it doesn't read words like you do. It breaks your text into tokens: small chunks that might be words, parts of words, or even single characters.
For English:
- "Hello" = 1 token
- "artificial" = 1 token
- "counterintuitive" = 3 tokens (counter + intuit + ive)
- "🎉" = 1 token (sometimes more)
Rule of thumb: 1 token ≈ 4 characters, or about 0.75 words.
So 1,000 tokens ≈ 750 words, roughly a page and a half of text.
Why tokens instead of words?
Words are messy. Different languages, compound words, slang, typos. The AI needs a consistent unit to work with.
Tokenization solves this by creating a vocabulary of common text chunks. The model learns which chunks appear together and what they mean.
English got lucky: common words are usually single tokens. But other languages might need more tokens per word. Code uses lots of symbols, so it's often token-expensive too.
What's the context window?
The context window (or context limit) is how much text the AI can "see" at once. It includes:
- Your message
- The AI's response
- Any previous conversation
- System instructions
If GPT-4 has a 128k token context window, that's roughly 100,000 words. Sounds huge, but it fills up fast in a long conversation.
When you hit the limit, something has to go. Usually the oldest messages get dropped.
Why does context size matter?
Bigger context = the AI remembers more.
Small context (4k-8k tokens):
- Forgets earlier conversation
- Can't handle long documents
- Loses track of complex tasks
Large context (100k-200k tokens):
- Can read entire books
- Remembers full conversation history
- Handles complex, multi-step work
But bigger isn't always better. More context = slower responses and higher costs.
How token pricing works
AI companies charge per token. You pay for:
Input tokens: What you send (your messages, documents, context) Output tokens: What the AI generates (usually costs more)
Example pricing (varies by model):
- Input: $0.01 per 1,000 tokens
- Output: $0.03 per 1,000 tokens
A typical chat message might be 50-100 tokens. A detailed response might be 500-1,000 tokens. At these prices, casual use costs pennies. Heavy use adds up.
Managing your tokens
Cut the fluff. "Please kindly help me with the following request if you don't mind" uses way more tokens than "Help me with this."
Summarize context. Instead of keeping entire conversation history, periodically summarize what's been discussed.
Be specific. Vague prompts get long responses. Specific prompts get focused answers.
Use system messages wisely. They're included in every request. Keep them concise.
Choose the right model. Need quick answers? Use a smaller, cheaper model. Complex reasoning? Pay for the big one.
Token limits vs rate limits
These are different:
Token limit: How much text in a single request (context window) Rate limit: How many requests per minute/hour you can make
Hit a token limit? Your message is too long. Hit a rate limit? You're sending requests too fast.
The hidden cost: thinking tokens
Some models (like Claude with "extended thinking") use extra tokens for reasoning before responding. You pay for these thinking tokens too, even though you might not see them.
It's like paying for the AI's scratch paper.
Why tokens feel limiting
Tokens are a compromise. AI models process everything at once (not word by word like humans read). More tokens = more computation = more cost.
The context window is essentially the AI's working memory. Like RAM in a computer, it's finite and expensive to expand.
Companies are racing to increase context windows. But until compute gets cheaper, tokens will remain the currency of AI.
Tokens are how AI sees text. Understanding them helps you use AI more effectively and economically. Next: Why can't you run ChatGPT on your laptop?
Keep reading
How does AI training actually work?
Everyone says AI 'learns from data.' But what does that actually mean? Here's what happens during training, no PhD required.
4 min read
What is Natural Language Processing?
How computers learn to understand and generate human language. The field behind chatbots, translation, and text analysis.
7 min read
What is Transfer Learning?
How AI applies knowledge from one task to master new ones faster. Transfer learning makes AI training more efficient and accessible.
7 min read
Get new explanations in your inbox
Every Tuesday and Friday. No spam, just AI clarity.
Powered by AutoSend