LLM Token Counter Calculator
Count tokens for GPT-4, Claude, Llama, and other LLMs before sending requests. Estimate API costs and stay within context window limits.
Tokens are the currency of large language models — and if you're building anything with LLM APIs, you're paying per token whether you realize it or not. Understanding token counts helps you write more cost-efficient prompts, avoid context window overflows, and budget your API spend accurately.
Tokens Are Not Words
This trips up a lot of people. A token is roughly 0.75 words in English, but that ratio shifts depending on content type:
| Content Type | Chars per Token | Example |
|---|---|---|
| Plain English prose | ~4 chars | "the quick brown fox" = 5 tokens |
| Code (Python/JS) | ~3.5 chars | Identifiers and symbols split more |
| JSON/structured data | ~3 chars | {"key": "value"} = 7 tokens |
| Non-Latin scripts (Chinese, Arabic) | ~1.5 chars | Each character is often 2–3 tokens |
| Whitespace-heavy text | ~5 chars | Extra spaces collapse |
How to Use It
Paste your text — system prompt, user message, conversation history, whatever — into the input box. Select your target model. The calculator returns:
- Exact token count using that model's tokenizer
- Estimated cost at current API rates (input and output separately)
- A color-coded bar showing how much of the context window you're using
- Per-sentence breakdown if you want to find verbose sections
[INST] tags, role markers, etc. that eat tokens before your actual content).
Real Scenario: A RAG Prompt Getting Expensive
A retrieval-augmented generation setup might look like:
- System prompt: 500 tokens
- Retrieved context chunks (3 × 800 words): ~3,200 tokens
- User question: 50 tokens
- Total input: ~3,750 tokens
Tips for Reducing Token Usage
Be concise in system prompts. "You are a helpful assistant that answers questions accurately and briefly." is 15 tokens. A 500-word elaborate persona description is 380 tokens and often doesn't improve quality. Compress retrieved context. Instead of pasting raw document chunks, summarize or extract only the relevant sentences. This can cut retrieval token cost by 60–80%. Watch JSON overhead. If you're having the model return structured output, verbose JSON keys multiply fast.{"response": "yes"} is 8 tokens; {"r": 1} is 4. Consider compressed schemas in high-volume applications.
Conversation history grows. In a chatbot, each turn includes all previous messages. After 10 exchanges, you might be sending 4,000 tokens of history with every new question. Implement rolling window truncation or periodic summarization.
Does token count affect response quality?
Not directly, but context window limits matter. If your prompt is 120,000 tokens on a 128,000-token model, you have very little room for the response. Models also tend to "forget" content from the very beginning of a long context, which is known as the lost-in-the-middle problem.
How do I count tokens without calling the API?
Use the tiktoken library for OpenAI models (pip install tiktoken). For Llama-based models, use the HuggingFace tokenizer directly. The CalcHub calculator runs these tokenizers in the browser using WebAssembly, so your text never leaves your device.
Why does my code snippet use more tokens than expected?
Code tends to tokenize densely because operators, brackets, and short variable names create many individual tokens. x+=1 might be 4–5 tokens despite being 4 characters. The token counter's per-word breakdown helps identify token-heavy segments.
Related Calculators
- AI API Cost Calculator — full monthly cost projections for LLM usage
- Inference Cost Calculator — self-hosted vs API cost comparison
- Embedding Dimension Calculator — plan vector storage for RAG