Token Calculator

Estimate AI token count and cost for popular language models.

Your text

Model

Est. tokens

Input cost

$0.0002

Context used

0.0%

Context limit

128K

What is an AI token?

A token is the basic unit of text that AI language models process. Rather than working character-by-character or word-by-word, models use a tokenizer to split text into variable-length chunks that balance vocabulary coverage with computational efficiency. Common English words are often a single token; rarer words, prefixes, and suffixes may be split across multiple tokens. The rule of thumb is approximately 1 token per 4 characters, or 0.75 tokens per word.

Rough token estimates:
  1 word  ≈ 1.3 tokens  (common English words)
  1 char  ≈ 0.25 tokens
  1 sentence (15 words) ≈ 20 tokens

Common content sizes:
  Tweet (280 chars)         ≈  70 tokens
  Paragraph (100 words)     ≈ 130 tokens
  Blog post (1,000 words)   ≈ 1,300 tokens
  Short novel (80,000 words) ≈ 104,000 tokens

Why token limits matter

Every AI model has a context window — the maximum number of tokens it can process in a single request (both input and output combined). Sending more tokens than the context window allows results in an error or truncation. For long documents, code files, or multi-turn conversations, tracking token count helps you stay within limits and plan how to structure your inputs.

Model	Context window	Approx. word equivalent
GPT-4o	128,000 tokens	~96,000 words (~192 pages)
Claude Sonnet 4	200,000 tokens	~150,000 words (~300 pages)
Gemini 1.5 Pro	1,000,000 tokens	~750,000 words (~1,500 pages)
Llama 3.1 70B	128,000 tokens	~96,000 words
Mistral Large	128,000 tokens	~96,000 words

Non-English text and special content

Token counts vary significantly by language and content type. Non-Latin scripts (Chinese, Japanese, Korean, Arabic, Hindi) tend to use more tokens per character than English because most tokenizers were trained primarily on English text. Code is also tokenized differently — typically 1 token per 3–4 characters for common languages like Python and JavaScript.

Token efficiency by content type (approximate):
  English prose:        1 token / 4 chars
  English code:         1 token / 3.5 chars
  JSON/XML:             1 token / 3 chars (verbose structure)
  Chinese/Japanese:     1 token / 1.5–2 chars
  Arabic/Hebrew:        1 token / 2–3 chars
  Emoji:                1–2 tokens each

Frequently asked questions

What is a token in AI language models?

A token is a chunk of text — often a word or part of a word — that a language model processes. On average, one token is about 4 characters or 0.75 words of English.

Why do tokens matter for cost?

AI APIs charge per token for both input (your prompt) and output (the response). Estimating token counts lets you predict and control costs before sending requests.

How many tokens are in a word?

Roughly 1.3 tokens per English word on average, though long or unusual words split into more tokens. 1,000 tokens is about 750 words.

Does this count tokens exactly like the model?

It gives a close estimate using average ratios. Exact counts depend on the specific tokenizer a model uses, but the estimate is accurate enough for budgeting.

Formula / How it works

Rough estimation: 1 token ≈ 4 characters (English text). This matches OpenAI’s rule of thumb. Actual token counts may vary slightly by model and tokenizer.