String Length Calculator: Characters, Bytes, and Words Explained
Count characters, bytes, and words in any text. Understand why byte count differs from character count with UTF-8 encoding differences.
Most people assume "string length" just means counting letters. Type hello and you get 5 — simple. But the moment you paste in Hindi text, an emoji, or a Chinese phrase, your assumptions fall apart fast. That 5-character Hindi word might be 15 bytes. That one emoji? Could be 4 bytes all by itself.
The String Length Calculator on CalcHub handles all of this correctly — giving you character count, byte count, word count, and line count in one go.
Why Characters ≠ Bytes
Every character on your screen is encoded as one or more bytes in memory. ASCII characters (standard English letters, digits, punctuation) are 1 byte each — that's the easy case. But the world has thousands of scripts, and UTF-8 (the dominant encoding on the web) uses variable-width encoding to handle them all.
So "hello" = 5 characters = 5 bytes. But "नमस्ते" = 6 characters = 18 bytes. That gap matters every time you're storing text in a database, sending it over an API with byte limits, or validating form fields.
Bytes per Script — Quick Reference
| Script / Category | Bytes per Character |
|---|---|
| ASCII (a–z, 0–9, punctuation) | 1 byte |
| Latin Extended (é, ü, ñ) | 2 bytes |
| Greek, Cyrillic, Arabic, Hebrew | 2 bytes |
| Hindi / Devanagari | 3 bytes |
| Bengali, Tamil, Telugu, Kannada | 3 bytes |
| Chinese, Japanese, Korean (CJK) | 3 bytes |
| Emoji (most standard ones) | 4 bytes |
| Complex emoji (skin tones, ZWJ sequences) | 8–28 bytes |
A Practical Example
Say you're building a feature that lets users set a "bio" — max 255 characters. You validate on the front end and everything looks fine. Then a user writes their bio in Tamil, hits submit, and your database throws an error. Why? Because MySQL's VARCHAR(255) counts bytes by default, not characters. That 100-character Tamil bio is already 300 bytes.
Knowing byte count upfront would have caught this before it reached production.
How CalcHub Counts
The calculator gives you:
- Character count — Unicode code points, the way humans count
- Byte count — UTF-8 encoded size, the way storage systems count
- Word count — split on whitespace, works across scripts
- Line count — useful for multi-line text, code, or CSVs
Tips for Developers
- Twitter's 280-character limit counts Unicode characters, not bytes — that's why CJK users effectively get "less space" in bytes but the same limit visually.
- SMS messages are 160 characters in GSM-7 encoding, but only 70 if you use any non-GSM character (like a smart quote or accented letter).
- MySQL TEXT columns store bytes. If you're using
utf8mb4charset (which you should be, for emoji support), each character can be up to 4 bytes.
Does character count include spaces?
Yes — by default, character count includes spaces. Some contexts (like Twitter) count spaces, others don't. CalcHub counts them, but you can mentally subtract if needed.
What encoding does the calculator use?
UTF-8. It's the standard for the web and matches what most databases, APIs, and file systems expect unless you've specifically configured otherwise.
Why does my emoji show as 1 character but take 4 bytes?
Because in Unicode, an emoji is a single code point (one character), but UTF-8 needs 4 bytes to represent code points above U+FFFF. Your screen shows one symbol; your memory holds 4 bytes.
Related: Word Counter · Text Case Converter · Base64 Encoder/Decoder