March 28, 20264 min read

String Length Calculator: Characters, Bytes, and Words Explained

Count characters, bytes, and words in any text. Understand why byte count differs from character count with UTF-8 encoding differences.

string length character count utf-8 text tools calchub

Most people assume "string length" just means counting letters. Type hello and you get 5 — simple. But the moment you paste in Hindi text, an emoji, or a Chinese phrase, your assumptions fall apart fast. That 5-character Hindi word might be 15 bytes. That one emoji? Could be 4 bytes all by itself.

The String Length Calculator on CalcHub handles all of this correctly — giving you character count, byte count, word count, and line count in one go.

Why Characters ≠ Bytes

Every character on your screen is encoded as one or more bytes in memory. ASCII characters (standard English letters, digits, punctuation) are 1 byte each — that's the easy case. But the world has thousands of scripts, and UTF-8 (the dominant encoding on the web) uses variable-width encoding to handle them all.

So "hello" = 5 characters = 5 bytes. But "नमस्ते" = 6 characters = 18 bytes. That gap matters every time you're storing text in a database, sending it over an API with byte limits, or validating form fields.

Bytes per Script — Quick Reference

Script / Category	Bytes per Character
ASCII (a–z, 0–9, punctuation)	1 byte
Latin Extended (é, ü, ñ)	2 bytes
Greek, Cyrillic, Arabic, Hebrew	2 bytes
Hindi / Devanagari	3 bytes
Bengali, Tamil, Telugu, Kannada	3 bytes
Chinese, Japanese, Korean (CJK)	3 bytes
Emoji (most standard ones)	4 bytes
Complex emoji (skin tones, ZWJ sequences)	8–28 bytes

This is UTF-8 specifically. UTF-16 and UTF-32 use different widths — hence why string length in Java (which uses UTF-16 internally) can surprise you.

A Practical Example

Say you're building a feature that lets users set a "bio" — max 255 characters. You validate on the front end and everything looks fine. Then a user writes their bio in Tamil, hits submit, and your database throws an error. Why? Because MySQL's VARCHAR(255) counts bytes by default, not characters. That 100-character Tamil bio is already 300 bytes.

Knowing byte count upfront would have caught this before it reached production.

How CalcHub Counts

The calculator gives you:

Character count — Unicode code points, the way humans count
Byte count — UTF-8 encoded size, the way storage systems count
Word count — split on whitespace, works across scripts
Line count — useful for multi-line text, code, or CSVs

Paste your text, get all four numbers instantly.

Tips for Developers

Twitter's 280-character limit counts Unicode characters, not bytes — that's why CJK users effectively get "less space" in bytes but the same limit visually.
SMS messages are 160 characters in GSM-7 encoding, but only 70 if you use any non-GSM character (like a smart quote or accented letter).
MySQL TEXT columns store bytes. If you're using utf8mb4 charset (which you should be, for emoji support), each character can be up to 4 bytes.

Does character count include spaces?

Yes — by default, character count includes spaces. Some contexts (like Twitter) count spaces, others don't. CalcHub counts them, but you can mentally subtract if needed.

What encoding does the calculator use?

UTF-8. It's the standard for the web and matches what most databases, APIs, and file systems expect unless you've specifically configured otherwise.

Why does my emoji show as 1 character but take 4 bytes?

Because in Unicode, an emoji is a single code point (one character), but UTF-8 needs 4 bytes to represent code points above U+FFFF. Your screen shows one symbol; your memory holds 4 bytes.

Related: Word Counter · Text Case Converter · Base64 Encoder/Decoder