Design a Chat System Like WhatsApp: System Design Deep Dive
System design walkthrough for a real-time chat system — WebSocket connections, message delivery, group chat fan-out, offline handling, and end-to-end encryption.
Designing a chat system is one of the most common and most revealing system design interview questions. It touches on real-time communication, delivery guarantees, presence tracking, group messaging, and scaling persistent connections — all in one problem. It's the interview question that separates people who've thought about distributed systems from people who've just read about them.
Let's walk through it like a real interview conversation.
Requirements
Functional:- 1-on-1 messaging
- Group chat (up to 500 members)
- Online/offline/last-seen status
- Message delivery status (sent, delivered, read)
- Push notifications for offline users
- Media messages (images, videos, files)
- Message history and search
- Real-time delivery (sub-second latency for online users)
- Message ordering within a conversation
- At-least-once delivery (no lost messages)
- High availability (chat should work even during partial outages)
- 500M daily active users, average 40 messages per user per day = 20B messages/day
- Messages per second: 20B / 86,400 ≈ 230K messages/sec
- Average message size: 200 bytes (text) → ~46 GB/day for text
- Media: assume 10% of messages have media, average 200KB → ~400 TB/day
- Concurrent WebSocket connections: ~100M at peak
Connection Protocol
Here's the thing about chat: HTTP was designed for request-response. Client asks, server answers. But chat needs the server to push messages to clients without being asked. You have three options:
Polling — client asks "any new messages?" every few seconds. Simple but wasteful. Thousands of empty responses burning bandwidth and server resources. Long polling — client sends a request, server holds it open until there's a new message (or timeout). Better than polling but still has overhead from constantly re-establishing connections. WebSocket — persistent, bidirectional TCP connection. Client and server can send messages at any time. This is the right answer for chat. After an initial HTTP handshake, the connection upgrades to WebSocket and stays open.Client ←→ WebSocket Connection ←→ Chat Server
(persistent, bidirectional)
Use WebSocket as the primary protocol, with long polling as a fallback for environments where WebSocket doesn't work (some corporate proxies block it).
High-Level Architecture
┌──────────┐ WebSocket ┌─────────────────┐
│ Client A ├────────────────────┤ Chat Server 1 │
└──────────┘ └────────┬────────┘
│
┌──────────┐ WebSocket ┌────────▼────────┐
│ Client B ├────────────────────┤ Chat Server 2 │
└──────────┘ └────────┬────────┘
│
┌────────▼────────┐
│ Message Service │
└──┬──────┬───┬───┘
│ │ │
┌────────▼┐ ┌──▼───▼────┐
│ Kafka │ │ Presence │
│ (queue) │ │ Service │
└────┬────┘ │ (Redis) │
│ └────────────┘
┌────▼────┐
│Database │
│(Cassandra)│
└─────────┘
Core components:
- Chat servers — maintain WebSocket connections with clients. Each server handles tens of thousands of concurrent connections.
- Message service — routes messages between users, handles group fan-out, ensures delivery.
- Presence service — tracks who's online and which chat server they're connected to. Backed by Redis for fast lookups.
- Message queue (Kafka) — decouples message production from storage and delivery. Guarantees messages aren't lost.
- Database (Cassandra) — stores message history. Optimized for the access pattern we need.
Message Flow: 1-on-1 Chat
Let's trace what happens when Alice sends "hey" to Bob:
- Alice's client sends the message over WebSocket to Chat Server 1
- Chat Server 1 sends the message to the Message Service
- Message Service does three things:
- Presence Service responds: "Bob is connected to Chat Server 2"
- Message Service forwards the message to Chat Server 2
- Chat Server 2 pushes the message to Bob over WebSocket
- Bob's client sends "delivered" acknowledgment back
- Meanwhile, Kafka consumer persists the message to Cassandra
Message Storage
The access pattern for chat messages is very specific: "Give me the messages for conversation X, ordered by time, paginated." This is a sequential read pattern.
Why Cassandra works well here:Partition key: conversation_id
Clustering key: message_timestamp (descending)
CREATE TABLE messages (
conversation_id UUID,
message_id TIMEUUID,
sender_id UUID,
content TEXT,
media_url TEXT,
message_type TEXT, -- 'text', 'image', 'video'
created_at TIMESTAMP,
PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
All messages for a conversation live on the same partition, sorted by time. Fetching the latest 50 messages is a single sequential read — extremely fast.
Why not SQL? It would work at smaller scale. But at 230K messages/sec, the write throughput requirements push us toward a database designed for high-volume writes. Cassandra handles this natively with its log-structured storage engine.Group Chat: The Fan-Out Decision
Group messaging introduces a key design decision: when Alice sends a message to a group of 200 people, how do you deliver it?
Fan-out-on-write: When a message arrives, create a copy for each group member. Each member has their own message queue/inbox.Alice sends "hello" to Group (200 members)
→ Write 200 copies, one per member's inbox
→ Each member reads from their own inbox
- Pros: Reading is fast — each user just reads their own inbox. Simple client logic.
- Cons: 1 message becomes 200 writes. For large groups, this is expensive. If a group has 500 members and gets 100 messages/hour, that's 50,000 writes/hour just for one group.
Alice sends "hello" to Group
→ Write 1 copy to group's message store
→ Each member queries the group's messages when they open the chat
- Pros: 1 write regardless of group size. Storage efficient.
- Cons: Reading requires querying the group store every time. For active groups, this is fine. For users in 50 groups, opening the app means 50 queries.
Delivery Semantics: Sent, Delivered, Read
Three levels of message acknowledgment:
- Sent (single check ✓) — server received the message from the sender
- Delivered (double check ✓✓) — recipient's device received the message
- Read (blue checks) — recipient opened the conversation
Sent: Server → sender's client: "message_id stored"
Delivered: Recipient's client → server: "message_id received"
Server → sender's client: "message_id delivered"
Read: Recipient opens chat → server: "messages up to X read"
Server → sender's client: "message_id read"
For groups, track delivery and read status per member. This creates a lot of status updates, so batch them — don't send individual acks for every message in a group of 200.
Handling Offline Users
When a user is offline:
- Messages are stored in the database (they would be anyway)
- A push notification is sent via FCM/APNs with a preview
- When the user reconnects, the client sends its last received message ID
- Server returns all messages after that ID
Client reconnects:
→ "My last message_id is abc123"
← Server sends all messages after abc123, per conversation
This is a pull model on reconnect. The client drives synchronization, which is simpler and more reliable than the server trying to push a backlog.
Media Messages
You never send actual media through the chat pipeline. That would choke everything.
1. Client uploads image to S3/CDN → gets URL
- Client sends message with type: "image", media_url: "https://cdn.../img.jpg"
- Recipient receives message, client downloads image from CDN
The chat message itself is just a small JSON payload with a URL. The heavy lifting of media transfer is offloaded to object storage and CDN infrastructure.
For thumbnails, generate them server-side on upload and include a thumbnail URL in the message. The recipient sees the thumbnail immediately while the full image loads.
End-to-End Encryption
Let's be honest — you probably won't need to implement E2EE in a 45-minute interview. But mentioning it shows awareness.
The concept (Signal Protocol, used by WhatsApp): each user has a public/private key pair. Messages are encrypted with the recipient's public key on the sender's device and decrypted with the recipient's private key on their device. The server never sees plaintext.
The implication for system design: the server can't index or search encrypted message content. Features like server-side search don't work with E2EE. This is a real tradeoff that WhatsApp accepts (and Telegram doesn't, since Telegram's regular chats aren't E2E encrypted).
Scaling Persistent Connections
With 100M concurrent WebSocket connections, you need a lot of chat servers. If each server handles 50K concurrent connections, you need 2,000 chat servers.
User-to-server mapping: Use consistent hashing to assign users to chat servers. The Presence Service (Redis) stores the mapping:user_id → chat_server_id.
Server failure: If a chat server goes down, all connected users reconnect. The load balancer routes them to a new server, and the Presence Service updates. Clients should have reconnection logic with exponential backoff.
Cross-server message delivery: When Alice (on Server 1) messages Bob (on Server 2), the Message Service looks up Bob's server in the Presence Service and routes accordingly. This is a service mesh pattern — servers communicate through the message routing layer, not directly.
What Makes This Answer Strong
A strong answer to "design a chat system" covers:
- The WebSocket decision (and why, not just what)
- Message flow traced through the system
- Storage model (Cassandra partition design for sequential reads)
- Group chat fan-out tradeoff (and the hybrid approach)
- Offline message handling
- Media as URL references, not inline content
- Delivery receipts as a status machine
Practice building out these distributed system designs interactively at CodeUp — working through the message flow step by step builds the intuition that reading alone can't.