System Design Interviews: How to Think Through Problems You've Never Seen
A structured framework for system design interviews covering estimation, architecture patterns, and common designs like URL shorteners and chat systems.
System design interviews are terrifying because there's no single correct answer. The interviewer hands you a vague prompt — "design Twitter" or "design a URL shortener" — and you have 45 minutes to demonstrate that you can think about large-scale systems without freezing up.
The good news: there's a repeatable framework. You don't need to memorize architectures. You need to learn how to think through problems you've never seen before, ask the right questions, and make reasonable tradeoffs. That's what this guide is about.
The Four-Phase Framework
Every system design interview follows roughly the same structure, whether you're designing a chat app or a video streaming platform. Internalize these four phases and you'll never stare blankly at the whiteboard again.
Phase 1: Requirements Clarification (3-5 minutes)
This is where most candidates mess up. They jump straight into drawing boxes and arrows without understanding what they're building. The interviewer is testing whether you can handle ambiguity — and the first test is whether you ask clarifying questions.
Functional requirements — what does the system do?- Who are the users? How many?
- What are the core features? (Not everything — the top 3-5.)
- What does the read/write ratio look like?
- Are there any special constraints? (Real-time? Consistency? Ordering?)
- Availability vs. consistency (pick one to prioritize)
- Latency expectations (sub-100ms? Sub-second?)
- Data durability requirements
- Scale expectations (DAU, requests per second)
Don't skip this. Spending 3 minutes here saves you from designing the wrong system.
Phase 2: Back-of-the-Envelope Estimation (3-5 minutes)
This isn't about getting exact numbers. It's about demonstrating that you think about scale before you design.
Key things to estimate:
- QPS (queries per second): If you have 1B requests per day, that's roughly 1B / 86400 ~= 12K QPS average, maybe 24-36K at peak (2-3x average).
- Storage: If each URL entry is ~500 bytes and you store 100M per day, that's 50GB per day, ~18TB per year.
- Bandwidth: 12K QPS * 500 bytes ~= 6 MB/s. Not a bottleneck.
- Memory for caching: If 20% of URLs get 80% of traffic, caching the top 20% of daily URLs = 0.2 100M 500 bytes = 10GB. Fits in a single machine's RAM.
Phase 3: High-Level Design (10-15 minutes)
Now you draw the architecture. Start with the simplest thing that works, then add complexity as needed.
The basic template for most web systems:
Client -> Load Balancer -> Application Servers -> Database
-> Cache
Then extend based on requirements:
- Read-heavy? Add a cache layer (Redis/Memcached).
- Write-heavy? Add a message queue for async processing.
- Global users? Add a CDN and consider geo-distributed databases.
- Real-time? Add WebSocket servers or a pub/sub system.
- Search needed? Add a search index (Elasticsearch).
- Large files? Add object storage (S3) and a CDN.
Phase 4: Deep Dive (15-20 minutes)
The interviewer will pick one or two areas to dig into. This is where you show depth. Common deep dives:
- Database schema and indexing strategy
- How the cache invalidation works
- How you'd handle a specific failure mode
- Data partitioning strategy
- Consistency model and conflict resolution
The Building Blocks
You don't need to know every technology, but you should understand these fundamental components and when to use each.
Load Balancers
A load balancer distributes incoming traffic across multiple servers. It sits between clients and your application servers.
Key algorithms:- Round robin — simple, works when servers are identical
- Least connections — sends traffic to the server handling the fewest requests, good when request processing times vary
- Consistent hashing — routes based on a hash of the request (e.g., user ID), good when you need session affinity or want to maximize cache hit rates
Caching
Caching stores frequently accessed data in a fast layer (usually in-memory) to reduce load on the database and improve latency.
Cache strategies:- Cache-aside (lazy loading) — application checks cache first, reads from DB on miss, populates cache. Most common pattern.
- Write-through — write to cache and DB simultaneously. Consistent but slower writes.
- Write-behind — write to cache, asynchronously sync to DB. Fast writes, risk of data loss.
- LRU (Least Recently Used) — evict the item not accessed for the longest time. Default choice.
- TTL (Time-to-Live) — items expire after a set duration. Good for data that changes on a schedule.
Databases
SQL (relational) — MySQL, PostgreSQL. Strong consistency, ACID transactions, structured data with relationships. Use when data integrity matters and your schema is well-defined. NoSQL (document) — MongoDB, DynamoDB. Flexible schema, horizontal scaling, eventual consistency. Use when your data model is denormalized, or you need extreme write throughput. NoSQL (key-value) — Redis, DynamoDB. Sub-millisecond lookups by key. Use for session stores, caching, leaderboards. NoSQL (wide-column) — Cassandra, HBase. Good for time-series data, write-heavy workloads, large-scale analytics. Partitioning strategies:- Horizontal partitioning (sharding) — split data across machines by some key (user ID, geo region)
- Consistent hashing — distribute data across shards while minimizing redistribution when shards are added/removed
Message Queues
Kafka, RabbitMQ, SQS. Decouple producers from consumers, smooth out traffic spikes, enable async processing.
When to use:- Processing that doesn't need to happen in the request path (sending emails, generating thumbnails)
- Traffic spikes — queue absorbs the burst, workers process at a steady rate
- Communication between microservices
CDN (Content Delivery Network)
Caches static content (images, JS, CSS) at edge locations close to users. Reduces latency and offloads traffic from your origin servers.
When to use: Any system serving static content to geographically distributed users.Common Designs: Worked Examples
URL Shortener
Requirements: Generate short URLs, redirect to original URL, analytics (optional), high availability. High-level design:Client -> Load Balancer -> App Server -> Cache (Redis)
-> Database (key-value)
Key decisions:
- URL generation: Use a counter-based approach (base62 encode an auto-incrementing ID) or a hash-based approach (MD5/SHA256 of the long URL, take first 7 characters). Counter gives shorter URLs and no collisions. Hash is simpler but needs collision handling.
- Database: A key-value store is ideal — you're doing simple lookups by short URL key. DynamoDB or even Redis with persistence.
- Caching: Cache popular URLs in Redis. Most URL shorteners follow a power-law distribution — a small percentage of URLs get the vast majority of clicks.
- Read path: Check cache -> check DB -> return 301/302 redirect.
- Scaling: Partition by short URL hash. Each partition handles a subset of the keyspace.
Chat System
Requirements: 1-on-1 messaging, group chat, online/offline status, message history, real-time delivery. High-level design:Client <-> WebSocket Server <-> Message Service -> Message Queue -> Database
-> Presence Service -> Redis
-> Group Service -> Database
Key decisions:
- Real-time delivery: WebSocket connections for persistent bidirectional communication. Each user maintains a connection to a WebSocket server.
- Message routing: When User A sends a message to User B, the message service needs to know which WebSocket server User B is connected to. Use a presence service backed by Redis that maps user IDs to WebSocket server IDs.
- Message storage: Write messages to a message queue (Kafka) for reliability, then persist to a database. Use a database that's optimized for sequential reads by conversation (Cassandra with partition key = conversation_id, clustering key = timestamp).
- Group messages: Fan out on write (store a copy for each group member) or fan out on read (store once, query for each member). Fan out on read is simpler and uses less storage, but reads are more expensive. For small groups, fan out on write is fine. For large groups (1000+ members), fan out on read avoids write amplification.
- Offline delivery: When a user comes online, pull undelivered messages from the database.
News Feed / Timeline
Requirements: Users follow other users, see a feed of their posts, real-time updates, ranked feed (optional). Key decisions:- Fan-out on write: When a user posts, push the post to all followers' feeds (stored in cache/database). Fast reads, expensive writes. Works well when most users have a moderate number of followers.
- Fan-out on read: When a user opens their feed, pull posts from all users they follow and merge. Cheap writes, expensive reads. Better for users with millions of followers (celebrities).
- Hybrid: Fan out on write for normal users, fan out on read for celebrities. This is what Twitter actually does.
- Feed storage: Pre-computed feeds stored in Redis (list of post IDs per user). When a user scrolls, paginate through the list.
- Ranking: Apply a ranking algorithm (relevance, recency, engagement) before displaying. This is a separate service that re-orders the raw feed.
What Interviewers Actually Look For
Having watched dozens of system design interviews from both sides of the table, here's what separates strong candidates:
Structured thinking. Follow the framework. Don't jump into details without establishing requirements and scale. Interviewers want to see that you can organize ambiguous problems. Tradeoff awareness. Every design decision has tradeoffs. Don't present your choices as the only option — explain what you considered and why you picked this approach. "I'm choosing eventual consistency here because availability is more important for this use case, and a few seconds of stale data is acceptable." Knowing your numbers. You don't need to memorize exact specs, but you should know rough orders of magnitude. A single machine can handle ~1K-10K QPS depending on the workload. Redis can handle ~100K operations per second. An SSD can do ~100K random reads per second. These rough numbers help you estimate whether your design works. Driving the conversation. Don't wait for the interviewer to tell you what to do next. Move through the phases proactively. If you've covered the high-level design, say "I'd like to dive deeper into the database sharding strategy — is that a good area to explore, or would you prefer I focus on something else?" Honesty about gaps. "I'm not sure about the exact implementation of X, but here's how I'd approach figuring it out" is much better than making something up. Interviewers respect intellectual honesty.Common Mistakes
Overengineering. Don't add Kafka, Elasticsearch, a graph database, and a machine learning pipeline for a URL shortener. Start simple and add complexity only when the requirements demand it. Ignoring failure modes. What happens when the database goes down? What if a server crashes mid-request? What if the cache gets corrupted? Mentioning failure handling shows maturity. Not discussing data model. At some point, talk about what your data actually looks like. What tables do you need? What are the primary keys? What indexes? This grounds your design in reality. Treating it as a memorization exercise. Interviewers can tell when you're reciting a memorized design vs. reasoning through it. If they change the requirements mid-interview, you need to adapt — and you can't adapt a memorized answer. Not communicating enough. This is a collaborative exercise. Think out loud. Draw as you talk. Check in with the interviewer. A silent candidate who draws a perfect architecture is less impressive than a communicative candidate who reasons through a good-enough design.How to Practice
Start with classic designs: URL shortener, chat system, news feed, web crawler, notification system. These cover the most common patterns. Time yourself. Give yourself 45 minutes per design. If you can't finish, that's useful feedback — figure out where you spent too long. Study real architectures. Read engineering blogs from companies like Netflix, Uber, Airbnb, and Discord. Understanding how real systems evolved is more valuable than memorizing idealized designs. Practice with a partner. System design is a conversation. Practicing alone teaches you the content, but practicing with someone teaches you the communication. Build something small. Even a toy implementation of a URL shortener or chat system gives you intuition that no amount of reading can replace. When you've actually configured a load balancer or set up Redis caching, you speak about it differently in interviews.The framework works because system design problems are more similar than they are different. Every system needs to handle reads and writes, deal with scale, manage failures, and make tradeoffs between consistency, availability, and partition tolerance. Learn the patterns, practice the framework, and you'll handle any design question they throw at you.
Explore interactive system design exercises and coding challenges on CodeUp to build both the breadth and depth you need for these interviews.