March 27, 20269 min read

System Design Fundamentals: Everything You Need Before the Interview

The complete building blocks for system design interviews — load balancers, caching, databases, message queues, CDNs, API design, and CAP theorem explained practically.

system-design interviews architecture backend fundamentals

Here's the thing about system design interviews: they're not testing whether you've memorized how Netflix works. They're testing whether you understand the building blocks well enough to assemble them into something reasonable under pressure. And that means you need to actually understand each piece — not just know its name.

This guide covers every fundamental component you'll need. Not surface-level definitions, but the practical knowledge that lets you reason about tradeoffs in real time. Let's get into it.

Load Balancers

A load balancer sits between clients and your servers, distributing traffic so no single server gets crushed. Simple concept, but the details matter in interviews.

L4 vs L7 load balancing:

Layer 4 (Transport) — routes based on IP address and TCP port. Fast because it doesn't inspect the request content. Good for raw throughput. Think of it as a traffic cop that just counts cars.
Layer 7 (Application) — routes based on HTTP headers, URL paths, cookies, etc. Slower but smarter. You can route /api/ to one set of servers and /static/ to another.

Routing algorithms:

Round-robin — requests go to servers in order: 1, 2, 3, 1, 2, 3. Simple and works when all servers are identical and requests take roughly the same time.
Least connections — sends the next request to whichever server is handling the fewest active requests. Better when processing times vary (some requests are heavier than others).
Weighted round-robin — like round-robin but some servers get more traffic. Useful when your servers have different specs.
IP hash / consistent hashing — routes based on client IP or request key. Same client always hits the same server.

Sticky sessions — when you need a user to always hit the same server (e.g., because of in-memory session data). The load balancer uses a cookie or IP hash to maintain affinity. The downside: uneven load distribution and harder failover. The better solution is usually to externalize session state to Redis, but interviewers like hearing you discuss the tradeoff.

Caching

Caching is the single most impactful performance optimization in most systems. It exists at multiple levels, and understanding where to cache is as important as knowing how.

Cache levels (from client to database):

Client Cache → CDN → API Gateway Cache → Application Cache (Redis) → Database Cache (query cache)

Client-side — browser cache, HTTP cache headers (Cache-Control, ETag). Zero server load for cache hits.
CDN — caches static content at edge locations worldwide. Cloudflare, CloudFront, Akamai.
Application-level — Redis or Memcached sitting between your app servers and database. This is the one you'll discuss most in interviews.
Database-level — MySQL query cache, PostgreSQL shared_buffers. Usually transparent to your application.

Redis vs Memcached:

Feature	Redis	Memcached
Data structures	Strings, lists, sets, sorted sets, hashes	Strings only
Persistence	Optional (RDB snapshots, AOF)	None (pure cache)
Replication	Built-in master-replica	None
Use case	Cache + data store	Pure caching
Threading	Single-threaded (mostly)	Multi-threaded

Use Redis unless you have a specific reason not to. It does everything Memcached does and more. Cache invalidation strategies:

Let's be honest — cache invalidation is one of the two hard problems in computer science (along with naming things and off-by-one errors). Here are the main approaches:

TTL (Time-to-Live) — cache entries expire after a set duration. Simple but you'll serve stale data until expiry. Good enough for many use cases.
Write-through — every write goes to both cache and database. Cache is always consistent. Slower writes since you're writing to two places.
Write-behind (write-back) — write to cache immediately, asynchronously flush to database. Fast writes but risk of data loss if the cache crashes before flushing.
Cache-aside (lazy loading) — application reads from cache first. On miss, reads from DB and populates cache. Most common pattern. Stale data possible but manageable with TTL.

Databases

The SQL vs NoSQL decision comes up in every single system design interview. Here's the honest framework — it's not about which is "better."

When to use SQL (PostgreSQL, MySQL):

You need ACID transactions (banking, e-commerce orders)
Your data has complex relationships (joins across tables)
Your schema is well-defined and unlikely to change drastically
You need strong consistency guarantees

When to use NoSQL:

Document stores (MongoDB, DynamoDB) — flexible schema, denormalized data, when your access patterns are known upfront
Key-value stores (Redis, DynamoDB) — simple lookups by key, session storage, caching
Wide-column stores (Cassandra, HBase) — time-series data, write-heavy workloads, massive scale
Graph databases (Neo4j) — social networks, recommendation engines, anything with complex relationship traversals

Scaling relational databases:

Read replicas — write to the primary, read from replicas. Easy win for read-heavy workloads. Introduces replication lag (eventual consistency on reads).
Sharding — split data across multiple databases by some key. Horizontal partitioning. Hard to do well — cross-shard queries are painful, and resharding is expensive.
Vertical partitioning — split different tables into different databases. User data on one cluster, product data on another.

Message Queues

When you need to decouple services, smooth out traffic spikes, or do work asynchronously, you reach for a message queue.

Kafka vs RabbitMQ vs SQS:

Feature	Kafka	RabbitMQ	SQS
Model	Distributed log	Message broker	Managed queue
Ordering	Per partition	Per queue	Best effort (FIFO available)
Throughput	Extremely high (millions/sec)	Moderate (tens of thousands/sec)	Moderate
Retention	Configurable (days/weeks)	Until consumed	Up to 14 days
Best for	Event streaming, data pipelines	Task queues, RPC	Simple async tasks on AWS
Complexity	High (ZooKeeper/KRaft, partitions)	Moderate	Low (fully managed)

When to use each:

Kafka — when you need event replay, multiple consumers reading the same stream, or extremely high throughput. Analytics pipelines, activity feeds, CDC (Change Data Capture).
RabbitMQ — when you need flexible routing, priority queues, or request-reply patterns. Background job processing, microservice communication.
SQS — when you're on AWS and want zero operational overhead. Simple async task processing.

CDNs

A CDN caches your content at edge locations globally. Two deployment models:

Push CDN — you explicitly upload content to the CDN. Full control over what's cached. Better when content changes infrequently. More work to manage.
Pull CDN — CDN fetches content from your origin on the first request, then caches it. Simpler. Better when you have a lot of content and don't want to manage uploads. Most common.

In interviews, mention CDNs whenever the system serves static content (images, videos, CSS/JS) to geographically distributed users.

API Design

REST — resource-oriented, HTTP verbs (GET, POST, PUT, DELETE), stateless. The default choice. Easy to understand, well-tooled, works for most CRUD applications. GraphQL — client specifies exactly what data it needs. Eliminates over-fetching and under-fetching. Good for complex UIs with varied data requirements (mobile apps needing different data than web). Overhead: query parsing, potential N+1 problems, harder caching. gRPC — binary protocol (Protocol Buffers), strongly typed, bidirectional streaming. Used for internal service-to-service communication where performance matters. Not great for browser clients (needs a proxy). When to pick what:

Public API consumed by third parties → REST
Mobile app with complex, varied data needs → GraphQL
Internal microservice communication → gRPC
Simple CRUD application → REST

CAP Theorem — Explained Honestly

CAP theorem says a distributed system can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance.

Here's where most explanations go wrong: they say "pick 2 of 3." That's misleading. In any distributed system, network partitions will happen. You don't get to opt out of P. So the real choice is:

When a network partition occurs, do you prioritize:

Consistency (CP) — reject requests rather than return stale data. The system is unavailable during the partition but never returns wrong answers. Example: ZooKeeper, HBase.
Availability (AP) — keep serving requests even during a partition, accepting that some responses might be stale. Example: Cassandra, DynamoDB.

When there's no partition (normal operation), you can have both C and A. The tradeoff only kicks in during failures. Consistency models worth knowing:

Strong consistency — every read sees the most recent write. Expensive but simple to reason about.
Eventual consistency — given enough time with no new writes, all replicas converge. The default for most distributed systems. Good enough for most use cases.
Causal consistency — if operation A caused operation B, everyone sees A before B. Preserves cause-and-effect ordering without the cost of strong consistency.

The Cheat Sheet

Component	When to Use	Popular Choices
Load Balancer	Multiple servers, horizontal scaling	NGINX, AWS ALB/NLB, HAProxy
Cache (application)	Read-heavy, expensive queries	Redis, Memcached
CDN	Static content, global users	CloudFront, Cloudflare, Akamai
SQL Database	ACID, complex joins, structured data	PostgreSQL, MySQL
Document Store	Flexible schema, denormalized data	MongoDB, DynamoDB
Key-Value Store	Simple lookups, sessions, cache	Redis, DynamoDB
Wide-Column Store	Time-series, high write throughput	Cassandra, HBase
Message Queue (streaming)	Event replay, high throughput	Kafka
Message Queue (task)	Background jobs, async processing	RabbitMQ, SQS
Search Engine	Full-text search, filtering	Elasticsearch, OpenSearch
Object Storage	Large files, media	S3, GCS, Azure Blob
API Gateway	Rate limiting, auth, routing	Kong, AWS API Gateway

Putting It Together

The point isn't to memorize this table. It's to understand why each component exists and when to reach for it. In an interview, you'll assemble these pieces based on the specific requirements you've clarified.

Start with the simplest architecture that meets the requirements. Add components only when you can articulate why they're needed. "I'm adding Redis here because our read-to-write ratio is 50:1 and the same posts are viewed thousands of times" is infinitely better than "I'm adding Redis because all systems need caching."

That's the difference between someone who's memorized building blocks and someone who understands them.

Practice assembling these fundamentals into complete designs on CodeUp — start with classic problems and work up to novel ones. The building blocks don't change; your speed at combining them does.