Database Design for Developers — Tables, Relations, and Normalization
Practical database design: schema planning, normalization forms, one-to-many and many-to-many relations, indexing strategy, and common anti-patterns.
Most developers learn SQL syntax — SELECT, INSERT, JOIN — but skip database design. Then they end up with a users table that has 47 columns, a data column storing JSON blobs, and queries that take 30 seconds because there are no indexes.
Good database design isn't about memorizing normal forms. It's about understanding how data relates, how it'll be queried, and what happens when requirements change. This guide covers the practical decisions you'll face when designing schemas for real applications.
Start With the Entities, Not the Tables
Before writing any CREATE TABLE statement, list your entities and their relationships. For an e-commerce app:
- User — has many orders, has one cart, has many addresses
- Product — belongs to many categories, has many reviews
- Order — belongs to one user, has many order items
- OrderItem — belongs to one order, references one product
- Category — has many products
- Review — belongs to one user and one product
- Address — belongs to one user
The Three Relationship Types
Every relationship between entities is one of three types:
One-to-Many (Most Common)
A user has many orders. An order belongs to one user. The "many" side gets the foreign key:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(100) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id),
total DECIMAL(10, 2) NOT NULL,
status VARCHAR(20) DEFAULT 'pending',
created_at TIMESTAMP DEFAULT NOW()
);
The foreign key user_id on orders enforces the relationship. The database guarantees that every order points to a valid user.
Many-to-Many
A product can belong to many categories. A category can contain many products. This requires a junction table (also called join table or bridge table):
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
price DECIMAL(10, 2) NOT NULL,
description TEXT
);
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
slug VARCHAR(100) UNIQUE NOT NULL
);
CREATE TABLE product_categories (
product_id INTEGER REFERENCES products(id) ON DELETE CASCADE,
category_id INTEGER REFERENCES categories(id) ON DELETE CASCADE,
PRIMARY KEY (product_id, category_id)
);
The junction table has two foreign keys forming a composite primary key. This prevents duplicate entries (a product can't be in the same category twice) and automatically cleans up when either side is deleted.
One-to-One
A user has one profile. This is less common and usually means the data could be in the same table, but you've split it for performance or organizational reasons:
CREATE TABLE user_profiles (
user_id INTEGER PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
bio TEXT,
avatar VARCHAR(500),
website VARCHAR(500),
location VARCHAR(100)
);
The user_id is both the primary key and the foreign key. This guarantees one profile per user.
Normalization — What It Actually Means
Normalization is about eliminating data duplication. There are formal "normal forms," but in practice, you need to understand three:
First Normal Form (1NF): No Repeated Groups
Each column contains a single value. No arrays, no comma-separated lists:
-- BAD: Violates 1NF
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
phone_numbers VARCHAR(500) -- "555-1234, 555-5678, 555-9012"
);
-- GOOD: Separate table for repeating data
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE phone_numbers (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
number VARCHAR(20) NOT NULL,
type VARCHAR(10) DEFAULT 'mobile' -- mobile, home, work
);
Second Normal Form (2NF): No Partial Dependencies
Every non-key column depends on the entire primary key, not just part of it. This mainly applies to tables with composite primary keys:
-- BAD: product_name depends only on product_id, not the full key
CREATE TABLE order_items (
order_id INTEGER,
product_id INTEGER,
product_name VARCHAR(200), -- This depends only on product_id
quantity INTEGER,
PRIMARY KEY (order_id, product_id)
);
-- GOOD: Product name lives in the products table
CREATE TABLE order_items (
order_id INTEGER REFERENCES orders(id),
product_id INTEGER REFERENCES products(id),
quantity INTEGER NOT NULL,
unit_price DECIMAL(10, 2) NOT NULL, -- Snapshot of price at time of order
PRIMARY KEY (order_id, product_id)
);
Notice unit_price in the order items table. This is intentional duplication — you want to record what the customer paid, not the current product price. If the product price changes later, the order history should still be accurate.
Third Normal Form (3NF): No Transitive Dependencies
Every non-key column depends directly on the primary key, not on another non-key column:
-- BAD: city and state depend on zip_code, not on the user
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
zip_code VARCHAR(10),
city VARCHAR(100), -- Determined by zip_code
state VARCHAR(2) -- Determined by zip_code
);
-- GOOD: Separate locations table (if you need it)
-- OR: Just keep city/state on users (pragmatic denormalization)
Here's where theory and practice diverge. Strict 3NF would have you create a zip_codes table. In practice, nobody does this for a web app. A little denormalization is fine when it simplifies queries and the data rarely changes.
Practical Schema Design Patterns
Soft Deletes
Instead of actually deleting rows, mark them as deleted:
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
title VARCHAR(200) NOT NULL,
content TEXT NOT NULL,
author_id INTEGER REFERENCES users(id),
deleted_at TIMESTAMP NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Query only active posts
SELECT * FROM posts WHERE deleted_at IS NULL;
Enums vs. Lookup Tables
For a small, fixed set of values (order status, user role), you have two options:
-- Option A: PostgreSQL ENUM
CREATE TYPE order_status AS ENUM ('pending', 'paid', 'shipped', 'delivered', 'cancelled');
-- Option B: Lookup table
CREATE TABLE order_statuses (
id SERIAL PRIMARY KEY,
name VARCHAR(50) UNIQUE NOT NULL
);
Use ENUMs for values that almost never change. Use lookup tables when you need metadata on the status (display name, color, sort order) or when the values change often.
Audit Trail / History Tables
Track who changed what and when:
CREATE TABLE product_history (
id SERIAL PRIMARY KEY,
product_id INTEGER REFERENCES products(id),
field_name VARCHAR(100) NOT NULL,
old_value TEXT,
new_value TEXT,
changed_by INTEGER REFERENCES users(id),
changed_at TIMESTAMP DEFAULT NOW()
);
Polymorphic Associations
When different entity types can have the same kind of child (comments on posts, comments on products, comments on orders):
-- Option A: Separate foreign keys (simple, strict)
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
body TEXT NOT NULL,
author_id INTEGER REFERENCES users(id),
post_id INTEGER REFERENCES posts(id),
product_id INTEGER REFERENCES products(id),
-- Only one of post_id/product_id should be set
CHECK (
(post_id IS NOT NULL)::int +
(product_id IS NOT NULL)::int = 1
)
);
-- Option B: Polymorphic columns (flexible, no FK enforcement)
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
body TEXT NOT NULL,
author_id INTEGER REFERENCES users(id),
commentable_type VARCHAR(50) NOT NULL, -- 'post', 'product', 'order'
commentable_id INTEGER NOT NULL
);
Option A is safer (foreign key constraints work). Option B is more flexible but loses referential integrity. Pick based on how many types you'll have.
Indexing Strategy
Indexes make reads faster and writes slower. You don't need to index everything — just the columns you query frequently.
What to Index
| Always index | Sometimes index | Don't bother |
|---|---|---|
| Foreign keys | Columns in WHERE clauses | Columns only in SELECT |
| Columns in UNIQUE constraints | Columns in ORDER BY | Boolean columns with 50/50 distribution |
| Primary keys (automatic) | Columns in JOIN conditions | Tables with fewer than 1,000 rows |
-- Foreign key indexes (PostgreSQL doesn't auto-create these)
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
-- Composite index for common query patterns
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Partial index — only index what you query
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';
Checking If Your Indexes Work
Use EXPLAIN ANALYZE:
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 42 AND status = 'pending'
ORDER BY created_at DESC
LIMIT 10;
If you see Seq Scan on a large table, you're missing an index. You want to see Index Scan or Index Only Scan.
Common Anti-Patterns
The God Table. One table with 40+ columns trying to store everything. Split entities into separate tables with foreign key relationships. JSON Columns for Structured Data. PostgreSQL JSON columns are great for truly unstructured data. But if you're querying specific fields inside the JSON, those fields should be regular columns with indexes. No Foreign Keys. "We enforce relationships in application code." Until a bug creates orphaned records, and now you have orders pointing to deleted users. Use foreign keys. The database is the last line of defense. VARCHAR(255) For Everything. 255 isn't a magic number. Think about what the column stores. Email addresses cap at 254 characters (RFC 5321). Country codes are 2-3 characters. URLs can be 2,083 characters. Choose appropriate limits. Not Using TIMESTAMP WITH TIME ZONE. Always useTIMESTAMPTZ in PostgreSQL. Plain TIMESTAMP doesn't store timezone info, which leads to subtle bugs when your users or servers are in different timezones.
Database design is the foundation everything else builds on. A bad schema makes everything harder — queries, performance, maintenance, feature additions. Spend the time up front to get it right. You can practice schema design patterns through interactive exercises on CodeUp.