March 26, 20268 min read

Database Design for Developers — Tables, Relations, and Normalization

Practical database design: schema planning, normalization forms, one-to-many and many-to-many relations, indexing strategy, and common anti-patterns.

database design normalization sql schema

Most developers learn SQL syntax — SELECT, INSERT, JOIN — but skip database design. Then they end up with a users table that has 47 columns, a data column storing JSON blobs, and queries that take 30 seconds because there are no indexes.

Good database design isn't about memorizing normal forms. It's about understanding how data relates, how it'll be queried, and what happens when requirements change. This guide covers the practical decisions you'll face when designing schemas for real applications.

Start With the Entities, Not the Tables

Before writing any CREATE TABLE statement, list your entities and their relationships. For an e-commerce app:

User — has many orders, has one cart, has many addresses
Product — belongs to many categories, has many reviews
Order — belongs to one user, has many order items
OrderItem — belongs to one order, references one product
Category — has many products
Review — belongs to one user and one product
Address — belongs to one user

Write this down in plain English before touching SQL. If you can't describe the relationships in words, you can't model them in tables.

The Three Relationship Types

Every relationship between entities is one of three types:

One-to-Many (Most Common)

A user has many orders. An order belongs to one user. The "many" side gets the foreign key:

CREATE TABLE users (
  id         SERIAL PRIMARY KEY,
  email      VARCHAR(255) UNIQUE NOT NULL,
  name       VARCHAR(100) NOT NULL,
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE orders (
  id         SERIAL PRIMARY KEY,
  user_id    INTEGER NOT NULL REFERENCES users(id),
  total      DECIMAL(10, 2) NOT NULL,
  status     VARCHAR(20) DEFAULT 'pending',
  created_at TIMESTAMP DEFAULT NOW()
);

The foreign key user_id on orders enforces the relationship. The database guarantees that every order points to a valid user.

Many-to-Many

A product can belong to many categories. A category can contain many products. This requires a junction table (also called join table or bridge table):

CREATE TABLE products (
  id          SERIAL PRIMARY KEY,
  name        VARCHAR(200) NOT NULL,
  price       DECIMAL(10, 2) NOT NULL,
  description TEXT
);

CREATE TABLE categories (
  id   SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  slug VARCHAR(100) UNIQUE NOT NULL
);

CREATE TABLE product_categories (
  product_id  INTEGER REFERENCES products(id) ON DELETE CASCADE,
  category_id INTEGER REFERENCES categories(id) ON DELETE CASCADE,
  PRIMARY KEY (product_id, category_id)
);

The junction table has two foreign keys forming a composite primary key. This prevents duplicate entries (a product can't be in the same category twice) and automatically cleans up when either side is deleted.

One-to-One

A user has one profile. This is less common and usually means the data could be in the same table, but you've split it for performance or organizational reasons:

CREATE TABLE user_profiles (
  user_id  INTEGER PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
  bio      TEXT,
  avatar   VARCHAR(500),
  website  VARCHAR(500),
  location VARCHAR(100)
);

The user_id is both the primary key and the foreign key. This guarantees one profile per user.

Normalization — What It Actually Means

Normalization is about eliminating data duplication. There are formal "normal forms," but in practice, you need to understand three:

First Normal Form (1NF): No Repeated Groups

Each column contains a single value. No arrays, no comma-separated lists:

-- BAD: Violates 1NF
CREATE TABLE users (
  id    SERIAL PRIMARY KEY,
  name  VARCHAR(100),
  phone_numbers VARCHAR(500)  -- "555-1234, 555-5678, 555-9012"
);

-- GOOD: Separate table for repeating data
CREATE TABLE users (
  id   SERIAL PRIMARY KEY,
  name VARCHAR(100)
);

CREATE TABLE phone_numbers (
  id      SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  number  VARCHAR(20) NOT NULL,
  type    VARCHAR(10) DEFAULT 'mobile'  -- mobile, home, work
);

Second Normal Form (2NF): No Partial Dependencies

Every non-key column depends on the entire primary key, not just part of it. This mainly applies to tables with composite primary keys:

-- BAD: product_name depends only on product_id, not the full key
CREATE TABLE order_items (
  order_id     INTEGER,
  product_id   INTEGER,
  product_name VARCHAR(200),  -- This depends only on product_id
  quantity     INTEGER,
  PRIMARY KEY (order_id, product_id)
);

-- GOOD: Product name lives in the products table
CREATE TABLE order_items (
  order_id   INTEGER REFERENCES orders(id),
  product_id INTEGER REFERENCES products(id),
  quantity   INTEGER NOT NULL,
  unit_price DECIMAL(10, 2) NOT NULL,  -- Snapshot of price at time of order
  PRIMARY KEY (order_id, product_id)
);

Notice unit_price in the order items table. This is intentional duplication — you want to record what the customer paid, not the current product price. If the product price changes later, the order history should still be accurate.

Third Normal Form (3NF): No Transitive Dependencies

Every non-key column depends directly on the primary key, not on another non-key column:

-- BAD: city and state depend on zip_code, not on the user
CREATE TABLE users (
  id        SERIAL PRIMARY KEY,
  name      VARCHAR(100),
  zip_code  VARCHAR(10),
  city      VARCHAR(100),   -- Determined by zip_code
  state     VARCHAR(2)      -- Determined by zip_code
);

-- GOOD: Separate locations table (if you need it)
-- OR: Just keep city/state on users (pragmatic denormalization)

Here's where theory and practice diverge. Strict 3NF would have you create a zip_codes table. In practice, nobody does this for a web app. A little denormalization is fine when it simplifies queries and the data rarely changes.

Practical Schema Design Patterns

Soft Deletes

Instead of actually deleting rows, mark them as deleted:

CREATE TABLE posts (
  id         SERIAL PRIMARY KEY,
  title      VARCHAR(200) NOT NULL,
  content    TEXT NOT NULL,
  author_id  INTEGER REFERENCES users(id),
  deleted_at TIMESTAMP NULL,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Query only active posts
SELECT * FROM posts WHERE deleted_at IS NULL;

Enums vs. Lookup Tables

For a small, fixed set of values (order status, user role), you have two options:

-- Option A: PostgreSQL ENUM
CREATE TYPE order_status AS ENUM ('pending', 'paid', 'shipped', 'delivered', 'cancelled');

-- Option B: Lookup table
CREATE TABLE order_statuses (
  id   SERIAL PRIMARY KEY,
  name VARCHAR(50) UNIQUE NOT NULL
);

Use ENUMs for values that almost never change. Use lookup tables when you need metadata on the status (display name, color, sort order) or when the values change often.

Audit Trail / History Tables

Track who changed what and when:

CREATE TABLE product_history (
  id          SERIAL PRIMARY KEY,
  product_id  INTEGER REFERENCES products(id),
  field_name  VARCHAR(100) NOT NULL,
  old_value   TEXT,
  new_value   TEXT,
  changed_by  INTEGER REFERENCES users(id),
  changed_at  TIMESTAMP DEFAULT NOW()
);

Polymorphic Associations

When different entity types can have the same kind of child (comments on posts, comments on products, comments on orders):

-- Option A: Separate foreign keys (simple, strict)
CREATE TABLE comments (
  id         SERIAL PRIMARY KEY,
  body       TEXT NOT NULL,
  author_id  INTEGER REFERENCES users(id),
  post_id    INTEGER REFERENCES posts(id),
  product_id INTEGER REFERENCES products(id),
  -- Only one of post_id/product_id should be set
  CHECK (
    (post_id IS NOT NULL)::int +
    (product_id IS NOT NULL)::int = 1
  )
);

-- Option B: Polymorphic columns (flexible, no FK enforcement)
CREATE TABLE comments (
  id              SERIAL PRIMARY KEY,
  body            TEXT NOT NULL,
  author_id       INTEGER REFERENCES users(id),
  commentable_type VARCHAR(50) NOT NULL,  -- 'post', 'product', 'order'
  commentable_id   INTEGER NOT NULL
);

Option A is safer (foreign key constraints work). Option B is more flexible but loses referential integrity. Pick based on how many types you'll have.

Indexing Strategy

Indexes make reads faster and writes slower. You don't need to index everything — just the columns you query frequently.

What to Index

Always index	Sometimes index	Don't bother
Foreign keys	Columns in WHERE clauses	Columns only in SELECT
Columns in UNIQUE constraints	Columns in ORDER BY	Boolean columns with 50/50 distribution
Primary keys (automatic)	Columns in JOIN conditions	Tables with fewer than 1,000 rows

-- Foreign key indexes (PostgreSQL doesn't auto-create these)
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

-- Composite index for common query patterns
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Partial index — only index what you query
CREATE INDEX idx_orders_pending ON orders(created_at)
  WHERE status = 'pending';

Checking If Your Indexes Work

Use EXPLAIN ANALYZE:

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 42 AND status = 'pending'
ORDER BY created_at DESC
LIMIT 10;

If you see Seq Scan on a large table, you're missing an index. You want to see Index Scan or Index Only Scan.

Common Anti-Patterns

The God Table. One table with 40+ columns trying to store everything. Split entities into separate tables with foreign key relationships. JSON Columns for Structured Data. PostgreSQL JSON columns are great for truly unstructured data. But if you're querying specific fields inside the JSON, those fields should be regular columns with indexes. No Foreign Keys. "We enforce relationships in application code." Until a bug creates orphaned records, and now you have orders pointing to deleted users. Use foreign keys. The database is the last line of defense. VARCHAR(255) For Everything. 255 isn't a magic number. Think about what the column stores. Email addresses cap at 254 characters (RFC 5321). Country codes are 2-3 characters. URLs can be 2,083 characters. Choose appropriate limits. Not Using TIMESTAMP WITH TIME ZONE. Always use TIMESTAMPTZ in PostgreSQL. Plain TIMESTAMP doesn't store timezone info, which leads to subtle bugs when your users or servers are in different timezones.

Database design is the foundation everything else builds on. A bad schema makes everything harder — queries, performance, maintenance, feature additions. Spend the time up front to get it right. You can practice schema design patterns through interactive exercises on CodeUp.