SQLAlchemy: Python's Database Toolkit That Scales with You
Learn SQLAlchemy from scratch -- Core vs ORM, models, CRUD, relationships, queries, Alembic migrations, and async patterns for production Python apps.
Most Python developers start with raw SQL strings. You write cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,)), it works, and life is good. Then the project grows. You need joins across five tables, migrations when schemas change, and suddenly those string queries are a maintenance nightmare.
SQLAlchemy fixes this. It's not just an ORM -- it's a complete database toolkit that gives you exactly as much abstraction as you want. You can write raw SQL through it, use the expression language for composable queries, or go full ORM with mapped classes. Most serious Python projects land on SQLAlchemy eventually, and for good reason.
Core vs ORM: Two Libraries in One
SQLAlchemy has two distinct layers, and understanding this split saves you a lot of confusion:
SQLAlchemy Core is a SQL expression language. You write Python code that generates SQL. No classes mapping to tables, no identity maps, no sessions. Just a Pythonic way to construct and execute SQL statements. SQLAlchemy ORM builds on top of Core. It maps Python classes to database tables, tracks changes to objects, handles relationships, and manages a "unit of work" that batches your changes into efficient SQL.You don't have to pick one. Many projects use the ORM for most operations and drop down to Core for complex reporting queries or bulk operations where ORM overhead isn't worth it.
Setting Up: Engine and Connection
Everything starts with an Engine. The engine manages the connection pool and serves as the source for all database interactions.
from sqlalchemy import create_engine
# SQLite (great for development)
engine = create_engine("sqlite:///app.db", echo=True)
# PostgreSQL
engine = create_engine(
"postgresql://user:password@localhost:5432/mydb",
pool_size=10,
max_overflow=20,
)
# MySQL
engine = create_engine("mysql+pymysql://user:password@localhost/mydb")
The echo=True flag logs all generated SQL to stdout -- invaluable during development, terrible in production. The connection URL follows the format dialect+driver://user:password@host:port/database.
Notice you never open or close connections manually. The engine handles connection pooling automatically. When a query finishes, the connection goes back to the pool instead of being destroyed.
Defining Models with the ORM
The modern way to define models uses the DeclarativeBase and Mapped types (SQLAlchemy 2.0 style):
from datetime import datetime
from sqlalchemy import String, Text, ForeignKey, func
from sqlalchemy.orm import (
DeclarativeBase,
Mapped,
mapped_column,
relationship,
)
class Base(DeclarativeBase):
pass
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
username: Mapped[str] = mapped_column(String(50), unique=True)
email: Mapped[str] = mapped_column(String(120), unique=True)
bio: Mapped[str | None] = mapped_column(Text, default=None)
created_at: Mapped[datetime] = mapped_column(
server_default=func.now()
)
# Relationship to posts
posts: Mapped[list["Post"]] = relationship(back_populates="author")
def __repr__(self) -> str:
return f"<User(id={self.id}, username='{self.username}')>"
class Post(Base):
__tablename__ = "posts"
id: Mapped[int] = mapped_column(primary_key=True)
title: Mapped[str] = mapped_column(String(200))
body: Mapped[str] = mapped_column(Text)
author_id: Mapped[int] = mapped_column(ForeignKey("users.id"))
author: Mapped["User"] = relationship(back_populates="posts")
def __repr__(self) -> str:
return f"<Post(id={self.id}, title='{self.title}')>"
A few things to notice:
Mapped[str]means the column is NOT NULL.Mapped[str | None]makes it nullable.mapped_column()replaces the oldColumn()syntax. It integrates with type checkers.relationship()defines the Python-side link between objects.back_populateskeeps both sides in sync.ForeignKey("users.id")is the actual database constraint.
Base.metadata.create_all(engine)
This issues CREATE TABLE IF NOT EXISTS for every model that extends Base. Fine for development. For production, you want Alembic (covered later).
Sessions and CRUD Operations
The Session is where all ORM magic happens. It tracks which objects you've added, modified, or deleted, and flushes those changes to the database when you commit.
from sqlalchemy.orm import Session
# Create a session
with Session(engine) as session:
# CREATE
user = User(username="alice", email="alice@example.com")
session.add(user)
session.commit()
# The user now has an id assigned by the database
print(user.id) # 1
Using with ensures the session closes properly even if an exception occurs. Here's the full CRUD:
with Session(engine) as session:
# CREATE -- add multiple
users = [
User(username="bob", email="bob@example.com"),
User(username="charlie", email="charlie@example.com"),
]
session.add_all(users)
session.commit()
# READ -- get by primary key
user = session.get(User, 1) # SELECT * FROM users WHERE id = 1
print(user.username) # "alice"
# UPDATE
user.email = "newalice@example.com"
session.commit() # SQLAlchemy tracks the change automatically
# DELETE
session.delete(user)
session.commit()
The session tracks changes through something called the "identity map." When you modify user.email, SQLAlchemy knows the object is dirty. On commit(), it generates the appropriate UPDATE statement. You never write the SQL yourself.
Querying and Filtering
SQLAlchemy 2.0 uses select() statements instead of the old session.query() pattern:
from sqlalchemy import select
with Session(engine) as session:
# Get all users
stmt = select(User)
users = session.scalars(stmt).all()
# Filter with where()
stmt = select(User).where(User.username == "alice")
alice = session.scalars(stmt).first()
# Multiple conditions (AND)
stmt = select(User).where(
User.email.like("%@example.com"),
User.created_at >= datetime(2026, 1, 1),
)
results = session.scalars(stmt).all()
# OR conditions
from sqlalchemy import or_
stmt = select(User).where(
or_(User.username == "alice", User.username == "bob")
)
# Ordering and limiting
stmt = (
select(User)
.order_by(User.created_at.desc())
.limit(10)
.offset(20)
)
# Count
from sqlalchemy import func
stmt = select(func.count()).select_from(User)
count = session.scalar(stmt)
The key difference from raw SQL: these queries are composable. You can build a base query and add filters conditionally:
def get_users(
session: Session,
username: str | None = None,
email_domain: str | None = None,
limit: int = 50,
) -> list[User]:
stmt = select(User)
if username:
stmt = stmt.where(User.username.ilike(f"%{username}%"))
if email_domain:
stmt = stmt.where(User.email.like(f"%@{email_domain}"))
stmt = stmt.limit(limit)
return session.scalars(stmt).all()
Try doing that cleanly with string concatenation.
Relationships: One-to-Many and Many-to-Many
We already saw one-to-many (User has many Posts). Let's add many-to-many for tags:
from sqlalchemy import Table, Column, Integer, ForeignKey
# Association table -- no mapped class needed
post_tags = Table(
"post_tags",
Base.metadata,
Column("post_id", Integer, ForeignKey("posts.id"), primary_key=True),
Column("tag_id", Integer, ForeignKey("tags.id"), primary_key=True),
)
class Tag(Base):
__tablename__ = "tags"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(50), unique=True)
posts: Mapped[list["Post"]] = relationship(
secondary=post_tags, back_populates="tags"
)
Then update the Post model:
class Post(Base):
__tablename__ = "posts"
id: Mapped[int] = mapped_column(primary_key=True)
title: Mapped[str] = mapped_column(String(200))
body: Mapped[str] = mapped_column(Text)
author_id: Mapped[int] = mapped_column(ForeignKey("users.id"))
author: Mapped["User"] = relationship(back_populates="posts")
tags: Mapped[list["Tag"]] = relationship(
secondary=post_tags, back_populates="posts"
)
Using it:
with Session(engine) as session:
python_tag = Tag(name="python")
tutorial_tag = Tag(name="tutorial")
post = Post(
title="SQLAlchemy Guide",
body="...",
author_id=1,
tags=[python_tag, tutorial_tag],
)
session.add(post)
session.commit()
# Access tags through the relationship
for tag in post.tags:
print(tag.name)
# Find all posts with a specific tag
stmt = select(Post).where(Post.tags.any(Tag.name == "python"))
python_posts = session.scalars(stmt).all()
The secondary parameter tells SQLAlchemy about the association table. It handles the INSERT and DELETE on post_tags automatically when you modify the tags list.
Eager Loading: Avoiding the N+1 Problem
This is the most common performance trap with any ORM:
# BAD -- N+1 queries
with Session(engine) as session:
users = session.scalars(select(User)).all()
for user in users:
print(user.posts) # Each access fires a separate SELECT
If you have 100 users, that's 101 queries (1 for users + 100 for posts). Fix it with eager loading:
from sqlalchemy.orm import joinedload, selectinload
# joinedload -- single query with JOIN
stmt = select(User).options(joinedload(User.posts))
users = session.scalars(stmt).unique().all()
# selectinload -- two queries (usually better for collections)
stmt = select(User).options(selectinload(User.posts))
users = session.scalars(stmt).all()
joinedload does a SQL JOIN, which can produce duplicate rows for collections (hence .unique()). selectinload fires a second query with WHERE id IN (...), which is usually more efficient for one-to-many relationships.
Migrations with Alembic
create_all() is fine for prototyping, but it can't modify existing tables. Alembic is the migration tool built for SQLAlchemy.
pip install alembic
alembic init alembic
This creates an alembic/ directory and alembic.ini. Configure the database URL in alembic.ini:
sqlalchemy.url = postgresql://user:password@localhost/mydb
Point Alembic at your models in alembic/env.py:
from myapp.models import Base
target_metadata = Base.metadata
Now generate and run migrations:
# Auto-generate a migration by comparing models to the database
alembic revision --autogenerate -m "add bio column to users"
# Apply the migration
alembic upgrade head
# Rollback one step
alembic downgrade -1
# See current version
alembic current
The auto-generated migration looks like this:
def upgrade() -> None:
op.add_column("users", sa.Column("bio", sa.Text(), nullable=True))
def downgrade() -> None:
op.drop_column("users", "bio")
Always review auto-generated migrations. Alembic can't detect everything (renamed columns look like a drop + add, for example).
Async SQLAlchemy
For async frameworks like FastAPI, SQLAlchemy supports async sessions:
from sqlalchemy.ext.asyncio import (
create_async_engine,
async_sessionmaker,
AsyncSession,
)
# Note the async driver (asyncpg for PostgreSQL)
engine = create_async_engine(
"postgresql+asyncpg://user:password@localhost/mydb"
)
async_session = async_sessionmaker(engine, class_=AsyncSession)
async def get_user(user_id: int) -> User | None:
async with async_session() as session:
stmt = select(User).where(User.id == user_id)
result = await session.execute(stmt)
return result.scalar_one_or_none()
async def create_user(username: str, email: str) -> User:
async with async_session() as session:
user = User(username=username, email=email)
session.add(user)
await session.commit()
await session.refresh(user)
return user
The API is almost identical to sync SQLAlchemy. The main differences:
- Use
create_async_engineinstead ofcreate_engine - Use
AsyncSessioninstead ofSession - Await
session.execute(),session.commit(),session.refresh() - Relationship lazy loading doesn't work in async (you must use eager loading)
user.posts without eager loading raises an error instead of silently firing a query. This is actually a good thing -- it forces you to be explicit about your data loading.
Common Patterns for Production
Session dependency injection (FastAPI example):from fastapi import Depends
async def get_session():
async with async_session() as session:
yield session
@app.get("/users/{user_id}")
async def read_user(
user_id: int,
session: AsyncSession = Depends(get_session),
):
user = await session.get(User, user_id)
if not user:
raise HTTPException(status_code=404)
return user
Soft deletes:
class SoftDeleteMixin:
deleted_at: Mapped[datetime | None] = mapped_column(default=None)
@property
def is_deleted(self) -> bool:
return self.deleted_at is not None
class User(SoftDeleteMixin, Base):
__tablename__ = "users"
# ...
Timestamps mixin:
class TimestampMixin:
created_at: Mapped[datetime] = mapped_column(
server_default=func.now()
)
updated_at: Mapped[datetime] = mapped_column(
server_default=func.now(),
onupdate=func.now(),
)
Bulk operations (bypass ORM overhead):
from sqlalchemy import insert, update
# Bulk insert -- much faster than add() in a loop
with Session(engine) as session:
session.execute(
insert(User),
[
{"username": f"user_{i}", "email": f"user_{i}@example.com"}
for i in range(10000)
],
)
session.commit()
# Bulk update
with Session(engine) as session:
session.execute(
update(User)
.where(User.created_at < datetime(2025, 1, 1))
.values(bio="Legacy user")
)
session.commit()
Common Mistakes
Forgetting to commit. The session batches changes. If youadd() without commit(), nothing is saved. If an exception occurs, changes are rolled back when the session closes.
Not handling IntegrityError. Unique constraints, foreign keys -- violations throw IntegrityError. Catch it:
from sqlalchemy.exc import IntegrityError
try:
session.add(User(username="alice", email="alice@example.com"))
session.commit()
except IntegrityError:
session.rollback()
# Handle duplicate
Using the session after an error without rollback. After an exception, the session is in a broken state. You must call session.rollback() before using it again.
Lazy loading in async. As mentioned, it doesn't work. Use selectinload or joinedload in every async query that touches relationships.
Not indexing foreign keys. SQLAlchemy doesn't create indexes on foreign key columns automatically (PostgreSQL does, SQLite doesn't, MySQL depends). Add index=True:
author_id: Mapped[int] = mapped_column(
ForeignKey("users.id"), index=True
)
What's Next
SQLAlchemy has depth. Once you're comfortable with the basics, explore:
- Hybrid properties for computed attributes that work both in Python and SQL
- Events for hooking into the ORM lifecycle (before insert, after update, etc.)
- Custom types for mapping database columns to complex Python objects
- Partitioning strategies for multi-tenant applications
- Connection pooling tuning for high-traffic services
If you're building Python projects and want more practical guides like this, check out CodeUp for tutorials that skip the fluff and get to the code.