SQL Subqueries and CTEs: When One Query Isn't Enough
Scalar subqueries, correlated subqueries, CTEs with WITH, recursive CTEs for hierarchies, and when to use which. Readable SQL beats clever SQL.
Most real SQL questions can't be answered in a single pass. "Show me employees who earn more than their department average" requires computing the average and comparing each row to it. You need to compose queries together, and SQL gives you three ways to do it: subqueries, CTEs, and temp tables.
Subqueries
A subquery is a query nested inside another query. They come in a few flavors.
Scalar Subquery (Returns One Value)
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
The inner query returns a single number. The outer query uses it as a filter. Simple enough. This works because the subquery produces exactly one row, one column.
Column Subquery (Returns a List)
SELECT name, department_id
FROM employees
WHERE department_id IN (
SELECT id FROM departments WHERE location = 'Mumbai'
);
The inner query returns a list of IDs. IN checks membership. You could rewrite this as a JOIN, and sometimes that's faster -- but the subquery version often reads more clearly for simple lookups.
Table Subquery (Returns Rows and Columns)
SELECT d.name, stats.avg_salary
FROM departments d
JOIN (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) stats ON d.id = stats.department_id;
The subquery in the FROM clause acts like a temporary table (sometimes called a "derived table"). You aggregate first, then join. This is how you avoid the classic mistake of aggregating after a join and getting inflated numbers.
Correlated Subquery (The Tricky One)
A correlated subquery references the outer query's row. It re-executes for every row in the outer query.
SELECT e.name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e.department_id
);
This finds employees who earn above their own department's average. The inner query runs once per employee, using that employee's department_id. It works, but it's slow on large tables because of the repeated execution.
SELECT d.name
FROM departments d
WHERE EXISTS (
SELECT 1 FROM employees e
WHERE e.department_id = d.id AND e.salary > 100000
);
"Departments that have at least one high earner." EXISTS short-circuits -- it stops as soon as it finds one matching row, so it's often faster than IN for large datasets.
CTEs: Readable SQL
CTEs (Common Table Expressions) use the WITH clause to name a subquery and reference it like a table.
WITH dept_avg AS (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
)
SELECT e.name, e.salary, da.avg_salary
FROM employees e
JOIN dept_avg da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary;
This does the same thing as the correlated subquery above, but it's much easier to read. The CTE computes department averages once, gives the result a name, and the main query joins against it.
You can chain multiple CTEs:
WITH
active_users AS (
SELECT * FROM users WHERE last_login > '2026-01-01'
),
user_orders AS (
SELECT user_id, COUNT(*) AS order_count, SUM(total) AS total_spent
FROM orders
WHERE user_id IN (SELECT id FROM active_users)
GROUP BY user_id
)
SELECT au.name, au.email, uo.order_count, uo.total_spent
FROM active_users au
JOIN user_orders uo ON au.id = uo.user_id
ORDER BY uo.total_spent DESC;
Each CTE builds on the previous one. The logic flows top to bottom. Try writing this as nested subqueries and you'll appreciate what CTEs do for readability.
Recursive CTEs: Walking Hierarchies
This is where CTEs get genuinely powerful. Recursive CTEs can traverse tree structures -- org charts, category trees, threaded comments.
WITH RECURSIVE org_tree AS (
-- Base case: the CEO (no manager)
SELECT id, name, manager_id, 0 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees who report to someone in the tree
SELECT e.id, e.name, e.manager_id, ot.depth + 1
FROM employees e
JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT name, depth
FROM org_tree
ORDER BY depth, name;
| name | depth |
|--------------|-------|
| Ravi Kumar | 0 |
| Anita Desai | 1 |
| Mark Torres | 1 |
| Priya Shah | 2 |
| Sam Lee | 2 |
| Wei Chen | 2 |
The base case seeds the result with the root node(s). The recursive case joins back to the CTE's own output to find the next level. It repeats until no new rows are produced.
Generating a date sequence:WITH RECURSIVE dates AS (
SELECT DATE '2026-01-01' AS d
UNION ALL
SELECT d + INTERVAL '1 day'
FROM dates
WHERE d < DATE '2026-03-31'
)
SELECT d FROM dates;
Handy for reports that need a row for every date, even dates with no data. Left join your actual data onto this sequence and NULLs show you the gaps.
CTE vs Subquery vs Temp Table
Use a CTE when:- Readability is the priority (it usually is)
- You reference the result once or twice
- You need recursion
- It's a simple one-liner that doesn't hurt readability
- Scalar or
EXISTSchecks where a CTE feels like overkill
- The intermediate result is large and referenced many times
- You need to index the intermediate result
- You're in a stored procedure or multi-step script
MATERIALIZED. MySQL and SQL Server have always inlined CTEs. Point being: check your database version, but in modern databases the performance difference is usually negligible.
Readability Matters More Than Cleverness
You'll encounter SQL written as a single massive query with four levels of nested subqueries, and technically it works. But the person who has to debug it at 2 AM (probably future you) will curse the day it was written.
Break complex queries into named steps with CTEs. Each CTE should do one thing. Give them descriptive names. Future-you will be grateful.
-- This tells a story you can follow:
WITH overdue_invoices AS (...),
customer_contacts AS (...),
escalation_candidates AS (...)
SELECT ... FROM escalation_candidates ...
-- This does not:
SELECT FROM (SELECT FROM (SELECT * FROM invoices WHERE ...) x JOIN (SELECT ...) y ON ...) z WHERE ...
Practice on Real Problems
SQL composition -- knowing when to use a subquery vs a CTE, when to aggregate before joining, how to structure a recursive query -- is a skill that comes from repetition. CodeUp has SQL challenges that progressively build from basic subqueries to recursive CTEs. Write the queries, check the results, and refactor until they read cleanly. That's the whole process.