SQL Subquery

In the world of database management, a subquery (also known as an inner query or nested query) is essentially a query trapped inside another SQL statement. Think of it as a way to perform a multi-step calculation: first, you find a specific piece of information, and then you use that information to filter or refine your main result set.

Subqueries are incredibly versatile. They allow you to write dynamic queries where the filtering criteria aren't hard-coded but are instead determined by the data itself at runtime. You will most commonly find them tucked inside SELECT, INSERT, UPDATE, or DELETE statements.

Developer Tip: Think of a subquery as a "pre-query." The database executes the inner query first and then passes its result to the outer query so it can finish its job.

 

Types of Subqueries

Depending on what they return and how they interact with the main query, subqueries fall into four primary categories:

  1. Single-row Subquery: This returns exactly one value (one column of one row). It is used with standard comparison operators like =, >, <, or !=.
  2. Multiple-row Subquery: This returns a list of values. Since there is more than one result, you must use set-based operators like IN, ANY, or ALL.
  3. Correlated Subquery: This is the most complex type. It references columns from the outer query, meaning the inner query runs once for every single row processed by the outer query.
  4. Non-correlated Subquery: This is a standalone query. It doesn't rely on the outer query and could be run by itself to produce a result.
Best Practice: Use non-correlated subqueries whenever possible. They are generally faster because the database only has to run them once, rather than repeating the logic for every row.

 

Syntax of Subquery

The basic structure involves wrapping the inner query in parentheses. This tells the SQL engine to prioritize the execution of the code inside.

SELECT column1, column2, ...
FROM table_name
WHERE column_name operator (SELECT column_name FROM table_name WHERE condition);
  • operator: A comparison operator (like = or >) or a set operator (like IN).
  • condition: The logic used to isolate the specific data point you need for the outer query.
Common Mistake: Forgetting to wrap the subquery in parentheses. Without them, the SQL parser will throw a syntax error because it can't distinguish where the outer query ends and the inner one begins.

 

Example of a Subquery

To see subqueries in action, let's look at a typical HR database scenario. Imagine we have two tables representing our staff and their respective departments.

Employees Table:

employee_id name department_id salary
1 John 101 50000
2 Jane 102 60000
3 Michael 101 45000
4 Sarah 102 70000

Departments Table:

department_id department_name
101 HR
102 IT

 

Subquery Example

Suppose you need a report of everyone who earns more than the average salary specifically within the IT department (ID 102). You don't know the average IT salary off the top of your head, so you let a subquery find it for you:

SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees WHERE department_id = 102);

Explanation:

  • The Inner Query: (SELECT AVG(salary) FROM employees WHERE department_id = 102) calculates the average salary for IT employees (which is 65,000).
  • The Outer Query: Now looks for any employee whose salary is > 65000.

Output:

name salary
Sarah 70000

 

Types of Subqueries

  1. Single-Row Subquery: This is used when you are certain the inner query will return exactly one value. If it returns more than one, the query will fail.

Example: Finding the employee(s) earning the maximum salary in the company.

SELECT name
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);
  1. Multiple-Row Subquery: Use this when the inner query might return a list of values. The IN operator is the most common choice here.

Example: Finding names of employees who belong to the 'IT' department by looking up the ID first.

SELECT name
FROM employees
WHERE department_id IN (SELECT department_id FROM departments WHERE department_name = 'IT');
Watch Out: If you use a single-row operator (like =) with a subquery that returns multiple rows, SQL will throw an error: "Subquery returned more than 1 value." Always use IN if there is any chance of multiple results.
  1. Correlated Subquery: This acts like a nested loop in programming. The inner query depends on the current row of the outer query.

Example: Find employees who earn more than the average salary of their own department.

SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id);

Explanation:

  • For John (Dept 101), the subquery calculates the average for Dept 101.
  • For Jane (Dept 102), the subquery calculates the average for Dept 102.
  • The inner query uses e.department_id to "talk" to the outer query.
Common Mistake: Using correlated subqueries on tables with millions of rows without proper indexing. Because the subquery runs for every row, it can lead to massive performance degradation.

 

Key Points

  • Versatility: Subqueries work in SELECT (as a column), FROM (as a temporary table), WHERE (as a filter), and HAVING clauses.
  • Comparison: While Joins are often more performant for combining tables, subqueries are sometimes more readable for complex filtering logic.
  • Nesting: You can nest subqueries inside other subqueries, though doing this too deeply makes your code hard to maintain.
  • Null Values: If a subquery returns NULL, the comparison in the outer query may result in "Unknown," potentially returning no rows.

 

Summary

Subqueries are a fundamental tool in an SQL developer's toolkit. They allow you to build sophisticated, data-driven logic by nesting one query inside another. Whether you're performing a simple lookup with a Single-row subquery, handling lists with Multiple-row subqueries, or performing row-by-row analysis with Correlated subqueries, mastering this concept is key to writing professional-grade SQL.