SQL UNION

The UNION operator in SQL is a powerful tool used to combine the results of two or more SELECT statements into a single, unified result set. Think of it as "stacking" the results of one query on top of another vertically.

By default, UNION performs a distinct operation, meaning it automatically removes duplicate rows from the final output. For a UNION to work, all SELECT statements must follow a strict set of rules regarding column count and data types.

Developer Tip: While JOINS combine columns from different tables horizontally, UNION combines rows from different queries vertically. Use UNION when you have similar data stored in different places that needs to be viewed as one list.

 

Syntax of SQL UNION

SELECT column1, column2, ...
FROM table1
WHERE condition
UNION
SELECT column1, column2, ...
FROM table2
WHERE condition;
  • SELECT statements: These are the individual queries you want to merge. You can union as many SELECT statements as you need, not just two.
  • UNION: The operator that tells the database to merge the result sets and filter out duplicates.
  • Columns: Every SELECT statement must have the same number of columns in the same order.
Watch Out: If your first query selects three columns and your second query selects two, the database will throw an error. The "shape" of the result sets must match perfectly.

 

Key Points

  • Duplicate Removal: The standard UNION operator removes duplicate records, which requires the database to perform extra work (sorting and filtering).
  • Union All: If you want to keep every single row, including duplicates, use UNION ALL. This is significantly faster because the database doesn't have to scan for duplicates.
  • Data Compatibility: Corresponding columns must have compatible data types. For example, you cannot union a column of strings with a column of integers in many SQL dialects without explicit casting.
  • Column Names: The column names in the final result set are usually taken from the column headers of the first SELECT statement.
Best Practice: Use UNION ALL by default if you know for a fact that your datasets won't overlap or if you don't care about duplicates. It is much more performant than a standard UNION.

 

Example of SQL UNION

Imagine you are managing a company that keeps employee records for different departments in separate tables. You need a master list of all staff names and their departments for a company-wide directory.

Employees Table 1 (Department 1):

employee_id name department
1 John HR
2 Alice HR
3 Bob HR

Employees Table 2 (Department 2):

employee_id name department
4 Sarah IT
5 Michael IT
6 David IT

To generate a single list of all unique employees across both departments, run this query:

SELECT name, department
FROM employees_1
UNION
SELECT name, department
FROM employees_2;

Output:

name department
John HR
Alice HR
Bob HR
Sarah IT
Michael IT
David IT

Explanation:

  • The SQL engine executes the first query, then the second.
  • It merges the two lists and looks for any identical rows.
  • Since all employees in our example are unique, they all appear in the final output.
Common Mistake: Beginners often try to use ORDER BY inside each individual SELECT statement. You can only use ORDER BY once, at the very end of the final SELECT statement, to sort the entire combined result set.

 

SQL UNION with Different Conditions

You can apply different WHERE clauses to each part of the UNION. This is particularly useful when you need to pull specific, different subsets of data into one report. For instance, if you only want HR staff from the first table and specific IT staff from the second:

SELECT name, department
FROM employees_1
WHERE department = 'HR'
UNION
SELECT name, department
FROM employees_2
WHERE department = 'IT' AND name LIKE 'S%';

Output:

name department
John HR
Alice HR
Sarah IT

 

Using UNION ALL

The UNION ALL operator works exactly like UNION, but it skips the duplicate removal step. If "John" appeared in both tables, UNION ALL would show him twice.

-- This will show every row from both tables, duplicates included.
SELECT name, department
FROM employees_1
UNION ALL
SELECT name, department
FROM employees_2;
Developer Tip: In real-world data engineering, UNION ALL is frequently used in Views to present multiple historical tables as one single table to the end user without the performance hit of a distinct check.

 

Summary

The UNION operator is an essential tool for merging data from multiple sources into a single result set. Use UNION when you need a clean list of unique values, and use UNION ALL when performance is a priority or when duplicates are expected and necessary. Always ensure that your column counts and data types align across all queries to prevent runtime errors.