Relational databases are highly valued for their ability to organize large datasets into smaller, more manageable tables through relationships based on matching records. SQL, as one of the most essential tools in a data scientist’s toolkit, plays a critical role in querying and extracting meaningful insights from these datasets.
An Inner Join is one of the most commonly used types of joins in SQL. It retrieves records that have matching values in both tables and ignores the ones that have no matching ones, for example, in the following two tables: employees
and departments
.
employee_id | name | department_id |
---|---|---|
1 | Alice | 10 |
2 | Bob | 20 |
3 | Charlie | 30 |
4 | Dave | NULL |
department_id | department_name |
---|---|
10 | HR |
20 | IT |
30 | Marketing |
40 | Sales |
If an inner join query is performed:
SELECT employees.name, departments.department_name
FROM employees
INNER JOIN departments
ON employees.department_id = departments.department_id;
The result would be:
name | department_name |
---|---|
Alice | HR |
Bob | IT |
Charlie | Marketing |
The Inner Join operation conveniently excluded Dave, who didn’t have a matching department, which in this case is NULL
, in the departments
table.
But, what if we need to generate a report with all the rows, both matching and not, to be produced? Comes Outer Join to the rescue.
In simpler terms, Outer Join is essentially the opposite operation to Inner join, returning rows from one or both tables even if there is no matching row in the other table.
Referring to the same departments
and employees
tables, if we perform a LEFT OUTER JOIN (discussed later) query:
SELECT employees.name, departments.department_name
FROM employees
LEFT OUTER JOIN departments
ON employees.department_id = departments.department_id;
The result would be:
name | department_name |
---|---|
Alice | HR |
Bob | IT |
Charlie | Marketing |
Dave | NULL |
The LEFT OUTER JOIN returns all rows from the employees
table, including those that do not have a matching department_id
in the departments
table.
For Dave, who has no matching department (NULL
), the result still includes the row with NULL
in the department_name
column because the Outer Join includes unmatched rows from the left table.
There are three main types of Outer Joins in SQL:
A LEFT OUTER JOIN returns all rows from the left table and the matching rows from the right table. If there is no match, the result will show NULL
for columns from the right table. Please refer to the previous example to see how LEFT OUTER JOIN returns all rows from the employees
table and displays NULL
for Dave.
A RIGHT OUTER JOIN returns all rows from the right table and the matching rows from the left table. If there is no match, the result will show NULL
for columns from the left table. For example, if we consider the same departments and employees table mentioned above:
SELECT employees.name, departments.department_name
FROM employees
RIGHT OUTER JOIN departments
ON employees.department_id = departments.department_id;
name | department_name |
---|---|
Alice | HR |
Bob | IT |
Charlie | Marketing |
NULL | Sales |
In contrast to the LEFT OUTER JOIN, all rows from the departments
table are returned in RIGHT OUTER JOIN. As Sales is a department with no employees, so NULL
appears in the name
column for that row.
A FULL OUTER JOIN returns all rows from both tables, including rows where there is no match. If a row has no match in one table, NULL
will appear in the columns of the table without a matching row. Again, taking the same two tables as example:
SELECT employees.name, departments.department_name
FROM employees
FULL OUTER JOIN departments
ON employees.department_id = departments.department_id;
name | department_name |
---|---|
Alice | HR |
Bob | IT |
Charlie | Marketing |
Dave | NULL |
NULL | Sales |
As you may noticed, all rows from both the employees and departments tables are included. For Dave, there is no matching department, so NULL
appears in the department_name
column. And, for the Sales department, there is no matching employee, so NULL
appears in the name
column.
The FULL OUTER JOIN combines the behavior of both the LEFT and RIGHT joins, ensuring all records from both tables are returned.
Here’s a list of real-world scenarios, as seen in practice, where Outer Joins can be applied, though not limited to:
Imagine we have a company track employees and their assigned departments. However, not every employee has been assigned to a department yet. We would use an Outer Join to get a list of all employees, including those who haven’t been assigned to any department.
Suppose you have a list of products in your store and a record of which products were sold during a given month. However, there are products in your inventory that were not sold during that month. You would likely use a Right Outer Join to list all the products that were part of the sales records, including the ones that didn’t sell. This helps you track which products were not sold and assess inventory needs. It’s particularly helpful for understanding which products are underperforming.
In Interview Query, we track our candidates and the Learning Paths they are enrolled in. Not every candidate is enrolled in a Learning Path, and some LPs may not have any candidates enrolled at all.
We would use a Full Outer Join to get a complete list of all candidates and all LPs, including those who aren’t enrolled in any course and courses with no one enrolled.
Imagine you’re managing an online store and need to track customer orders. However, not all customers place an order every month. You want to know which customers made a purchase and which customers didn’t.
You would likely use a Left Outer Join to retrieve all customers and their orders, including those who didn’t make any orders during a given period. This helps you identify inactive customers who haven’t made any purchases, allowing you to send targeted promotions or follow-up emails to re-engage them.
Understanding SQL Outer Joins is essential for working with relational databases, especially when you need to retrieve all records from one or both tables, even when there is no match. Unlike the Inner Join, which only returns matching rows, Outer Joins ensure that no data is excluded, giving you a more comprehensive view of your datasets. By leveraging Left, Right, and Full Outer Joins, you can handle incomplete or mismatched data more effectively, ensuring you have the full picture for analysis and reporting. All the best!
The best way to get better at data manipulation in SQL is practice. Interview Query offers a variety of SQL learning resources to help you practice and improve your SQL skills, including: