SQL JOINs
SQL was designed to work with large databases, which is why it stores information in a special way:
Our data is divided into different tables so that we don’t have to go through all of it each time we need to query specific data. For example, if we only want information about employees, we’d like to be able to look at just one table, rather than all the other data we hold (such as sales, products, projects, etc).
Our tables are “normalized”, which means that data should not be duplicated across tables. This reduces unnecessary use of memory and limits the risk of data inconsistencies. If we have the same information in two different places, we’d have to update them together. If we ever forgot, we’d have an inconsistency.
Primary and foreign keys
Let’s assume we have two tables, called sales
and employees
. Let’s say the employees
table has the name and salary of each employee, and the sales
table shows the amount of each sale and the employee who facilitated it.
If we want to find both the salary and total amount sold for each employee, we would have to look at both tables to figure it out.
However, there is a problem: what if two employees share the same name? There would be no way of distinguishing between sales from one and the other.
To solve this issue, all our rows have a primary key, which is a unique identifier for each row. In the case of our employees, the primary key is a unique id
that we assign to each of them:
employees
:
id | name | salary |
---|---|---|
1 | John Smith | 50000 |
2 | Jane Doe | 60000 |
3 | Bob Johnson | 45000 |
Now, whenever another table references an employee, it will use their id
.
The sales
table also has a column called id
which uniquely identifies each sale. However, each sale must reference an employee. To avoid ambiguity, it references the employees’ id
. The column that references to the primary key of another table is called a foreign key.
sales
:
id | employee_id | amount |
---|---|---|
1 | 1 | 2500 |
2 | 1 | 1500 |
3 | 2 | 3000 |
4 | 3 | 4500 |
5 | 3 | 3500 |
JOINs
Let’s say we now want to know the name of the employee who carried out each sale. The tables are normalized, so each sale only holds the employees’ id
.
To do this, we would have to look at the employee_id of each sale, and then search for it in the employees
table. We could think of this process as joining the two tables together.
SQL has a JOIN
operation that matches rows of two different tables and appends them. We must specify the primary and foreign keys with an ON
statement, like this:
SELECT * FROM employees
JOIN sales
ON employees.id = sales.employee_id
This query tells us that the employees
and sales
tables should be joined by looking at the column id
from the employees
table and find a matching employee_id
in the sales
table. Note that we need to clarify the table each column comes from with the syntax {table_name}.{column_name}
. We need to do this because columns might have the same names across different tables.
For each row in one table, it appends its matching row on the other table beside it. The result would be this table:
id | name | salary | id | employee_id | amount |
---|---|---|---|---|---|
1 | John Smith | 50000 | 1 | 1 | 2500 |
1 | John Smith | 50000 | 2 | 1 | 1500 |
2 | Jane Doe | 60000 | 3 | 2 | 3000 |
3 | Bob Johnson | 45000 | 4 | 3 | 4500 |
3 | Bob Johnson | 45000 | 5 | 3 | 3500 |
Note that the first three columns came from the employees
table and the last three columns came from the sales
table.
After the JOIN
, we can select the columns we want. We just need to replace the wildcard (*)
for the columns we want to select in our query.
But now, there are two columns with the same name id
, so SELECT id
would be an ambiguous statement.
If we want to SELECT
a column with a name that repeats, we must specify the table it came from before the join. In this case, employees.id would refer to the first column and sales.id would refer to the fourth.
Finally, we can also use the keyword AS
to rename tables.
As an example, if we just want the name of the employee who carried out each sale, we could do:
SELECT s.id, e.name
FROM employees AS e
JOIN sales AS s
ON s.id = e.id
id | name |
---|---|
1 | John Smith |
2 | John Smith |
3 | Jane Doe |
4 | Bob Johnson |
5 | Bob Johnson |
Types of JOINs
In our last example, every sale had a matching employee and vice versa. However, it’s not always the case that a row in a table has a matching row in another. There are four types of JOINs, which all differ in how to handle non-matching values within the tables.
In this section, we’ll use two tables as examples, called prices
and names
.
prices
itemID | price |
---|---|
1 | 10.99 |
2 | 5.99 |
3 | 7.50 |
names
itemID | name |
---|---|
1 | Pencil |
2 | Pen |
4 | Paper |
INNER JOIN
An inner join returns all rows that have a matching value in both tables. In this case,
SELECT * FROM prices
INNER JOIN names
ON prices.itemID = names.itemID
would return:
itemID | price | name |
---|---|---|
1 | 10.99 | Pencil |
2 | 5.99 | Pen |
This is because both names
and prices
have rows with itemID values 1 and 2, so they are kept in the table returned by the Join.
However, names
doesn’t have the itemID value 4, and prices
doesn’t have the itemID value 3, so both rows are discarded in the table returned.
LEFT JOIN
A left join keeps all rows that appear in the first table. If there is no matching value in the second table, the join will append empty cells. For example:
SELECT * FROM prices
LEFT JOIN names
ON prices.itemID = names.itemID
The first and second rows of the prices
table have a matching value in the names
table, so the join works as usual for those rows.
The third row of the prices
table has no matching values in the names
table, so the join will have nothing to append to it. However, the left join keeps all rows that were present in the first table.
As a result, the join will complete the row that has no matching values with empty cells, which are symbolized by the keyword NULL
.
The third row of the names
table has no matching values in the prices
table, so the left join will discard it.
itemID | price | name |
---|---|---|
1 | 10.99 | Pencil |
2 | 5.99 | Pen |
3 | 7.50 | NULL |
RIGHT JOIN
The right join works exactly like the left join, but the other way around. It keeps all the values in the second table. If they have no matching values in the first table, the join will append empty cells instead.
For example:
SELECT * FROM prices
RIGHT JOIN names
ON prices.itemID = names.itemID
will return
itemID | price | name |
---|---|---|
1 | 10.99 | Pencil |
2 | 5.99 | Pen |
4 | NULL | Paper |
OUTER JOIN
Finally, the outer join keeps all the rows that appeared on the first and second table. If any of them have no matching values in the other table, it completes the join with empty cells.
For example,
SELECT * FROM prices
OUTER JOIN names
ON prices.itemID = names.itemID
would return us the following table:
itemID | price | name |
---|---|---|
1 | 10.99 | Pencil |
2 | 5.99 | Pen |
3 | 7.50 | NULL |
4 | NULL | Paper |
What we learned so far:
Example query
SELECT first_name, salary, sales.employeeID
FROM employees JOIN sales
ON employee.id = sales.employeeID
WHERE salary > 7500 AND last_name != "Williams"
Execution order
FROM
-> Gets input tablesJOIN … ON
-> Merges input tablesWHERE
-> Filters rows according to conditionSELECT
-> Selects the columns we need.- Aggregate functions and numerical operations performed on output.
35%
CompletedYou have 166 sections remaining on this learning path.