SQL JOIN on Multiple Columns: A Full Guide (2024)

SQL JOIN on Multiple Columns: A Full Guide (2024)

Introduction

Structured Query Language (SQL) is a skill that every data analytics professional must have in their arsenal, as it has a wide range of applications in manipulating complex databases. Among other commonly tested SQL functions in interviews, JOINs are one of the most important SQL clauses to learn.

In this article, we will explore an important application of SQL JOINs, namely: SQL JOIN on multiple columns. We’ll briefly review the syntax, real-world use cases, and alternatives to joining on multiple columns. We shall also cover some relevant examples to help you appreciate the concept better.

Quick Overview of the SQL JOIN

SQL JOIN clauses are used to combine rows from two or more tables based on a related column(s) between them. The commonly used JOINs are INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, CROSS JOIN, and SELF JOIN.

Although JOINs are commonly made on the basis of a single column (called primary key or foreign key based on the table we are speaking in the context of), there are instances when the multiple columns need to be taken into consideration to get the desired result.

Here is the SQL JOIN syntax:

SELECT columns
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name;

The syntax will differ based on the type of join you wish to perform; for instance, you’ll need to change the phrasing to INNER JOIN if that’s the kind of join needed.

Note: if you need a refresher on SQL syntax, keep our ultimate SQL cheat sheet handy.

image

What is SQL JOIN on Multiple Columns?

Joins become more powerful when the business problem necessitates more complex relationships — this is where SQL joins on multiple columns comes into play. Rather than joining tables based on a single common column, you can join them based on two or more columns, which allows for a more nuanced and precise retrieval of related data.

Such joins are essential in situations where the integrity and context of the data must be maintained across dimensions. There are several ways we can achieve this, as we will go over in the next section.

For example, in a database where an employee’s identification involves both a department code and an employee number, you would need a join on both of these columns to accurately match employee records with departmental data. Here, a join on multiple columns ensures the integrity and specificity of the data relationship:

SELECT *
FROM Employee
JOIN Department ON Employee.DeptCode = Department.DeptCode
AND Employee.EmpNumber = Department.EmpNumber;

Use Cases for SQL Join in Multiple Columns

Multi-column joins in SQL are particularly useful when you have complex relationships that cannot be adequately represented by a single-column join. Here are a few real-world scenarios where this can come up.

  1. Employees by location: For global MNCs with offices across countries, employees might have unique IDs within their local office, but not globally. To accurately connect employees to their local office records, a join on both employee ID and officeID/countryID is needed.
  2. Order details: An e-commerce company might need to join an orders table with a shipping table using both the order ID and a customer ID to ensure the shipping details match the correct customer order.
  3. Healthcare records: In healthcare databases, patient records might be joined with appointment details using both a patient ID and a specific date to view all treatments received by the patient on that date.
  4. Multi-factor authentication: A system that uses multi-factor authentication might join user login attempts with a security table using both the user ID and the device ID to validate the attempt.

image

SQL JOINs on Multiple Columns: Practice Problems With Code

1. Extract the correct student grade.

Consider a database for a university where you have two tables:

  1. Enrollments table with records of student enrollments, including StudentID, CourseID, and Semester.
  2. Grades table with the students’ grades, including StudentID, CourseID, Semester, and Grade.

Find out what grade each student received in each course for the Fall 2023 semester.

Here are the tables provided:

Enrollments Table

EnrollmentID StudentID CourseID Semester EnrollmentDate
1 101 201 Fall 2023 2023-08-15
2 102 202 Fall 2023 2023-08-16
3 103 203 Fall 2023 2023-08-17
4 104 204 Fall 2023 2023-08-18
5 105 205 Fall 2023 2023-08-19
6 101 206 Spring 2023 2023-01-10
7 102 207 Spring 2023 2023-01-11

Grades Table

StudentID CourseID Semester Grade
1 101 Fall 2023 A
1 102 Spring 2023 B
2 101 Fall 2023 A
3 103 Fall 2023 C
2 104 Spring 2023 B
3 102 Spring 2023 B
4 101 Fall 2023 B
5 103 Fall 2023 A

Solution

To find out what grade each student received in each course for the “Fall 2023” semester, you would join the two tables on three columns: StudentID, CourseID, and Semester.

Here’s how the SQL query might look:


SELECT
    e.StudentID,
    e.CourseID,
    e.Semester,
    g.Grade
FROM
    Enrollments e
JOIN
    Grades g
    ON e.StudentID = g.StudentID
    AND e.CourseID = g.CourseID
    AND e.Semester = g.Semester
WHERE
    e.Semester = 'Fall 2023';

This multi-column join is absolutely essential because if you joined only on StudentID or CourseID, you would end up with incorrect matches where a student’s grade from one semester could be incorrectly matched with an enrollment from another semester.

The expected output looks like this:

StudentID CourseID Semester Grade
101 201 Fall 2023 A
102 202 Fall 2023 B
103 203 Fall 2023 A
104 204 Fall 2023 C
105 205 Fall 2023 B

Here is a similar interview query you can practice on our built-in SQL compiler: Audio Chat Success

2. Employees Before Managers

In SQL, multi-column joins typically refer to matching rows based on the equality of multiple column pairs. However, you can have multi-condition joins where you’re joining on the equality of one set of columns and a comparative condition on another. We’ll demonstrate this application in this example.

You’re given two tables: employees and managers. Find the names of all employees who joined before their manager.

Solution

Here’s the expected code:

SELECT CONCAT(e.first_name, ' ', e.last_name) AS employee_name

FROM employees e

INNER JOIN managers m ON e.manager_id = m.id AND e.join_date < m.join_date

Note the logical condition e.join_date < m.join_date which implies that you are looking for employees who were joined the company before their managers. This is an instance of the logical operator < being used in a SQL multiple-column join condition instead of the more commonly used AND operator.

3. Liked and Commented

The below problem statement demonstrates the converse of the above example, namely that even if you have an AND operator in your join condition, it may not technically be a SQL multiple join.

You’re given two tables, users and events. The events table holds values of all of the user events in the action column (‘like’, ‘comment’, or ‘post’).

Write a query to get the percentage of users that have never liked or commented. Round to two decimal places.

Here is a sample query to get the needed output:

SELECT 
    ROUND(
        COUNT(DISTINCT CASE WHEN action IS NULL THEN u.id END)/ COUNT(DISTINCT u.id),
        2
    ) AS percent_never 
FROM 
    users u
LEFT JOIN events e
    ON u.id = e.user_id
    AND e.action IN ('like', 'comment');

Important note: While the given join clause is joining the users and events tables based on the user_id, the additional condition action IN ('like', 'comment') is a filter on the rows returned by the join, not an additional join condition on another column. The AND in the ON clause does not combine two columns from each table but rather restricts the join to certain rows based on the action column’s content.

image

4. Calculating User Engagement

Now let’s look at an example where multiple columns are used to join multiple tables. Let’s consider a simplified example of a database that might be used for a multi-factor authentication (MFA) system. In an MFA system, users might authenticate through multiple methods, like a password, a particular device type, and biometric data like their date of birth. Write a query to verify that a user has completed all three authentication factors.

Here are the input tables.

Users Table

UserID Username LastPasswordUpdate LastDeviceRegistration LastBiometricUpdate
1 john_doe_92 1010 2010 3010
2 jane.smith88 1020 2020 3020
3 mikebrown123 1030 2030 3030
4 lisaray234 1040 2040 3040
5 dave.clark45 1050 2050 3050

Passwords Table

AuthID UserID PasswordHash UpdateDate
1010 1 $2b$12$7q 2023-01-15 08:35:00
1020 2 $2b$12$9I 2023-02-20 09:20:00
1030 3 $2b$12$iX 2023-03-05 10:05:00
1040 4 $2b$12$4t 2023-04-10 11:15:00
1050 5 $2b$12$oH 2023-05-25 12:45:00

Devices Table

AuthID UserID DeviceID RegistrationDate
2010 1 iPhone12 2023-01-25 14:30:00
2020 2 Pixel5 2023-02-15 15:40:00
2030 3 GalaxyS21 2023-03-18 16:50:00
2040 4 iPad8 2023-04-08 17:05:00
2050 5 OnePlus9 2023-05-20 18:20:00

Biometrics Table

AuthID UserID BiometricID LastUpdate
3010 1 FingerprintA 2023-01-30 19:00:00
3020 2 FaceID_X2 2023-02-28 20:10:00
3030 3 IrisPatternC 2023-03-22 21:30:00
3040 4 VoicePrintD 2023-04-12 22:00:00
3050 5 RetinaScanE 2023-05-27 23:45:00

Output Table

UserID Username PasswordHash DeviceID BiometricID
1 john_doe_92 $2b$12$7q iPhone12 FingerprintA
2 jane.smith88 $2b$12$9l Pixel5 FaceID_X2
3 mikebrown123 $2b$12$iX GalaxyS21 IrisPatternC
4 lisaray234 $2b$12$4t iPad8 VoicePrintD
5 dave.clark45 $2b$12$oH OnePlus9 RetinaScanE

The output can be used to vet if any users have not completed their MFA, and send them reminders if required.

Here’s the SQL join query to produce the above output:

SELECT
    u.UserID,
    u.Username,
    p.Password,
    d.DeviceID,
    b.BiometricID
FROM
    Users u
JOIN
    Passwords p ON u.UserID = p.UserID AND u.LastPasswordUpdate = p.AuthID
JOIN
    Devices d ON u.UserID = d.UserID AND u.LastDeviceRegistration = d.AuthID
JOIN
    Biometrics b ON u.UserID = b.UserID AND u.LastBiometricUpdate = b.AuthID;

To practice more SQL JOIN interview questions, you can go to our database of popular questions and filter by the tag ‘JOINs’.

image

SQL JOINs on Multiple Columns: Alternative Techniques

The most common way to join tables on multiple columns is to use the AND operator within your ON clause. However, there are a few variations that can be used, depending on the problem you want to solve.

  • Joining with a composite key:

A composite key is a key composed of two or more columns. You can join tables based on this key as follows:

SELECT *
FROM table1 AS t1
JOIN table2 AS t2
ON (t1.columnA, t1.columnB) = (t2.columnA, t2.columnB);

Composite key joins are used when a combination of columns is designed to be a unique key in the database schema. Multi-column joins, on the other hand, are more general joins used for other purposes as well, such as general filtering and matching of records.

  • Joining on concatenated columns: If the data you need to join is split across multiple columns, you can use a concatenation function to join on a composite column.

While this method may be effective in rare cases such as legacy systems, since databases are generally optimized to use indexes on joins, concatenating will mean that the database can no longer use the individual indexes for each column. This can result in a full table scan, which is much slower. This method is usually discouraged.

SELECT *
FROM employees AS e
JOIN salaries AS s
ON CONCAT(e.first_name, ' ', e.last_name) = s.employee_name;
  • Joining using multiple conditions:

Apart from equality, join conditions can include a range of operators (<, >, <=, >=, <>). You can use these operators to perform a join based on a comparative condition.

SELECT *
FROM orders AS o
JOIN shipment AS s
ON o.customer_id = s.customer_id
AND o.order_date <= s.shipment_date;
  • Using natural join:

Natural joins automatically join all columns with the same names. This method is rarely employed because the join is implicit and can produce unexpected results if the tables have more columns in common than anticipated.

SELECT *
FROM customers
NATURAL JOIN orders;

Conclusion

This article has explored the SQL JOIN on multiple columns, with various ways to tackle this common business problem scenario. You can head over to our blog to read our comprehensive SQL guides or practice questions from our interview database.

image