The PARTITION BY clause in SQL is a window function that allows you to split rows into different groups and perform a function separately on each group. It works similarly to the GROUP BY clause but, as we’ll see, the two have unique differences.
This article explains how you can use the SQL PARTITION BY clause to separate records and perform operations on them. We’ll show which functions can be paired with this clause and have included examples to help you understand how this clause is applied.
SQL PARTITION BY splits records into groups sharing a common feature and allows you to perform a function separately on each group. In this regard, it is similar to the GROUP BY clause.
For example, if you have records of students’ grades in an exam, instead of just calculating the average score of all the students, PARTITION BY allows you to separate the students into males and females and calculate the average scores of the males and females separately.
The GROUP BY clause also allows you to separate records and perform functions separately on each group. However, it only returns as many records as there are groups. For instance, you’d only get two rows in the results table in the above case.
Gender | Average Scores (%) |
---|---|
Male | 57.43 |
Female | 59.43 |
On the other hand, PARTITION BY returns as many records as there were originally. This allows you to display columns that capture individual traits, such as names. In this example, if there were 10 students in the class, the function would return 10 records separated into males and females. Each row would display the average score for the group along with any other specified feature(column) of that individual.
First Name | Gender | Average Scores (%) |
---|---|---|
John | Male | 57.43 |
William | Male | 57.43 |
Ezra | Male | 57.43 |
Peter | Male | 57.43 |
Mary | Female | 59.43 |
Sarah | Female | 59.43 |
Alice | Female | 59.43 |
Teresa | Female | 59.43 |
Elsa | Female | 59.43 |
Olivia | Female | 59.43 |
PARTITION BY allows you to separate records into groups without losing sight of individual records. This allows you to easily compare individual records to some calculated baseline for the group, e.g., the average, maximum, or minimum. This also allows you to order or rank the records separately within their groups.
When using the PARTITION BY clause, the syntax usually looks as follows:
SELECT column1, column2,...,
WINDOW FUNCTION() OVER(PARTITION BY *grouping_feature*)
FROM table1
The grouping_feature is the column that defines how the records will be split. In the example above, this was the gender column.
The OVER clause is used with any SQL window function, including PARTITION BY. It defines the specific set of records over which a window function is performed, which is then split into smaller sections by the PARTITION BY clause.
PARTITION BY is often used together with SQL aggregate functions such as MAX, AVG, MIN, COUNT, and SUM. The first three functions are used to get the maximum, average, and minimum values, respectively. COUNT is used to count the number of records and return the total for each group, while SUM adds up values and returns the total for each group.
When using PARTITION BY with aggregate functions, the syntax used is:
SELECT column1, column2,...,
AGGREGATE FUNCTION(column_to_aggregate) OVER(PARTITION BY
*grouping_feature*)
FROM table1
The PARTITION BY clause can also be used with functions that are not used for aggregating. Common examples are:
It’s now time to see how the PARTITION BY clause is used in practice. In the examples below, you’ll see how this function is used with aggregate functions and other clauses to get insights from data.
You have been given a table containing different animals and features, such as their scientific family. The objective is to group the animals in their respective families and count the number of species in each family.
Solution
This is a fairly simple problem. All you need to do is use PARTITION BY to separate the animals into their respective families and use the COUNT function to add up the number of species in each.
Step 1: Create the database and table containing the details of the animals.
DROP DATABASE IF EXISTS `Biodiversity`;
CREATE DATABASE `Biodiversity`;
USE `Biodiversity`;
CREATE TABLE animal_basics (
animal_id INT NOT NULL,
common_name VARCHAR(50),
scientific_name VARCHAR(50),
family VARCHAR(50),
PRIMARY KEY (animal_id)
);
Step 2: Populate the table.
INSERT INTO animal_basics (animal_id, common_name, scientific_name, family)
VALUES
(1, 'Domestic Cat', 'Felis catus', 'Felidae'),
(2, 'Domestic Dog', 'Canis lupus familiaris', 'Canidae'),
(3, 'Grizzly Bear', 'Ursus arctos horribilis', 'Ursidae'),
(4, 'Cougar', 'Puma concolor', 'Felidae'),
(5, 'Jaguar', 'Panthera onca', 'Felidae'),
(6, 'Elk', 'Cervus canadensis', 'Deer'),
(7, 'Bison', 'Bison bison', 'Bovidae'),
(8, 'Leopard', 'Panthera pardus', 'Felidae'),
(9, 'Cape Buffalo', ' Syncerus caffer caffer', 'Bovidae'),
(10, 'Grey Wolf', 'Canis lupus', 'Canidae'),
(11, 'Horse', 'Equus ferus caballus', 'Equidae'),
(12, 'Indian Cobra', 'Naja naja', 'Elapidae');
Step 3: Use SQL PARTITION BY and the SUM function to group the animals and count them.
SELECT common_name, scientific_name, family,
COUNT(common_name) OVER(PARTITION BY family) AS family_total
FROM animal_basics
ORDER BY family_total DESC;
This will return the table below.
common_name | scientific_name | family | family_total |
---|---|---|---|
Domestic Cat | Felis catus | Felidae | 4 |
Cougar | Puma concolor | Felidae | 4 |
Jaguar | Panthera onca | Felidae | 4 |
Leopard | Panthera pardus | Felidae | 4 |
Bison | Bison bison | Bovidae | 2 |
Cape Buffalo | Syncerus caffer caffer | Bovidae | 2 |
Domestic Dog | Canis lupus familiaris | Canidae | 2 |
Grey Wolf | Canis lupus | Canidae | 2 |
Elk | Cervus canadensis | Deer | 1 |
Indian Cobra | Naja naja | Elapidae | 1 |
Horse | Equus ferus caballus | Equidae | 1 |
Grizzly Bear | Ursus arctos horribilis | Ursidae | 1 |
As seen above, using PARTITION BY preserves the granularity of the data, allowing us to see that the leopard, for example, is in a family of four while the elk, cobra, horse, and bear are the only individuals in their respective families.
In this problem, you have been given two tables, one containing basic employee information and another containing their salary and department details. The goal is to identify the most expensive department while keeping track of each employee’s earnings.
Solution
This problem is more challenging because you’ll need to join two tables in addition to using PARTITION BY. It also requires using the SUM function to add each department’s salaries.
Step 1: Create the database and tables.
DROP DATABASE IF EXISTS `Olive_Inc`;
CREATE DATABASE `Olive_Inc`;
USE `Olive_Inc`;
CREATE TABLE employee_basics (
employee_id INT NOT NULL,
first_name VARCHAR(50),
last_name VARCHAR(50),
age INT,
gender VARCHAR(10),
birth_date DATE,
PRIMARY KEY (employee_id)
);
CREATE TABLE employee_salary (
employee_id INT NOT NULL,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
occupation VARCHAR(50),
salary INT,
dept_id INT
);
Step 2: Populate the tables.
INSERT INTO employee_basics (employee_id, first_name, last_name, age, gender, birth_date)
VALUES
(1,'George', 'Bishop', 51, 'Male','1973-03-25'),
(2,'Tom', 'Ludley', 36, 'Male', '1988-01-04'),
(3,'Rebecca', 'McDouglas', 41, 'Female', '1982-06-04'),
(4, 'Jerry', 'Astley', 29, 'Male', '1994-06-30'),
(5, 'Angela', 'Warren', 33, 'Female', '1990-10-13'),
(6, 'Donald', 'Duckwing', 40, 'Male', '1983-08-30'),
(7, 'Sandy', 'Perkins', 45, 'Female', '1978-12-15'),
(8, 'Rick', 'Jones', 43, 'Male', '1981-03-22'),
(9, 'Wyatt', 'Ridge', 23, 'Male', '2001-04-26'),
(10, 'Dwight', 'Langley', 39, 'Male', '1985-03-02'),
(11, 'Mark', 'Austine', 40, 'Male', '1984-06-30'),
(12, 'Page', 'Brooks', 26, 'Female', '1998-03-27');
INSERT INTO employee_salary (employee_id, first_name, last_name, occupation, salary, dept_id)
VALUES
(1, 'George', 'Bishop', 'Senior Software Engineer', 145000, 1),
(2, 'Tom', 'Ludley', 'Software Engineer', 105000, 1),
(3, 'Rebecca', 'McDouglas', 'Software Engineer', 101000, 1),
(4, 'Jerry', 'Astley', 'Office Secretary', 50000, 2),
(5, 'Angela', 'Warren', 'Human Resource Manager', 65000, 3),
(6, 'Donald', 'Duckwing', 'Data Engineer', 70000, 4),
(7, 'Sandy', 'Perkins', 'Data Scientist', 100000, 4),
(8, 'Rick', 'Jones', 'Assistant Human Resource Manager', 45000, 3),
(9, 'Wyatt', 'Ridge', 'Office Administrator', 45000, 2),
(10, 'Dwight', 'Langley', 'Receptionist', 40000, 2),
(11, 'Mark', 'Austine', 'Senior Data Analyst', 100000, 4),
(12, 'Page', 'Brooks', 'Data Analyst', 60000, 4)
Step 3: Use the SQL PARTITION BY and SUM functions to calculate departmental totals.
SELECT sal.first_name, sal.last_name, occupation, dept_id, salary,
SUM(salary) OVER(PARTITION BY dept_id) AS dept_total
FROM employee_salary AS sal
INNER JOIN employee_basics AS dem
ON sal.employee_id = dem.employee_id
;
This should return the table below.
first_name | occupation | dept_id | salary | dept_total |
---|---|---|---|---|
George | Senior Software Engineer | 1 | 145000 | 351000 |
Tom | Software Engineer | 1 | 105000 | 351000 |
Rebecca | Software Engineer | 1 | 101000 | 351000 |
Jerry | Office Secretary | 2 | 50000 | 135000 |
Wyatt | Office Adminstrator | 2 | 45000 | 135000 |
Dwight | Receptionist | 2 | 40000 | 135000 |
Angela | Human Resource Manager | 3 | 65000 | 110000 |
Rick | Assistant Human Resource Manager | 3 | 45000 | 110000 |
Donald | Data Engineer | 4 | 70000 | 330000 |
Sandy | Data Scientist | 4 | 100000 | 330000 |
Mark | Senior Data Analyst | 4 | 100000 | 330000 |
Page | Data Analyst | 4 | 60000 | 330000 |
Thanks to the PARTITION BY function, we are not only able to see the most expensive department, but we can also compare the total to each individual’s salary.
In this third question, you’ll have to combine the table in the first question with another table showing the number of fatalities caused by each animal to identify the most deadly family.
Solution
This question also requires joining two tables and calculating a sum. However, this question will show why the SQL PARTITION BY function is so useful in data analysis.
Step 1: Create the second table and populate it.
CREATE TABLE fatalities (
animal_id INT NOT NULL,
common_name VARCHAR(50),
kills INT
);
INSERT INTO fatalities (animal_id, common_name, kills)
VALUES
(1, 'Domestic Cat', 0),
(2, 'Domestic Dog', 30000),
(3, 'Grizzly Bear', 1),
(4, 'Cougar', 0),
(5, 'Jaguar', 4),
(6, 'Elk', 7),
(7, 'Bison', 0),
(8, 'Leopard', 55),
(9, 'Cape Buffalo', 200),
(10, 'Grey Wolf', 3),
(11, 'Horse', 6),
(12, 'Indian Cobra', 58000);
Step 2: Sum the number of fatalities caused by each animal family.
SELECT basics.common_name, family, kills,
SUM(kills) OVER(PARTITION BY family) AS family_kills,
COUNT(basics.common_name) OVER(PARTITION BY family) AS family_total
FROM animal_basics AS basics
INNER JOIN fatalities AS deaths
ON basics.animal_id = deaths.animal_id
ORDER BY family_kills DESC
This code will produce the table below.
common_name | family | kills | family_kills | family_total |
---|---|---|---|---|
Indian Cobra | Elapidae | 58000 | 58000 | 1 |
Domestic Dog | Canidae | 30000 | 30003 | 2 |
Grey Wolf | Canidae | 3 | 30003 | 2 |
Bison | Bovidae | 0 | 200 | 2 |
Cape Buffalo | Bovidae | 200 | 200 | 2 |
Domestic Cat | Felidae | 0 | 59 | 4 |
Cougar | Felidae | 0 | 59 | 4 |
Jaguar | Felidae | 4 | 59 | 4 |
Leopard | Felidae | 55 | 59 | 4 |
Elk | Deer | 7 | 7 | 1 |
Horse | Equidae | 6 | 6 | 1 |
Grizzly Bear | Ursidae | 1 | 1 | 1 |
An interesting feature seen in this table is that although the wolf is in a family responsible for 30,003 fatal attacks, it’s only responsible for 3 of these, revealing that the domestic dog is far more dangerous. If the GROUP BY function was used, it would only return the family names and the number of fatalities for each. As a result, the key detail seen above would be lost in the summary.
PARTITION BY is a window function that allows users to split records based on one feature and perform a process separately on each group. It is similar to the GROUP BY function. However, it does not return a summary of the records. It outputs the results of the operation alongside individual records, preserving their granularity and making data analysis easier. SQL PARTITION BY is commonly used with aggregate functions such as AVG, MAX, and MIN but can also be used with RANK, DENSE_RANK, ORDER_BY, and other non-aggregate functions.
If you would like to gain a better understanding of important SQL functions, Interview Query is here to assist you. We have articles explaining different SQL functions, including GROUP BY, JOINs, CASE WHEN, and Cumulative Sum. If you’re interested in a structured approach to improving your SQL skills, you can also check out our SQL learning path, which is designed to help you prepare for SQL interview questions. You can also find more SQL interview questions in our question bank. Finally, if you’ve got an interview coming up, feel free to try the other resources we offer, including our AI interviewer and company interview guides.
SQL PARTITION BY can help you level up your analytics skills when using SQL, and we hope this guide has helped you understand when and how to use it.