Using SQL PARTITION BY (Updated in 2024)

Using SQL PARTITION BY (Updated in 2024)

Overview

The PARTITION BY clause in SQL is a window function that allows you to split rows into different groups and perform a function separately on each group. It works similarly to the GROUP BY clause but, as we’ll see, the two have unique differences.

This article explains how you can use the SQL PARTITION BY clause to separate records and perform operations on them. We’ll show which functions can be paired with this clause and have included examples to help you understand how this clause is applied.

Understanding PARTITION BY in SQL

SQL PARTITION BY splits records into groups sharing a common feature and allows you to perform a function separately on each group. In this regard, it is similar to the GROUP BY clause.

For example, if you have records of students’ grades in an exam, instead of just calculating the average score of all the students, PARTITION BY allows you to separate the students into males and females and calculate the average scores of the males and females separately.

SQL PARTITION BY vs GROUP BY

The GROUP BY clause also allows you to separate records and perform functions separately on each group. However, it only returns as many records as there are groups. For instance, you’d only get two rows in the results table in the above case.

Gender Average Scores (%)
Male 57.43
Female 59.43

On the other hand, PARTITION BY returns as many records as there were originally. This allows you to display columns that capture individual traits, such as names. In this example, if there were 10 students in the class, the function would return 10 records separated into males and females. Each row would display the average score for the group along with any other specified feature(column) of that individual.

First Name Gender Average Scores (%)
John Male 57.43
William Male 57.43
Ezra Male 57.43
Peter Male 57.43
Mary Female 59.43
Sarah Female 59.43
Alice Female 59.43
Teresa Female 59.43
Elsa Female 59.43
Olivia Female 59.43

Why Use SQL PARTITION BY?

PARTITION BY allows you to separate records into groups without losing sight of individual records. This allows you to easily compare individual records to some calculated baseline for the group, e.g., the average, maximum, or minimum. This also allows you to order or rank the records separately within their groups.

SQL PARTITION BY Syntax

When using the PARTITION BY clause, the syntax usually looks as follows:

 SELECT column1, column2,...,

 WINDOW FUNCTION() OVER(PARTITION BY *grouping_feature*)

 FROM table1

The grouping_feature is the column that defines how the records will be split. In the example above, this was the gender column.

The OVER clause is used with any SQL window function, including PARTITION BY. It defines the specific set of records over which a window function is performed, which is then split into smaller sections by the PARTITION BY clause.

Using SQL PARTITION BY with Aggregate Functions

PARTITION BY is often used together with SQL aggregate functions such as MAX, AVG, MIN, COUNT, and SUM. The first three functions are used to get the maximum, average, and minimum values, respectively. COUNT is used to count the number of records and return the total for each group, while SUM adds up values and returns the total for each group.

When using PARTITION BY with aggregate functions, the syntax used is:

 SELECT column1, column2,...,
 AGGREGATE FUNCTION(column_to_aggregate) OVER(PARTITION BY 
 *grouping_feature*)
 FROM table1

SQL PARTITION BY and Other Clauses

The PARTITION BY clause can also be used with functions that are not used for aggregating. Common examples are:

  • ORDER BY - This function arranges the rows in each partition in ascending or descending order. When paired with the SUM function, it returns the cumulative sum from the first to the last entry in each group.
  • RANK - This function assigns a rank to records based on a quantifiable criterion. If two rows have the same rank, one number is skipped when assigning the next rank, i.e., 1,2,3,3,5,6,…
  • DENSE_RANK — This function is similar to the RANK function except that if two records have the same rank, the next record is assigned the next value without skipping one, i.e., 1,2,3,3,4,5,…

SQL PARTITION BY Practice Questions

It’s now time to see how the PARTITION BY clause is used in practice. In the examples below, you’ll see how this function is used with aggregate functions and other clauses to get insights from data.

Practice Question 1: Count the Number of Animals in Each Family

You have been given a table containing different animals and features, such as their scientific family. The objective is to group the animals in their respective families and count the number of species in each family.

Solution

This is a fairly simple problem. All you need to do is use PARTITION BY to separate the animals into their respective families and use the COUNT function to add up the number of species in each.

Step 1: Create the database and table containing the details of the animals.

DROP DATABASE IF EXISTS `Biodiversity`;
CREATE DATABASE `Biodiversity`;
USE `Biodiversity`;

CREATE TABLE animal_basics (
  animal_id INT NOT NULL,
  common_name VARCHAR(50),
  scientific_name VARCHAR(50),
  family VARCHAR(50),
  PRIMARY KEY (animal_id)
);

Step 2: Populate the table.

INSERT INTO animal_basics (animal_id, common_name, scientific_name, family)
VALUES
(1, 'Domestic Cat', 'Felis catus', 'Felidae'),
(2, 'Domestic Dog', 'Canis lupus familiaris', 'Canidae'),
(3, 'Grizzly Bear', 'Ursus arctos horribilis', 'Ursidae'),
(4, 'Cougar', 'Puma concolor', 'Felidae'),
(5, 'Jaguar', 'Panthera onca', 'Felidae'),
(6, 'Elk', 'Cervus canadensis', 'Deer'),
(7, 'Bison', 'Bison bison', 'Bovidae'),
(8, 'Leopard', 'Panthera pardus', 'Felidae'),
(9, 'Cape Buffalo', ' Syncerus caffer caffer', 'Bovidae'),
(10, 'Grey Wolf', 'Canis lupus', 'Canidae'),
(11, 'Horse', 'Equus ferus caballus', 'Equidae'),
(12, 'Indian Cobra', 'Naja naja', 'Elapidae');

Step 3: Use SQL PARTITION BY and the SUM function to group the animals and count them.

SELECT common_name, scientific_name, family,
COUNT(common_name) OVER(PARTITION BY family) AS family_total
FROM animal_basics
ORDER BY family_total DESC;

This will return the table below.

common_name scientific_name family family_total
Domestic Cat Felis catus Felidae 4
Cougar Puma concolor Felidae 4
Jaguar Panthera onca Felidae 4
Leopard Panthera pardus Felidae 4
Bison Bison bison Bovidae 2
Cape Buffalo Syncerus caffer caffer Bovidae 2
Domestic Dog Canis lupus familiaris Canidae 2
Grey Wolf Canis lupus Canidae 2
Elk Cervus canadensis Deer 1
Indian Cobra Naja naja Elapidae 1
Horse Equus ferus caballus Equidae 1
Grizzly Bear Ursus arctos horribilis Ursidae 1

As seen above, using PARTITION BY preserves the granularity of the data, allowing us to see that the leopard, for example, is in a family of four while the elk, cobra, horse, and bear are the only individuals in their respective families.

Practice Question 2: Most Expensive Department

In this problem, you have been given two tables, one containing basic employee information and another containing their salary and department details. The goal is to identify the most expensive department while keeping track of each employee’s earnings.

Solution

This problem is more challenging because you’ll need to join two tables in addition to using PARTITION BY. It also requires using the SUM function to add each department’s salaries.

Step 1: Create the database and tables.

DROP DATABASE IF EXISTS `Olive_Inc`;
CREATE DATABASE `Olive_Inc`;
USE `Olive_Inc`;

CREATE TABLE employee_basics (
  employee_id INT NOT NULL,
  first_name VARCHAR(50),
  last_name VARCHAR(50),
  age INT,
  gender VARCHAR(10),
  birth_date DATE,
  PRIMARY KEY (employee_id)
);

CREATE TABLE employee_salary (
  employee_id INT NOT NULL,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL,
  occupation VARCHAR(50),
  salary INT,
  dept_id INT
);

Step 2: Populate the tables.

INSERT INTO employee_basics (employee_id, first_name, last_name, age, gender, birth_date)
VALUES
(1,'George', 'Bishop', 51, 'Male','1973-03-25'),
(2,'Tom', 'Ludley', 36, 'Male', '1988-01-04'),
(3,'Rebecca', 'McDouglas', 41, 'Female', '1982-06-04'),
(4, 'Jerry', 'Astley', 29, 'Male', '1994-06-30'),
(5, 'Angela', 'Warren', 33, 'Female', '1990-10-13'),
(6, 'Donald', 'Duckwing', 40, 'Male', '1983-08-30'),
(7, 'Sandy', 'Perkins', 45, 'Female', '1978-12-15'),
(8, 'Rick', 'Jones', 43, 'Male', '1981-03-22'),
(9, 'Wyatt', 'Ridge', 23, 'Male', '2001-04-26'),
(10, 'Dwight', 'Langley', 39, 'Male', '1985-03-02'),
(11, 'Mark', 'Austine', 40, 'Male', '1984-06-30'),
(12, 'Page', 'Brooks', 26, 'Female', '1998-03-27');

INSERT INTO employee_salary (employee_id, first_name, last_name, occupation, salary, dept_id)
VALUES
(1, 'George', 'Bishop', 'Senior Software Engineer', 145000, 1),
(2, 'Tom', 'Ludley', 'Software Engineer', 105000, 1),
(3, 'Rebecca', 'McDouglas', 'Software Engineer', 101000, 1),
(4, 'Jerry', 'Astley', 'Office Secretary', 50000, 2),
(5, 'Angela', 'Warren', 'Human Resource Manager', 65000, 3),
(6, 'Donald', 'Duckwing', 'Data Engineer', 70000, 4),
(7, 'Sandy', 'Perkins', 'Data Scientist', 100000, 4),
(8, 'Rick', 'Jones', 'Assistant Human Resource Manager', 45000, 3),
(9, 'Wyatt', 'Ridge', 'Office Administrator', 45000, 2),
(10, 'Dwight', 'Langley', 'Receptionist', 40000, 2),
(11, 'Mark', 'Austine', 'Senior Data Analyst', 100000, 4),
(12, 'Page', 'Brooks', 'Data Analyst', 60000, 4)

Step 3: Use the SQL PARTITION BY and SUM functions to calculate departmental totals.

SELECT sal.first_name, sal.last_name, occupation, dept_id, salary,  
SUM(salary) OVER(PARTITION BY dept_id) AS dept_total
FROM employee_salary AS sal
INNER JOIN employee_basics AS dem
	ON sal.employee_id = dem.employee_id
;

This should return the table below.

first_name occupation dept_id salary dept_total
George Senior Software Engineer 1 145000 351000
Tom Software Engineer 1 105000 351000
Rebecca Software Engineer 1 101000 351000
Jerry Office Secretary 2 50000 135000
Wyatt Office Adminstrator 2 45000 135000
Dwight Receptionist 2 40000 135000
Angela Human Resource Manager 3 65000 110000
Rick Assistant Human Resource Manager 3 45000 110000
Donald Data Engineer 4 70000 330000
Sandy Data Scientist 4 100000 330000
Mark Senior Data Analyst 4 100000 330000
Page Data Analyst 4 60000 330000

Thanks to the PARTITION BY function, we are not only able to see the most expensive department, but we can also compare the total to each individual’s salary.

Practice Question 3: Most Dangerous Animal Family

In this third question, you’ll have to combine the table in the first question with another table showing the number of fatalities caused by each animal to identify the most deadly family.

Solution

This question also requires joining two tables and calculating a sum. However, this question will show why the SQL PARTITION BY function is so useful in data analysis.

Step 1: Create the second table and populate it.

CREATE TABLE fatalities (
  animal_id INT NOT NULL,
  common_name VARCHAR(50),
  kills INT
);

INSERT INTO fatalities (animal_id, common_name, kills)
VALUES
(1, 'Domestic Cat', 0),
(2, 'Domestic Dog', 30000),
(3, 'Grizzly Bear', 1),
(4, 'Cougar', 0),
(5, 'Jaguar', 4),
(6, 'Elk', 7),
(7, 'Bison', 0),
(8, 'Leopard', 55),
(9, 'Cape Buffalo', 200),
(10, 'Grey Wolf', 3),
(11, 'Horse', 6),
(12, 'Indian Cobra', 58000);

Step 2: Sum the number of fatalities caused by each animal family.

SELECT basics.common_name, family, kills,
SUM(kills) OVER(PARTITION BY family) AS family_kills,
COUNT(basics.common_name) OVER(PARTITION BY family) AS family_total
FROM animal_basics AS basics
INNER JOIN fatalities AS deaths
	ON basics.animal_id = deaths.animal_id
ORDER BY family_kills DESC

This code will produce the table below.

common_name family kills family_kills family_total
Indian Cobra Elapidae 58000 58000 1
Domestic Dog Canidae 30000 30003 2
Grey Wolf Canidae 3 30003 2
Bison Bovidae 0 200 2
Cape Buffalo Bovidae 200 200 2
Domestic Cat Felidae 0 59 4
Cougar Felidae 0 59 4
Jaguar Felidae 4 59 4
Leopard Felidae 55 59 4
Elk Deer 7 7 1
Horse Equidae 6 6 1
Grizzly Bear Ursidae 1 1 1

An interesting feature seen in this table is that although the wolf is in a family responsible for 30,003 fatal attacks, it’s only responsible for 3 of these, revealing that the domestic dog is far more dangerous. If the GROUP BY function was used, it would only return the family names and the number of fatalities for each. As a result, the key detail seen above would be lost in the summary.

Conclusion

PARTITION BY is a window function that allows users to split records based on one feature and perform a process separately on each group. It is similar to the GROUP BY function. However, it does not return a summary of the records. It outputs the results of the operation alongside individual records, preserving their granularity and making data analysis easier. SQL PARTITION BY is commonly used with aggregate functions such as AVG, MAX, and MIN but can also be used with RANK, DENSE_RANK, ORDER_BY, and other non-aggregate functions.

If you would like to gain a better understanding of important SQL functions, Interview Query is here to assist you. We have articles explaining different SQL functions, including GROUP BY, JOINs, CASE WHEN, and Cumulative Sum. If you’re interested in a structured approach to improving your SQL skills, you can also check out our SQL learning path, which is designed to help you prepare for SQL interview questions. You can also find more SQL interview questions in our question bank. Finally, if you’ve got an interview coming up, feel free to try the other resources we offer, including our AI interviewer and company interview guides.

SQL PARTITION BY can help you level up your analytics skills when using SQL, and we hope this guide has helped you understand when and how to use it.