Using SQL PARTITION BY (Updated in 2024)

Written by IQ Team

IQ Team

Reviewed by IQ Team

IQ Team

Published December 11, 2024

Estimated reading time: 12 minutes

Table of contents

Overview

Understanding PARTITION BY in SQL

SQL PARTITION BY Syntax

Using SQL PARTITION BY with Aggregate Functions

SQL PARTITION BY and Other Clauses

SQL PARTITION BY Practice Questions

Conclusion

Overview

The PARTITION BY clause in SQL is a window function that allows you to split rows into different groups and perform a function separately on each group. It works similarly to the GROUP BY clause but, as we’ll see, the two have unique differences.

This article explains how you can use the SQL PARTITION BY clause to separate records and perform operations on them. We’ll show which functions can be paired with this clause and have included examples to help you understand how this clause is applied.

Understanding PARTITION BY in SQL

SQL PARTITION BY splits records into groups sharing a common feature and allows you to perform a function separately on each group. In this regard, it is similar to the GROUP BY clause.

For example, if you have records of students’ grades in an exam, instead of just calculating the average score of all the students, PARTITION BY allows you to separate the students into males and females and calculate the average scores of the males and females separately.

SQL PARTITION BY vs GROUP BY

The GROUP BY clause also allows you to separate records and perform functions separately on each group. However, it only returns as many records as there are groups. For instance, you’d only get two rows in the results table in the above case.

Gender	Average Scores (%)
Male	57.43
Female	59.43

On the other hand, PARTITION BY returns as many records as there were originally. This allows you to display columns that capture individual traits, such as names. In this example, if there were 10 students in the class, the function would return 10 records separated into males and females. Each row would display the average score for the group along with any other specified feature(column) of that individual.

First Name	Gender	Average Scores (%)
John	Male	57.43
William	Male	57.43
Ezra	Male	57.43
Peter	Male	57.43
Mary	Female	59.43
Sarah	Female	59.43
Alice	Female	59.43
Teresa	Female	59.43
Elsa	Female	59.43
Olivia	Female	59.43

Why Use SQL PARTITION BY?

PARTITION BY allows you to separate records into groups without losing sight of individual records. This allows you to easily compare individual records to some calculated baseline for the group, e.g., the average, maximum, or minimum. This also allows you to order or rank the records separately within their groups.

SQL PARTITION BY Syntax

When using the PARTITION BY clause, the syntax usually looks as follows:

 SELECT column1, column2,...,

 WINDOW FUNCTION() OVER(PARTITION BY *grouping_feature*)

 FROM table1

The grouping_feature is the column that defines how the records will be split. In the example above, this was the gender column.

The OVER clause is used with any SQL window function, including PARTITION BY. It defines the specific set of records over which a window function is performed, which is then split into smaller sections by the PARTITION BY clause.

Using SQL PARTITION BY with Aggregate Functions

PARTITION BY is often used together with SQL aggregate functions such as MAX, AVG, MIN, COUNT, and SUM. The first three functions are used to get the maximum, average, and minimum values, respectively. COUNT is used to count the number of records and return the total for each group, while SUM adds up values and returns the total for each group.

When using PARTITION BY with aggregate functions, the syntax used is:

 SELECT column1, column2,...,
 AGGREGATE FUNCTION(column_to_aggregate) OVER(PARTITION BY 
 *grouping_feature*)
 FROM table1

SQL PARTITION BY and Other Clauses

The PARTITION BY clause can also be used with functions that are not used for aggregating. Common examples are:

ORDER BY - This function arranges the rows in each partition in ascending or descending order. When paired with the SUM function, it returns the cumulative sum from the first to the last entry in each group.
RANK - This function assigns a rank to records based on a quantifiable criterion. If two rows have the same rank, one number is skipped when assigning the next rank, i.e., 1,2,3,3,5,6,…
DENSE_RANK — This function is similar to the RANK function except that if two records have the same rank, the next record is assigned the next value without skipping one, i.e., 1,2,3,3,4,5,…

SQL PARTITION BY Practice Questions

It’s now time to see how the PARTITION BY clause is used in practice. In the examples below, you’ll see how this function is used with aggregate functions and other clauses to get insights from data.

Practice Question 1: Count the Number of Animals in Each Family

You have been given a table containing different animals and features, such as their scientific family. The objective is to group the animals in their respective families and count the number of species in each family.

Solution

This is a fairly simple problem. All you need to do is use PARTITION BY to separate the animals into their respective families and use the COUNT function to add up the number of species in each.

Step 1: Create the database and table containing the details of the animals.

DROP DATABASE IF EXISTS `Biodiversity`;
CREATE DATABASE `Biodiversity`;
USE `Biodiversity`;

CREATE TABLE animal_basics (
  animal_id INT NOT NULL,
  common_name VARCHAR(50),
  scientific_name VARCHAR(50),
  family VARCHAR(50),
  PRIMARY KEY (animal_id)
);

Step 2: Populate the table.

INSERT INTO animal_basics (animal_id, common_name, scientific_name, family)
VALUES
(1, 'Domestic Cat', 'Felis catus', 'Felidae'),
(2, 'Domestic Dog', 'Canis lupus familiaris', 'Canidae'),
(3, 'Grizzly Bear', 'Ursus arctos horribilis', 'Ursidae'),
(4, 'Cougar', 'Puma concolor', 'Felidae'),
(5, 'Jaguar', 'Panthera onca', 'Felidae'),
(6, 'Elk', 'Cervus canadensis', 'Deer'),
(7, 'Bison', 'Bison bison', 'Bovidae'),
(8, 'Leopard', 'Panthera pardus', 'Felidae'),
(9, 'Cape Buffalo', ' Syncerus caffer caffer', 'Bovidae'),
(10, 'Grey Wolf', 'Canis lupus', 'Canidae'),
(11, 'Horse', 'Equus ferus caballus', 'Equidae'),
(12, 'Indian Cobra', 'Naja naja', 'Elapidae');

Step 3: Use SQL PARTITION BY and the SUM function to group the animals and count them.

SELECT common_name, scientific_name, family,
COUNT(common_name) OVER(PARTITION BY family) AS family_total
FROM animal_basics
ORDER BY family_total DESC;

This will return the table below.

common_name	scientific_name	family	family_total
Domestic Cat	Felis catus	Felidae	4
Cougar	Puma concolor	Felidae	4
Jaguar	Panthera onca	Felidae	4
Leopard	Panthera pardus	Felidae	4
Bison	Bison bison	Bovidae	2
Cape Buffalo	Syncerus caffer caffer	Bovidae	2
Domestic Dog	Canis lupus familiaris	Canidae	2
Grey Wolf	Canis lupus	Canidae	2
Elk	Cervus canadensis	Deer	1
Indian Cobra	Naja naja	Elapidae	1
Horse	Equus ferus caballus	Equidae	1
Grizzly Bear	Ursus arctos horribilis	Ursidae	1

As seen above, using PARTITION BY preserves the granularity of the data, allowing us to see that the leopard, for example, is in a family of four while the elk, cobra, horse, and bear are the only individuals in their respective families.

Practice Question 2: Most Expensive Department

In this problem, you have been given two tables, one containing basic employee information and another containing their salary and department details. The goal is to identify the most expensive department while keeping track of each employee’s earnings.

Solution

This problem is more challenging because you’ll need to join two tables in addition to using PARTITION BY. It also requires using the SUM function to add each department’s salaries.

Step 1: Create the database and tables.

DROP DATABASE IF EXISTS `Olive_Inc`;
CREATE DATABASE `Olive_Inc`;
USE `Olive_Inc`;

CREATE TABLE employee_basics (
  employee_id INT NOT NULL,
  first_name VARCHAR(50),
  last_name VARCHAR(50),
  age INT,
  gender VARCHAR(10),
  birth_date DATE,
  PRIMARY KEY (employee_id)
);

CREATE TABLE employee_salary (
  employee_id INT NOT NULL,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL,
  occupation VARCHAR(50),
  salary INT,
  dept_id INT
);

Step 2: Populate the tables.

INSERT INTO employee_basics (employee_id, first_name, last_name, age, gender, birth_date)
VALUES
(1,'George', 'Bishop', 51, 'Male','1973-03-25'),
(2,'Tom', 'Ludley', 36, 'Male', '1988-01-04'),
(3,'Rebecca', 'McDouglas', 41, 'Female', '1982-06-04'),
(4, 'Jerry', 'Astley', 29, 'Male', '1994-06-30'),
(5, 'Angela', 'Warren', 33, 'Female', '1990-10-13'),
(6, 'Donald', 'Duckwing', 40, 'Male', '1983-08-30'),
(7, 'Sandy', 'Perkins', 45, 'Female', '1978-12-15'),
(8, 'Rick', 'Jones', 43, 'Male', '1981-03-22'),
(9, 'Wyatt', 'Ridge', 23, 'Male', '2001-04-26'),
(10, 'Dwight', 'Langley', 39, 'Male', '1985-03-02'),
(11, 'Mark', 'Austine', 40, 'Male', '1984-06-30'),
(12, 'Page', 'Brooks', 26, 'Female', '1998-03-27');

INSERT INTO employee_salary (employee_id, first_name, last_name, occupation, salary, dept_id)
VALUES
(1, 'George', 'Bishop', 'Senior Software Engineer', 145000, 1),
(2, 'Tom', 'Ludley', 'Software Engineer', 105000, 1),
(3, 'Rebecca', 'McDouglas', 'Software Engineer', 101000, 1),
(4, 'Jerry', 'Astley', 'Office Secretary', 50000, 2),
(5, 'Angela', 'Warren', 'Human Resource Manager', 65000, 3),
(6, 'Donald', 'Duckwing', 'Data Engineer', 70000, 4),
(7, 'Sandy', 'Perkins', 'Data Scientist', 100000, 4),
(8, 'Rick', 'Jones', 'Assistant Human Resource Manager', 45000, 3),
(9, 'Wyatt', 'Ridge', 'Office Administrator', 45000, 2),
(10, 'Dwight', 'Langley', 'Receptionist', 40000, 2),
(11, 'Mark', 'Austine', 'Senior Data Analyst', 100000, 4),
(12, 'Page', 'Brooks', 'Data Analyst', 60000, 4)

Step 3: Use the SQL PARTITION BY and SUM functions to calculate departmental totals.

SELECT sal.first_name, sal.last_name, occupation, dept_id, salary,  
SUM(salary) OVER(PARTITION BY dept_id) AS dept_total
FROM employee_salary AS sal
INNER JOIN employee_basics AS dem
	ON sal.employee_id = dem.employee_id
;

This should return the table below.

first_name	occupation	dept_id	salary	dept_total
George	Senior Software Engineer	1	145000	351000
Tom	Software Engineer	1	105000	351000
Rebecca	Software Engineer	1	101000	351000
Jerry	Office Secretary	2	50000	135000
Wyatt	Office Adminstrator	2	45000	135000
Dwight	Receptionist	2	40000	135000
Angela	Human Resource Manager	3	65000	110000
Rick	Assistant Human Resource Manager	3	45000	110000
Donald	Data Engineer	4	70000	330000
Sandy	Data Scientist	4	100000	330000
Mark	Senior Data Analyst	4	100000	330000
Page	Data Analyst	4	60000	330000

Thanks to the PARTITION BY function, we are not only able to see the most expensive department, but we can also compare the total to each individual’s salary.

Practice Question 3: Most Dangerous Animal Family

In this third question, you’ll have to combine the table in the first question with another table showing the number of fatalities caused by each animal to identify the most deadly family.

Solution

This question also requires joining two tables and calculating a sum. However, this question will show why the SQL PARTITION BY function is so useful in data analysis.

Step 1: Create the second table and populate it.

CREATE TABLE fatalities (
  animal_id INT NOT NULL,
  common_name VARCHAR(50),
  kills INT
);

INSERT INTO fatalities (animal_id, common_name, kills)
VALUES
(1, 'Domestic Cat', 0),
(2, 'Domestic Dog', 30000),
(3, 'Grizzly Bear', 1),
(4, 'Cougar', 0),
(5, 'Jaguar', 4),
(6, 'Elk', 7),
(7, 'Bison', 0),
(8, 'Leopard', 55),
(9, 'Cape Buffalo', 200),
(10, 'Grey Wolf', 3),
(11, 'Horse', 6),
(12, 'Indian Cobra', 58000);

Step 2: Sum the number of fatalities caused by each animal family.

SELECT basics.common_name, family, kills,
SUM(kills) OVER(PARTITION BY family) AS family_kills,
COUNT(basics.common_name) OVER(PARTITION BY family) AS family_total
FROM animal_basics AS basics
INNER JOIN fatalities AS deaths
	ON basics.animal_id = deaths.animal_id
ORDER BY family_kills DESC

This code will produce the table below.

common_name	family	kills	family_kills	family_total
Indian Cobra	Elapidae	58000	58000	1
Domestic Dog	Canidae	30000	30003	2
Grey Wolf	Canidae	3	30003	2
Bison	Bovidae	0	200	2
Cape Buffalo	Bovidae	200	200	2
Domestic Cat	Felidae	0	59	4
Cougar	Felidae	0	59	4
Jaguar	Felidae	4	59	4
Leopard	Felidae	55	59	4
Elk	Deer	7	7	1
Horse	Equidae	6	6	1
Grizzly Bear	Ursidae	1	1	1

An interesting feature seen in this table is that although the wolf is in a family responsible for 30,003 fatal attacks, it’s only responsible for 3 of these, revealing that the domestic dog is far more dangerous. If the GROUP BY function was used, it would only return the family names and the number of fatalities for each. As a result, the key detail seen above would be lost in the summary.

Conclusion

PARTITION BY is a window function that allows users to split records based on one feature and perform a process separately on each group. It is similar to the GROUP BY function. However, it does not return a summary of the records. It outputs the results of the operation alongside individual records, preserving their granularity and making data analysis easier. SQL PARTITION BY is commonly used with aggregate functions such as AVG, MAX, and MIN but can also be used with RANK, DENSE_RANK, ORDER_BY, and other non-aggregate functions.

If you would like to gain a better understanding of important SQL functions, Interview Query is here to assist you. We have articles explaining different SQL functions, including GROUP BY, JOINs, CASE WHEN, and Cumulative Sum. If you’re interested in a structured approach to improving your SQL skills, you can also check out our SQL learning path, which is designed to help you prepare for SQL interview questions. You can also find more SQL interview questions in our question bank. Finally, if you’ve got an interview coming up, feel free to try the other resources we offer, including our AI interviewer and company interview guides.

SQL PARTITION BY can help you level up your analytics skills when using SQL, and we hope this guide has helped you understand when and how to use it.

How to Gain SQL Experience: A Step-by-Step Guide for Beginners Top Sklearn Datasets for Machine Learning Projects Common SQL Interview Mistakes and How to Avoid Them Python vs SQL: Which One Should You Use and Why?Data Engineer Jobs & Career Paths: Roles, Salaries, Expectations