Aggregating data is one of the most common actions in SQL. These operations include calculating averages, minimums, and maximums. There are different ways of aggregating data in SQL, but using the PARTITION BY clause is one of the most useful.
The GROUP BY function may be simpler, but as you’ll see below, its implementation is limited when you need to compare many records without losing sight of individual ones. This makes it harder to compare individual entries against aggregates.
In this article, we’ll explain the PARTITION BY clause and how you can use it when querying databases.
The PARTITION BY() clause in SQL is a window function used to define how records should be separated, i.e., partitioned, before an aggregation is performed. It must be used inside the OVER() clause.
For example, in a table with the names, genders, and ages of employees, PARTITION BY can be used to separate the records based on gender before the average or maximum ages for each gender are calculated.
Going by the above example, PARTITION BY seems no different from GROUP BY. However, GROUP BY is intended for quick aggregations and reporting. The output is, therefore, a summary of the results from aggregating groups of records. Individual rows of records are condensed into a few rows representing each group, as shown below.
The equivalent process using the PARTITION BY clause results in an aggregation that is shown alongside individual records, as shown below.
This gives you the ability to perform operations without losing sight of individual records. Even details such as user names and the age of each person can be displayed, making it easier to compare an individual’s data to the aggregate.
The PARTITION BY clause can only be used inside the OVER clause. The OVER clause is what defines the window within which window functions are performed. It’s also what preserves the row-wise data and can be used without the PARTITION BY clause, as shown below in the average age column.
The PARTITION BY clause can only be used inside the OVER clause. The OVER clause is what defines the window within which window functions are performed. It’s also what preserves the row-wise data and can be used without the PARTITION BY clause, as shown below in the average age column.
Using PARTITION BY requires understanding the syntax and the different operations that can benefit from having a partitioned dataset.
The syntax for using PARTITION BY is as follows:
SELECT column1, column2,…, WINDOW FUNCTION(column_name1) OVER(PARTITION BY(column_name2)
Multiple columns in the original data set can be selected alongside the partitioned column. Column_name1 represents the column that will have an aggregation or other function performed on it, while column_name2 is the column that will be used to group rows of data before the window function is performed on each separate group.
Aggregate functions are commonly paired with PARTITION BY. Some calculations performed by aggregate functions include:
In addition to aggregate functions, there are other functions used with PARTITION BY for other purposes. Some examples include:
You have been given a table with columns for employee IDs, first names, last names, job roles, salaries, and department IDs. Compare the salaries of employees with the average, maximum, and minimum salaries in their respective departments.
Create the table containing employee salary information.
CREATE TABLE employee_salary (
employee_id INT NOT NULL,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
occupation VARCHAR(50),
salary INT,
dept_id INT
);
Insert values into the table above.
INSERT INTO employee_salary (employee_id, first_name, last_name, occupation, salary, dept_id)
VALUES
(1,'George', 'Bishop', 'Senior Software Engineer', 145000, 1),
(2,'Tom', 'Ludley', 'Software Engineer', 105000, 1),
(3,'Rebecca', 'McDouglas', 'Software Engineer', 101000, 1),
(4, 'Jerry', 'Astley', 'Office Secretary', 50000, 2),
(5, 'Angela', 'Warren', 'Human Resource Manager', 65000, 3),
(6, 'Donald', 'Duckwing', 'Data Engineer', 70000, 4),
(7, 'Sandy', 'Perkins', 'Data Scientist', 100000, 4),
(8, 'Rick', 'Jones', 'Assistant Human Resource Manager', 45000, 3),
(9, 'Wyatt', 'Ridge', 'Office Adminstrator', 45000, 2),
(10, 'Dwight', 'Langley', 'Receptionist', 40000, 2),
(11, 'Mark', 'Austine', 'Senior Data Analyst', 100000, 4),
(12, 'Page', 'Brooks', 'Data Analyst', 60000, 4)
Use the PARTITION BY clause to return a table with first names, last names, and salaries. The table will also contain average, maximum, and minimum salaries per department.
SELECT first_name, last_name, dept_id, salary,
AVG(salary) OVER(PARTITION BY dept_id) AS AVG_SAL,
MAX(salary) OVER(PARTITION BY dept_id) AS MAX_SAL,
MIN(salary) OVER(PARTITION BY dept_id) AS MIN_SAL
FROM employee_salary;
This resulting table is shown below.
In addition to the table in example 1, you have been provided with a second table containing demographic information, including age and gender. Write a query that groups the male and female employees separately and ranks them from the highest to the lowest earners.
Create the table for employee demographics and add data. Note that the employee ID in this table MUST be the same as the employee ID in the salaries table.
CREATE TABLE employee_basics (
employee_id INT NOT NULL,
first_name VARCHAR(50),
last_name VARCHAR(50),
age INT,
gender VARCHAR(10),
birth_date DATE,
PRIMARY KEY (employee_id)
);
INSERT INTO employee_basics (employee_id, first_name, last_name, age, gender, birth_date)
VALUES
(1,'George', 'Bishop', 51, 'Male','1973-03-25'),
(2,'Tom', 'Ludley', 36, 'Male', '1988-01-04'),
(3,'Rebecca', 'McDouglas', 41, 'Female', '1982-06-04'),
(4, 'Jerry', 'Astley', 29, 'Male', '1994-06-30'),
(5, 'Angela', 'Warren', 33, 'Female', '1990-10-13'),
(6, 'Donald', 'Duckwing', 40, 'Male', '1983-08-30'),
(7, 'Sandy', 'Perkins', 45, 'Female', '1978-12-15'),
(8, 'Rick', 'Jones', 43, 'Male', '1981-03-22'),
(9, 'Wyatt', 'Ridge', 23, 'Male', '2001-04-26'),
(10, 'Dwight', 'Langley', 39, 'Male', '1985-03-02'),
(11, 'Mark', 'Austine', 40, 'Male', '1984-06-30'),
(12, 'Page', 'Brooks', 26, 'Female', '1998-03-27');
Write the code to join the two tables and rank male and female employees based on salaries.
SELECT sal.first_name, sal.last_name, occupation, salary, gender,
RANK() OVER(PARTITION BY gender ORDER BY salary DESC) AS Pay_Rank
FROM employee_salary AS sal
INNER JOIN employee_basics AS dem
ON sal.employee_id = dem.employee_id
;
The resulting table should appear as shown below.
Since the RANK clause was used and not the DENSE_RANK, the pay rank for male employees goes from 6 to 8.
This is a window function that gives each row a number. The number for each row in a partition is unique, but numbers restart at 1 in the next partition.
Unlike the RANK() function, the row numbers are not assigned based on a quantitative measure and simply label the rows from 1 to the number corresponding to the last value in each partition.
The PARTITION BY clause has many advantages. It organizes records into meaningful groups while producing useful aggregates. There is less data to be processed at any time since the calculations for each partition are handled separately. The flexibility of this clause makes it easier to handle complicated calculations inside SQL.
The results from the DENSE_RANK function often look identical to that of the ROW_NUMBER function. However, similar to the RANK function, the results from DENSE_RANK come from a comparison of data in a specific column, while those assigned by ROW_NUMBER are not the result of a comparison.
The SQL PARTITION BY clause is used to divide the result set into partitions and perform calculations across each partition separately, while GROUP BY aggregates results into groups. Use PARTITION BY when you need to perform calculations that require visibility into individual rows within each partition, such as calculating running totals or ranking data. GROUP BY is useful for summarizing data into aggregated results.
The SQL PARTITION BY clause enhances the performance of analytic queries by allowing you to perform calculations over partitions of your data rather than the entire dataset. This means you can compute aggregates like running totals or averages within subsets of your data, preserving individual record details. Unlike using aggregate functions alone, which group all records together and lose granular data, PARTITION BY maintains the context of each record within its partition, enabling more precise and efficient data analysis.
SQL PARTITION BY Clause is incredibly useful when you’re working with window functions. It makes it easy to calculate different metrics while retaining the original rows of data making comparison easy. The syntax is also relatively easy to understand and use.
Interview Query offers a wide range of resources to prepare you for your next tech interview. Our question bank contains SQL questions for different experience levels, and we also offer interview guides for specific tech positions in leading companies. You can also visit our blog for other useful articles on SQL and other topics.
If you’d like to use SQL for data aggregation but find the GROUP BY clause insufficient, the PARTITION BY clause will supercharge your efforts.