How to Use GROUP BY in SQL With Examples (Updated for 2025)

Written by IQ Team

IQ Team

Reviewed by IQ Team

IQ Team

Published December 18, 2024

Estimated reading time: 15 minutes

Table of contents

Overview

What Does SQL GROUP BY Clause Do?

When and How to Use SQL GROUP BY and Advanced Techniques

SQL GROUP BY Interview Questions

The Bottom Line

Overview

Owing to its high integrability and easy-to-learn syntax, SQL is heavily used in data analysis and software development. While various relational database management systems exist, Oracle was the first provider to offer a commercial system.

The GROUP BY Clause is among the most used statements in SQL, as it allows the user to group rows that have the same values in the specified columns. It’s often used in conjunction with functions like COUNT, SUM, AVG, MIN, MAX, WHERE, ORDER BY, HAVING, etc., to perform calculations on grouped data.

In this article, we’ll go through how to use GROUP BY in SQL and address some real-world questions you might encounter in upcoming interviews.

What Does SQL GROUP BY Clause Do?

As mentioned, the SQL GROUP BY clause gathers rows with the same values in specified columns according to user preference. It also allows the aggregate functions to be applied to the grouped data.

For example,

Suppose we have a table called orders that contains the following columns: order_id, customer_id, order_date, order_amount.

Table: orders

order_id	customer_id	order_date	order_amount
1	1	2023-01-10	100
2	2	2023-01-12	200
3	1	2023-01-15	150
4	3	2023-01-17	250
5	2	2023-01-20	100

We want to find out the total amount of orders placed by each customer.

SQL Query Using GROUP BY

SELECT customer_id, SUM(order_amount) AS total_order_amount
FROM orders
GROUP BY customer_id;

Result

customer_id	total_order_amount
1	250
2	300
3	250

Explanation

The SELECT statement in the query selects the customer_id and the sum of order_amount. The SUM aggregate function is used to calculate the total amount for each customer. The GROUP BY statement groups the rows in the orders by customer_id.

For customer_id = 1, the total order amount is 100 + 150 = 250.
For customer_id = 2, the total order amount is 200 + 100 = 300.
For customer_id = 3, the total order amount is 250.

This example demonstrates the basic functionality of the GROUP BY clause. We’ll delve into more advanced use cases throughout the article.

When and How to Use SQL GROUP BY and Advanced Techniques

Some of the practical use cases of GROUP BY in SQL include:

Summarizing sales data by region, product, and salesperson
Analyzing survey results by demographics like age group, location, etc.
Summarizing expenses by department or project
Aggregating stock levels by product category

Let’s explore the GROUP BY techniques that your interviewers may expect you to know:

Aggregating Group Data

When you need to calculate summary statistics such as sums, averages, counts, etc., for groups of rows.

Example

To calculate the total sales amount for each product from a sales table:

Table: sales

sale_id	product_id	product_name	sale_amount
1	101	Widget A	100
2	102	Gadget B	200
3	101	Widget A	150
4	103	Doohickey C	250
5	102	Gadget B	100

Query: Total sales amount for each product

SELECT product_id, product_name, SUM(sale_amount) AS total_sales
FROM sales
GROUP BY product_id, product_name;

Result

product_id	product_name	total_sales
101	Widget A	250
102	Gadget B	300
103	Doohickey C	250

Counting Occurrences

When you need to count the number of occurrences of specific values in a column.

Example To count the number of orders placed by each customer from an orders table:

Table: orders

order_id	customer_id	customer_name
1	1	John Doe
2	2	Jane Smith
3	1	John Doe
4	3	Mary Brown
5	2	Jane Smith

Query: Number of orders placed by each customer

SELECT customer_id, customer_name, COUNT(order_id) AS order_count
FROM orders
GROUP BY customer_id, customer_name;

Result

customer_id	customer_name	order_count
1	John Doe	2
2	Jane Smith	2
3	Mary Brown	1

Finding Maximum and Minimum Values

When you need to find the maximum or minimum values within groups.

Example

To find the highest sale amount for each product from a sales table:

Table: sales

sale_id	product_id	product_name	sale_amount
1	101	Widget A	100
2	102	Gadget B	200
3	101	Widget A	150
4	103	Doohickey C	250
5	102	Gadget B	100

Query: Highest sale amount for each product

SELECT product_id, product_name, MAX(sale_amount) AS highest_sale
FROM sales
GROUP BY product_id, product_name;

Result

product_id	product_name	highest_sale
101	Widget A	150
102	Gadget B	200
103	Doohickey C	250

Grouping By Multiple Columns

When you need to perform grouping based on multiple columns to get more granular summary statistics.

Example

To calculate the total sales amount for each product in each region from a sales table:

Table: sales

sale_id	product_id	product_name	region	sale_amount
1	101	Widget A	North	100
2	102	Gadget B	South	200
3	101	Widget A	North	150
4	103	Doohickey C	East	250
5	102	Gadget B	South	100

Query: Total sales amount for each product in each region

SELECT product_id, product_name, region, SUM(sale_amount) AS total_sales
FROM sales
GROUP BY product_id, product_name, region;

Result

product_id	product_name	region	total_sales
101	Widget A	North	250
102	Gadget B	South	300
103	Doohickey C	East	250

Analyzing Categorical Data

When you need to analyze data that is categorized by specific columns.

Example

To find the average rating for each category of products from a product_reviews table:

Table: product_reviews

review_id	product_id	category	rating
1	101	Electronics	4.5
2	102	Home Appliances	4.0
3	103	Tools	3.8
4	104	Automotive	4.2
5	105	Kitchen	4.7
6	106	Outdoor	3.9
7	107	Electronics	4.6

Query: Average rating for each category of products

SELECT category, AVG(rating) AS average_rating
FROM product_reviews
GROUP BY category;

Result

category	average_rating
Electronics	4.55
Home Appliances	4.00
Tools	3.80
Automotive	4.20
Kitchen	4.70
Outdoor	3.90

Filtering Grouped Data with the HAVING Clause

When you need to filter grouped data using the HAVING clause, which allows you to specify conditions on aggregate functions.

Example

To find customers who have placed more than 5 orders from an orders table:

Table: orders

order_id	customer_id	customer_name
1	1	John Doe
2	2	Jane Smith
3	1	John Doe
4	3	Mary Brown
5	2	Jane Smith
6	1	John Doe
7	4	Paul Green

Query: Customers who have placed more than 2 orders

SELECT customer_id, customer_name, COUNT(order_id) AS order_count
FROM orders
GROUP BY customer_id, customer_name
HAVING COUNT(order_id) > 2;

Result

customer_id	customer_name	order_count
1	John Doe	3

Combining with WHERE and ORDER BY Clause

When you need to use the GROUP BY clause in conjunction with other SQL clauses like WHERE, ORDER BY, etc.

Example

To calculate the average salary by department for employees who have been with the company for more than 5 years from an employees table:

Table: Employees

employee_id	employee_name	department	salary	years_with_company
1	Alice Smith	Sales	50000	6
2	Bob Johnson	HR	60000	7
3	Carol White	IT	70000	5
4	Dave Black	Marketing	55000	4
5	Emma Blue	Finance	62000	8

Query: Average salary by department for employees who have been with the company for more than 5 years

SELECT department, AVG(salary) AS average_salary
FROM employees
WHERE years_with_company > 5
GROUP BY department
ORDER BY average_salary DESC;

Result

department	average_salary
HR	60000
Finance	62000
Sales	50000

Common Pitfalls of Using GROUP BY in SQL

Even the most experienced data analysts may encounter the common pitfalls of using GROUP BY in SQL. When responding to interview questions, consider reviewing your query for these common mistakes to strengthen your candidacy.

Confusing WHERE and HAVING Clause

The HAVING clause is used to filter groups after they have been aggregated, whereas the WHERE clause is used to filter rows before aggregation.

An incorrect query might look like:

SELECT product_id, SUM(sale_amount) 
FROM sales 
WHERE SUM(sale_amount) > 100 
GROUP BY product_id;

And, the correct query would be:

SELECT product_id, SUM(sale_amount) 
FROM sales 
GROUP BY product_id 
HAVING SUM(sale_amount) > 100;

Performance Issues with Large Datasets

Using GROUP BY on large datasets can be slow if not optimized properly. Without indexes or other optimizations, the database may need to perform a full table scan. Furthermore, indexing also helps optimize multiple Join statements within a query.

Incorrect Results with NULL Values

NULL values in columns used for grouping can lead to groups that are difficult to interpret. SQL treats NULL as a separate group and needs to be replaced with something more interpretable.

Example

Consider the sales table with some NULL values in the region column.

sale_id	product_id	product_name	region	sale_amount
1	101	Widget A	North	100
2	102	Gadget B	South	200
3	101	Widget A	NULL	150
4	103	Doohickey C	East	250
5	102	Gadget B	NULL	100

An incorrect query that may lead to confusion:

SELECT region, COUNT(*)
FROM sales
GROUP BY region;

This query will cluster NULL as well, which might be unexpected.

Handling NULL values:

SELECT COALESCE(region, 'Unknown') AS region, COUNT(*)
FROM sales
GROUP BY COALESCE(region, 'Unknown');

Using COALESCE, NULL values are replaced with ‘Unknown’, making the results clearer.

Grouping on Too Many Columns

Except for specific use cases, grouping on too many columns can lead to overly detailed results, making it hard to draw meaningful conclusions. Consider applying thoughtful aggregations and more meaningful queries to reach better conclusions.

Misinterpreting the Results

Misinterpreting grouped data, especially when using multiple aggregate functions, can lead to incorrect conclusions.

Example

Consider the sales table with an additional quantity column.

sale_id	product_id	product_name	sale_amount	quantity
1	101	Widget A	100	2
2	102	Gadget B	200	1
3	101	Widget A	150	3
4	103	Doohickey C	250	1
5	102	Gadget B	100	2

Misleading query:

SELECT product_id, SUM(sale_amount), AVG(quantity)
FROM sales
GROUP BY product_id;

The result shows the total sales and the average quantity per order, which may be misinterpreted.

Clearer query:

SELECT product_id, SUM(sale_amount) AS total_sales, AVG(quantity) AS avg_quantity
FROM sales
GROUP BY product_id;

Renaming the columns in the result makes their meaning clearer.

SQL GROUP BY Interview Questions

Question

Topics

Difficulty

Ask Chance

Lasso vs Ridge

Machine Learning

Easy

Very High

Experiment Validity

A/B Testing

Medium

Very High

Random Bucketing

A/B Testing

Medium

Very High

Oujehzp Wmsimhrm

SQL

Medium

Ctws Aghdj Recdp

Machine Learning

Hard

High

Ekulj Zwwpyuod

SQL

Easy

Very High

Eqvfze Qbkn Oelxu Afviphdw Kkxyok

Analytics

Easy

Medium

Qnup Nvzsnth Gjlw Pejujk Mztusq

SQL

Medium

High

Lazab Vwrl

SQL

Medium

High

Zrdo Kdmb Qgnu

Machine Learning

Easy

High

Kkkajjz Shlkxtah Wxijeg Umlbqqa Ktioea

SQL

Easy

Low

Jrxvtv Wpcj Iulpdxsk Ftcedbot

Machine Learning

Hard

Medium

Tyvfng Hbcdskdv Ftnexcs

Machine Learning

Medium

Zqypvxfh Mdpxzcp Okvhxi Nyje Svnevgn

Analytics

Easy

High

Pcnrfhdo Ulbpbqzk

Analytics

Hard

Medium

Qddfs Lgpf

Machine Learning

Medium

Very High

Mpxlipm Pgyfu

Machine Learning

Medium

Vnzbltl Xtssvl Jlanozdk Vgsh

Analytics

Easy

High

Cmiiayr Ccijt

Analytics

Medium

Very High

Sohrpg Xqur

Machine Learning

Hard

High

Loading pricing options

View all questions

By learning effective SQL and practicing, you can interpret tables accurately and write queries that generate optimal solutions. Here are a few SQL interview questions requiring the GROUP BY clause to solve. You may encounter these in your upcoming interview:

To finish a class, students must pass four exams (exam ids: 1,2,3, and 4). Given a table exam_scores containing the data about all of the exams that students took, form a new table to track the scores for each student.

Note: Students took each exam only once.

Example:

For the given input:

student_id	student_name	exam_id	score
100	Anna	1	71
100	Anna	2	72
100	Anna	3	73
100	Anna	4	74
101	Brian	1	65

The expected output should be:

student_name	exam_1	exam_2	exam_3	exam_4
Anna	71	72	73	74
Brian	65	NULL	NULL	NULL

Input:

exam_scores table

Column	Type
student_id	INTEGER
student_name	VARCHAR
exam_id	INTEGER
score	INTEGER

Output:

Column	Type
student_name	VARCHAR
exam_1	INT
exam_2	INT
exam_3	INT
exam_4	INT

Let’s say you’re working in HR at a major tech company, and they ask you to find employees with a high probability of leaving the company. They tell you these employees perform well but have the lowest pay.

Given two tables, employees and projects, find the five lowest-paid employees who have completed at least three projects.

Note: We consider projects to be completed when they have an end date, which is the same as saying their End_dt is not NULL.

Example:

Input:

employees table

id	salary
INTEGER	FLOAT

projects table

employee_id	project_id	Start_dt	End_dt
INTEGER	INTEGER	DATETIME	DATETIME

Output:

employee_id
INTEGER

Let’s say you work in air traffic control. You are given the table below containing information on flights between two cities. Write a query to find out how much time, in minutes (rounded down), each plane spent in the air each day.

Note: Both cities are in the same time zone, so you do not need to worry about converting time zones.

flights table

Example:

Input:

Columns	Type
id	INTEGER
destination_location	VARCHAR
source_location	VARCHAR
plane_id	INTEGER
flight_start	DATETIME
flight_end	DATETIME

Output:

Columns	Type
plane_id	INTEGER
calendar_day	VARCHAR
time_in_min	INTEGER

Please give calendar_day in the format YYYY-MM-DD.

The Bottom Line

Hopefully, you’ll now be able to utilize the inherent grouping capabilities of SQL’s GROUP BY clause to count concurrencies, aggregate group data, filter data, and combine other clauses. To delve deeper into SQL functions and prepare for interview questions, explore our SQL Learning Path and Company Interview Guides. Best of luck with your SQL journey!

SQL Compare Dates Is Data Science a Good Major in 2025?What Is a Business Analyst? Career Path, Salary & Key Skills in 2025 February Data Science Job Market Report (2025)Cohort Study vs. Case Control