
Data Analytics Interview
0 of 84 Completed
Introduction to Easy SQL questions
SQL JOINs
Download Facts
GROUP BY and HAVING
Total Spent On Products
Largest Salary By Department
ORDER BY and LIMIT
Average Quantity
Miscellaneous Functions and Clauses
Manager Team Sizes
Session Difference
Top 5 Turnover Risk
GROUP BY and HAVING
GROUP BY
In the beginner level course, we saw how aggregate functions can be used to summarize metrics from a table. However, their real utility comes when we combine them with the GROUP BY
clause.
Let’s say we have a sales
table with the following information:
employeeID | date | amount |
---|---|---|
1 | 2021-01-01 | 100.00 |
2 | 2021-01-01 | 200.00 |
1 | 2021-01-02 | 150.00 |
3 | 2021-01-03 | 300.00 |
2 | 2021-01-03 | 150.00 |
4 | 2021-01-04 | 400.00 |
If we wanted to know the total amount sold by each employee with the tools we’ve learned so far, we would need a separate query for each of them.
In this case, this would mean that we would find the total amount sold by each employee separately. For example, to find the amount sold by the first employee, we could use the following query:
SELECT employeeID, SUM(amount) AS total_sold
FROM employees
WHERE employeeID = 1
The WHERE
clause filters the sales done by the first employee:
employeeID | date | amount |
---|---|---|
1 | 2021-01-01 | 100.00 |
1 | 2021-01-02 | 150.00 |
And then, the aggregation function finds the total amount sold for this smaller table. The output of the query would be:
employeeID | total_sold |
---|---|
1 | 250.00 |
If the table is too large, this approach is inconvenient, as it would need to find the total amount sold for each employee separately.
SQL allows us to solve this problem with a single query, using the GROUP BY
clause. The GROUP BY
clause lets us:
- Divide our table into smaller tables according to a single column.
- Perform an aggregate function over each of them.
- Integrate the results into a single table again.
In this case, the query we would use is:
SELECT employeeID, SUM(amount) AS total_sol
FROM employees
GROUP BY employeeID
Our output would be:
employeeID | total_amount_sold |
---|---|
1 | 250.00 |
2 | 350.00 |
3 | 300.00 |
4 | 400.00 |
The logic of it works in the following way:
- The
GROUP BY
clause divides our original table into four smaller tables according to theiremployeeID
.
Table_1
:
employeeID | date | amount |
---|---|---|
1 | 2021-01-01 | 100.00 |
1 | 2021-01-02 | 150.00 |
Table_2
:
employeeID | date | amount |
---|---|---|
2 | 2021-01-01 | 200.00 |
2 | 2021-01-03 | 150.00 |
Table_3
:
employeeID | date | amount |
---|---|---|
3 | 2021-01-03 | 300.00 |
Table_4
:
employeeID | date | amount |
---|---|---|
4 | 2021-01-04 | 400.00 |
- Then, we select
employeeID
andSUM(amount) AS total_amount
from each of the tables:
Table 1:
employeeID | total_amount_sold |
---|---|
1 | 250.00 |
Table 2:
employeeID | total_sold |
---|---|
2 | 350.00 |
Table 3:
employeeID | total_sold |
---|---|
3 | 300.00 |
Table 4:
employeeID | total_sold |
---|---|
4 | 400.00 |
- Finally, the results are integrated into a single table:
employeeID | total_sold |
---|---|
1 | 250.00 |
2 | 350.00 |
3 | 300.00 |
4 | 400.00 |
In reality, the GROUP BY
clause does all those steps at once, and is more efficient than carrying them all out separately.
We can use the GROUP BY
clause with any aggregate function we need. We can specify more than one argument in the GROUP BY
clause by using a separating comma. The original tables will be divided into smaller groups for which all the columns in the GROUP BY
clause have the same values.
For example, GROUP BY location_id, employee_id
would divide a table into smaller groups that share the same location_id
and employee_id
.
HAVING
The HAVING
clause is a filter for the groups made by the GROUP BY
. It works similarly to the WHERE
clause, but it’s applied to groups instead of single rows.
For example, if we wanted to find the total amount sold for employees who performed more than one sale, we could do:
SELECT employeeID,
SUM(amount) AS total_sold
FROM employees
GROUP BY employeeID
HAVING COUNT(*) > 1
Here, the GROUP BY
clause would divide our table into groups according to their employeeID
and then filter each of the groups according to our condition. In this case, it would discard the smaller tables 3 and 4, as they only have one row.
After filtering the groups, it continues just as the GROUP BY
clause does. It performs the aggregate functions in the SELECT
on each of the groups and integrates the results into a single table.
So far:
Example query
SELECT first_name, SUM(salary)
FROM employees
JOIN sales
ON employee.id = sales.employeeID
WHERE salary > 7500 AND last_name != "Williams"
GROUP BY employee.id
HAVING COUNT(*) > 1
Execution order
FROM
-> Gets input tablesJOIN … ON
-> Merges input tablesWHERE
-> Filters rows according to conditionGROUP BY
-> Separates into smaller tablesHAVING
-> Filters smaller tables according to conditionSELECT
-> Selects the columns we need (and integrates smaller tables)SUM()
and aggregate functions or numerical operations performed on output.
0%
CompletedYou have 84 sections remaining on this learning path.