SQL (Structured Query Language) is one of the most important querying languages in use today to access and transform relational databases. A common SQL problem statement is to perform a database manipulation based on conditional logic.
The CASE WHEN
statement provides flexibility in expressing conditional logic as it can handle multiple conditions. To perform conditional aggregation, the SUM
function combined with CASE WHEN
in SQL will achieve optimized results.
In this article, we’ll briefly touch upon the SUM() and CASE() functions, some common use cases, and how the two functions can be nested. We will also explore some real-life examples for practice and provide a few useful tips to ensure error-free, optimized code.
The CASE
function will help you transform data in SQL with conditional logic, much like an IF-ELSE statement.
The CASE
expression goes through several conditions and returns a value when the first condition is met. So, once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the ELSE
clause.
Let’s say you’ve been asked to categorize employees based on their yearly pay.
EmployeeID | EmployeeName | Salary |
---|---|---|
1 | John Doe | 55000 |
2 | Jane Smith | 60000 |
3 | Alice Johnson | 120000 |
4 | Bob Williams | 45000 |
5 | Eva Davis | 120000 |
Here is the query to achieve the given operation.
SELECT
EmployeeName,
Salary,
CASE
WHEN salary >= 50000 AND salary < 70000 THEN 'Grade A'
WHEN salary >= 70000 AND salary < 90000 THEN 'Grade B'
WHEN salary >= 90000 THEN 'Grade C'
ELSE 'Grade D'
END AS Grade
FROM
employees;
Here is the desired result:
EmployeeID | EmployeeName | Salary | Grade |
---|---|---|---|
1 | John Doe | 55000 | Grade A |
2 | Jane Smith | 60000 | Grade A |
3 | Alice Johnson | 120000 | Grade B |
4 | Bob Williams | 45000 | Grade D |
5 | Eva Davis | 120000 | Grade C |
In this example, the CASE
statement handles multiple conditions to categorize employees into different pay grades. Each WHEN
condition checks a specific range of salaries and assigns the corresponding pay grade. The ELSE
statement handles all other conditions that are not included in the WHEN
conditions.
The SUM
function in SQL is an aggregate function used for calculating the total sum of values in a specific column. It is most often used with the GROUP BY
clause to calculate totals within groups. It can also be used along with DISTINCT
to calculate the sum of distinct values in a column or with other arithmetic operations such as COUNT
to perform complex calculations.
Note that SUM
ignores NULL
values. If a row contains a NULL
value in the column being summed, it is not included in the calculation. This feature can help when handling missing data.
Using SUM
with CASE WHEN
allows you to perform conditional summation in SQL. It’s particularly useful when you want to aggregate values based on specific conditions. Consider it the equivalent of “pivoting” data - combining multiple rows by category and an aggregate function.
SUM(CASE WHEN condition THEN value ELSE alternative END)
In this syntax:
condition
: The criteria that must be met for the THEN
clause to execute.value
: The value to sum if the condition is met.THEN
: Specifies the result if the condition evaluates to true.ELSE alternative
: If the condition is not met, this value is used. Often set to 0 so it doesn’t affect the sum.END
: Concludes the CASE
expression.SELECT
SUM(CASE WHEN column_name > 100 THEN column_name ELSE 0 END) AS conditional_sum
FROM
table_name;
In this example, we’re summing all the values in column_name
where each value is greater than 100. If a value is 100 or less, it does not contribute to the sum.
Now for the above illustration, the same result can be achieved more efficiently by excluding the rows with values 100 or less with the WHERE
clause. However, combining SUM with CASE-WHEN is necessary when:
CASE WHEN
inside SUM
allows you to differentiate the sums based on varying conditions. It enables you to avoid multiple queries or subqueries and allows for easier readability and maintenance of reporting.WHERE
clause will change the count of the rows returned. With SUM
and CASE WHEN
, you can maintain the integrity of the row count, which is particularly useful in reports where you want to show both the total and the conditional sums.GROUP BY
or window functions, SUM
, and CASE WHEN
can produce aggregates over partitions of the data, which isn’t possible with WHERE
alone when multiple clauses need to be satisfied.Here are some real-life scenarios where these two functions should be nested:
SUM
and CASE
statements.Let us explore some real-world examples on this topic with written code.
SUM
with CASE WHEN
to solve the problem.Let’s say your sales table looks like this:
item_id | category | amount |
---|---|---|
1 | Electronics | 200.00 |
2 | Electronics | 450.00 |
3 | Clothing | 50.00 |
4 | Clothing | 75.00 |
5 | Books | 20.00 |
6 | Groceries | 30.00 |
This is the code to pivot the data by sales category:
SELECT
SUM(CASE WHEN category = 'Electronics' THEN amount ELSE 0 END) AS ElectronicsSales,
SUM(CASE WHEN category = 'Clothing' THEN amount ELSE 0 END) AS ClothingSales,
SUM(CASE WHEN category NOT IN ('Electronics', 'Clothing') THEN amount ELSE 0 END) AS OtherSales
FROM
sales;
Here is the expected output:
ElectronicsSales | ClothingSales | OtherSales |
---|---|---|
650.00 | 125.00 | 50.00 |
You can practice a similar interview problem here: Monthly Product Sales
Let us assume that you are supplied with the following transactions table.
transaction_id | department | amount | transaction_date |
---|---|---|---|
1 | IT | 1000.00 | 2023-01-15 |
2 | IT | 1500.00 | 2023-02-17 |
3 | HR | 750.00 | 2023-04-25 |
4 | Marketing | 1250.00 | 2023-05-11 |
5 | Marketing | 1350.00 | 2023-07-19 |
6 | Finance | 2000.00 | 2023-08-21 |
7 | HR | 800.00 | 2023-10-05 |
8 | IT | 1100.00 | 2023-11-12 |
We will group by the quarters of the transaction dates. The quarters are:
Here is the code to achieve this transformation.
SELECT
CASE
WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 1 AND 3 THEN 'Q1'
WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 4 AND 6 THEN 'Q2'
WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 7 AND 9 THEN 'Q3'
WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 10 AND 12 THEN 'Q4'
END AS Quarter,
SUM(CASE WHEN department = 'IT' THEN amount ELSE 0 END) AS ITSales,
SUM(CASE WHEN department = 'HR' THEN amount ELSE 0 END) AS HRSales,
SUM(CASE WHEN department = 'Marketing' THEN amount ELSE 0 END) AS MarketingSales,
SUM(CASE WHEN department NOT IN ('IT', 'HR', 'Marketing') THEN amount ELSE 0 END) AS OtherSales
FROM
transactions
GROUP BY
Quarter;
The above query uses the CASE WHEN
statement in two ways; one, simply to categorize the transaction dates by quarters, and the other way is by nesting the sum and case statement to aggregate the spend by department. The output from the above query would look like this:
Quarter | ITSales | HRSales | MarketingSales | OtherSales |
---|---|---|---|---|
Q1 | 2500.00 | 0.00 | 0.00 | 0.00 |
Q2 | 0.00 | 750.00 | 1250.00 | 0.00 |
Q3 | 0.00 | 0.00 | 1350.00 | 2000.00 |
Q4 | 1100.00 | 800.00 | 0.00 | 0.00 |
This table shows the total sales amount for each department, grouped by the fiscal quarter. The ‘OtherSales’ column includes sales from departments not listed specifically (like ‘Finance’ in the transactions table).
SUM
with CASE WHEN
in addition to an arithmetic operation. Let’s consider a scenario where we have an employee table, and we want to calculate the total salary expenditure for different job titles while also considering overtime. We want to sum up the regular salaries and the overtime payments separately and also include a total sum for each job title.The following employees table is provided to you.
employee_id | job_title | salary | overtime_hours | overtime_rate |
---|---|---|---|---|
1 | Software Developer | 7000.00 | 5 | 50.00 |
2 | Software Developer | 7200.00 | 3 | 50.00 |
3 | Graphic Designer | 5000.00 | 10 | 30.00 |
4 | Sales Associate | 4500.00 | 0 | 0.00 |
5 | Sales Associate | 4600.00 | 2 | 45.00 |
6 | Graphic Designer | 5100.00 | 8 | 30.00 |
7 | Human Resources | 4500.00 | 0 | 0.00 |
While calculating the total compensation per job role, we’ll include the CASE WHEN
statement to apply conditional logic to distinguish between job types and compute remuneration accordingly. Here is the code to get the desired result:
SELECT
job_title,
SUM(salary) AS TotalSalaries,
SUM(overtime_hours * overtime_rate) AS TotalOvertimePayments,
SUM(
salary +
(overtime_hours * overtime_rate) +
(CASE
WHEN job_title = 'Software Developer' THEN 500
WHEN job_title = 'Graphic Designer' THEN 300
ELSE 200
END)
) AS TotalCompensation
FROM
employees
GROUP BY
job_title;
The output from the above query will now include the bonus in the total compensation:
job_title | TotalSalaries | TotalOvertimePayments | TotalCompensation |
---|---|---|---|
Software Developer | 14200.00 | 400.00 | 15100.00 |
Graphic Designer | 10100.00 | 540.00 | 10940.00 |
Sales Associate | 9100.00 | 90.00 | 9390.00 |
Human Resources | 4500.00 | 0.00 | 4700.00 |
The CASE WHEN
statement in the query adds a conditional bonus to total compensation by job type based on their job title.
SUM
with CASE WHEN
statement when used with a logical operator. An electronics retail company tracks its sales and wants to run a special performance analysis.They aim to categorize sales into different bonus eligibility groups based on the amount of sales and the region where the sale occurred. The conditions for categorization are:
The company wants a report that sums these amounts by region to help with financial planning and bonus allocations.
Here is the sales table:
sale_id | employee_id | product_id | sale_amount | sale_date | region |
---|---|---|---|---|---|
1 | 101 | 30 | 1200.00 | 2023-06-15 | East |
2 | 102 | 22 | 800.00 | 2023-06-17 | West |
3 | 103 | 11 | 2300.00 | 2023-06-25 | East |
4 | 101 | 45 | 400.00 | 2023-07-03 | North |
5 | 102 | 30 | 1800.00 | 2023-07-19 | East |
6 | 104 | 25 | 1500.00 | 2023-07-21 | South |
7 | 103 | 22 | 900.00 | 2023-08-05 | West |
8 | 101 | 11 | 1600.00 | 2023-08-12 | East |
Here is the code to get the required output.
SELECT
region,
SUM(sale_amount) AS TotalSales,
SUM(
CASE
WHEN sale_amount > 2000 OR region = 'East' THEN sale_amount
ELSE 0
END
) AS PremiumSales,
SUM(
CASE
WHEN sale_amount BETWEEN 1000 AND 2000 AND region != 'East' THEN sale_amount
ELSE 0
END
) AS StandardSales,
SUM(
CASE
WHEN region = 'West' OR (sale_date BETWEEN '2023-06-01' AND '2023-07-31') THEN sale_amount
ELSE 0
END
) AS PromotionalSales
FROM
sales
GROUP BY
region;
The output would group sales according to the company’s specifications:
region | TotalSales | PremiumSales | StandardSales | PromotionalSales |
---|---|---|---|---|
East | 6000.00 | 6000.00 | 0.00 | 2000.00 |
West | 1700.00 | 0.00 | 0.00 | 1700.00 |
North | 400.00 | 0.00 | 0.00 | 400.00 |
South | 1500.00 | 1500.00 | 0.00 | 0.00 |
Likewise, you can combine the SUM and CASE statements with other logical operators such as BETWEEN and AND in a similar way to incorporate conditional logic in similar scenarios.
You’re given two tables, payments and users. The payments table holds all payments between users with the payment_state
column consisting of either "success"
or "failed"
.
Note: The sender_id
and recipient_id
both represent the user_id
.
payments
table
Column | Type |
---|---|
payment_id |
INTEGER |
sender_id |
INTEGER |
recipient_id |
INTEGER |
created_at |
DATETIME |
payment_state |
VARCHAR |
amount_cents |
INTEGER |
users
table
Column | Type |
---|---|
id |
INTEGER |
created_at |
DATETIME |
Output:
Column | Type |
---|---|
num_customers |
INTEGER |
The SUM
with CASE WHEN
construct is quite standard in SQL and is typically supported across most SQL databases with no variation in syntax.
However, in MySQL, you have the option to use the IF() function. It works similarly to programming languages’ IF-ELSE statements and can only be used in SELECT statements, WHERE clauses, and other places where you would use a MySQL expression.
Here’s how you might nest an IF()
function with SUM()
to perform conditional summation.
SELECT
SUM(IF(category IN ('A', 'B'), amount, 0)) AS TotalSales
FROM sales;
While the IF() function has an easier syntax to follow, it is important to remember CASE
is part of the SQL standard and is supported by virtually all SQL-compliant RDBMS, whereas IF()
is specific to MySQL. If you want to ensure that your skills are transferable, you should prefer CASE
. Additionally, CASE
can handle multiple conditions in a sequence making it more versatile. So, it is generally recommended to use CASE-WHEN
over IF-ELSE
even when querying in MySQL.
It is important to remember the following caveats to make your code error-free and efficient:
CASE
expression is significant, as the conditions are evaluated in the order they are written. Once a condition is met, the CASE
expression will return the corresponding result and not evaluate any subsequent conditions. This downside is important to bear in mind to ensure that your query produces the correct result.CASE
expression within an aggregate function like SUM
must be carefully written to handle NULL
values appropriately, as SUM disregards rows with NULLs.WHERE
clause to limit the rows whenever possible before implementing CASE
logic. This can help the query planner to use an index to filter rows first, thus reducing the number of rows over which the CASE
needs to be evaluated.We hope that this article has helped you understand the different ways to combine the SUM and CASE-WHEN
functions to perform powerful transformations in SQL.
For an end-to-end overview of SQL, including complex joins, advanced reporting, and creating multi-table databases, you can explore our specialized learning path.