The SQL COUNT (CASE WHEN...)
statement is a powerful tool in SQL programming that allows you to perform counts based on specified conditions. By leveraging conditional counting, you can efficiently retrieve, analyze, and aggregate data with a single query. In this article, we will explore syntax, usage, real-world examples, and advanced techniques of SQL’s COUNT (CASE WHEN...)
statement and help you start using it in your queries.
Before we get started with leveraging the power of COUNT (CASE WHEN...)
, let’s first review COUNT
and CASE WHEN
separately.
In SQL, COUNT(COLUMN)
is an aggregation function used to calculate all non-null) instances within a specified column. A more inclusive form called COUNT(*)
can be used to count all the rows in a table, including null values.
In SQL, the CASE WHEN
statement is a conditional expression that allows you to perform logical evaluations based on specified conditions and return relevant values. This is useful in conducting conditional transformations or aggregations of data.
COUNT (CASE WHEN...)
is one such “conditional aggregation” we can perform with those simpler functions. Let’s explore how we are able to use it in SQL queries.
COUNT (CASE WHEN...)
is known as a conditional count. This allows us to selectively count elements based on their values, unrestricted by their null values. By using the CASE WHEN
statement within the COUNT
function, we can define specific conditions and adjust the count only when those conditions are met. This criteria allows us to narrowly target the data in our analysis.
To understand the usage of SQL’s COUNT (CASE WHEN...)
in practical scenarios, let’s take a look at the statement in action. Consider the following code snippet:
SELECT
COUNT (
CASE WHEN order_type = 'purchase' then 1
ELSE 0
END
) FROM ORDERS;
In this query, we utilize the COUNT (CASE WHEN...)
statement to count the number of purchases within the “orders” table. From how we coded the query, we only count a row if it has the order_type of ‘purchase’. This can be expressed by tagging the result as 1. On the other hand, if it is not of type purchase, we can tag it as 0.
Let’s consider the following queries. Which of the following is logically equivalent to the query above? (Focus specifically on: COUNT(CASE WHEN order_type = 'purchase' then 1 END)
).
COUNT(CASE WHEN order_type = 'purchase' then 0 END)
COUNT(CASE WHEN order_type = 'purchase' then 2 END)
COUNT(CASE WHEN order_type = 'purchase' then NULL END)
COUNT(CASE WHEN order_type = 'purchase' then 1 ELSE 0 END)
In SQL, the COUNT() function only considers non-null values. The case statement inside the COUNT() function generates a value for each row in the dataset. The COUNT() function then counts the number of rows that do not have a NULL value generated by the case statement.
Given this understanding, the logically equivalent options to COUNT(CASE WHEN order_type = 'purchase' then 1 END)
are:
COUNT(CASE WHEN order_type = 'purchase' then 0 END)
COUNT(CASE WHEN order_type = 'purchase' then 2 END)
These queries are equivalent because they also produce non-null values for each row where order_type = 'purchase'
, and hence will be counted by the COUNT() function.
The options that are not logically equivalent are:
C: COUNT(CASE WHEN order_type = 'purchase' then NULL END)
This query is not equivalent because it produces NULL when order_type = 'purchase'
, and hence it will not be counted by the COUNT() function.
D: COUNT(CASE WHEN order_type = 'purchase' then 1 ELSE 0 END)
This query is not equivalent because it produces a non-null value for every row in the dataset, not just those where order_type = 'purchase'
, and hence all rows will be counted by the COUNT() function, changing the result from the original query.
Real-world examples are invaluable in helping us understand and apply concepts in practical ways. You may take different approaches with the COUNT (CASE WHEN...)
when designing for unique business solutions and business logic, so familiarizing yourself now with patterns of use can make your work go by much quicker in the future.
In your role on the logistics team at Happy Farms Inc., you are responsible for ensuring smooth operations and meeting customer demands. One of our esteemed buyers has expressed interest in purchasing a significant quantity of sheep products. You must ensure that we have sufficient stock to fulfill their order.
Let’s explore different ways to reach the solution:
Simple Classification-Based Counting
One of the most basic and commonly used problem scenarios when using COUNT (CASE WHEN...)
is to count the number of instances of a classification.
To determine the availability of sheep products, you can swiftly retrieve the information from our comprehensive database through a simple query:
SELECT
COUNT(CASE WHEN product = 'sheep' THEN 1)
FROM
products;
The use of COUNT (CASE WHEN...)
as a part of our query logic might not be needed to perform such a simple task, however. It might instead be more efficient, and easier to code, to use the WHERE
clause instead. Take a look at the following query:
SELECT
COUNT(*) AS sheep_product_stock
FROM
products
WHERE
product = 'sheep';
As things become more complex, there are instances where the use of COUNT (CASE WHEN...)
is not only beneficial but also necessary. Let’s see an iterated version of our sheep storage question for these use cases.
When you need to count based on a single classification, the WHERE
approach can be effective. However, real-world business challenges often necessitate more intricate queries, where counting based on multiple classifications becomes necessary. In such cases, relying solely on the WHERE
clause can become cumbersome and sub-optimal. Instead, using COUNT (CASE WHEN...)
can provide the more efficient solution.
Happy Farms has received a new request. One of our esteemed buyers wishes to purchase various products, not just sheep. It is crucial to verify our stock availability to fulfill their order promptly. We will use COUNT (CASE WHEN...)
to find the status of all categories of livestock.
SELECT
COUNT(CASE WHEN product = 'sheep' AND status = 'pending' THEN 1 END)
AS sheep_pending,
COUNT(CASE WHEN product = 'sheep' AND status = 'shipped' THEN 1 END)
AS sheep_shipped,
COUNT(CASE WHEN product = 'sheep' AND status = 'cancelled' THEN 1 END)
AS sheep_cancelled,
COUNT(CASE WHEN product = 'cow' AND status = 'pending' THEN 1 END)
AS cow_pending,
COUNT(CASE WHEN product = 'cow' AND status = 'shipped' THEN 1 END)
AS cow_shipped,
COUNT(CASE WHEN product = 'cow' AND status = 'cancelled' THEN 1 END)
AS cow_cancelled,
COUNT(CASE WHEN product = 'goat' AND status = 'pending' THEN 1 END)
AS goat_pending,
COUNT(CASE WHEN product = 'goat' AND status = 'shipped' THEN 1 END)
AS goat_shipped,
COUNT(CASE WHEN product = 'goat' AND status = 'cancelled' THEN 1 END)
AS goat_cancelled
FROM orders;
This query looks ugly, and there are certainly ways to make it more efficient by utilizing the GROUP BY clause. Let’s take a look at that now.
By applying the GROUP BY clause, we can streamline the query and enhance its performance. The grouping operation allows us to aggregate the counts based on common attributes, resulting in a cleaner and more concise query structure.
Let’s take a look at an improved version of the query using the GROUP BY clause:
SELECT
product,
COUNT(CASE WHEN status = 'pending' THEN 1 END) AS count_pending,
COUNT(CASE WHEN status = 'shipped' THEN 1 END) AS count_shipped,
COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS count_cancelled
FROM orders
GROUP BY product;
That’s a significant improvement! Because COUNT is an aggregate function we can use GROUP BY to generate more detailed and informative result sets.
The GROUP BY clause allows us to organize the data based on specific attributes, such as ‘product’ in this case, creating distinct groups within the dataset. Then, with COUNT, we can perform calculations within each group, such as counting the occurrences of different status categories like ‘pending,’ ‘shipped,’ and ‘cancelled.’
This combination of functions enables us to obtain a more granular view of our data, revealing the distribution and quantities of orders across different product categories.
The COUNT(CASE WHEN...)
statement may appear simple at first glance, but it offers immense power when used strategically. By employing advanced techniques, you can harness its full potential, especially when tackling interview questions. In this section, we will delve into effective strategies for utilizing COUNT(CASE WHEN...)
efficiently and learn how to provide optimal answers to related queries.
Consider a different approach to counting values using the CASE WHEN statement. Instead of counting non-null values, we can count instances of NULL
. This technique can be useful in certain scenarios. Let’s explore it through the following interview question:
Question
The support team at your company is trying to analyze ticketing data to improve response times. They have noticed that a significant number of tickets have no assigned agent, making it difficult to track the progress of these tickets.
Write a SQL query to count the total number of tickets, and the number of tickets that are either assigned or unassigned to agents.
Note: Tickets that are not assigned to any agent have NULL
in their agent_id
field.
Example:
Input:
tickets
table
tickets
table
Column | Type |
---|---|
id | INTEGER |
issue | VARCHAR |
agent_id | INTEGER |
created_at | TIMESTAMP |
updated_at | TIMESTAMP |
Output:
Column | Type |
---|---|
total_tickets | INTEGER |
tickets_with_agent | INTEGER |
tickets_without_agent | INTEGER |
Solution
SELECT
COUNT(*) AS total_tickets,
COUNT(agent_id) AS tickets_with_agent,
COUNT(CASE WHEN agent_id IS NULL 1) AS tickets_without_agent
FROM
tickets;
Answering this question is relatively easy, and there are several ways to approach it. Nonetheless, one of the simplest techniques to tackle this question is by utilizing the COUNT(CASE WHEN...)
method. By employing count CASE WHEN, we can reverse the usual behavior of the count function and instead count the number of NULL values using the conditional CASE WHEN agent_id IS NULL 1
.
In some scenarios, you may need to nest CASE WHEN
statements within COUNT
to perform more complex data analysis. This technique is particularly useful when you need to count based on multiple conditions.
Let’s explore this through an example:
Question
You are a data analyst at a videogame production company and you’ve been asked to analyze the player’s behaviors. Specifically, they want to know the count of players who have played a number of games that is less than 10 but more than 5, as well as the count of players who have played 10 or more games.
The players
table is as follows:
Column | Type |
---|---|
id | INTEGER |
name | VARCHAR |
games_played | INTEGER |
registered_at | TIMESTAMP |
Write an SQL query to obtain the needed information.
Output:
Solution
SELECT
COUNT(CASE WHEN games_played > 5 AND games_played < 10 THEN 1 END) AS players_more_than_5_to_10_games,
COUNT(CASE WHEN games_played >= 10 THEN 1 END) AS players_10_plus_games
FROM
players;
In this query, we use nested CASE WHEN
conditions inside the COUNT
function. When the conditions are met, it returns 1 in the new column, and the COUNT
function then reveals how many players there are total in that category.
Having learned the basics of COUNT (CASE WHEN...)
it’s now time to test yourself with an interview question. The following question has many approaches, but try to solve it with COUNT (CASE WHEN...)
Given an employees
and departments
table, select the top 3 departments with at least 10 employees and rank them according to the percentage of their employees making over $100,000 in salary.
Example:
Input:
employees
table
Columns | Type |
---|---|
id | INTEGER |
first_name | VARCHAR |
last_name | VARCHAR |
salary | INTEGER |
department_id | INTEGER |
departments
table
Columns | Type |
---|---|
id | INTEGER |
name | VARCHAR |
Output:
Column | Type |
---|---|
percentage_over_100k | FLOAT |
department_name | VARCHAR |
number_of_employees | INTEGER |
Let’s say we have an annual_payments
table. Answer the following three questions via SQL queries and output them as a table with the answers to each question.
"paid"
have an amount greater or equal to 100?"paid"
status)Example:
Input:
annual_payments
table
Column | Type |
---|---|
id |
INTEGER |
amount |
FLOAT |
created_at |
DATETIME |
status |
VARCHAR |
user_id |
INTEGER |
amount_refunded |
FLOAT |
product |
VARCHAR |
last_updated |
DATETIME |
Output:
Column | Type |
---|---|
question_id |
INTEGER |
answer |
TEXT |