Business analytics is one of the fastest-growing job sectors in the United States. According to the Bureau of Labor Statistics, hiring for business analysts is expected to grow by 14% over the next decade.
Typically, a Business Analyst SQL interview questions are designed to quickly assess your ability to pull metrics and process data with SQL. In general, you might have to whiteboard SQL queries or produce SQL code in a code editor.
SQL, which stands for a structured query language, is a programming language used to manipulate data in databases. SQL is useful to professionals when used in databases with sizable amounts of data. Business analysts often work with databases, and they can use SQL to retrieve data for reports and analysis.
As such, SQL skills can be valuable if you’re searching for a business analyst job. Once you get an interview, you may need to know how to answer SQL interview questions.
The core responsibility of a business analyst is to gather, clean, and analyze business data. This can include data such as revenue, sales, or customer engagement metrics. Therefore, business analysts should be able to interpret data, share findings, and make recommendations.
This requires in-depth knowledge of how to use SQL and how to perform statistical analysis. Business analysts should be able to generate reports, answer business problems, and answer questions like what would you do if you noticed a decline in revenue?
The answer to this question is subjective and depends on the need. Although a strong knowledge of SQL is helpful to have and would give you an advantage over the competition, it is not always mandatory.
According to this business analyst job listing analysis on Kaggle based on Glassdoor data, only 27% of business analyst job listings include SQL as a requirement. However, the need for SQL is dependent on company size, career experience, and a company’s ability to provide on-job SQL training.
In a nutshell, basic knowledge of SQL is required and advanced knowledge is usually only required at some companies and isn’t deemed an absolute necessity.
Business analyst SQL interview questions usually fall into three categories:
Business analyst interviews and technical screens generally start with beginner SQL questions. There are two main types of easy SQL questions:
Here are some easy business analyst SQL interview questions:
Although they are both used to delete data, a key difference is that DELETE
is a database manipulation language command, whereas TRUNCATE
is a data definition language command.
Therefore, DELETE
removes specific data from a table but TRUNCATE
removes all the rows of a table without maintaining the table’s structure. Another difference between the two is that DELETE
can be used with the WHERE
clause but TRUNCATE
cannot. In this case, the DELETE
TABLE would remove all the data from within the table while maintaining the structure. In contrast, TRUNCATE
TABLE would delete the table in its entirety.
For this question, assume the table is called Employees and the last name column is LastName
. The query should look like this:
SELECT * FROM Employees WHERE LastName BETWEEN 'Bailey' AND 'Frederick'
EXTRACT
allows you to pull temporal data types like date, time, timestamp, and interval from the date and time values. If you wanted to find the year from 2022-03-22, you would write EXTRACT
( FROM), as shown below:
SELECT EXTRACT(YEAR FROM DATE '2022-03-22') AS year;
To answer this question, you need the name of the department to be associated with each employee in the employees table in order to understand which department each employee is a part of.
The department_id
field in the employees table is associated with the “id” field in the departments table. You can call the department_id
a foreign key because it is a column that references the primary key of another table, which in this case is the “id” field in the departments table.
Based on this shared field, you can join both tables using INNER JOIN
to associate the department name to their employees as exemplified below:
SELECT
salaryFROM
employeesINNER JOIN
departmentsON
employees.department_id = departments.idWith the department name in place, you can now look at the employees of the engineering team and sort by salary to find the second highest paid.
A JOIN
is a clause in SQL that’s used to join rows from two or more tables based on a common column between the tables. This clause is used for merging tables and retrieving data. The most common types of JOIN commands include:
INNER JOIN
is the most common type of JOIN
command and is used to return all rows from two or more tables when the JOIN
condition is met.JOIN
command returns rows from the left table when a matching row from the right meets the JOIN
condition.JOIN
command is similar to a LEFT JOIN
, but rows are returned instead from the right table when the JOIN
condition on the left is met.FULL JOIN
returns all rows from the left and the right when there is a match in any of the tables.PRIMARY KEY is a constraint that uniquely identifies each record. Notably, the constraint cannot have NULL values, and all values must be UNIQUE. A table can have only one PRIMARY KEY, but the PRIMARY KEY can consist of single or multiple columns.
Constraints in SQL are rules that can be applied to the type of data in a table. They are used to limit the type of data that can be stored in a particular column within a table. Some common constraints are:
users
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
neighborhood_id |
INTEGER |
created_at |
DATETIME |
neighborhoods
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
city_id |
INTEGER |
Output:
Columns | Type |
---|---|
name |
VARCHAR |
Hint: The predicament here is finding all the neighborhoods without users. In a sense, you need all the neighborhoods that do not have a singular user living in them. This means you have to introduce a concept of existence of a column in one table but not in the other.
To answer this question, start by joining the two tables using an INNER JOIN since we only need accounts that had downloads during the day. This type of join will exclude accounts with no download records, ensuring accurate calculations. Then, calculate the average downloads using the AVG() function, grouping results by the columns you want to differentiate: download_date and paying_customer. Finally, use the ROUND() function to format the average to two decimal places.
For this problem, note that we are going to assume that the question states average order value for all users that have ordered at least once. Therefore, we can apply an INNER JOIN between users and transactions.
For mid-level business analyst roles, expect intermediate SQL questions to challenge your knowledge and skill. Medium SQL questions fall into two categories:
Here are some intermediate business analyst SQL interview questions:
For more context, you are given two tables. One is named projects and the other maps employees to the projects they are working on. Exclude projects with zero employees. Assume each employee works on only one project.
To start, think about how to calculate the combined budget for each project and then determine what the budget per employee attached to the project equals. After that, you can think about how to rank these project values from most to least expensive to determine which are the most expensive.
Here’s a hint for this question to get you started: if you are sampling from this table and you want to specifically sample every fourth value, you will probably have to use a window function.
A general rule of thumb to follow is when a question states or asks for some Nth value (like the third purchase of each customer or the tenth notification sent), then a window function is the best option. Window functions allow you to use the RANK() or ROW_NUMBER() function to provide a numerical index based on a certain order.
For this problem, you are given a table of product purchases. Each row in the table represents an individual product purchase.
Note: If the customer purchased two things on the same day, that does not count as an upsell because they were purchased within a similar time frame. You are looking for a customer returning on a different date to purchase a product.
This question is slightly tricky because you have to note the dates that each user purchased products. You can’t just group by the user_id
to find where the number of products purchased is greater than one because of the upsell condition.
You have to group by both the date field and the user_id
to obtain each transaction broken out by day and user:
SELECT
user_id
, DATE(created_at) AS date
FROM transactions
GROUP BY 1,2
The query above will now provide a user_id
and date field for each row. If there is a duplicate user_id
, then you know that the user purchased on multiple days, which satisfies the upsell condition. What comes next?
This question provides a table that represents search results on Facebook, including a query, a position, and a human rating.
In this problem, you are given two tables: an employees table
and a departments table
.
Example:
Input:
employees
table
Columns | Type |
---|---|
id |
INTEGER |
first_name |
VARCHAR |
last_name |
VARCHAR |
salary |
INTEGER |
department_id |
INTEGER |
departments
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
Output:
Column | Type |
---|---|
percentage_over_100k |
FLOAT |
department_name |
VARCHAR |
number_of_employees |
INTEGER |
First, break down the question to understand what it’s asking. Specifically, break the question down into three clauses of conditions:
Then, think about how you would associate employees with their department, calculate and display the percentage of employees making over $100,000 a year, and order those results to provide an answer to the original question.
Advanced business analyst SQL interview questions are common for mid and senior-level roles, and they require you to write advanced SQL queries or work through complex logic-based case studies. The two main types of hard SQL questions are:
To solve this cumulative distribution practice problem, you are given two tables, a users table, and a comments table.
frequency |
cumulative |
---|---|
0 | 10 |
1 | 25 |
2 | 27 |
Input:
users
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
created_at |
DATETIME |
Output:
DATE | INTEGER |
---|---|
2020-01-01 | 5 |
2020-01-02 | 12 |
… | … |
2020-02-01 | 8 |
2020-02-02 | 17 |
2020-02-03 | 23 |
Hint: This question first seems like it could be solved by just running a COUNT(*) and grouping by date or that maybe it’s just a regular cumulative distribution function. But it is important to notice that you are actually grouping by a specific interval of month and date, and when the next month comes around, you want to reset the count of the number of users.
Input:
subscriptions
table
Column | Type |
---|---|
user_id |
INTEGER |
start_date |
DATETIME |
end_date |
DATETIME |
user_id |
start_date |
end_date |
---|---|---|
1 | 2019-01-01 | 2019-01-31 |
2 | 2019-01-15 | 2019-01-17 |
3 | 2019-01-29 | 2019-02-04 |
4 | 2019-02-05 | 2019-02-10 |
Output:
user_id |
overlap |
---|---|
1 | 1 |
2 | 1 |
3 | 1 |
4 | 0 |
Hint: Take a look at each of the conditions first and see how they could be triggered. Given two date ranges, what determines if the subscriptions would overlap?
You are trying to find paired products that are often purchased together by the same user, such as wine and bottle openers, chips and beer, etc.
Write a query to find the top five paired products and their names.
Note: For the purposes of satisfying the test case, P1 should be the item that comes first in the alphabet.
Input:
transactions
table
Column | Type |
---|---|
id |
INTEGER |
user_id |
INTEGER |
created_at |
DATETIME |
product_id |
INTEGER |
quantity |
INTEGER |
products
table
Column | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
price |
FLOAT |
Output:
Column | Type |
---|---|
p1 |
VARCHAR |
p2 |
VARCHAR |
qty |
INTEGER |
Hint: To solve this question, you need to break it into several steps. First, you should find a way to select all the instances in which a user purchased two or more products at the same time. How can you use user_id
and created_at
to accomplish this?
The schema below is for a retail online shopping company consisting of two tables, attribution and user_sessions
. Here are some details of the two tables:
The attribution table logs a session visit for each row. If the conversion is true, then the user converted to a purchase on that session.
The channel column represents which advertising platform the user was attributed to for that specific session.
Lastly, the user_sessions
table maps session visits back to one user from a single visit all the way up to several on the same day.
First-touch attribution is defined as the channel to which the converted user was associated when they first discovered the website. It is helpful to sketch out the attribution model for converting users:
How do you figure out the beginning path of the Facebook ad and connect it to the end purchasing user?
You need to take two actions:
You can do that by creating a subquery that only provides the distinct users that have actually converted.
In Google Data Analyst interviews, SQL questions make up a sizable part of the interview. Your preparation should include a solid study of the basics and SQL definitions, but you should also be comfortable with more complex queries and sub-queries.
One tip: ask the recruiter the types of questions that will come up in the interview. This will give you an idea of where to focus your study.
SQL is a common topic in business analyst interviews regardless of the job level. It is best to be familiar with the topic from basic syntax to complex queries and sub-queries. Excelling in this portion of the interview could make the difference between getting the job or being asked to look elsewhere.
This course will help you brush up on your SQL skills and learn basic to advanced techniques.