Business intelligence case studies are generally scenario-based questions that ask you to work through a solution to a proposed business problem.
For example, in a business intelligence case interview, you might be asked: How would you de-duplicate product listings that don’t have the same title, SKU, or description?
Your job is to ask the interviewer for more information, make assumptions about the case, propose a solution, and finally, consider the trade-offs of your solution. For business intelligence engineering roles, business case studies tend to fall into two broad categories:
Analytics - Analytics questions test your understanding of metrics and how they relate to business goals. Your job is to ask the interviewer for more information, make assumptions about the case, propose a solution, and finally, consider the trade-offs of your solution.
Database Design - Database/ technology questions ask you to design or discuss a tech solution to a given business problem.
Generally, business intelligence case studies are the most difficult part of business intelligence interviews, but using frameworks and understanding how they are graded can help you to prepare for your next BI case study interview.
Business intelligence (BI) engineers are technology specialists who ensure that analysts and data scientists have access to the right data and technologies. A key responsibility is ensuring that the company’s data is organized and accessible. BI engineer case interviews mirror the type of work that candidates will perform on the job.
Specific tasks business intelligence engineers do include:
Creating reports, developing dashboards, and implementing analytics applications such as DataMiner or Tableau Desktop
Designing, developing, and maintaining data warehouses to store large volumes of structured, semi-structured, and unstructured data
Selecting hardware, software, and database management systems for data warehousing projects in line with organizational goals
Training and onboarding users to use business intelligence software
Case studies are a common business intelligence interview question that present the interviewee with a specific business problem. The interviewee must then talk the interviewer through a potential solution for that problem.
Most business intelligence case studies cover designing dashboards or creating databases to function for business needs. Therefore, most problems are general business case studies or technical SQL case studies, and the interviewee must solve a problem relating to how data is being presented or stored.
A typical framework for solving business intelligence case questions includes four steps:
1. Clarify - Your first step should be to gather more information from the interviewer. Case studies tend to be vague and lack information. You’re responsible for digging in and finding out exactly what the question is asking.
2. Make Assumptions - Start forming a hypothesis and talk through your reasoning. Your goal should be to land on one hypothesis/solution for the problem, which you will analyze further.
3. Propose a Solution - State your solution, and talk the interviewer through your processes for building the solution.
4. Conduct Further Analysis - For analytics case studies, you’ll want to narrow your investigation to one key metric and support your hypothesis with data. For database design case studies, you’ll walk the interviewer through the schema for a database.
There is not a set grading rubric for business intelligence case studies, as it’s often at the discretion of the interviewer. However, there are some areas you should focus on that will make your response stronger:
Curiosity- Clarifying questions helps you narrow your response. An Amazon business intelligence engineer told us: “If you don’t ask questions, the interviewer could fail you, because they wanted to give you some information to steer the discussion down a particular path.”
Ability to take direction - Our source said: “The interviewer decides where the candidate needs to end up in their solution.” Therefore, it’s important to take hints and coaching during the interview.
Thoroughness - Case questions assess the depth of your problem-solving approach. You can show this by asking clarifying questions, providing multiple data points for analysis, and making assumptions (and checking that those assumptions are correct).
Ability to adapt - Inevitably, something unexpected will come up in a case study question, like your preferred method isn’t feasible, and you will have to adapt. Take the cues the interviewer provides, and always be willing to change courses if needed.
Communication - BI cases assess your ability to summarize your solution and clearly explain your thought processes and assumptions. One tip: Ask the interviewer if they have any questions throughout your response. This can help you clarify your answer at the moment.
There’s no right or wrong answer to case study questions. Rather, candidates are graded on the quality of their responses. Using a framework will help you structure your response more clearly.
Let’s take a look at an in-depth mock interview solution to a business intelligence case question asked at Amazon:
More context. Products are listed under different seller names. So for the same product, we might see many variations, e.g., iPhone X and Apple iPhone 10. However, let’s say this example shows up for a lot of different products in various categories.
See a full mock interview solution to this question on YouTube.
Here’s an edited solution from the mock interview:
Interviewee: “If it’s an established e-commerce company, I would assume that they would have some kind of an ID for every product in their inventory. So something like an SKU or an ID. And if it’s Amazon, then that’s pretty unique, and you know that even if the description is different under different sellers, I would assume that they would have the same SKU.”
“So if you just look at the list of all the SKUs and different sellers and then do a distinct GROUP BY on SKU across all sellers, you’ll find out which SKUs are replicated. And then, once you have that, you can go to the business team saying what you want to do with them.”
Follow-Up Question 2: Let’s say you don’t have an SKU. People create their product titles, along with an image and descriptions.
How would we then do the mapping to the SKU, or would you think of a different approach towards solving the problem?
Identifying Similar Images
“If we have images for these products that we think may be duplicated, we could try to use an algorithm to identify similar images. Then once you have that list of similar images, you look at the descriptions and build a string similarity algorithm that outputs which descriptions sound similar or are close to each other. Now you would have at least two data points that you know these two products are similar. Then it’s probably going to be a little bit of manual intervention to identify if they really are similar or not.”
Similar Product Reviews
“The other thing that I can think of is maybe reviews on different products. So imagine that there are two different products just named differently, but both of them are the Apple iPhone 10. You would assume that the reviews are pretty much talking about a phone and that it’s manufactured by Apple. They probably have the same kinds of experiences and reviews, so you could see if the reviews are very similar to each other, and that would give a good indication that the product is probably the same.”
Follow-Up Question 3: We’re looking at similarities across images, descriptions, and reviews, and we’re getting this score for each one of them. Now how do we go about deciding if we can de-duplicate them or not?
Would we have a human review every single one? Do we do some sort of scaling process? Because let’s say we have to do this for thousands and thousands of products, right? What’s the next step?
“Well, from the beginning, we don’t really know which products are the same or not, so we can’t do a supervised learning method. It needs to be an unsupervised technique that first tries to identify what products are similar to each other. I probably would do a clustering technique based on just descriptions and reviews.”
“We’ll definitely need to do some cleaning and tokenization for the text data to bring it to a structured format. Then we can run a TF IDF on different descriptions and reviews to find out which documents are similar. We’ll get some scores depending on how many documents end up in a particular cluster, and we will definitely have to do a manual step to see if they’re actually the same or not.”
“I’m unaware of a clustering technique that works on images, but we would probably have to build out features from the image, bring it to a structured format, and then do clustering on top. So we might identify ten different clusters if there are ten items that are duplicated and then look at the clusters’ descriptive statistics to see if the customer in reviews is really talking about a phone, a tablet, or a computer. And then try to go about in a manual investigation from that point.”
Follow-Up Question 4: Let’s say we do that. We’re going through these clusters, and we find that the algorithm clustered just phones together instead of doing a specific enough cluster for the same product. Or maybe we’re getting thousands of different clusters that may or may not be duplicated.
Is there any way that we can optimize our manual intervention or scale this problem out so that we use the least amount of manual oversight while also figuring out a way to deduplicate efficiently?
“I guess it would depend on the features that we actually extract, as the more granular the features in our dataset, the better the clusters could be. If we are creating clusters just on the type of device, then you’re right. I think all phones and all computers will just end up together.”
“But if we are given that these are also duplicate listings, we would definitely want to look at more information in the listing itself; like the price of the product, the different types of colors that are available, and then the features in iPhones and Androids that are similar to each other. The features need to be as close to the product itself so that our clusters are more identifiable amongst each other and not as generic as phones and computers.”
“Finally, we could look at customers to see purchasing behavior. iPhones typically tend to sell out as soon as they are launched, so we can try to use the information around when a particular product was launched and then look at the purchase pattern during that time and then try to integrate these features into the dataset.”
The example response provided some solid jumping-off points to solve the problem. However, there were missing factors that could have made the response stronger:
Consider the Scope
The response focused primarily on the example provided in the problem statement, e.g., iPhone X vs. Apple iPhone 10. However, in business case studies, it’s important to consider the broader context and incorporate that into your answer. In this case, considering a wider variety of products would have made the response stronger.
Having Multiple Data Points
In the example, there was just one type of product proposed. Having more data points to explain these concepts would have made the response more thorough and would have provided more examples to illustrate the proposed solution.
Considering Limitations
This particular response would have benefited from considering trade-offs to the proposed solution. In particular, the response didn’t address limitations like threshold error rates and automation. How accurate can we get with an automated word-matching solution, and would we be satisfied with the threshold?
Ultimately, the response could have benefited from a dialogue about implementation and business impact, as well as the technical details.
Practice for the business intelligence interview with these sample database design and analytics case study questions:
The hypothesis you want to test is: Does a free trial result in cheaper engagement and acquisition costs, compared to using a CS manager?
Since this is a business intelligence role, you’ll want to frame the question in terms of metrics. Some of the metrics to consider include:
Each one of these metrics can be segmented additionally into new vs existing users. And if we apply weighting to each of these metrics we can ultimately come up with an equation that can maximize our goals.
An easy BI case study question like this assesses your data sense and the depth of your analytics knowledge. You might start with a clarifying question: Are we interested in revenue statistics (e.g. WhatsApp for Business) or more general user metrics?
For more general user engagement metrics, you could propose something like:
As you propose metrics, be sure to tie them back to the business. Answer this question: Why does this metric matter to WhatsApp?
First, define some of the metrics. Demand would be the number of ride requests, while supply would be the number of available drivers. How would you further analyze ride requests to measure demand?
See a full mock interview solution for this question on YouTube:
Follow-up question: What data would you look into and how would you find out who is actually married and who is not?
With this question, you’d want to start with some clarifying questions like:
One first step would be to look at what went wrong. You could look at UPDATE and INSERT queries to identify what might have caused the problem initially.
Next, you might look for an easy solution. Are there dimensions or columns related to marriage? If there was a column spouse.name, for example, this would provide insights into whether a client is married or not. You could also look to see if reverting the data would show the correct marriage status before the bug existed.
A more complex approach might be to GROUP clients by the last name and then see if entries with the same last name share an address, insurance plan ID, etc.
Broadly speaking, sending a mass email blast to a list of customers is generally not a good idea, especially when the objective of the email is to increase sales.
A better solution is to segment the audience and personalize the messaging by the audience. For example, if a customer was about to reach their licensing limit, we could send a personalized offer to add more licenses, while a win-back campaign could be used for recently churned users.
More context: We know that the product costs $100, an average 10% monthly churn, and the average customer sticks around for 3.5 months.
Average lifetime value is defined by the prediction of the net revenue attributed to the entire future relationship with all customers averaged. Given that we don’t know the future net revenue, we can estimate it by taking the total amount of revenue generated divided by the total number of customers acquired over the same period of time.
However, there’s a catch: it’s only a 14-month-old company. As a result, the average customer length is biased, because the company hasn’t existed long enough to correctly measure a sample average that is indicative of the mean.
How would you calculate this? Try to find the expected value of the customer at each month as a multiplier of retention times the product cost.
More context: If the customer purchased two things on the same day, that does not count as an upsell, as they were purchased within a similar timeframe.
This question tests a candidate’s ability to analyze and interpret transaction data to uncover valuable business insights. It requires proficiency in SQL to group and filter data accurately, ensuring that the upsell condition is correctly identified.
To find the number of customers who were upsold, we analyze transaction data to identify users with multiple purchase dates. Initially, we group transactions by user ID and date to find unique purchase events. If a user has purchases on multiple dates, they are considered upsold. The optimized query uses a HAVING clause to filter users with more than one distinct purchase date, providing an efficient way to determine upsold customers.
This question tests your ability to write queries that extract and manipulate data, particularly in hierarchical structures like employees and managers. The task demonstrates your attention to detail in comparing dates and ensuring accurate results.
To solve the problem of finding employees who joined before their manager, we need to compare the join dates from two related tables: employees and managers. By linking these tables using a LEFT JOIN on the manager_id, we can access both the employee’s and manager’s join dates. We then filter the results to include only those employees whose join date is earlier than their manager’s.
Prepare with these business intelligence interview questions, which are 29 commonly asked BI questions in areas like SQL, generic scenario-based cases, Python, and database design. Also, see our guide to Amazon business intelligence interviews and Google business intelligence interviews for more BI interview prep help.