At Amazon, one of the most well-known, highly sought-after FAANG companies, you have the chance to use your skills to help improve one of the most successful businesses in the world. Acing the Amazon interview is no easy feat.
In this article, we’ll be focusing on getting you through Amazon SQL interview questions.
Amazon’s interviews are not unlike other tech companies. The overall process is as follows:
The initial screen is a phone interview where your resume and the position will be discussed. This is a lower stakes interview, more of a ‘getting to know you’ than an actual test of your abilities.
During the technical interview, you should expect at least a few questions centered around the specifics of the job. For example, a data science technical screen consists of coding, algorithms, and machine learning. For most data science positions, at least one of these questions will be SQL-based.
Finally, the onsite interview will consist of several steps with interviewers. Here, you will be tested on more thorough and difficult concepts and coding questions. In addition, the interviewers will be looking for your understanding of Amazon’s 14 leadership principles.
A business analyst is primarily responsible for helping a business achieve its goals by analyzing data to generate business-related insights. A business analyst will run analyses, maintain reports, and build dashboards. The following infographic compares the skills which are tested in the Amazon interview with skills tested in a typical business analyst interview.
As you can see, SQL is tested almost twice as heavily in the Amazon interview versus the average interview.
According to an Amazon business analyst from Glassdoor, the Amazon interview consists of a technical phone screen, a behavioral phone screen, and an exhaustive third round of several interviews throughout the day. Expect to encounter SQL-related questions during the first and third rounds.
In particular, during the phone screen, one should expect basic SQL questions on DATE, GROUP BY, and JOIN. These typically start simple in nature and are followed by basic queries. The final round of interviews includes advanced SQL questions using CASE, JOIN, sub-queries, and complex queries.
You should be able to verbally explain the different kinds of joins as well as the difference between the WHERE statement and HAVING statement. Further, you should be familiar with deriving insights from given tables. This includes exploring the data by finding various metrics, such as totals and metrics over a given time. As far as particular SQL questions go, Amazon really likes to ask their business analysts about joins.
At Amazon, BI engineers work in teams to form business decisions based on collected data with the intention of improving the customer experience. BI engineers work to improve Prime, Alexa, Twitch, and more. Here is a breakdown of skills tested in an Amazon BI engineer interview.
Compared to all other domains, SQL is by far the most thoroughly emphasized skill in an Amazon BI engineer interview.
The interview process involves two phone screenings and several onsite interviews. SQL queries have been known to come up during the first phone screen and the first two onsite interviews. People have reported seeing SQL questions centered around JOIN, LEAD, LAG, pivoting, WHERE, HAVING, GROUP BY, and aggregate functions.
Typically, the phone-based SQL questions describe aggregate functions and find the approximate number of rows in an output table after a JOIN. You shouldn’t expect to write more advanced queries until the onsite interview.
An Amazon data analyst performs important functions such as data interpretation, the building and maintaining of dashboards, system design for data collection and compiling, and more.
The specifics of the role can vary depending on what team you are working with. Data analysts who work with Twitch data may perform vastly different functions than those working with Alexa data. The graph below shows how Amazon tests data analyst skills like SQL compared to other companies.
The interview process for a data analyst position involves three levels: behavioral, analytical, and technical interviews. The behavioral and technical interviews are conducted over the phone.
For the technical interview, you must also share your screen with the interviewer as you answer some questions. To start, you have your typical conceptual SQL questions, such as differentiating between WHERE and HAVING clauses. Later in the process, you can expect more query writing questions.
Much like the other positions, data scientists at Amazon perform different roles depending on which branch of the company their work is focused on. A data scientist works with data to produce models and solutions in machine learning and natural language applications in order to make predictions and provide forecasting insights. Here is the comparison graph of Amazon data scientists’ tested skills versus average data scientists’ tested skills.
Since data science as a profession is conceptually deep and diverse, more topics are tested overall. This leaves less room for SQL, although it is still a major skill in data science. Amazon tests you in SQL slightly more than the average employer.
Hopefully, you see a pattern with these interview questions. The interviewers always start out with basic SQL syntax. Then they hit you with a few solvable queries.
Here is an example of one:
Given a table with three columns (id, category, value), with each id having three or fewer categories (price, size, color), find all ids for which two or more category values match.
You can expect an Amazon SQL question on the technical screen, and one or two of the on-site interviews will focus heavily on SQL and data analysis. In general, Amazon SQL questions tend to focus on customer metrics and e-commerce cases.
Explain the difference between the types of joins.
Hint: Try thinking of the question as a series of Venn diagrams.
Consider the following two tables:
customer_id | purchase_date | product_id | unit_price | unit_purchased |
---|---|---|---|---|
10001 | 2021-02-01 | 35525 | 25.00 | 4 |
10002 | 2021-02-02 | 30321 | 10.00 | 8 |
10003 | 2021-02-14 | 35525 | 25.00 | 3 |
10004 | 2021-02-05 | 34826 | 300.00 | 1 |
10005 | 2021-02-23 | 30321 | 10.00 | 5 |
….. |
customer_id | registration_date |
---|---|
10001 | 2020-10-10 |
10002 | 2021-01-31 |
10003 | 2021-02-10 |
10004 | 2021-01-15 |
10005 | 2021-02-15 |
….. |
a. Write a query to retrieve data with unique customer ids that made over $100 in purchases during the first week of February.
b. Write a query to retrieve data with unique customer ids that made over $100 in purchases within ten days of registering.
Consider the following tables:
Write a SQL query to create a histogram of the number of comments per user in the month of January 2020. Assume bin buckets class intervals of one.
users
table
Columns | Type |
---|---|
id | INTEGER |
name | STRING |
created_at | DATETIME |
neighborhood_id | INTEGER |
STRING |
comments
table
Columns | Type |
---|---|
user_id |
INTEGER |
body |
TEXT |
created_at |
DATETIME |
Here’s a hint:
What does a histogram represent? In this case, we’re interested in using a histogram to represent the distribution of comments each user made in January 2020.
A histogram with bin buckets of size one means that we can avoid the logical overhead of grouping frequencies into specific intervals.
For example, if we wanted a histogram of size five, we would have to run a SELECT statement like so:
SELECT CASE WHEN frequency BETWEEN 0 AND 5 THEN 5
WHEN frequency BETWEEN 5 AND 10 THEN 10
etc.
Test your abilities on this question on Interview Query.
Write a query to create a new table, named flight routes, that displays unique pairs of two locations.
Example:
Duplicate pairs from the flights table, such as Dallas to Seattle and Seattle to Dallas, should have one entry in the flight routes table.
flights
table
Column | Type |
---|---|
id |
INTEGER |
source_location |
STRING |
destination_location |
STRING |
Given a users table, write a query to get the cumulative number of new users added by day, with the total reset every month.
users
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
created_at |
DATETIME |
Here’s a hint:
This question first seems like it could be solved by just running a COUNT(*) and grouping by date. Or maybe it’s just a regular cumulative distribution function?
But we have to notice that we are actually grouping by a specific interval of month and date. And that when the next month comes around, we want to reset the count of the number of users.
Tangentially aside - the practical benefit of a query like this is that we can get a retention graph that compares the cumulative number of users from one month to another. If we have a goal to acquire 10% more users each month, how do we know if we’re on track for this goal on February 15th without having the same number to compare it to for January 15th?
Therefore how can we make sure that the total amount of users on January 31st rolls over back to 0 on February 1st?
With this question, you’re provided a table that contains data about products that a user purchased. Products are divided into categories. The column id is the primary key of table products and represents the order in which the products are purchased.
In this question, you’re given a table that represents the total number of messages sent between two users by date on Messenger.
What are some insights that could be derived from this table? What do you think the distribution of the number of conversations created by each user per day looks like?
See a video solution for this question:
Write a SQL query to select the 2nd highest salary in the engineering department. If more than one person shares the highest salary, the query should select the next highest salary.
Here’s a hint: First, we need the name of the department to be associated with each employee in the employees table, to understand which department each employee is a part of.4
Write a query to identify the manager with the biggest team size. You may assume there is only one manager with the largest team size.
Example:
Input:
employees
table
Column | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
manager_id |
INTEGER |
managers
table
Column | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
team |
VARCHAR |
Output:
Column | Type |
---|---|
manager |
VARCHAR |
team_size |
INTEGER |
Here’s a hint: We need data from both tables. Therefore we need to join them somehow.
Let’s say that you’re a data scientist working on a distribution team at Amazon.
Create a report displaying which shipments were delivered to customers during their membership period.
If the transaction is shipped during the membership period, column is_member
should have the value Y. If not, the column should have the value N.
Here’s a hint: You may assume that customer_id
is unique and that shipments are made only to registered customers.
Let’s say that you’re working at a global trading company and you’re analyzing the price of a particular asset over time. This dataset is super noisy, volatile, and you’re not guaranteed completely accurate data.
How would you analyze this data to ensure there aren’t any discrepancies?
Write a query to identify customers who placed more than three transactions each in both 2019 and 2020.
Given a table of product subscriptions with a subscription start date and end date for each user, write a query that returns true or false whether or not each user has a subscription date range that overlaps with any other completed subscription.
Completed subscriptions have end_date recorded.
Regardless of the position, SQL is a common feature of the Amazon interview. It’s best to be familiar with a basic syntax all the way through complex queries and sub-queries. Doing well in this portion of the interview could make the difference between getting the job or being asked to look elsewhere.
If you want to check out more SQL questions, check out our in-depth guide to approaching the different types of SQL interview questions you’ll see on the interview.
The interview process can feel long and intimidating for many. But it doesn’t need to be like that! We offer the guidance and tips that are needed to ensure you ace your interview with little stress and get started with your dream job.