SQL questions popup in Google data science interviews all the time. And if you’ve got an interview coming up, you might be wondering: How can I prepare for SQL interviews at Google?
First, you’ll want to be ready to answer any definition-based SQL questions, like “what is a join?” Basic questions test your competency level with SQL, and they’re often asked early in the interview process. More importantly, though, in Google SQL interviews, you’ll likely be asked SQL business-focused questions, e.g. writing SQL queries around customer data.
Today, we’re looking specifically at the SQL portion of Google interviews. We cover: What you need to know, how SQL questions align to various Google roles, and SQL practice problems you can use to prep.
Google has an extremely rigorous interview process. Each stage is as important as the next. But you likely won’t be asked SQL interview questions until the technical screen and on-site interview. Here’s what it looks like:
Initial Phone Screen - The initial call typically asks behavioral questions and questions about your experience. They’re typically conducted with a recruiter, and they’re designed to see if you’re a right fit for Google.
Technical Screen - During the technical screen, you should expect basic SQL questions, including definitions of basics like should expect basic SQL questions on DATE, GROUP BY, and JOIN. You might be asked to perform basic queries as well.
Onsite Interview - The final round of interviews includes intermediate to advanced SQL questions using CASE, JOIN, sub-queries and complex queries.
Like all FAANG companies, Google relies heavily on data. And SQL is a go-to tool for processing and analyzing that data. Google SQL interviews don’t just ask basics. These interviews tend to ask case study SQL questions. In other words, you’ll be presented with more practical problems and real data, and be asked to write queries for that dataset. The most common questions to get asked include:
Basic SQL interview questions - These are typically definition-based questions, and they come up on the technical screen. One tip: Develop your ability to explain these basic SQL concepts in layman’s terms.
SQL query questions - These types of questions test your knowledge in writing queries and statements in SQL. You’ll be presented with a dataset and asked to write SQL code to return a specific value.
Advanced SQL questions - Finally, you may be asked SQL scenario-based interview questions that will require you to write advanced queries. With this type of questions, you’ll be asked to write queries that address a specific case and use a range of SQL clauses, from basics like SELECT and FROM to advanced like HAVING.
SQL is used in many different roles at Google. It will most commonly come up in interviews for:
Business analysts at Google use SQL to generate insights, maintain reports and run analyses. In business analyst interviews, you can expect the basics, like DATE, GROUP BY and JOIN, and in the final round of interviews, more complex queries.
BI engineers at Google are at the “intersection of product, data, and business strategy.” In other words, you’ll be tasked with using data to make business decisions that improve customer experience. SQL is one of the most important skills tested in BI engineer interviews at Google.
Google data analysts perform a range of key tasks. Google analysts tend to derive business insights from data, and provide that information to key stakeholders. These roles vary by the team you work with; for example, a data analyst on the Google Ads team will have a much different role than one working on Google Drive.
At Google, data scientists perform a variety of roles, and they’re tested on many different tools and skills. Although SQL questions get asked, these interviews tend to focus more on statistics, algorithms and machine learning topics. But be prepared for basic SQL syntax questions, as well as solvable queries.
These are examples of SQL questions you might expect in a Google interview:
A JOIN is a clause in SQL that’s used to join rows from two or more tables, based on a common column between the tables. It is used for merging tables, as well as retrieving data. The most common types of joins include:
Primary keys are constraints that uniquely identify each record. One thing to note: Primary keys cannot have NULL values, and all values must be UNIQUE. A table can have only one primary key, but the primary key can consist of single or multiple columns.
Constraints in SQL are rules that can be applied to the type of data in a table. They are used to limit the type of data that can be stored in a particular column within a table. Some common types used in SQL are:
DELETE is used to remove specific data from a table. This statement is a DML command, and it’s slower than TRUNCATE. One key difference: You can rollback data after using DELETE. TRUNCATE, on the other hand, is a DDL command, and it is used to delete all the rows from a table.
Inefficient SQL queries can drain a database, and lead to slow performance and loss of service. Optimization is especially critical when working with production databases. As such, query optimization is the process of making SQL queries more efficient. More efficient queries provide outputs faster, and minimizes the impact on the database.
employees
table
Columns | Type |
---|---|
id |
INTEGER |
first_name |
VARCHAR |
last_name |
VARCHAR |
salary |
INTEGER |
department_id |
INTEGER |
departments
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
Output:
percentage_over_100K | department name | number of employees |
---|---|---|
.9 |
engineering | 25 |
.5 |
marketing | 50 |
.12 |
sales | 12 |
Hint: What’s the question really asking? Breaking it down, we can subset this into separate clauses of conditions:
users
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
created_at |
DATETIME |
Output:
Date | Monthly Cumulative |
---|---|
2020-01-01 |
5 |
2020-01-02 |
12 |
... |
… |
2020-02-01 |
8 |
2020-02-02 |
17 |
2020-02-03 |
23 |
Hint: This question first seems like it could be solved by just running a COUNT(*) and grouping by date. Or maybe it’s just a regular cumulative distribution function? But we have to notice that we are actually grouping by a specific interval of month and date. And when the next month comes around, we want to the reset the count of the number of users.
subscriptions
table
Column | Type |
---|---|
user_id |
INTEGER |
start_date |
DATE |
end_date |
DATE |
Example:
user_id | start_date | end_date |
---|---|---|
1 | 2019-01-01 | 2019-01-31 |
2 | 2019-01-15 | 2019-01-17 |
3 | 2019-01-29 | 2019-02-04 |
4 | 2019-02-05 | 2019-02-10 |
Output:
user_id | overlap |
---|---|
1 | 1 |
2 | 1 |
3 | 1 |
4 | 0 |
Hint: Let’s take a look at each of the conditions first and see how they could be triggered. Given two date ranges, what determines if the subscriptions would overlap?
If there are multiple students with the same minimum score difference, select the student name combination that is higher in the alphabet.
scores
table
Column | Type |
---|---|
id |
INTEGER |
student |
VARCHAR |
score |
INTEGER |
Input:
id | student | score |
---|---|---|
1 | Jack | 1700 |
2 | Alice | 2010 |
3 | Miles | 2200 |
4 | Scott | 2100 |
Output:
one_student | other_student | score_diff |
---|---|---|
Alice | Scott | 90 |
Hint: Given the problem statement is referencing one table with only two columns, we have to self-reference different creations of the same table. It’s helpful to think about this problem in the form of two different tables with the same values.
users
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
neighborhood_id |
INTEGER |
created_at |
DATETIME |
neighborhoods
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
city_id |
INTEGER |
Output:
Columns | Type |
---|---|
neighborhood_name |
VARCHAR |
Hint: Our predicament is to find all the neighborhoods without users. In a sense we need all the neighborhoods that do not have a singular user living in them. This means we have to introduce a concept of existence of a column in one table, but not in the other.
transactions table
Column | Type |
---|---|
id |
INTEGER |
user_id |
INTEGER |
created_at |
DATETIME |
product_id |
INTEGER |
quantity |
INTEGER |
products table
Column | Type |
---|---|
id |
INTEGER |
name |
STRING |
price |
FLOAT |
Hint: We first need to find the average price of all transactions. The total price of a transaction is price*quantity so we write a sub-query to find the average over all transactions.
transactions
and products
. Hypothetically the transactions
table consists of over a billion rows of purchases bought by users.We are trying to find paired products that are often purchased together by the same user, such as wine and bottle openers, chips and beer, etc..
Write a query to find the top five paired products and their names.
Note: for the purposes of satisfying the test case, P1 should be the item that comes first in the alphabet. transactions table:
Column | Type |
---|---|
id |
INTEGER |
user_id |
INTEGER |
created_at |
DATETIME |
product_id |
INTEGER |
quantity |
INTEGER |
products table:
Column | Type |
---|---|
id |
INTEGER |
name |
STRING |
price |
FLOAT |
example output:
Column | Type |
---|---|
P1 |
STRING |
P2 |
STRING |
count |
INTEGER |
Hint: To solve this, we need to break this into several steps. First, we should find a way to select all the instances a user purchased 2 or more products at the same time. How can we use user_id and created_at to accomplish this?
We have a table called song_plays
that tracks each time a user plays a song.
Write a query to return the number of songs played on each date for each user
Note: If a user plays the same song twice during the day count should be two.
Design a database for a stand-alone fast food restaurant.
Based on the above database schema, write a SQL query to find the top three highest revenue items sold yesterday.
Note: We will only record the customer if they either sign up for our rewards program or order by delivery. Customers will be recorded in the orders and customers tables only if they match one of those criteria. Deliveries will be recorded in the orders and deliveries tables only if they were delivered.
Let’s say you work at a file-hosting website. You have information on user’s daily downloads in the download_facts table
Use the window function RANK to display the top three users by downloads each day. Order your data by date, and then by daily_rank
Input:
download_facts table
Column | Type |
---|---|
user_id | INTEGER |
date | DATE |
downloads | INTEGER |
Output:
Column | Type |
---|---|
daily_rank | INTEGER |
user_id | INTEGER |
date | DATE |
downloads | INTEGER |
In interviews for many Google positions, especially analyst roles, SQL questions make up a sizeable part of the interview. Your prep should include a solid study of the basics and SQL definitions, yet you should also be comfortable with more complex queries and sub-queries. One tip: Ask the recruiter the types of questions that will come up in the interview. This will give you an idea of where to focus your study.
This course will help you brush up on your SQL skills and learn basic to advanced techniques.
If you need additional help, be sure to check out the SQL module in our Data Science Course. It offers a solid review of basic-to-advanced concepts. You might also want to see our Top 25+ Data Science SQL Interview Questions.