SQL questions pop up in Google data science interviews all the time. And if you’ve got an interview, you might wonder: How can I prepare for SQL interviews at Google?
First, you’ll want to be ready to answer any definition-based SQL questions, like “What is a join?” Basic questions test your competency level with SQL, and they’re often asked early in the interview process. More importantly, though, you’ll likely be asked SQL business-focused questions in Google SQL interviews, e.g., writing SQL queries around customer data.
Today, we’re looking specifically at the SQL portion of Google interviews. We cover what you need to know about Google SQL interview questions, how they align with various Google roles, and SQL practice problems to help you prep.
Google has an extremely rigorous interview process. Each stage is as important as the next. But you likely won’t be asked SQL interview questions until the technical screen and on-site interview. Here’s what it looks like:
Initial Phone Screen - The initial call typically asks behavioral questions and questions about your experience. They’re typically conducted with a recruiter, and they’re designed to see if you’re a right fit for Google.
Technical Screen - During the technical screen, you should expect basic SQL questions, including definitions of basics, such as basic SQL questions on DATE, GROUP BY, and JOIN. You might be asked to perform basic queries as well.
Onsite Interview - The final round of interviews includes intermediate to advanced SQL questions using CASE, JOIN, sub-queries, and complex queries.
Like all FAANG companies, Google relies heavily on data. And SQL is a go-to tool for processing and analyzing that data. Google SQL interview questions don’t just ask basics. These interviews tend to ask case study SQL questions. In other words, you’ll be presented with more practical problems and real data and be asked to write queries for that dataset. The most common questions to get asked include:
Basic Google SQL interview questions - These are typically definition-based questions that come up on the technical screen. One tip: Develop your ability to explain these basic SQL concepts in layman’s terms.
SQL query questions - These types of questions test your knowledge in writing queries and statements in SQL. You’ll be presented with a dataset and asked to write SQL code to return a specific value.
Advanced SQL questions - Finally, you may be asked SQL scenario-based interview questions that will require you to write advanced queries. With this type of question, you’ll be asked to write queries that address a specific case and use a range of SQL clauses, from basics like SELECT and FROM to advanced ones like HAVING.
SQL is used in many different roles at Google. It will most commonly come up in interviews for:
Business analysts at Google use SQL to generate insights, maintain reports, and run analyses. In business analyst interviews, you can expect the basics, like DATE, GROUP BY, and JOIN, as well as more complex queries in the final round of interviews.
BI engineers at Google are at the “intersection of product, data, and business strategy.” In other words, you’ll be tasked with using data to make business decisions that improve customer experience. SQL is one of the most important skills tested in BI engineer interviews at Google.
Google data analysts perform a range of key tasks. Google analysts tend to derive business insights from data and provide that information to key stakeholders. These roles vary by the team you work with; for example, a data analyst on the Google Ads team will have a much different role than one working on Google Drive.
At Google, data scientists perform a variety of roles, and they’re tested on many different tools and skills. Although SQL questions are asked, these interviews tend to focus more on statistics, algorithms, and machine learning. But be prepared for basic SQL syntax questions and solvable queries.
These are examples of SQL questions you might expect in a Google interview:
A JOIN is a clause in SQL that’s used to join rows from two or more tables based on a common column between the tables. It is used for merging tables, as well as retrieving data. The most common types of joins include:
Primary keys are constraints that uniquely identify each record. One thing to note: Primary keys cannot have NULL values; all values must be UNIQUE. A table can have only one primary key, but the primary key can consist of single or multiple columns.
Constraints in SQL are rules that can be applied to the data type in a table. They are used to limit the type of data that can be stored in a particular column within a table. Some common types used in SQL are:
DELETE is used to remove specific data from a table. This statement is a DML command, and it’s slower than TRUNCATE. One key difference: You can roll back data after using DELETE. TRUNCATE, on the other hand, is a DDL command that is used to delete all the rows from a table.
Inefficient SQL queries can drain a database and lead to slow performance and loss of service. Optimization is especially critical when working with production databases. As such, query optimization is the process of making SQL queries more efficient. More efficient queries provide outputs faster and minimize the impact on the database.
employees
table
Columns | Type |
---|---|
id |
INTEGER |
first_name |
VARCHAR |
last_name |
VARCHAR |
salary |
INTEGER |
department_id |
INTEGER |
departments
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
Output:
percentage_over_100K | department name | number of employees |
---|---|---|
.9 |
engineering | 25 |
.5 |
marketing | 50 |
.12 |
sales | 12 |
Hint: What’s the question really asking? Breaking it down, we can subset this into separate clauses of conditions:
users
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
created_at |
DATETIME |
Output:
Date | Monthly Cumulative |
---|---|
2020-01-01 |
5 |
2020-01-02 |
12 |
... |
… |
2020-02-01 |
8 |
2020-02-02 |
17 |
2020-02-03 |
23 |
Hint: This question first seems like it could be solved by just running a COUNT(*) and grouping by date. Or maybe it’s just a regular cumulative distribution function? But we must notice that we are grouping by a specific interval of month and date. And when the next month comes around, we want to reset the count of the number of users.
subscriptions
table
Column | Type |
---|---|
user_id |
INTEGER |
start_date |
DATE |
end_date |
DATE |
Example:
user_id | start_date | end_date |
---|---|---|
1 | 2019-01-01 | 2019-01-31 |
2 | 2019-01-15 | 2019-01-17 |
3 | 2019-01-29 | 2019-02-04 |
4 | 2019-02-05 | 2019-02-10 |
Output:
user_id | overlap |
---|---|
1 | 1 |
2 | 1 |
3 | 1 |
4 | 0 |
Hint: Let’s take a look at each of the conditions first and see how they could be triggered. Given two date ranges, what determines if the subscriptions would overlap?
If multiple students have the same minimum score difference, select the student name combination that is higher in the alphabet.
scores
table
Column | Type |
---|---|
id |
INTEGER |
student |
VARCHAR |
score |
INTEGER |
Input:
id | student | score |
---|---|---|
1 | Jack | 1700 |
2 | Alice | 2010 |
3 | Miles | 2200 |
4 | Scott | 2100 |
Output:
one_student | other_student | score_diff |
---|---|---|
Alice | Scott | 90 |
Hint: Given that the problem statement references one table with only two columns, we have to self-reference different creations of the same table. It’s helpful to think about this problem using two different tables with the same values.
users
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
neighborhood_id |
INTEGER |
created_at |
DATETIME |
neighborhoods
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
city_id |
INTEGER |
Output:
Columns | Type |
---|---|
neighborhood_name |
VARCHAR |
Hint: Our predicament is to find all the neighborhoods without users. In a sense, we need all the neighborhoods that do not have a single user living in them. This means we have to introduce the concept of the existence of a column in one table but not in the other.
transactions table
Column | Type |
---|---|
id |
INTEGER |
user_id |
INTEGER |
created_at |
DATETIME |
product_id |
INTEGER |
quantity |
INTEGER |
products table
Column | Type |
---|---|
id |
INTEGER |
name |
STRING |
price |
FLOAT |
Hint: We first need to find the average price of all transactions. The total price of a transaction is price*quantity, so we write a sub-query to find the average of all transactions.
transactions
and products
. Hypothetically, the transactions
table consists of over a billion rows of purchases bought by users.We are trying to find paired products often purchased by the same user, such as wine and bottle openers, chips and beer, etc.
Write a query to find the top five paired products and their names.
Note: P1 should be the item that comes first in the alphabet to satisfy the test case.
transactions table:
Column | Type |
---|---|
id |
INTEGER |
user_id |
INTEGER |
created_at |
DATETIME |
product_id |
INTEGER |
quantity |
INTEGER |
products table:
Column | Type |
---|---|
id |
INTEGER |
name |
STRING |
price |
FLOAT |
example output:
Column | Type |
---|---|
P1 |
STRING |
P2 |
STRING |
count |
INTEGER |
Hint: We need to break this into several steps to solve this. First, we should find a way to select all the instances in which a user purchased 2 or more products simultaneously. How can we use user_id and created_at to accomplish this?
We have a table called song_plays
that tracks each time a user plays a song.
Write a query to return the number of songs played on each date for each user
Note: If a user plays the same song twice during the day, the count should be two.
Design a database for a stand-alone fast food restaurant.
Based on the above database schema, write an SQL query to find the top three items with the highest revenue sold yesterday.
Note: We will only record the customer if they either sign up for our rewards program or order by delivery. Customers will be recorded in the orders and customers tables only if they match one of those criteria. Deliveries will be recorded in the orders and deliveries tables only if they were delivered.
Let’s say you work at a file-hosting website. You have information on user’s daily downloads in the download_facts table
Use the window function RANK to display the top three users by downloads each day. Order your data by date and then by daily_rank
Input:
download_facts table
Column | Type |
---|---|
user_id | INTEGER |
date | DATE |
downloads | INTEGER |
Output:
Column | Type |
---|---|
daily_rank | INTEGER |
user_id | INTEGER |
date | DATE |
downloads | INTEGER |
SQL questions are a sizeable part of the interview for many Google positions, especially analyst roles. Your prep should include a solid study of the basics and SQL definitions, yet you should also be comfortable with more complex queries and sub-queries. One tip: Ask the recruiter the types of questions that will come up in the interview. This will give you an idea of where to focus your study.
This course will help you brush up on your SQL skills and learn basic to advanced techniques.
If you need additional help, be sure to check out the SQL module in our Data Science Course. It offers a solid review of basic-to-advanced concepts. You might also want to see our Top 25+ Data Science SQL Interview Questions.