SQL proficiency is essential for anyone seeking a job in data analysis or data science. It’s the standard language for managing and querying relational databases, so most organizations use it to make data-driven decisions. So it almost always comes up in job interviews for data-related positions.
This SQL cheat sheet for interviews is designed to help you prepare for these interviews, providing a comprehensive yet concise overview of the most important SQL concepts, commands, and techniques.
Each section gives a brief explanation of how SQL clauses and functions work, and then provides a brief summary cheatsheet of the concepts mentioned.
Before diving into specific queries, it’s crucial to understand the basics of SQL and the syntax used for writing SQL commands. SQL is a declarative language, which means you specify what you want to accomplish without outlining the exact steps to get there.
The SELECT
statement is used to retrieve data from one or more tables. List the column names you want to retrieve, separated by commas. The FROM
clause follows the SELECT
clause and specifies the table(s) from which the data is fetched. Here’s an example:
SELECT column1, column2
FROM table_name;
The WHERE
clause filters the results of a query based on a specified condition. It appears after the FROM
clause. For instance:
SELECT column1, column2
FROM table_name
WHERE column1 = 'value
The HAVING
clause filters the results of a GROUP BY
query based on a specified condition. It appears after the GROUP BY
clause. For example:
SELECT column1, COUNT(*)
FROM table_name
GROUP BY column1
HAVING COUNT(*) > 10;
The ORDER BY
clause appears after the HAVING
clause (or if there’s no HAVING
, whatever comes earlier).
It sorts the result table based on specified column(s). You must specify the column names separated by commas, and then specify if you want them to be ordered in ascending (ASC) or descending (DESC) order.
SELECT column1, column2
FROM table_name
WHERE column1 > 10
ORDER BY column1, column2 ASC;
Here’s a brief syntax cheat sheet of what we just mentioned:
SELECT
: Used to retrieve data from one or more tables.
Syntax: SELECT column1, column2, ... FROM table_name;
FROM
: Specifies the table(s) from which data should be retrieved.
Syntax: SELECT column1, column2, ... FROM table_name;
WHERE
: Filters the data based on a specified condition.
Syntax: SELECT column1, column2, ... FROM table_name WHERE condition;
GROUP BY
: Groups rows that have the same values in specified columns.
Syntax: SELECT column1, column2, … FROM table_name GROUP BY column1, column2, …;
HAVING
: Filters the results of a GROUP BY query based on a specified condition.
Syntax: SELECT column1, column2, ... FROM table_name GROUP BY column1, column2, ... HAVING condition;
ORDER BY
: Sorts the result set in ascending or descending order based on one or more columns.
Syntax: SELECT column1, column2, ... FROM table_name ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...;
These basic commands form the foundation of SQL queries, and mastering them is essential for succeeding in data analysis and data science interviews. In the next sections, we’ll provide a cheat sheet of common SQL queries and functions to enhance your interview preparation further.
Mastering SQL queries is essential for data analysis and data science interviews. In this section, we provide a cheat sheet of the most common SQL queries that may come up in a job interview.
Sometimes, you may need to retrieve only specific pieces of data. SQL allows you to either fetch all columns from a table or just a selection of them:
(*)
after the SELECT statement:SELECT * FROM table_name;
-If you’re interested in only certain data, like extracting the names and email addresses of customers for a marketing campaign, enumerate the columns you want, separated by commas:
SELECT column1, column2 FROM table_name;
SQL enables you to retrieve subsets of rows that meet specific criteria using the WHERE
clause. This can be particularly helpful for carrying out analyses on certain groups:
WHERE
clause followed by the column name, an equals sign, and the desired value:SELECT * FROM table_name WHERE column1 = 'value';
-You can also filter for other conditions than equality. For example, to retrieve rows where the value of a column is greater than a specified value, use the following query, use the WHERE clause followed by the column name, the greater-than operator, and the desired value:
SELECT * FROM table_name WHERE column1 > 'value';
This can be useful for identifying trends or outliers, like finding products with unusually high prices or orders that exceed a certain quantity.
Sorting data can help you organize information in a more readable format or rank it based on specific criteria:
ORDER BY
clause followed by the column name and the ASC
keyword:SELECT * FROM table_name ORDER BY column1 ASC;
DESC
keyword instead:SELECT * FROM table_name ORDER BY column1 DESC;
When working with relational databases, you’ll often need to join tables to retrieve related data:
SELECT table1.column1, table2.column2
FROM table1
INNER JOIN table2
ON table1.column1 = table2.column1;
This can be useful for combining related data from different tables, like determining the total revenue generated by each customer by joining the Customers and Orders tables.
Grouping and aggregating data helps you summarize information for better insights:
GROUP BY
clause followed by the column name:SELECT column1, COUNT(*)
FROM table_name
GROUP BY column1;
This is useful for finding the total number of products sold per category or the average salary of employees in each department.
Limiting the number of rows in the output can help you focus on specific portions of data:
LIMIT
keyword followed by the desired number of rows:SELECT * FROM table_name
LIMIT 10;
This is helpful for displaying only the top 10 best-selling products or the 5 most recent orders.
This SQL query cheat sheet covers fundamental queries that form the foundation for data analysis tasks. Practicing these queries and understanding their use cases will help you succeed in data analysis and data science interviews.
Here’s a brief syntax cheat sheet of the common SQL queries:
SELECT * FROM table_name;
Retrieves all columns from the specified table.SELECT column1, column2 FROM table_name;
Retrieves specific columns from the specified table.SELECT * FROM table_name WHERE column1 = 'value';
Retrieves rows where the value of column1 matches the specified value.SELECT * FROM table_name WHERE column1 > 'value';
Retrieves rows where the value of column1 is greater than the specified value.SELECT * FROM table_name ORDER BY column1 ASC;
Retrieves data sorted by column1 in ascending order.SELECT * FROM table_name ORDER BY column1 DESC;
Retrieves data sorted by column1 in descending order.SELECT table1.column1, table2.column2 FROM table1 INNER JOIN table2 ON table1.column1 = table2.column1;
Retrieves specific columns from two tables, joining them based on the specified condition.SELECT column1, COUNT(*) FROM table_name GROUP BY column1;
Groups data by column1 and counts the number of rows in each group.SELECT * FROM table_name LIMIT 10;
SQL functions are indispensable in data analysis and manipulation, as they simplify complex operations and improve the efficiency of data processing. They’re versatile and can be used with various data types and scenarios. In this section, we’ll explore essential SQL functions, including aggregate functions, string functions, and date functions.
Aggregate functions perform calculations on a set of values and return a single value. Some common aggregate functions include:
COUNT()
: Counts the number of rows that match a specified condition.SELECT COUNT(column_name) FROM table_name GROUP BY column_name;
SUM()
: Adds up the values of a specified column.SELECT SUM(column_name) FROM table_name GROUP BY column_name;
AVG()
: Calculates the average value of a specified column.SELECT AVG(column_name) FROM table_name GROUP BY column_name;
MIN()
: Finds the smallest value in a specified column.SELECT MIN(column_name) FROM table_name GROUP BY column_name;
MAX()
: Finds the largest value in a specified column.SELECT MAX(column_name) FROM table_name GROUP BY column_name;
Aggregate functions are often used in conjunction with the GROUP BY
clause to perform calculations on each group of rows with the same value in specified columns. Let’s look at an example:
SELECT department, COUNT(employee_id) AS employee_count
FROM employees
GROUP BY department;
In this example, we group the rows by the department
column and then count the number of employees in each department using the COUNT()
function. The result is a summary table with each department’s employee count.
String functions manipulate and transform text data. Some essential string functions include:
CONCAT()
: Joins two or more strings together. For example, CONCAT('Hello', ' World')
returns 'Hello World'
.SELECT CONCAT(column1, column2) FROM table_name;
SUBSTRING()
: Extracts a part of a string. For example, SUBSTRING('Hello World', 1, 5)
returns 'Hello'
.SELECT SUBSTRING(column_name, start_position, length) FROM table_name
LENGTH()
: Returns the length of a string. For example, LENGTH('Hello')
returns 5
.SELECT LENGTH(column_name) FROM table_name;
REPLACE()
: Replaces a specified part of a string with another string. For example, REPLACE('Hello World', 'World', 'There')
returns 'Hello There'
.SELECT REPLACE(column_name, 'old_string', 'new_string') FROM table_name;
TRIM(column):
Removes leading and trailing spaces from the specified column. For example, TRIM(' Hello World ')
returns 'Hello World'
Date functions help in manipulating and formatting date and time values. They tend to come up frequently in job interviews because they are necessary for any data analysis that involves dates, which is crucial for reporting. Some common date functions include:
NOW()
: Returns the current date and time.SELECT NOW();
DATE()
: Extracts the date part of a date-time expression.SELECT DATE(column_name) FROM table_name;
DATEDIFF()
: Calculates the difference between two dates.SELECT DATEDIFF(date1, date2) FROM table_name;
DATE_ADD()
: Adds a specified time interval to a date.SELECT DATE_ADD(date, INTERVAL value unit) FROM table_name;
You need to use the INTERVAL
keyword followed by the value and the time unit (DAY
, MONTH
, YEAR
, HOUR
, MINUTE
, or SECOND
). For example, in INTERVAL 5 DAY
, 5
would be the value and DAY
would be the unit.
Here’s a brief syntax cheat sheet of what we just mentioned:
COUNT(column):
Counts the number of non-null values in the specified column.SUM(column):
Calculates the sum of non-null values in the specified column.AVG(column):
Computes the average of non-null values in the specified column.MIN(column):
Finds the minimum value in the specified column.MAX(column):
Identifies the maximum value in the specified column.CONCAT(column1, column2)
: Concatenates two columns of strings.SUBSTRING(column, start, length)
: Extracts a substring from the specified column, starting at the given position for the specified length.LENGTH(column)
: Returns the string length for the value in the column.REPLACE(column, search_string, replacement_string)
: Replaces all occurrences of the search_string with the replacement_string in the specified column.TRIM(column):
Removes leading and trailing spaces from the specified column.NOW()
: Returns the current date and time.DATE(column)
: Extracts the date part of a datetime value in the specified column.DATEDIFF(column1, column2)
: Calculates the difference in days between two dates.DATE_ADD(date, INTERVAL value unit)
SQL window functions enable advanced data analysis by performing calculations across a set of rows related to the current row. These functions are powerful tools for solving complex analytical problems, such as cumulative sums, running averages, and ranking. In this section, we’ll explore commonly used window functions, including ROW_NUMBER
, RANK
, DENSE_RANK
, and NTILE
, and explain how they can be used in data analysis tasks.
The ROW_NUMBER()
function assigns a unique, sequential number to each row within the result set. Here’s an example:
SELECT product_name, sales, ROW_NUMBER() OVER (ORDER BY sales DESC) AS row_number
FROM products;
In this query, we assign a row number to each product based on their sales in descending order.
The ROW_NUMBER
function is useful for pagination because it lets us relate each row to a unique and sequential number. Therefore, we can choose to show the rows with values between 1 and 10 on one page, then the rows with values between 11 and 20 on another, and so on, and know that each of these pages will have 10 rows. We couldn’t use the rows’ ids directly because there’s no guarantee that it will be sequential (it may skip values, so there may be less than 10 rows with ids between 1 and 10).
The RANK()
function assigns a unique rank to each row within the result set, with the same rank for rows with equal values. If two rows have the same rank, the next rank is skipped. For example:
SELECT product_name, sales, RANK() OVER (ORDER BY sales DESC) AS rank
FROM products;
This query ranks products by sales, with higher sales receiving a lower rank.
By ranking records, you can analyze trends, detect outliers, and make informed decisions about your data. For instance, a sales manager can use the RANK()
function to identify the top 10 salespeople in the team and reward them accordingly.
The DENSE_RANK()
function is similar to RANK()
, but it does not skip any rank. Rows with equal values receive the same rank, and the next rank is assigned immediately after. Example:
SELECT product_name, sales, DENSE_RANK() OVER (ORDER BY sales DESC) AS dense_rank
FROM products;
In this query, we assign a dense rank to products based on their sales.
The NTILE()
function divides the result set into a specified number of equal groups. It assigns each row to one of these groups based on the order specified. For instance:
SELECT product_name, sales, NTILE(4) OVER (ORDER BY sales DESC) AS quartile
FROM products;
This query divides products into four quartiles based on their sales.
NTILE()
is a valuable tool in data analysis for segmenting your data into different groups or categories, such as quartiles, quintiles, or percentiles. By dividing your dataset into equal parts based on a specific measure, you can gain insights into the distribution and trends of your data. For example, a marketing team can use NTILE()
to segment customers into quartiles based on their spending habits. This information can then be used to create targeted marketing campaigns for different customer segments, ultimately improving marketing efficiency and customer engagement.
Here’s a brief syntax cheat sheet of what we just mentioned:
ROW_NUMBER()
: Assigns a unique, sequential number to each row within the result set, starting at 1.
Syntax: ROW_NUMBER() OVER (ORDER BY column)
RANK()
: Assigns a unique rank to each row within the result set, with the same rank assigned to rows with equal values. Rows with equal values will leave a gap in the sequence.
Syntax: RANK() OVER (ORDER BY column)
DENSE_RANK()
: Similar to RANK()
, but without gaps in the rank sequence for equal values.
Syntax: DENSE_RANK() OVER (ORDER BY column)
NTILE(n)
: Divides the result set into a specified number of groups (n) and assigns a group number to each row.
Syntax: NTILE(n) OVER (ORDER BY column)
Understanding and mastering SQL window functions is essential for tackling sophisticated data analysis tasks, making them a valuable addition to your SQL interview cheat sheet.
Understanding SQL data types is crucial for interview success, as it ensures you can create and manipulate database structures effectively. Here’s a cheat sheet of common SQL data types:
INT
: A whole number with a range depending on the database system.FLOAT
: A floating-point number, used to store approximate values with varying degrees of precision.DECIMAL(p, s)
: A fixed-point number with user-defined precision (p) and scale (s). Scale represents the number of digits to the right of the decimal point.VARCHAR(n)
: A variable-length character string with a maximum length of n characters.DATE
: Represents a date value (YYYY-MM-DD).TIMESTAMP
: Represents a date and time value (YYYY-MM-DD HH:MI:SS).Knowing these common SQL data types will help you create and modify database schemas, write efficient queries, and avoid data type conversion errors during your interview.
Further your SQL practice with the SQL learning path, and brush up your skills with our comprehensive SQL question database.
While it’s generally not acceptable to bring an actual cheat sheet into an interview, you can use it effectively during your interview preparation. Here are some tips:
By using the cheat sheet strategically during your preparation, you’ll build a strong foundation in SQL that will improve your chances of landing the job.
A SQL cheat sheet for interviews is a valuable resource for honing your SQL skills and boosting your confidence in data analysis and data science job interviews. By mastering the concepts covered in this cheat sheet, you’ll be well-prepared to tackle any SQL-related interview questions and showcase your expertise.