Common SQL Interview Mistakes and How to Avoid Them

Common SQL Interview Mistakes and How to Avoid Them

Overview

Database and SQL questions are essential in interviews for data-related roles. While SQL is a relatively straightforward and very popular query language for DBMS that can be mastered through advanced knowledge and consistent practice, we’ve noticed our candidates making recurring mistakes. These errors are not typically due to a lack of technical skills but rather a result of limited industry experience.

While practicing SQL interview questions, you may encounter some of these common mistakes that can be detrimental during high-pressure technical interviews. To help you stay prepared and avoid these pitfalls, we’ve compiled a list of the most frequent SQL mistakes candidates make.

Not Reading the Questions Carefully

One typical SQL interview mistake is failing to thoroughly read and understand the question before attempting to write a query. Candidates often jump into writing code without fully digesting the requirements, leading to incomplete, incorrect, or overly complicated solutions. This is frequently a direct result of nervousness under pressure and overconfidence. Carefully read the questions at least twice to ensure you understand all the requirements and edge cases. Dividing the questions into parts may also facilitate better handling.

Also, don’t waste time trying to understand intentionally vague questions. Feel free to ask the interviewer clarifying questions where required and allowed.

For example:

Question: Display the total revenue as total_revenue and the count of orders as order_count.

Common Mistake: You may forget to use aliases for the output columns as specified in the question.

-- Incorrect: Lacks required column aliases
SELECT SUM(revenue), COUNT(order_id)
FROM orders;

Correct Approach: Ensure the output matches the specified format.

-- Correct: Uses aliases for clarity
SELECT SUM(revenue) AS total_revenue, COUNT(order_id) AS order_count
FROM orders;

Using UNION for Stacking Tables

UNION operator combines the result sets of two or more SELECT statements and removes duplicate rows from the final result, whereas the UNION ALL operator also combines the result sets of two or more SELECT statements, but without eliminating duplicates.

One SQL mistake made during interviews is using UNION to stack tables when UNION ALL would be a better choice. This mistake often happens because candidates fail to consider the difference in functionality between these two operators and the performance implications of using them.

However, this mistake often occurs against the assumption that UNION is the default choice for combining results from multiple SELECT statements, and it might seem like the “safer” option because it eliminates duplicates automatically. Before you use UNION, ask yourself whether duplicate elimination is necessary. If you don’t need to remove duplicates, always choose UNION ALL for better performance.

For instance:

Question: Combine the results of two tables, employees and contractors, which both have a column employee_id.

Mistake:

Using UNION when there is no need to eliminate duplicates, leading to unnecessary overhead.

-- Incorrect: Using UNION which removes duplicates, even if they're not needed
SELECT employee_id, name FROM employees
UNION
SELECT employee_id, name FROM contractors;

In this case, UNION will perform extra work to eliminate any duplicate employee_ids between the employees and contractors tables, which is not needed if you’re simply stacking the results.

Correct Approach:

Use UNION ALL when you want to retain all rows from both tables, and there’s no need for deduplication.

-- Correct: Use UNION ALL to stack rows without eliminating duplicates
SELECT employee_id, name FROM employees
UNION ALL
SELECT employee_id, name FROM contractors;

This approach is faster and more efficient since it doesn’t perform the unnecessary deduplication step.

Using = NULL instead of IS NULL

A high-pressure common SQL interview mistake is using = NULL to compare values to NULL. This is incorrect because NULL represents an unknown value, and the SQL = (equal) operator cannot be used to compare NULL values. Instead, the correct approach is to use the IS NULL or IS NOT NULL condition to check for NULL values.

For example:

The candidate incorrectly uses the = operator to check if a column contains a NULL value.

-- Incorrect: Using "=" to compare NULL
SELECT * FROM employees WHERE department_id = NULL;

This query will return no results, even if some rows in the department_id column are NULL, because the expression department_id = NULL evaluates to unknown, not true. SQL will not return rows where the condition is unknown.

Correct Approach:

Instead of = NULL, use IS NULL to check for NULL values.

-- Correct: Use IS NULL to check for NULL values
SELECT * FROM employees WHERE department_id IS NULL;

Now, this query will return all rows where the department_id is actually NULL.

This usually happens because when learning SQL, you might develop the habit of using = for comparisons in general and may apply this to NULL values without realizing the difference. It’s often mitigated with enough practice and staying alert while preparing for the interview with an AI Interviewer.

Putting Left Join in a WHERE Clause

Another SQL mistake we frequently notice is where candidates place conditions related to the right table of a LEFT JOIN directly in the WHERE clause. This can unintentionally convert the LEFT JOIN into an INNER JOIN, leading to incorrect results. The LEFT JOIN is meant to return all rows from the left table, along with matching rows from the right table (or NULL if no match exists), but putting conditions for the right table in the WHERE clause can filter out those NULL values and eliminate rows that don’t have a match in the right table.

For example:

Scenario:

You want to retrieve a list of all employees and their department names, including those not assigned to any department.

-- Incorrect: Filtering out employees with no department
SELECT e.employee_id, e.employee_name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
WHERE d.department_name = 'Sales';

Explanation of the Mistake:

In this example, the LEFT JOIN is used to retrieve all employees and their department names, even if some employees are not assigned to any department (i.e., department_name is NULL). However, by placing the condition WHERE d.department_name = 'Sales' in the WHERE clause, you’re unintentionally converting the LEFT JOIN to an INNER JOIN because it filters out all rows where d.department_name is NULL. As a result, only employees who belong to the ‘Sales’ department will be returned, and employees with no department will be excluded.

Not Formatting Your SQL Code

Failing to format SQL code properly is a common mistake that, while not affecting the execution of the query, can significantly hinder readability, maintainability, and debugging. Unformatted or poorly formatted SQL code can lead to confusion, errors during code reviews, and difficulty understanding complex queries.

It’s more apparent in beginner coders who might not have been trained in or exposed to best practices for SQL formatting, so they default to writing code in a single line or with inconsistent indentation. Debugging unformatted code takes longer, as understanding the structure of the query requires additional effort.

Example:

SELECT e.name, d.department_name FROM employees e JOIN departments d ON e.department_id = d.department_id WHERE d.department_name = 'Sales' AND e.salary > 50000 ORDER BY e.name;

The query is written in a single line, making it difficult to distinguish between different clauses. Also, logical groupings of the query elements (e.g., SELECT, JOIN, WHERE, ORDER BY) are not visually separated.

Not Checking for Duplicate Matched When using JOIN

Failing to account for duplicate matches when performing JOIN operations can lead to inflated result sets, as rows from one table may join with multiple matching rows in another table. This often results in duplicate or unexpected data, especially when relationships between tables aren’t carefully understood or when the data itself allows for multiple matches.

This usually occurs when you presume that every row in one table has a one-to-one match in the other table when, in reality, the relationship could be one-to-many or many-to-many. However, if the database lacks proper constraints (e.g., unique keys, foreign key relationships), duplicate rows in either table may cause unintended results.

For instance:

You want to list all employees and the projects they are working on using an employees table and a projects table, joined through an employee_projects bridge table.

-- Incorrect: May result in duplicate matches
SELECT
    e.employee_id,
    e.name,
    p.project_name
FROM
    employees e
JOIN
    employee_projects ep
    ON e.employee_id = ep.employee_id
JOIN
    projects p
    ON ep.project_id = p.project_id;

If the employee_projects table contains multiple rows for the same employee_id and project_id (e.g., tracking different roles on the same project), the query may return duplicate rows.

Solution: Use DISTINCT to Remove Duplicates in the Result Set:

If duplicates are caused by the JOIN but aren’t needed in the result, use DISTINCT to ensure unique rows.

SELECT DISTINCT
    e.employee_id,
    e.name,
    p.project_name
FROM
    employees e
JOIN
    employee_projects ep
    ON e.employee_id = ep.employee_id
JOIN
    projects p
    ON ep.project_id = p.project_id;

Using LIMIT Without ORDER BY

Developers often assume that records are returned in the order they were inserted or by primary key, but this is not guaranteed unless explicitly specified. Using the LIMIT clause in SQL without specifying an ORDER BY clause can produce unpredictable results. When you don’t explicitly define an order for your query results, the database may return rows in an arbitrary order, which can vary depending on the underlying data structure, query execution plan, or database engine.

When displaying results across multiple pages, ORDER BY ensures consistent ordering across page views.

SELECT *
FROM users
ORDER BY user_id ASC
LIMIT 10 OFFSET 20;

This fetches rows 21–30 in a consistent order. Feel free to check our ORDER BY and LIMIT Guide for more info.

Treating Dates as Strings

A standard SQL mistake is treating date and time fields as plain strings instead of leveraging the database’s native DATE, DATETIME, or TIMESTAMP data types. This leads to inefficient queries, incorrect results, and missed opportunities to use powerful date-related functions provided by the database.

This issue occurs because when importing data, you may store date values as strings (VARCHAR or TEXT) instead of converting them to proper date formats. Beginners may also be unfamiliar with date-handling functions or find it easier to work with strings for operations like concatenation or pattern matching.

Forgetting to ROUND Numbers

Failing to round numbers in SQL queries can lead to unexpected results, especially when dealing with financial data, statistics, or any calculation requiring precise representation. Unrounded values can cause inaccuracies in reports, user-facing displays, or downstream systems that rely on consistent data formats.

Databases store numeric values in floating-point formats, which can result in minor, unexpected inaccuracies due to how computers handle decimal arithmetic. Developers often assume that raw numeric values are “close enough” or that rounding will be handled elsewhere, such as in the application layer. Moreover, in multi-step calculations, intermediate results often accumulate precision errors if not rounded at appropriate points.

The ROUND function rounds numbers to a specified number of decimal places.

ROUND(number, decimal_places)

Ignoring the Importance of Indexing

One of the most common mistakes in SQL is failing to understand and leverage indexing, leading to slow query performance and inefficient database operations. Indexes are a crucial feature in relational databases, used to speed up data retrieval, yet they are often overlooked or misused.

This is primarily a rookie mistake where beginners don’t understand what indexes are and how they work. It’s usually resolved with more practice and precision. However, it could also be that during development, the primary focus is on functionality rather than performance, leading to overlooked optimization opportunities like indexing.

Using Problematic Column Names

Choosing column names that are reserved words, contain spaces, or lack clarity can lead to complications in writing and maintaining SQL queries. These names often require additional syntax to function correctly and can introduce errors or inconsistencies in your database management.

Developers may prioritize naming columns in a way that seems intuitive or readable without considering SQL syntax rules. Also, including spaces in column names, such as First Name, seems user-friendly but leads to query complexity.

Overusing SELECT

Developers often default to SELECT * when querying a database, which retrieves all columns in a table regardless of whether they are needed. While convenient during initial development or debugging, this habit can lead to inefficient queries, performance issues, and increased maintenance complexity in production environments.

As SELECT * is faster to write than specifying individual columns, especially in exploratory queries, developers may not realize the performance and resource implications of fetching unnecessary data. They may also use SELECT * to avoid modifying queries when table schemas change, mistakenly believing this is a flexible approach.

For example:

Suppose you want to retrieve a customer’s name and email from the customers table.

Query:

SELECT *
FROM customers
WHERE customer_id = 123;

This query retrieves all columns, including unnecessary data like address, phone number, and sensitive details.

Misunderstanding Three-Valued Logic

Expounding on the IS NULL issue, SQL uses three-valued logic (3VL) to handle comparisons involving NULL values. Many developers mistakenly treat NULL as if it were a regular value, leading to incorrect query results or unintended behavior. The three possible logical values are TRUE, FALSE, and UNKNOWN, and failing to account for UNKNOWN can cause logical errors in filtering, conditions, and joins.

In SQL, a NULL represents the absence of a value, and any operation involving NULL results in UNKNOWN. For example:

  • TRUE AND UNKNOWNUNKNOWN
  • FALSE OR UNKNOWNUNKNOWN
  • NOT UNKNOWNUNKNOWN

This logic often leads to confusion because NULL does not behave like other values.

Fixing Cross Joins with DISTINCT

A common SQL mistake involves using DISTINCT to fix the incorrect results caused by a CROSS JOIN. Cross joins generate a Cartesian product of two tables, meaning that every row from one table is combined with every row from another table, which can lead to an unnecessarily large result set. Developers often mistakenly use DISTINCT to remove duplicates caused by a **CROSS** **JOIN** rather than addressing the root cause of the issue, which is usually an incorrect join type.

For example, if table A has 5 rows and table B has 10 rows, a CROSS JOIN will return 50 rows.

Example:

SELECT *
FROM employees
CROSS JOIN departments;

If employees has 100 records and departments has 5 records, the result will be 500 rows (100 * 5).

The incorrect approach:

You want to combine employees and departments, but you’ve accidentally used a CROSS JOIN when you meant to filter employees by department. Instead of fixing the join type, you use DISTINCT to remove duplicate records caused by the Cartesian product.

Incorrect Query:

SELECT DISTINCT e.name, d.department_name
FROM employees e
CROSS JOIN departments d;

This query will return every employee for each department, but you probably only wanted to associate each employee with their actual department, not generate every possible combination of employees and departments. The use of DISTINCT does not solve the root issue—it only hides the unnecessary combinations.

Correct Approach:

To fix the issue, you need to use the appropriate type of join, typically an INNER JOIN, LEFT JOIN, or RIGHT JOIN, depending on the desired relationship between the tables. These joins combine data based on matching criteria rather than generating all possible combinations.

The Bottom Line

Mastering SQL for interviews requires not only technical knowledge but also attention to detail. Many candidates make common mistakes that stem from a lack of industry experience rather than skill gaps. By understanding and avoiding issues like incorrect join types, misusing NULL, or over-relying on DISTINCT, you can write more efficient, accurate queries.