Top 27 Data Science Coding Interview Questions (Updated for 2025)

Machine Learning

Medium

Very High

Python

Algorithms

Easy

Very High

Machine Learning

Hard

Very High

Loading pricing options

Python Data Science Coding Interview Questions

A fundamental requirement to succeed as a data scientist involves demonstrating a problem-solving approach and applying algorithm and coding skills to resolve real-world analytical challenges.

To ascertain your coding abilities, data science interviewers typically use Python interview questions. Here are some of them:

6. Given two sorted lists, write a function to merge them into one sorted list.

Bonus: What’s the time complexity?

Example:

Input:

list1 = [1,2,5]
list2 = [2,4,6]

Output:

def merge_list(list1,list2) -> [1,2,2,4,5,6]

7. You are given a singly linked list; write a function to find and return the last node of the list. If the list is empty, return null.

8. You are given two rectangles `a` and `b`, each defined by four ordered pairs denoting their corners on the `x`, `y` plane. Write a function `rectangle_overlap` to determine whether or not they overlap. Return `True` if so, and `False` otherwise.

Note: If the two rectangles border one another or share a corner like two diagonally adjacent positions on a chessboard, they are said to overlap.

Note: The lists of ordered pairs are in no particular order. The first entry in list a could be the top left corner, while the first in list b is the bottom right.

Example:

Input:

a = [(-3,5), (-3,2),(0,5),(0,2)]
b = [(-1,4), (3,4), (3,1), (-1,1)]

Output:

def rectangle_overlap(a, b) -> True

Point (0,2) is fully contained in rectangle b, and point (-1,4) is fully contained in rectangle a.

9. You have an array of integers, `nums` of length `n` spanning `0` to `n` with one missing. Write a function `missing_number` that returns the missing number in the array.

Note: The complexity of O(n) is required.

Example:

Input:

nums = [0,1,2,4,5]
missing_number(nums) -> 3

10. The probability that it will rain tomorrow is dependent on whether or not it is raining today and whether or not it rained yesterday. Given that it is raining today and that it rained yesterday, write a function `rain_days` to calculate the probability that it will rain on the nth day after today.

Given that it is raining today and rained yesterday, write a function rain_days to calculate the probability that it will rain on the nth day after today.

Example:

Input:

n=5

Output:

def rain_days(n) -> 0.39968

Question

Topics

Difficulty

Ask Chance

Machine Learning

Hard

Very High

Python

Algorithms

Easy

Very High

Machine Learning

Medium

Very High

Loading pricing options

Data Structures Interview Questions

In addition to algorithms and coding, data structure fundamentals—especially trees, lists, and maps—also contribute to successful data science projects. We have a plethora of data structure interview questions in our database, some of which are:

11. Given two strings `A` and `B`, write a function `can_shift` to return whether or not `A` can be shifted some number of places to get `B`.

Example:

Input:

A = 'abcde'
B = 'cdeab'
can_shift(A, B) == True

A = 'abc'
B = 'acb'
can_shift(A, B) == False

12. Build a random forest model from scratch with the following conditions:

The model takes as input a dataframe data and an array new_point with a length equal to the number of fields in the data.
All values of both data and new_point are 0 or 1, i.e., all fields are dummy variables and there are only two classes.
Rather than randomly deciding what subspace of the data each tree in the forest will use, like usual, make your forest out of decision trees that go through every permutation of the value columns of the data frame. Split the data according to the value seen in new_point for that column.
Return the majority vote on the class of new_point.
You may use pandas and NumPy but NOT scikit-learn.

Bonus: The permutations in the itertools package can help you easily get all of any iterable object.

Example:

Input:

new_point = [0,1,0,1]
print(data)
...
    Var1  Var2  Var3  Var4  Target
0    1.0   1.0   1.0   0.0       1
1    0.0   0.0   0.0   0.0       0
2    1.0   0.0   1.0   0.0       0
3    0.0   1.0   1.0   1.0       1
4    1.0   0.0   1.0   0.0       0
..   ...   ...   ...   ...     ...
95   0.0   1.0   0.0   1.0       0
96   1.0   1.0   0.0   0.0       0
97   0.0   0.0   1.0   1.0       0
98   1.0   0.0   0.0   0.0       0
99   0.0   1.0   0.0   0.0       0

[100 rows x 5 columns]

Output:

def random_forest(new_point, data) -> 0

13. Write a function `find_intersecting` to find which lines, if any, intersect with any of the others in the given `x_range`.

Say you are given a list of tuples where the first element is the slope of a line and the second element is the y-intercept of a line.

Example:

Input:

tuple_list = [(2, 3), (-3, 5), (4, 6), (5, 7)]
x_range = (0, 1)

Output:

def find_intersecting(tuple_list, x_range) ->  [(2,3), (-3,5)]

14. Build a k-nearest neighbors classification model from scratch with the following conditions:

Use Euclidian distance (a.k.a., the “2 norm”) as your closeness metric.
Your function should be able to handle data frames of many arbitrary rows and columns.
If there is a tie in the class of the k-nearest neighbors, rerun the search using k-1 neighbors instead.
You may use pandas and NumPy but NOT scikit-learn.

Example:

Input:

k = 5
new_point = [0.5,-2,8]
print(data)
...
        Var1      Var2      Var3  Target
0  -3.279536  3.362223  2.847892       2
1  -0.791565  1.742475  2.151587       2
2  -0.785992 -0.938681 -0.459770       0
3  -1.068190  1.461051  0.127130       3
4  -0.367568 -0.870240 -0.225734       0
..       ...       ...       ...     ...
95 -1.327175  1.971085 -0.690689       2
96 -3.203714  1.847649  0.778901       2
97 -0.587640  0.647458  2.094385       2
98  0.363644 -0.509795  2.514191       1
99 -0.673498  2.955285  2.102122       4

[100 rows x 4 columns]

Output:

def kNN(k, new_point, data) -> 2

15. Given a dictionary with keys of letters and values of a list of letters, write a function `closest_key` to find the key with the input value closest to the beginning of the list.

Example:

Input:

dictionary = {
    'a' : ['b','c','e'],
    'm' : ['c','e'],
}
input = 'c'

Output:

closest_key(dictionary, input) -> 'm'

c is at a distance of 1 from a and 0 from m. Hence, the closest key for c is m.

Question

Topics

Difficulty

Ask Chance

Machine Learning

Medium

Very High

Good Grades and Favorite Colors

Machine Learning

Hard

Very High

Pandas

Easy

Very High

Loading pricing options

NumPy Data Science Coding Interview Questions

NumPy is a fundamental Python library for scientific computing that provides high-performance multidimensional array objects and tools for working with these arrays. It is an upgrade to Python’s built-in lists for mathematical calculations on large datasets. We have an extensive list of NumPy Interview Questions, some of which are discussed here:

16. How can you initialize a three-dimensional array in NumPy? Give an example.

17. How can we reverse a NumPy array? Give an example.

18. How can we reshape a NumPy array? Give an example.

19. Given a list of integers, write a function `gcd` to find the greatest common denominator between them.

Example:

Input:

int_list = [8, 16, 24]

Output:

def gcd(int_list) -> 8

20. Given a NumPy array of integers and an integer called num, remove all elements with an instance lower than num. As much as possible, reduce the dependence on Python loops and utilize NumPy functions.

Question

Topics

Difficulty

Ask Chance

Machine Learning

Medium

Very High

Good Grades and Favorite Colors

Machine Learning

Hard

Very High

Pandas

Easy

Very High

Loading pricing options

Machine Learning Data Science Coding Interview Questions

Machine learning aids data scientists when they need to gather information faster and assists with trend analysis. While your involvement in building or “coding” ML models will be determined by the company and the type of role you hold, data scientists are generally not expected to approach machine learning interview questions from a strict development standpoint. However, you may be expected to answer algorithm coding questions, such as:

21. Write a function, `search_list` that returns a Boolean indicating if the `target` value is in the `linked_list` or not.

You receive the head of the linked list, which is a dictionary with the following keys: value (contains the value of the node) and next (contains the next node in the list, or None).

If the linked list is empty, you’ll receive None since there is no head node for an empty list.

Example:

Input:

target = 2
linked_list = 3 -> 2 -> 5 -> 6 -> 8 -> None

Output:

search_list(target, linked_list) -> True

22. Given two strings `A` and `B`, write a function `can_shift` to return whether or not `A` can be shifted some number of places to get `B`.

Example:

Input:

A = 'abcde'
B = 'cdeab'
can_shift(A, B) == True

A = 'abc'
B = 'acb'
can_shift(A, B) == False

23. You’re given two words, `begin_word` and `end_word` which are elements of `word_list`.

Write a function shortest_transformation to find the length of the shortest transformation sequence from begin_word to end_word through the elements of word_list.

Note: Only one letter can be changed at a time, and each transformed word in the list must exist inside of word_list.

Note: In all test cases, a path does exist between begin_word and end_word

Example:

Input:

Input:
begin_word = "same",
end_word = "cost",
word_list = ["same","came","case","cast","lost","last","cost"]

Output:

def shortest_transformation(begin_word, end_word, word_list) -> 5

Since the transformation sequence would be:

'same' -> 'came' -> 'case' -> 'cast' -> 'cost'

which is five elements long.

24. Given two strings, `string1` and `string2`, write a function `max_substring` to return the maximal substring shared by both strings.

Example:

Input:

string1 = 'mississippi'

string2 = 'mossyistheapple'

Output:

def maximal_substring(string1, string2) ->  'mssispp'

Note: If there are multiple max substrings with the same length, just return any one of them.

25. Given a sorted list of integers ints with no duplicates, write an efficient function nearest_entries that takes in integers N and k.

Additionally, it should do the following:

Find the element of the list closest to N.
Then, it returns that element along with the k-next and k-previous elements of the list.

26. You’ve been asked to generate a machine learning model that can map the legal first name of a person to likely nicknames they might have. How do you go about designing this model?

27. Design a machine learning model, which, given a set of health features, classifies whether the individual will undergo major health issues.

Question

Topics

Difficulty

Ask Chance

Machine Learning

Medium

Very High

Python

Algorithms

Easy

Very High

Machine Learning

Hard

Very High

Loading pricing options

Tips to Ace Data Science Coding Interview Questions

To excel in data science coding interviews, focus on a strong foundation in data structures and algorithms.

Practice coding regularly on our platform and utilize our AI Interviewer feature. Understand the trade-offs between different approaches and articulate your thought process clearly, especially in ML coding questions.

Emphasize code readability, efficiency, and test case considerations.

Additionally, delve deep into Python libraries like NumPy, pandas, and scikit-learn for efficient data manipulation and modeling. All the best!

Question

Topics

Difficulty

Ask Chance

Machine Learning

Medium

Very High

Good Grades and Favorite Colors

Python

Algorithms

Easy

Very High

Pandas

Easy

Very High

Loading pricing options