In machine learning interviews, Python questions come up very often, along with algorithm questions and general machine learning and modeling questions.
Typically, Python machine learning questions test your machine learning algorithmic coding ability and ask you to perform tasks like using pandas, writing machine learning algorithms from scratch, or answering basic Python questions related to machine learning.
These questions assess your fundamental knowledge of Python’s use in machine learning, as well as practical Python coding skills:
Basic Python Machine Learning Questions - Basic questions test your knowledge of fundamental concepts in machine learning, as well as Python’s most basic uses in model building.
Python Pandas Machine Learning Questions - Pandas is a data analysis library in Python. In machine learning, pandas is commonly used for manipulating data, cleaning, and data preparation.
Machine Learning Algorithms From Scratch - These questions ask you to write classic algorithmics from scratch, typically without the use of Python packages.
Basic questions in machine learning Python interviews tend to be definition-based and are asked to quickly gauge the depth of your ML and algorithmic coding knowledge.
Some of the most common topics include data structures, general machine-learning algorithm questions, comparisons of algorithms, and how Python techniques are used in machine learning.
Pre-processing techniques are used to prepare data in Python, and there are many different techniques you can use. Some common ones you might talk about include:
Brute force algorithms try all possibilities to find a solution. For example, if you were trying to solve a 3-digit pin code, brute force would require you to test all possible combinations from 000 to 999.
One common brute force algorithm is linear search, which traverses an array to check for a match. One disadvantage of brute force algorithms is that they can be inefficient and it’s usually more difficult to improve the performance of the algorithm within the framework.
An imbalanced dataset has skewed class proportions in a classification problem. Some of the ways to handle this include:
There are two common strategies. Omission and Imputation. Omission refers to removing rows or columns with missing values, while imputation refers to adding values to fill in missing observations.
There are some helpful modules in Scikit-learn that you can use for imputation. One is SimpleImputer which fills missing values with a zero, or the median, mean, or mode, while IterativeImputer models each feature with missing values as a function of other features.
Regression is a supervised machine learning technique, and it’s primarily used to find correlations between variables, as well as make predictions for the dependent variable. Regression algorithms are generally used for predictions, building forecasts, time-series models, or identifying causation.
Most of these algorithms, like linear regression or logistic regression, can be implemented with Scikit-learn in Python.
In Python, you can do this with the Scikit-learn module, using the train_test_split function. This is used to split arrays or matrices into random training and testing datasets.
Generally, about 75% of the data will go to the training dataset; however you will likely test different iterations.
Here’s a code example:
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.4)
Some of the most common you could mention include:
The two most commonly used methods are grid search and random search. Grid search is the process of defining a search space grid, and after you’ve selected hyperparameter values, grid search searches for the optimal combination.
Random search uses a wide range of hyperparam values and randomly iterates combinations. With random search, you specify the number of iterations (which you do not do in grid search).
In machine learning interviews, Python pandas questions are the most common coding challenge. These questions test your ability to manipulate and prepare data for use in machine learning models and cover techniques like normalization, imputation, and working with dataframes.
This type of Python question has a correct answer, and you will be required to write code in a shared editor or on a whiteboard. Here are some Python Pandas machine learning interview questions:
More context. You’re given a dataframe df_rain
containing rainfall data. The dataframe has two columns: day of the week and rainfall in inches.
With this question, there are two key steps:
This question requires you to use two built-in pandas methods:
dataframe.column.median()
This method returns the median of a column in a dataframe.
dataframe.column.fillna(`value`)
This method applies value to all nan values in a given column.
This easy Python question deals with pre-preprocessing. In it, you’re provided with a sorted list of positive integers with some entries being None.
Here’s the solution code for this problem:
def fill_none(input_list):
prev_value = 0
result = []
for value in values:
if value is None:
result.append(prev_value)
else:
result.append(value)
prev_value = value
return result
This question requires us to filter a data frame by two conditions: first, the grade of the student, and second, their favorite color.
Start by filtering by grade since it’s a bit simpler than filtering by strings. We can filter columns in pandas by setting our data frame equal to itself with the filter in place.
In this case:
df_students = df_students[df_students["grade"] > 90]
This Python question has been asked in Facebook machine-learning interviews.
More context. You are given a dataframe with a single column, var
. You do not have to calculate the p-value of the test or run the test.
Problems that ask you to write an algorithm from scratch are increasingly common in machine learning and computer vision interviews. The algorithms you are asked to write are like what you’d see on sci-kit-learn.
In general, this type of question tests your familiarity with an algorithm, as well as your ability to code a bug-free version as efficiently as possible. Most importantly, they test your knowledge of ML concepts by asking you to build the algorithms from scratch. So no more writing: rfr = RandomForest(x,y)
But although this can seem intimidating, remember that 1) interviewers don’t want the most optimized version of an algorithm. Instead, interviewers want the most “vanilla” version of an algorithm that shows you understand the basics.
And 2) you don’t have to study every algorithm. Only a few fit the format of an hour-long on-site interview, as many are too complicated to break down in such a short timeframe.
These are the algorithms you should study for machine learning Python interviews:
You can practice with these sample machine learning algorithm from scratch interview questions:
Example Output:
def kNN(k,data,new_point) -> 2
The model should have these conditions:
new_point
with a length equal to the number of fields in the df.new_point
are 0 or 1, i.e., all fields are dummy variables, and there are only two classes.new_point
for that column.The model should have these conditions:
The model should have these conditions:
Example:
After clustering the points with two clusters, the points will be clustered as follows.
Note: There could be an infinite number of separating lines in this example.
The goal of this course is to provide you with a comprehensive understanding of Machine Learning Algorithms: