Interview Query
Complete Pandas Cheat Sheet for 2025

Complete Pandas Cheat Sheet for 2025

Overview

pandas is a versatile and powerful library for data manipulation in Python. It’s an open-source tool that significantly enhances your ability to work with structured data. pandas is useful for a wide range of professionals, including data scientists, financial analysts, and anyone who needs to organize and analyze data efficiently.

This cheat sheet provides a detailed overview of essential pandas operations and functions.

How to Import pandas

To start using pandas, you need to import it:

import pandas as pd

How to Create DataFrames

DataFrames are the primary data structure in pandas. Here are different ways to create them:

From a Dictionary

# Creating a simple employee database
employee_data = {
    'Name': ['Alice Wonder', 'Bob Builder', 'Charlie Chaplin', 'Diana Prince'],
    'Department': ['IT', 'HR', 'Marketing', 'Finance'],
    'Salary': [75000, 65000, 78000, 82000],
    'Years of Experience': [3, 5, 2, 7]
}

df_employees = pd.DataFrame(employee_data)
print(df_employees)

This creates a neat table of employee information. You can almost hear HR sighing with relief!

From a List of Dictionaries

# Creating a product inventory
products = [
    {'name': 'Laptop', 'price': 1200, 'stock': 50},
    {'name': 'Mouse', 'price': 25, 'stock': 100},
    {'name': 'Keyboard', 'price': 50, 'stock': 75},
    {'name': 'Monitor', 'price': 200, 'stock': 30}
]

df_inventory = pd.DataFrame(products)
print(df_inventory)

Perfect for when you have a list of similar items, like products in an inventory.

From a CSV File

df = pd.read_csv('filename.csv')

This is how you’d typically load real-world data. It’s like opening a treasure chest of information!

Basic DataFrame Operations

Here are some basic operations you can perform on DataFrames:

How to View the Data

print(df_employees.head(3))  *# First 3 rows*
print(df_employees.tail(2))  *# Last 2 rows*
print(df_employees.info())   *# DataFrame info*
print(df_employees.describe())  *# Summary statistics*

These commands give you a quick overview of your data.

How to Select Columns

*# Single column*
salaries = df_employees['Salary']
print(salaries)

*# Multiple columns*
name_and_dept = df_employees[['Name', 'Department']]
print(name_and_dept)

This is how you slice and dice your data. Want just the names? The salaries? You got it!

How to Add/Remove Columns

*# Adding a new column*
df_employees['Bonus'] = df_employees['Salary'] * 0.1
print(df_employees)

*# Removing a column*
df_employees_no_exp = df_employees.drop('Years of Experience', axis=1)
print(df_employees_no_exp)

Columns come and go, but the DataFrame remains. It’s data flexibility at its finest!

How to Filter Data

Here’s where pandas really shines—letting you pick the exact data morsels you want:

Using Boolean Indexing

*# Employees with salary > 70000*
high_earners = df_employees[df_employees['Salary'] > 70000]
print(high_earners)

*# Employees in IT department with more than 2 years experience*
experienced_it = df_employees[(df_employees['Department'] == 'IT') & (df_employees['Years of Experience'] > 2)]
print(experienced_it)

This allows you to select data based on specific conditions.

Using loc and iloc

*# loc: label-based selection*
print(df_employees.loc[1, 'Name'])  *# Get name of the second employee# iloc: integer position-based selection*
print(df_employees.iloc[0, 2])  *# Get salary of the first employee*

loc and iloc are precise tools for data selection.

How to Handle Missing Data

In the real world, data often comes with holes. Pandas helps you deal with them:

*# Let's introduce some missing data*
df_employees.loc[1, 'Salary'] = np.nan
df_employees.loc[3, 'Department'] = np.nan

*# Check for missing values*
print(df_employees.isnull().sum())

*# Drop rows with missing values*
df_clean = df_employees.dropna()
print(df_clean)

*# Fill missing values*
df_filled = df_employees.fillna({'Salary': df_employees['Salary'].mean(), 'Department': 'Unknown'})
print(df_filled)

These tools help you manage and clean datasets with missing values.

How to Transform Data

pandas offers various data transformation techniques:

How to Use Sorting

*# Sort employees by salary, descending*
df_sorted = df_employees.sort_values('Salary', ascending=False)
print(df_sorted)

This allows you to order your data based on specific columns.

How to use Grouping and Aggregation

*# Average salary by department*
avg_salary = df_employees.groupby('Department')['Salary'].mean()
print(avg_salary)

*# Multiple aggregations*
dept_stats = df_employees.groupby('Department').agg({
    'Salary': ['mean', 'max'],
    'Years of Experience': 'mean'
})
print(dept_stats)

Grouping allows you to perform calculations on subsets of your data.

How to Merge DataFrames

*# Let's create another DataFrame with department locations*
dept_locations = pd.DataFrame({
    'Department': ['IT', 'HR', 'Marketing', 'Finance'],
    'Location': ['Floor 1', 'Floor 2', 'Floor 3', 'Floor 2']
})

*# Merge with employee data*
df_merged = pd.merge(df_employees, dept_locations, on='Department')
print(df_merged)

Merging allows you to combine data from different DataFrames.

Advanced Techniques

Here are some advanced pandas techniques:

How to Apply Custom Functions

*# Define a function to categorize salaries*
def salary_category(salary):
    if salary < 70000:
        return 'Low'
    elif 70000 <= salary < 80000:
        return 'Medium'
    else:
        return 'High'

*# Apply the function to create a new column*
df_employees['Salary Category'] = df_employees['Salary'].apply(salary_category)
print(df_employees)

This demonstrates how to apply custom functions to your data.

How to use Pivot Tables

*# Create a pivot table of average salary by department and salary category*
pivot_table = pd.pivot_table(df_employees, values='Salary', index='Department', 
                             columns='Salary Category', aggfunc='mean')
print(pivot_table)

Pivot tables are useful for summarizing and analyzing data.

How to perform Time-Series Operations

*# Create a DataFrame with date index*
date_range = pd.date_range(start='2025-01-01', end='2025-12-31', freq='D')
time_series = pd.DataFrame({'Value': np.random.randn(len(date_range))}, index=date_range)

*# Resample to monthly frequency*
monthly_avg = time_series.resample('M').mean()
print(monthly_avg)

This shows how to work with time-series data in pandas.

How to Export Data

After processing your data, you might want to save it:

*# Export to CSV*
df_employees.to_csv('processed_employees.csv', index=False)

*# Export to Excel*
df_employees.to_excel('employee_report.xlsx', sheet_name='Employee Data')

These commands allow you to save your data in different formats.

The Bottom Line

In conclusion, pandas is a comprehensive toolkit for data manipulation and analysis in Python. From importing data to performing complex analyses, it offers a wide range of functionalities. The key to mastering pandas is practice, so it’s recommended that you use it regularly with various datasets.