Apply and Map Functions
Learn to run custom functions on DataFrame columns and rows
Apply and Map Functions
Why Use Apply and Map?
Sometimes you need to do things that aren't built into pandas.
Examples:
- Convert temperature from Celsius to Fahrenheit
- Clean messy text in a specific way
- Calculate custom scores
Apply and map let you run any function on your data.
Map - Transform Single Column
Map replaces values one by one:
import pandas as pd
df = pd.DataFrame({
'Name': ['John', 'Sarah', 'Mike'],
'Grade': ['A', 'B', 'A']
})
# Create a mapping dictionary
grade_points = {'A': 4.0, 'B': 3.0, 'C': 2.0}
# Map grades to points
df['Points'] = df['Grade'].map(grade_points)
print(df)Output:
Name Grade Points
0 John A 4.0
1 Sarah B 3.0
2 Mike A 4.0
Map with a Function
df = pd.DataFrame({
'Name': ['john', 'sarah', 'mike'],
'Age': [25, 30, 28]
})
# Capitalize names using a function
df['Name'] = df['Name'].map(str.upper)
print(df)Output:
Name Age
0 JOHN 25
1 SARAH 30
2 MIKE 28
Apply - Run Function on Column
Apply is more flexible than map:
df = pd.DataFrame({
'Name': ['John', 'Sarah', 'Mike'],
'Salary': [50000, 60000, 55000]
})
# Custom function
def add_bonus(salary):
return salary * 1.10 # 10% bonus
# Apply to column
df['With_Bonus'] = df['Salary'].apply(add_bonus)
print(df)Output:
Name Salary With_Bonus
0 John 50000 55000.0
1 Sarah 60000 66000.0
2 Mike 55000 60500.0
Apply with Lambda
Lambda is a short way to write simple functions:
# Same as above, but shorter
df['With_Bonus'] = df['Salary'].apply(lambda x: x * 1.10)More examples:
# Double the value
df['Double'] = df['Salary'].apply(lambda x: x * 2)
# Check if above average
avg = df['Salary'].mean()
df['Above_Avg'] = df['Salary'].apply(lambda x: 'Yes' if x > avg else 'No')
print(df)Apply to Entire DataFrame
Apply function to each column:
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
# Get max of each column
print(df.apply(max))Output:
A 3
B 6
dtype: int64
Apply function to each row:
# Sum of each row
df['Row_Sum'] = df.apply(sum, axis=1)
print(df)Output:
A B Row_Sum
0 1 4 5
1 2 5 7
2 3 6 9
Apply with Multiple Columns
Access multiple columns in one function:
df = pd.DataFrame({
'Price': [100, 200, 150],
'Quantity': [5, 3, 4]
})
# Calculate total using both columns
df['Total'] = df.apply(lambda row: row['Price'] * row['Quantity'], axis=1)
print(df)Output:
Price Quantity Total
0 100 5 500
1 200 3 600
2 150 4 600
Applymap - Apply to Every Cell
Apply function to every single cell:
df = pd.DataFrame({
'A': [1.234, 2.567],
'B': [3.891, 4.123]
})
# Round every number
df_rounded = df.applymap(lambda x: round(x, 1))
print(df_rounded)Output:
A B
0 1.2 3.9
1 2.6 4.1
Note: In newer pandas, use df.map() instead of applymap().
Practice Example
import pandas as pd
# Employee data
employees = pd.DataFrame({
'Name': ['john doe', 'sarah smith', 'mike johnson'],
'Department': ['sales', 'it', 'sales'],
'Salary': [50000, 70000, 55000],
'Years': [3, 5, 2]
})
print("Original data:")
print(employees)
# 1. Capitalize names
employees['Name'] = employees['Name'].apply(str.title)
# 2. Map departments to codes
dept_codes = {'sales': 'S', 'it': 'I', 'hr': 'H'}
employees['Dept_Code'] = employees['Department'].map(dept_codes)
# 3. Calculate raise based on years
def calculate_raise(years):
if years >= 5:
return 0.15 # 15% raise
elif years >= 3:
return 0.10 # 10% raise
else:
return 0.05 # 5% raise
employees['Raise_Pct'] = employees['Years'].apply(calculate_raise)
# 4. Calculate new salary
employees['New_Salary'] = employees.apply(
lambda row: row['Salary'] * (1 + row['Raise_Pct']),
axis=1
)
print("\nProcessed data:")
print(employees)Map vs Apply vs Applymap
| Method | Works On | Best For |
|---|---|---|
| map | Series (column) | Simple replacements |
| apply | Series or DataFrame | Custom functions |
| applymap | DataFrame | Every cell |
Key Points
- map(): Replace values using dictionary or function
- apply(): Run any function on column or row
- lambda: Short way to write simple functions
- axis=1: Apply function to each row
- axis=0: Apply function to each column (default)
Common Mistakes
Mistake 1: Forgetting axis for row operations
# This applies to columns (default)
df.apply(sum)
# For rows, add axis=1
df.apply(sum, axis=1)Mistake 2: Map with missing values in dictionary
mapping = {'A': 1, 'B': 2}
df['Grade'].map(mapping) # 'C' becomes NaN
# Handle missing
df['Grade'].map(mapping).fillna(0)Mistake 3: Using apply when vectorized operation exists
# Slow
df['Double'] = df['Value'].apply(lambda x: x * 2)
# Fast
df['Double'] = df['Value'] * 2What's Next?
You learned apply and map. Next, you'll learn Window Functions - calculating moving averages and rankings.