5 min read min read
Filling Missing Data
Learn to replace missing values with useful data
Filling Missing Data
Why Fill Instead of Drop?
Dropping removes entire rows. Filling keeps your data and replaces empty cells with something useful.
Fill with a Fixed Value
code.py
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['John', 'Sarah', None],
'Age': [25, None, 30],
'Score': [85, 90, None]
})
# Fill all missing with 0
filled = df.fillna(0)
print(filled)Output:
Name Age Score
0 John 25.0 85.0
1 Sarah 0.0 90.0
2 0 30.0 0.0
Fill Different Values for Different Columns
code.py
filled = df.fillna({
'Name': 'Unknown',
'Age': 0,
'Score': 50
})
print(filled)Output:
Name Age Score
0 John 25.0 85.0
1 Sarah 0.0 90.0
2 Unknown 30.0 50.0
Fill with Average (Mean)
Best for numbers. Keeps the overall average same.
code.py
df = pd.DataFrame({
'Score': [80, 90, None, 85, None]
})
# Fill with average
avg = df['Score'].mean()
df['Score'] = df['Score'].fillna(avg)
print(df)Output:
Score
0 80.0
1 90.0
2 85.0 <- was missing, now average
3 85.0
4 85.0 <- was missing, now average
Fill with Middle Value (Median)
Better when data has extreme values.
code.py
df['Score'] = df['Score'].fillna(df['Score'].median())Fill with Most Common Value (Mode)
Best for categories like "Male/Female" or "Yes/No".
code.py
df = pd.DataFrame({
'City': ['NYC', 'LA', None, 'NYC', 'NYC']
})
# Most common city
most_common = df['City'].mode()[0]
df['City'] = df['City'].fillna(most_common)
print(df)Output:
City
0 NYC
1 LA
2 NYC <- was missing, now most common
3 NYC
4 NYC
Fill with Previous Value (Forward Fill)
Good for time data. Uses the value before the empty cell.
code.py
df = pd.DataFrame({
'Day': [1, 2, 3, 4, 5],
'Temp': [20, None, None, 25, 26]
})
df['Temp'] = df['Temp'].ffill()
print(df)Output:
Day Temp
0 1 20.0
1 2 20.0 <- copied from row 0
2 3 20.0 <- copied from row 1
3 4 25.0
4 5 26.0
Fill with Next Value (Backward Fill)
Uses the value after the empty cell.
code.py
df['Temp'] = df['Temp'].bfill()Which Method to Use?
| Data Type | Best Method |
|---|---|
| Numbers (normal) | Mean |
| Numbers (has outliers) | Median |
| Categories | Mode |
| Time series | Forward/Backward fill |
| Unknown | Fixed value like 0 or "Unknown" |
Key Points
- fillna(value) replaces all missing with one value
- fillna({'col': value}) different values per column
- fillna(df['col'].mean()) fills with average
- ffill() fills with previous value
- bfill() fills with next value
What's Next?
For time data, there's a smarter way to fill: interpolation. Learn it next!