4 min read min read
Dropping Missing Data
Learn to remove rows or columns with missing values
Dropping Missing Data
When to Drop Missing Data?
Drop missing data when:
- Only a few rows have missing values
- The missing data is random
- You have enough data left after dropping
Don't drop when:
- Too many rows would be removed
- Missing data follows a pattern (important information!)
Drop Rows with Any Missing Value
code.py
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['John', 'Sarah', None, 'Mike'],
'Age': [25, None, 30, 28],
'City': ['NYC', 'LA', 'Chicago', None]
})
# Drop rows with ANY missing value
clean_df = df.dropna()
print(clean_df)Output:
Name Age City
0 John 25.0 NYC
Only 1 row left! (Others had at least one missing value)
Drop Rows Where ALL Values are Missing
code.py
df = pd.DataFrame({
'A': [1, None, None],
'B': [2, None, 5],
'C': [3, None, 6]
})
# Drop only if ALL columns are missing
clean_df = df.dropna(how='all')
print(clean_df)Output:
A B C
0 1.0 2.0 3
2 NaN 5.0 6
Row 1 was completely empty, so it's removed.
Drop Based on Specific Columns
code.py
df = pd.DataFrame({
'Name': ['John', 'Sarah', None],
'Age': [25, None, 30],
'Email': ['j@mail.com', None, 'x@mail.com']
})
# Only drop if Name is missing
clean_df = df.dropna(subset=['Name'])
print(clean_df)Output:
Name Age Email
0 John 25.0 j@mail.com
1 Sarah NaN None
Sarah stays even though Age is missing.
Drop Columns with Missing Values
code.py
# Drop columns (not rows)
clean_df = df.dropna(axis=1)
print(clean_df)axis=1 means columns, axis=0 means rows (default)
Keep Rows with Minimum Values
code.py
df = pd.DataFrame({
'A': [1, None, None, 4],
'B': [None, 2, None, 5],
'C': [3, 3, None, 6]
})
# Keep rows with at least 2 non-missing values
clean_df = df.dropna(thresh=2)
print(clean_df)Output:
A B C
0 1.0 NaN 3
1 NaN 2.0 3
3 4.0 5.0 6
Row 2 had only 0 values, so it's dropped.
Important: dropna() Returns New DataFrame
code.py
# This does NOT change original df
df.dropna()
# To change original, reassign or use inplace
df = df.dropna()
# OR
df.dropna(inplace=True)Key Points
- dropna() removes rows with missing values
- how='all' removes only if ALL values missing
- subset=['col'] checks only specific columns
- axis=1 drops columns instead of rows
- thresh=n keeps rows with at least n values
- Original data unchanged unless you reassign
What's Next?
Sometimes dropping data removes too much. Next, learn to fill missing values instead.