Conditional Filtering
Learn advanced filtering with np.where, apply, and custom functions
Conditional Filtering
np.where() for Conditional Values
Create new column based on condition.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['John', 'Sarah', 'Mike', 'Emma'],
'Score': [85, 92, 78, 95]
})
df['Grade'] = np.where(df['Score'] >= 90, 'A', 'B')
print(df)Output:
Name Score Grade
0 John 85 B
1 Sarah 92 A
2 Mike 78 B
3 Emma 95 A
How it works:
- If Score >= 90, set 'A'
- Otherwise, set 'B'
Multiple Conditions with np.where
Nest conditions for multiple cases.
df['Grade'] = np.where(df['Score'] >= 90, 'A',
np.where(df['Score'] >= 80, 'B',
np.where(df['Score'] >= 70, 'C', 'F')))
print(df)What this creates:
- 90+: A
- 80-89: B
- 70-79: C
- Below 70: F
np.select() for Better Readability
Better for many conditions.
conditions = [
df['Score'] >= 90,
df['Score'] >= 80,
df['Score'] >= 70,
df['Score'] >= 60
]
choices = ['A', 'B', 'C', 'D']
df['Grade'] = np.select(conditions, choices, default='F')
print(df)Much cleaner than nested np.where!
apply() with Custom Function
Apply function to each row.
def categorize_score(row):
score = row['Score']
if score >= 90:
return 'Excellent'
elif score >= 75:
return 'Good'
elif score >= 60:
return 'Pass'
else:
return 'Fail'
df['Category'] = df.apply(categorize_score, axis=1)
print(df)axis=1 means apply to each row.
Lambda Functions
Short inline functions.
df['Passed'] = df['Score'].apply(lambda x: 'Yes' if x >= 60 else 'No')
print(df)Same as:
def check_pass(score):
return 'Yes' if score >= 60 else 'No'
df['Passed'] = df['Score'].apply(check_pass)Multiple Column Conditions
df = pd.DataFrame({
'Name': ['John', 'Sarah', 'Mike', 'Emma'],
'Written': [85, 92, 78, 88],
'Practical': [90, 85, 70, 95]
})
def final_grade(row):
avg = (row['Written'] + row['Practical']) / 2
if avg >= 90:
return 'A'
elif avg >= 80:
return 'B'
elif avg >= 70:
return 'C'
else:
return 'F'
df['Final'] = df.apply(final_grade, axis=1)
print(df)Conditional Replacement
df = pd.DataFrame({
'Status': ['active', 'inactive', 'pending', 'active']
})
df['Status'] = df['Status'].replace({
'active': 1,
'inactive': 0,
'pending': -1
})
print(df)map() Method
Map values from dictionary.
df = pd.DataFrame({
'Size': ['S', 'M', 'L', 'XL']
})
size_map = {'S': 'Small', 'M': 'Medium', 'L': 'Large', 'XL': 'Extra Large'}
df['Size_Full'] = df['Size'].map(size_map)
print(df)Practice Example
The scenario: Categorize employees and calculate bonuses.
import pandas as pd
import numpy as np
employees = pd.DataFrame({
'Name': ['John', 'Sarah', 'Mike', 'Emma', 'David', 'Lisa'],
'Department': ['Sales', 'IT', 'Sales', 'HR', 'IT', 'Sales'],
'Salary': [50000, 85000, 55000, 65000, 90000, 52000],
'Years': [3, 8, 4, 6, 10, 2],
'Performance': [85, 92, 78, 88, 95, 80]
})
print("Original data:")
print(employees)
print()
print("1. Categorize by experience:")
conditions = [
employees['Years'] >= 8,
employees['Years'] >= 5,
employees['Years'] >= 2
]
choices = ['Senior', 'Mid-Level', 'Junior']
employees['Level'] = np.select(conditions, choices, default='Entry')
print(employees[['Name', 'Years', 'Level']])
print()
print("2. Performance rating:")
def get_rating(score):
if score >= 90:
return 'Outstanding'
elif score >= 85:
return 'Excellent'
elif score >= 75:
return 'Good'
else:
return 'Needs Improvement'
employees['Rating'] = employees['Performance'].apply(get_rating)
print(employees[['Name', 'Performance', 'Rating']])
print()
print("3. Calculate bonus:")
def calculate_bonus(row):
base = row['Salary'] * 0.05
if row['Performance'] >= 90:
return base * 2
elif row['Performance'] >= 85:
return base * 1.5
else:
return base
employees['Bonus'] = employees.apply(calculate_bonus, axis=1)
print(employees[['Name', 'Salary', 'Performance', 'Bonus']])
print()
print("4. Promotion eligible:")
employees['Eligible'] = np.where(
(employees['Years'] >= 3) & (employees['Performance'] >= 85),
'Yes',
'No'
)
print(employees[['Name', 'Years', 'Performance', 'Eligible']])
print()
print("5. Department code:")
dept_map = {'Sales': 'SLS', 'IT': 'ITC', 'HR': 'HRD'}
employees['Dept_Code'] = employees['Department'].map(dept_map)
print(employees[['Name', 'Department', 'Dept_Code']])
print()
print("Summary:")
print("Total bonus pool:", employees['Bonus'].sum())
print("Promotion eligible:", (employees['Eligible'] == 'Yes').sum())
print("Outstanding performers:", (employees['Rating'] == 'Outstanding').sum())Chaining Conditions
df['Category'] = (
df['Score']
.apply(lambda x: 'High' if x >= 80 else 'Medium' if x >= 60 else 'Low')
)Conditional with Missing Values
df['Status'] = np.where(
df['Score'].isnull(),
'No Data',
np.where(df['Score'] >= 60, 'Pass', 'Fail')
)Update Existing Values
df.loc[df['Score'] < 60, 'Score'] = 60
print("Scores adjusted to minimum 60")Conditional Aggregation
high_scores = df[df['Score'] >= 80]['Score'].mean()
low_scores = df[df['Score'] < 80]['Score'].mean()
print("Avg high scores:", high_scores)
print("Avg low scores:", low_scores)Complex Business Logic
def assign_territory(row):
if row['Department'] == 'Sales':
if row['Years'] >= 5:
return 'Regional'
else:
return 'Local'
else:
return 'N/A'
df['Territory'] = df.apply(assign_territory, axis=1)Filter Then Transform
high_performers = df[df['Performance'] >= 85].copy()
high_performers['Bonus_Multiplier'] = 2.0
low_performers = df[df['Performance'] < 85].copy()
low_performers['Bonus_Multiplier'] = 1.0
result = pd.concat([high_performers, low_performers])Vectorized Operations
Faster than apply for simple cases.
df['Bonus'] = df['Salary'] * 0.1 * (df['Performance'] / 100)Much faster than:
def calc_bonus(row):
return row['Salary'] * 0.1 * (row['Performance'] / 100)
df['Bonus'] = df.apply(calc_bonus, axis=1)Key Points to Remember
np.where(condition, value_if_true, value_if_false) for simple conditions.
np.select() better for multiple conditions. More readable than nested np.where.
apply() with axis=1 applies function to each row. Use for complex logic.
Lambda functions good for simple one-line conditions.
map() converts values using dictionary mapping.
Vectorized operations faster than apply when possible.
Common Mistakes
Mistake 1: Wrong axis
df.apply(func) # Applies to columns (axis=0)
df.apply(func, axis=1) # Applies to rows - what you usually wantMistake 2: Modifying filtered DataFrame
df[df['Score'] > 80]['Grade'] = 'A' # May not work!
df.loc[df['Score'] > 80, 'Grade'] = 'A' # CorrectMistake 3: Not using copy
subset = df[df['Score'] > 80]
subset['New'] = 1 # Warning!
subset = df[df['Score'] > 80].copy() # SafeMistake 4: Slow apply when vectorized possible
df['Double'] = df['Score'].apply(lambda x: x * 2) # Slow
df['Double'] = df['Score'] * 2 # FastMistake 5: Forgetting default in np.select
result = np.select(conditions, choices) # NaN for unmatched
result = np.select(conditions, choices, default='Other') # BetterWhat's Next?
You now know advanced conditional filtering. Next, you'll learn about sorting data - ordering DataFrames by values and custom criteria.