#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
6 min read min read

Conditional Filtering

Learn advanced filtering with np.where, apply, and custom functions

Conditional Filtering

np.where() for Conditional Values

Create new column based on condition.

code.py
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma'],
    'Score': [85, 92, 78, 95]
})

df['Grade'] = np.where(df['Score'] >= 90, 'A', 'B')
print(df)

Output:

Name Score Grade 0 John 85 B 1 Sarah 92 A 2 Mike 78 B 3 Emma 95 A

How it works:

  • If Score >= 90, set 'A'
  • Otherwise, set 'B'

Multiple Conditions with np.where

Nest conditions for multiple cases.

code.py
df['Grade'] = np.where(df['Score'] >= 90, 'A',
              np.where(df['Score'] >= 80, 'B',
              np.where(df['Score'] >= 70, 'C', 'F')))

print(df)

What this creates:

  • 90+: A
  • 80-89: B
  • 70-79: C
  • Below 70: F

np.select() for Better Readability

Better for many conditions.

code.py
conditions = [
    df['Score'] >= 90,
    df['Score'] >= 80,
    df['Score'] >= 70,
    df['Score'] >= 60
]

choices = ['A', 'B', 'C', 'D']

df['Grade'] = np.select(conditions, choices, default='F')
print(df)

Much cleaner than nested np.where!

apply() with Custom Function

Apply function to each row.

code.py
def categorize_score(row):
    score = row['Score']
    if score >= 90:
        return 'Excellent'
    elif score >= 75:
        return 'Good'
    elif score >= 60:
        return 'Pass'
    else:
        return 'Fail'

df['Category'] = df.apply(categorize_score, axis=1)
print(df)

axis=1 means apply to each row.

Lambda Functions

Short inline functions.

code.py
df['Passed'] = df['Score'].apply(lambda x: 'Yes' if x >= 60 else 'No')
print(df)

Same as:

code.py
def check_pass(score):
    return 'Yes' if score >= 60 else 'No'

df['Passed'] = df['Score'].apply(check_pass)

Multiple Column Conditions

code.py
df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma'],
    'Written': [85, 92, 78, 88],
    'Practical': [90, 85, 70, 95]
})

def final_grade(row):
    avg = (row['Written'] + row['Practical']) / 2
    if avg >= 90:
        return 'A'
    elif avg >= 80:
        return 'B'
    elif avg >= 70:
        return 'C'
    else:
        return 'F'

df['Final'] = df.apply(final_grade, axis=1)
print(df)

Conditional Replacement

code.py
df = pd.DataFrame({
    'Status': ['active', 'inactive', 'pending', 'active']
})

df['Status'] = df['Status'].replace({
    'active': 1,
    'inactive': 0,
    'pending': -1
})

print(df)

map() Method

Map values from dictionary.

code.py
df = pd.DataFrame({
    'Size': ['S', 'M', 'L', 'XL']
})

size_map = {'S': 'Small', 'M': 'Medium', 'L': 'Large', 'XL': 'Extra Large'}
df['Size_Full'] = df['Size'].map(size_map)
print(df)

Practice Example

The scenario: Categorize employees and calculate bonuses.

code.py
import pandas as pd
import numpy as np

employees = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma', 'David', 'Lisa'],
    'Department': ['Sales', 'IT', 'Sales', 'HR', 'IT', 'Sales'],
    'Salary': [50000, 85000, 55000, 65000, 90000, 52000],
    'Years': [3, 8, 4, 6, 10, 2],
    'Performance': [85, 92, 78, 88, 95, 80]
})

print("Original data:")
print(employees)
print()

print("1. Categorize by experience:")
conditions = [
    employees['Years'] >= 8,
    employees['Years'] >= 5,
    employees['Years'] >= 2
]
choices = ['Senior', 'Mid-Level', 'Junior']
employees['Level'] = np.select(conditions, choices, default='Entry')
print(employees[['Name', 'Years', 'Level']])
print()

print("2. Performance rating:")
def get_rating(score):
    if score >= 90:
        return 'Outstanding'
    elif score >= 85:
        return 'Excellent'
    elif score >= 75:
        return 'Good'
    else:
        return 'Needs Improvement'

employees['Rating'] = employees['Performance'].apply(get_rating)
print(employees[['Name', 'Performance', 'Rating']])
print()

print("3. Calculate bonus:")
def calculate_bonus(row):
    base = row['Salary'] * 0.05
    if row['Performance'] >= 90:
        return base * 2
    elif row['Performance'] >= 85:
        return base * 1.5
    else:
        return base

employees['Bonus'] = employees.apply(calculate_bonus, axis=1)
print(employees[['Name', 'Salary', 'Performance', 'Bonus']])
print()

print("4. Promotion eligible:")
employees['Eligible'] = np.where(
    (employees['Years'] >= 3) & (employees['Performance'] >= 85),
    'Yes',
    'No'
)
print(employees[['Name', 'Years', 'Performance', 'Eligible']])
print()

print("5. Department code:")
dept_map = {'Sales': 'SLS', 'IT': 'ITC', 'HR': 'HRD'}
employees['Dept_Code'] = employees['Department'].map(dept_map)
print(employees[['Name', 'Department', 'Dept_Code']])
print()

print("Summary:")
print("Total bonus pool:", employees['Bonus'].sum())
print("Promotion eligible:", (employees['Eligible'] == 'Yes').sum())
print("Outstanding performers:", (employees['Rating'] == 'Outstanding').sum())

Chaining Conditions

code.py
df['Category'] = (
    df['Score']
    .apply(lambda x: 'High' if x >= 80 else 'Medium' if x >= 60 else 'Low')
)

Conditional with Missing Values

code.py
df['Status'] = np.where(
    df['Score'].isnull(),
    'No Data',
    np.where(df['Score'] >= 60, 'Pass', 'Fail')
)

Update Existing Values

code.py
df.loc[df['Score'] < 60, 'Score'] = 60
print("Scores adjusted to minimum 60")

Conditional Aggregation

code.py
high_scores = df[df['Score'] >= 80]['Score'].mean()
low_scores = df[df['Score'] < 80]['Score'].mean()

print("Avg high scores:", high_scores)
print("Avg low scores:", low_scores)

Complex Business Logic

code.py
def assign_territory(row):
    if row['Department'] == 'Sales':
        if row['Years'] >= 5:
            return 'Regional'
        else:
            return 'Local'
    else:
        return 'N/A'

df['Territory'] = df.apply(assign_territory, axis=1)

Filter Then Transform

code.py
high_performers = df[df['Performance'] >= 85].copy()
high_performers['Bonus_Multiplier'] = 2.0

low_performers = df[df['Performance'] < 85].copy()
low_performers['Bonus_Multiplier'] = 1.0

result = pd.concat([high_performers, low_performers])

Vectorized Operations

Faster than apply for simple cases.

code.py
df['Bonus'] = df['Salary'] * 0.1 * (df['Performance'] / 100)

Much faster than:

code.py
def calc_bonus(row):
    return row['Salary'] * 0.1 * (row['Performance'] / 100)

df['Bonus'] = df.apply(calc_bonus, axis=1)

Key Points to Remember

np.where(condition, value_if_true, value_if_false) for simple conditions.

np.select() better for multiple conditions. More readable than nested np.where.

apply() with axis=1 applies function to each row. Use for complex logic.

Lambda functions good for simple one-line conditions.

map() converts values using dictionary mapping.

Vectorized operations faster than apply when possible.

Common Mistakes

Mistake 1: Wrong axis

code.py
df.apply(func)  # Applies to columns (axis=0)
df.apply(func, axis=1)  # Applies to rows - what you usually want

Mistake 2: Modifying filtered DataFrame

code.py
df[df['Score'] > 80]['Grade'] = 'A'  # May not work!
df.loc[df['Score'] > 80, 'Grade'] = 'A'  # Correct

Mistake 3: Not using copy

code.py
subset = df[df['Score'] > 80]
subset['New'] = 1  # Warning!

subset = df[df['Score'] > 80].copy()  # Safe

Mistake 4: Slow apply when vectorized possible

code.py
df['Double'] = df['Score'].apply(lambda x: x * 2)  # Slow
df['Double'] = df['Score'] * 2  # Fast

Mistake 5: Forgetting default in np.select

code.py
result = np.select(conditions, choices)  # NaN for unmatched
result = np.select(conditions, choices, default='Other')  # Better

What's Next?

You now know advanced conditional filtering. Next, you'll learn about sorting data - ordering DataFrames by values and custom criteria.

SkillsetMaster - AI, Web Development & Data Analytics Courses