#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
6 min read min read

Conditional Filtering

Learn advanced filtering with np.where, apply, and custom functions

Conditional Filtering

np.where() for Conditional Values

Create new column based on condition.

code.pyPython
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma'],
    'Score': [85, 92, 78, 95]
})

df['Grade'] = np.where(df['Score'] >= 90, 'A', 'B')
print(df)

Output:

Name Score Grade 0 John 85 B 1 Sarah 92 A 2 Mike 78 B 3 Emma 95 A

How it works:

  • If Score >= 90, set 'A'
  • Otherwise, set 'B'

Multiple Conditions with np.where

Nest conditions for multiple cases.

code.pyPython
df['Grade'] = np.where(df['Score'] >= 90, 'A',
              np.where(df['Score'] >= 80, 'B',
              np.where(df['Score'] >= 70, 'C', 'F')))

print(df)

What this creates:

  • 90+: A
  • 80-89: B
  • 70-79: C
  • Below 70: F

np.select() for Better Readability

Better for many conditions.

code.pyPython
conditions = [
    df['Score'] >= 90,
    df['Score'] >= 80,
    df['Score'] >= 70,
    df['Score'] >= 60
]

choices = ['A', 'B', 'C', 'D']

df['Grade'] = np.select(conditions, choices, default='F')
print(df)

Much cleaner than nested np.where!

apply() with Custom Function

Apply function to each row.

code.pyPython
def categorize_score(row):
    score = row['Score']
    if score >= 90:
        return 'Excellent'
    elif score >= 75:
        return 'Good'
    elif score >= 60:
        return 'Pass'
    else:
        return 'Fail'

df['Category'] = df.apply(categorize_score, axis=1)
print(df)

axis=1 means apply to each row.

Lambda Functions

Short inline functions.

code.pyPython
df['Passed'] = df['Score'].apply(lambda x: 'Yes' if x >= 60 else 'No')
print(df)

Same as:

code.pyPython
def check_pass(score):
    return 'Yes' if score >= 60 else 'No'

df['Passed'] = df['Score'].apply(check_pass)

Multiple Column Conditions

code.pyPython
df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma'],
    'Written': [85, 92, 78, 88],
    'Practical': [90, 85, 70, 95]
})

def final_grade(row):
    avg = (row['Written'] + row['Practical']) / 2
    if avg >= 90:
        return 'A'
    elif avg >= 80:
        return 'B'
    elif avg >= 70:
        return 'C'
    else:
        return 'F'

df['Final'] = df.apply(final_grade, axis=1)
print(df)

Conditional Replacement

code.pyPython
df = pd.DataFrame({
    'Status': ['active', 'inactive', 'pending', 'active']
})

df['Status'] = df['Status'].replace({
    'active': 1,
    'inactive': 0,
    'pending': -1
})

print(df)

map() Method

Map values from dictionary.

code.pyPython
df = pd.DataFrame({
    'Size': ['S', 'M', 'L', 'XL']
})

size_map = {'S': 'Small', 'M': 'Medium', 'L': 'Large', 'XL': 'Extra Large'}
df['Size_Full'] = df['Size'].map(size_map)
print(df)

Practice Example

The scenario: Categorize employees and calculate bonuses.

code.pyPython
import pandas as pd
import numpy as np

employees = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma', 'David', 'Lisa'],
    'Department': ['Sales', 'IT', 'Sales', 'HR', 'IT', 'Sales'],
    'Salary': [50000, 85000, 55000, 65000, 90000, 52000],
    'Years': [3, 8, 4, 6, 10, 2],
    'Performance': [85, 92, 78, 88, 95, 80]
})

print("Original data:")
print(employees)
print()

print("1. Categorize by experience:")
conditions = [
    employees['Years'] >= 8,
    employees['Years'] >= 5,
    employees['Years'] >= 2
]
choices = ['Senior', 'Mid-Level', 'Junior']
employees['Level'] = np.select(conditions, choices, default='Entry')
print(employees[['Name', 'Years', 'Level']])
print()

print("2. Performance rating:")
def get_rating(score):
    if score >= 90:
        return 'Outstanding'
    elif score >= 85:
        return 'Excellent'
    elif score >= 75:
        return 'Good'
    else:
        return 'Needs Improvement'

employees['Rating'] = employees['Performance'].apply(get_rating)
print(employees[['Name', 'Performance', 'Rating']])
print()

print("3. Calculate bonus:")
def calculate_bonus(row):
    base = row['Salary'] * 0.05
    if row['Performance'] >= 90:
        return base * 2
    elif row['Performance'] >= 85:
        return base * 1.5
    else:
        return base

employees['Bonus'] = employees.apply(calculate_bonus, axis=1)
print(employees[['Name', 'Salary', 'Performance', 'Bonus']])
print()

print("4. Promotion eligible:")
employees['Eligible'] = np.where(
    (employees['Years'] >= 3) & (employees['Performance'] >= 85),
    'Yes',
    'No'
)
print(employees[['Name', 'Years', 'Performance', 'Eligible']])
print()

print("5. Department code:")
dept_map = {'Sales': 'SLS', 'IT': 'ITC', 'HR': 'HRD'}
employees['Dept_Code'] = employees['Department'].map(dept_map)
print(employees[['Name', 'Department', 'Dept_Code']])
print()

print("Summary:")
print("Total bonus pool:", employees['Bonus'].sum())
print("Promotion eligible:", (employees['Eligible'] == 'Yes').sum())
print("Outstanding performers:", (employees['Rating'] == 'Outstanding').sum())

Chaining Conditions

code.pyPython
df['Category'] = (
    df['Score']
    .apply(lambda x: 'High' if x >= 80 else 'Medium' if x >= 60 else 'Low')
)

Conditional with Missing Values

code.pyPython
df['Status'] = np.where(
    df['Score'].isnull(),
    'No Data',
    np.where(df['Score'] >= 60, 'Pass', 'Fail')
)

Update Existing Values

code.pyPython
df.loc[df['Score'] < 60, 'Score'] = 60
print("Scores adjusted to minimum 60")

Conditional Aggregation

code.pyPython
high_scores = df[df['Score'] >= 80]['Score'].mean()
low_scores = df[df['Score'] < 80]['Score'].mean()

print("Avg high scores:", high_scores)
print("Avg low scores:", low_scores)

Complex Business Logic

code.pyPython
def assign_territory(row):
    if row['Department'] == 'Sales':
        if row['Years'] >= 5:
            return 'Regional'
        else:
            return 'Local'
    else:
        return 'N/A'

df['Territory'] = df.apply(assign_territory, axis=1)

Filter Then Transform

code.pyPython
high_performers = df[df['Performance'] >= 85].copy()
high_performers['Bonus_Multiplier'] = 2.0

low_performers = df[df['Performance'] < 85].copy()
low_performers['Bonus_Multiplier'] = 1.0

result = pd.concat([high_performers, low_performers])

Vectorized Operations

Faster than apply for simple cases.

code.pyPython
df['Bonus'] = df['Salary'] * 0.1 * (df['Performance'] / 100)

Much faster than:

code.pyPython
def calc_bonus(row):
    return row['Salary'] * 0.1 * (row['Performance'] / 100)

df['Bonus'] = df.apply(calc_bonus, axis=1)

Key Points to Remember

np.where(condition, value_if_true, value_if_false) for simple conditions.

np.select() better for multiple conditions. More readable than nested np.where.

apply() with axis=1 applies function to each row. Use for complex logic.

Lambda functions good for simple one-line conditions.

map() converts values using dictionary mapping.

Vectorized operations faster than apply when possible.

Common Mistakes

Mistake 1: Wrong axis

code.pyPython
df.apply(func)  # Applies to columns (axis=0)
df.apply(func, axis=1)  # Applies to rows - what you usually want

Mistake 2: Modifying filtered DataFrame

code.pyPython
df[df['Score'] > 80]['Grade'] = 'A'  # May not work!
df.loc[df['Score'] > 80, 'Grade'] = 'A'  # Correct

Mistake 3: Not using copy

code.pyPython
subset = df[df['Score'] > 80]
subset['New'] = 1  # Warning!

subset = df[df['Score'] > 80].copy()  # Safe

Mistake 4: Slow apply when vectorized possible

code.pyPython
df['Double'] = df['Score'].apply(lambda x: x * 2)  # Slow
df['Double'] = df['Score'] * 2  # Fast

Mistake 5: Forgetting default in np.select

code.pyPython
result = np.select(conditions, choices)  # NaN for unmatched
result = np.select(conditions, choices, default='Other')  # Better

What's Next?

You now know advanced conditional filtering. Next, you'll learn about sorting data - ordering DataFrames by values and custom criteria.