Conditional Filtering

np.where() for Conditional Values

Create new column based on condition.

code.py

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma'],
    'Score': [85, 92, 78, 95]
})

df['Grade'] = np.where(df['Score'] >= 90, 'A', 'B')
print(df)

Output:

    Name  Score Grade
0   John     85     B
1  Sarah     92     A
2   Mike     78     B
3   Emma     95     A

How it works:

If Score >= 90, set 'A'
Otherwise, set 'B'

Multiple Conditions with np.where

Nest conditions for multiple cases.

code.py

df['Grade'] = np.where(df['Score'] >= 90, 'A',
              np.where(df['Score'] >= 80, 'B',
              np.where(df['Score'] >= 70, 'C', 'F')))

print(df)

What this creates:

90+: A
80-89: B
70-79: C
Below 70: F

np.select() for Better Readability

Better for many conditions.

code.py

conditions = [
    df['Score'] >= 90,
    df['Score'] >= 80,
    df['Score'] >= 70,
    df['Score'] >= 60
]

choices = ['A', 'B', 'C', 'D']

df['Grade'] = np.select(conditions, choices, default='F')
print(df)

Much cleaner than nested np.where!

apply() with Custom Function

Apply function to each row.

code.py

def categorize_score(row):
    score = row['Score']
    if score >= 90:
        return 'Excellent'
    elif score >= 75:
        return 'Good'
    elif score >= 60:
        return 'Pass'
    else:
        return 'Fail'

df['Category'] = df.apply(categorize_score, axis=1)
print(df)

axis=1 means apply to each row.

Lambda Functions

Short inline functions.

code.py

df['Passed'] = df['Score'].apply(lambda x: 'Yes' if x >= 60 else 'No')
print(df)

Same as:

code.py

def check_pass(score):
    return 'Yes' if score >= 60 else 'No'

df['Passed'] = df['Score'].apply(check_pass)

Multiple Column Conditions

code.py

df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma'],
    'Written': [85, 92, 78, 88],
    'Practical': [90, 85, 70, 95]
})

def final_grade(row):
    avg = (row['Written'] + row['Practical']) / 2
    if avg >= 90:
        return 'A'
    elif avg >= 80:
        return 'B'
    elif avg >= 70:
        return 'C'
    else:
        return 'F'

df['Final'] = df.apply(final_grade, axis=1)
print(df)

Conditional Replacement

code.py

df = pd.DataFrame({
    'Status': ['active', 'inactive', 'pending', 'active']
})

df['Status'] = df['Status'].replace({
    'active': 1,
    'inactive': 0,
    'pending': -1
})

print(df)

map() Method

Map values from dictionary.

code.py

df = pd.DataFrame({
    'Size': ['S', 'M', 'L', 'XL']
})

size_map = {'S': 'Small', 'M': 'Medium', 'L': 'Large', 'XL': 'Extra Large'}
df['Size_Full'] = df['Size'].map(size_map)
print(df)

Practice Example

The scenario: Categorize employees and calculate bonuses.

code.py

import pandas as pd
import numpy as np

employees = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma', 'David', 'Lisa'],
    'Department': ['Sales', 'IT', 'Sales', 'HR', 'IT', 'Sales'],
    'Salary': [50000, 85000, 55000, 65000, 90000, 52000],
    'Years': [3, 8, 4, 6, 10, 2],
    'Performance': [85, 92, 78, 88, 95, 80]
})

print("Original data:")
print(employees)
print()

print("1. Categorize by experience:")
conditions = [
    employees['Years'] >= 8,
    employees['Years'] >= 5,
    employees['Years'] >= 2
]
choices = ['Senior', 'Mid-Level', 'Junior']
employees['Level'] = np.select(conditions, choices, default='Entry')
print(employees[['Name', 'Years', 'Level']])
print()

print("2. Performance rating:")
def get_rating(score):
    if score >= 90:
        return 'Outstanding'
    elif score >= 85:
        return 'Excellent'
    elif score >= 75:
        return 'Good'
    else:
        return 'Needs Improvement'

employees['Rating'] = employees['Performance'].apply(get_rating)
print(employees[['Name', 'Performance', 'Rating']])
print()

print("3. Calculate bonus:")
def calculate_bonus(row):
    base = row['Salary'] * 0.05
    if row['Performance'] >= 90:
        return base * 2
    elif row['Performance'] >= 85:
        return base * 1.5
    else:
        return base

employees['Bonus'] = employees.apply(calculate_bonus, axis=1)
print(employees[['Name', 'Salary', 'Performance', 'Bonus']])
print()

print("4. Promotion eligible:")
employees['Eligible'] = np.where(
    (employees['Years'] >= 3) & (employees['Performance'] >= 85),
    'Yes',
    'No'
)
print(employees[['Name', 'Years', 'Performance', 'Eligible']])
print()

print("5. Department code:")
dept_map = {'Sales': 'SLS', 'IT': 'ITC', 'HR': 'HRD'}
employees['Dept_Code'] = employees['Department'].map(dept_map)
print(employees[['Name', 'Department', 'Dept_Code']])
print()

print("Summary:")
print("Total bonus pool:", employees['Bonus'].sum())
print("Promotion eligible:", (employees['Eligible'] == 'Yes').sum())
print("Outstanding performers:", (employees['Rating'] == 'Outstanding').sum())

Chaining Conditions

code.py

df['Category'] = (
    df['Score']
    .apply(lambda x: 'High' if x >= 80 else 'Medium' if x >= 60 else 'Low')
)

Conditional with Missing Values

code.py

df['Status'] = np.where(
    df['Score'].isnull(),
    'No Data',
    np.where(df['Score'] >= 60, 'Pass', 'Fail')
)

Update Existing Values

code.py

df.loc[df['Score'] < 60, 'Score'] = 60
print("Scores adjusted to minimum 60")

Conditional Aggregation

code.py

high_scores = df[df['Score'] >= 80]['Score'].mean()
low_scores = df[df['Score'] < 80]['Score'].mean()

print("Avg high scores:", high_scores)
print("Avg low scores:", low_scores)

Complex Business Logic

code.py

def assign_territory(row):
    if row['Department'] == 'Sales':
        if row['Years'] >= 5:
            return 'Regional'
        else:
            return 'Local'
    else:
        return 'N/A'

df['Territory'] = df.apply(assign_territory, axis=1)

Filter Then Transform

code.py

high_performers = df[df['Performance'] >= 85].copy()
high_performers['Bonus_Multiplier'] = 2.0

low_performers = df[df['Performance'] < 85].copy()
low_performers['Bonus_Multiplier'] = 1.0

result = pd.concat([high_performers, low_performers])

Vectorized Operations

Faster than apply for simple cases.

code.py

df['Bonus'] = df['Salary'] * 0.1 * (df['Performance'] / 100)

Much faster than:

code.py

def calc_bonus(row):
    return row['Salary'] * 0.1 * (row['Performance'] / 100)

df['Bonus'] = df.apply(calc_bonus, axis=1)

Key Points to Remember

np.where(condition, value_if_true, value_if_false) for simple conditions.

np.select() better for multiple conditions. More readable than nested np.where.

apply() with axis=1 applies function to each row. Use for complex logic.

Lambda functions good for simple one-line conditions.

map() converts values using dictionary mapping.

Vectorized operations faster than apply when possible.

Common Mistakes

Mistake 1: Wrong axis

code.py

df.apply(func)  # Applies to columns (axis=0)
df.apply(func, axis=1)  # Applies to rows - what you usually want

Mistake 2: Modifying filtered DataFrame

code.py

df[df['Score'] > 80]['Grade'] = 'A'  # May not work!
df.loc[df['Score'] > 80, 'Grade'] = 'A'  # Correct

Mistake 3: Not using copy

code.py

subset = df[df['Score'] > 80]
subset['New'] = 1  # Warning!

subset = df[df['Score'] > 80].copy()  # Safe

Mistake 4: Slow apply when vectorized possible

code.py

df['Double'] = df['Score'].apply(lambda x: x * 2)  # Slow
df['Double'] = df['Score'] * 2  # Fast

Mistake 5: Forgetting default in np.select

code.py

result = np.select(conditions, choices)  # NaN for unmatched
result = np.select(conditions, choices, default='Other')  # Better

What's Next?

You now know advanced conditional filtering. Next, you'll learn about sorting data - ordering DataFrames by values and custom criteria.