Boolean Indexing and Filtering

Boolean Masks

A boolean mask is an array of True/False values used to filter another array.

code.pyPython

import numpy as np

numbers = np.array([10, 25, 30, 15, 40])
mask = numbers > 20
print("Mask:", mask)
print("Filtered:", numbers[mask])

Output:

Mask: [False  True  True False  True]
Filtered: [25 30 40]

How it works: True positions are kept, False positions are filtered out.

Direct Filtering

You don't need to create the mask separately.

code.pyPython

import numpy as np

scores = np.array([78, 85, 92, 68, 95, 72])
high_scores = scores[scores > 80]
print("Scores above 80:", high_scores)

Output: [85 92 95]

Multiple Conditions

AND (&)

Both conditions must be true.

code.pyPython

import numpy as np

prices = np.array([45, 32, 67, 28, 51, 39])
mid_range = prices[(prices >= 30) & (prices <= 50)]
print("Prices 30-50:", mid_range)

Output: [45 32 39]

OR (|)

At least one condition must be true.

code.pyPython

import numpy as np

temps = np.array([72, 85, 68, 90, 75])
extreme = temps[(temps < 70) | (temps > 80)]
print("Extreme temps:", extreme)

Output: [85 68 90]

NOT (~)

Inverts the condition.

code.pyPython

import numpy as np

numbers = np.array([1, 2, 3, 4, 5])
not_three = numbers[numbers != 3]
print("Not 3:", not_three)

not_small = numbers[~(numbers < 3)]
print("Not small:", not_small)

Output:

Not 3: [1 2 4 5]
Not small: [3 4 5]

Counting Matches

code.pyPython

import numpy as np

scores = np.array([78, 85, 92, 68, 95, 72, 88])

passing = scores >= 70
count = np.sum(passing)
print("Passing students:", count)

percentage = (count / len(scores)) * 100
print("Pass rate:", round(percentage, 1) + " percent")

Why sum works: True counts as 1, False as 0.

Finding Positions

code.pyPython

import numpy as np

temps = np.array([72, 68, 75, 70, 73])
cold_indices = np.where(temps < 70)[0]
print("Cold day indices:", cold_indices)
print("Cold temps:", temps[cold_indices])

Output:

Cold day indices: [1]
Cold temps: [68]

Conditional Replacement

Replace values that meet condition.

code.pyPython

import numpy as np

scores = np.array([78, 65, 92, 58, 88])
scores[scores < 70] = 70
print("After curve:", scores)

Output: [78 70 92 70 88]

Use case: Apply minimum grade, cap maximum values, fix outliers.

Using np.where for Replacement

code.pyPython

import numpy as np

scores = np.array([85, 92, 78, 95, 88])
grades = np.where(scores >= 90, "A", "B")
print(grades)

Output: ['B' 'A' 'B' 'A' 'B']

Syntax: np.where(condition, value_if_true, value_if_false)

Complex Conditions

code.pyPython

import numpy as np

data = np.array([15, 25, 35, 45, 55, 65])

condition1 = data > 20
condition2 = data < 50
condition3 = data % 2 == 1

result = data[condition1 & condition2 & condition3]
print("Odd, between 20 and 50:", result)

Output: [25 35 45]

Filtering 2D Arrays

code.pyPython

import numpy as np

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

above_five = matrix[matrix > 5]
print("Values > 5:", above_five)

Output: [6 7 8 9]

Note: Returns 1D array of matching values.

Filter Rows

code.pyPython

import numpy as np

data = np.array([[10, 25], [30, 15], [20, 40]])

high_first_col = data[data[:, 0] > 15]
print("Rows where first column > 15:")
print(high_first_col)

Output:

[[30 15]
 [20 40]]

Practice Example

The scenario: Analyze and filter student performance data.

code.pyPython

import numpy as np

student_scores = np.array([78, 85, 92, 68, 95, 72, 88, 76, 90, 82])

print("All scores:", student_scores)
print("Total students:", len(student_scores))
print()

passing = student_scores >= 70
print("Passing scores:", student_scores[passing])
print("Passing count:", np.sum(passing))
print("Pass rate:", round(np.mean(passing) * 100, 1) + " percent")
print()

excellent = student_scores[student_scores >= 90]
print("Excellent (90+):", excellent)
print("Count:", len(excellent))
print()

needs_help = student_scores[student_scores < 75]
print("Needs help (<75):", needs_help)
print("Count:", len(needs_help))
print()

mid_range = student_scores[(student_scores >= 75) & (student_scores < 90)]
print("Mid range (75-89):", mid_range)
print()

above_average = student_scores[student_scores > student_scores.mean()]
print("Above average:", above_average)
print("Average:", round(student_scores.mean(), 1))
print()

outliers = student_scores[(student_scores < 70) | (student_scores > 95)]
print("Outliers:", outliers)

What this analysis shows:

All student scores
How many passed (70+)
Excellent performers (90+)
Students needing help (<75)
Mid-range students
Above-average performers
Outliers (very low or very high)

Using isin()

Check if values are in a list.

code.pyPython

import numpy as np

grades = np.array(["A", "B", "C", "A", "D", "B", "A"])
high_grades = np.isin(grades, ["A", "B"])
print("High grades:", grades[high_grades])

Output: ['A' 'B' 'A' 'B' 'A']

Masking Invalid Data

code.pyPython

import numpy as np

data = np.array([10, -999, 25, -999, 30])
mask = data != -999
valid_data = data[mask]
print("Valid data:", valid_data)
print("Average:", valid_data.mean())

Use case: Remove placeholder values before calculations.

Selecting Random Subset

code.pyPython

import numpy as np

data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
mask = np.random.random(len(data)) > 0.5
sample = data[mask]
print("Random sample:", sample)

What this does: Randomly selects about half the values.

Key Points to Remember

Boolean indexing uses True/False arrays to filter data. Create with comparison operators.

Combine conditions with & (and), | (or), ~ (not). Always use parentheses around conditions.

np.sum() on boolean array counts True values. np.mean() gives proportion.

np.where() finds positions or does conditional replacement.

Filtering 2D arrays returns 1D results unless you filter entire rows/columns.

Common Mistakes

Mistake 1: Using "and" instead of &

code.pyPython

arr[(arr > 5) and (arr < 10)]  # Error!
arr[(arr > 5) & (arr < 10)]  # Correct

Mistake 2: Forgetting parentheses

code.pyPython

arr[arr > 5 & arr < 10]  # Wrong!
arr[(arr > 5) & (arr < 10)]  # Correct

Mistake 3: Counting wrong

code.pyPython

len(arr[arr > 5])  # Count of filtered values
np.sum(arr > 5)  # Faster way to count True values

Mistake 4: Modifying filtered copy

code.pyPython

filtered = arr[arr > 5]
filtered[0] = 99  # Doesn't change original arr
arr[arr > 5] = 99  # This changes original

What's Next?

You now know boolean indexing and filtering. Next, you'll learn statistical functions - advanced statistics and analysis with NumPy.