Boolean Indexing and Filtering
Learn to filter and select data using boolean conditions
Boolean Indexing and Filtering
Boolean Masks
A boolean mask is an array of True/False values used to filter another array.
import numpy as np
numbers = np.array([10, 25, 30, 15, 40])
mask = numbers > 20
print("Mask:", mask)
print("Filtered:", numbers[mask])Output:
Mask: [False True True False True]
Filtered: [25 30 40]
How it works: True positions are kept, False positions are filtered out.
Direct Filtering
You don't need to create the mask separately.
import numpy as np
scores = np.array([78, 85, 92, 68, 95, 72])
high_scores = scores[scores > 80]
print("Scores above 80:", high_scores)Output: [85 92 95]
Multiple Conditions
AND (&)
Both conditions must be true.
import numpy as np
prices = np.array([45, 32, 67, 28, 51, 39])
mid_range = prices[(prices >= 30) & (prices <= 50)]
print("Prices 30-50:", mid_range)Output: [45 32 39]
OR (|)
At least one condition must be true.
import numpy as np
temps = np.array([72, 85, 68, 90, 75])
extreme = temps[(temps < 70) | (temps > 80)]
print("Extreme temps:", extreme)Output: [85 68 90]
NOT (~)
Inverts the condition.
import numpy as np
numbers = np.array([1, 2, 3, 4, 5])
not_three = numbers[numbers != 3]
print("Not 3:", not_three)
not_small = numbers[~(numbers < 3)]
print("Not small:", not_small)Output:
Not 3: [1 2 4 5]
Not small: [3 4 5]
Counting Matches
import numpy as np
scores = np.array([78, 85, 92, 68, 95, 72, 88])
passing = scores >= 70
count = np.sum(passing)
print("Passing students:", count)
percentage = (count / len(scores)) * 100
print("Pass rate:", round(percentage, 1) + " percent")Why sum works: True counts as 1, False as 0.
Finding Positions
import numpy as np
temps = np.array([72, 68, 75, 70, 73])
cold_indices = np.where(temps < 70)[0]
print("Cold day indices:", cold_indices)
print("Cold temps:", temps[cold_indices])Output:
Cold day indices: [1]
Cold temps: [68]
Conditional Replacement
Replace values that meet condition.
import numpy as np
scores = np.array([78, 65, 92, 58, 88])
scores[scores < 70] = 70
print("After curve:", scores)Output: [78 70 92 70 88]
Use case: Apply minimum grade, cap maximum values, fix outliers.
Using np.where for Replacement
import numpy as np
scores = np.array([85, 92, 78, 95, 88])
grades = np.where(scores >= 90, "A", "B")
print(grades)Output: ['B' 'A' 'B' 'A' 'B']
Syntax: np.where(condition, value_if_true, value_if_false)
Complex Conditions
import numpy as np
data = np.array([15, 25, 35, 45, 55, 65])
condition1 = data > 20
condition2 = data < 50
condition3 = data % 2 == 1
result = data[condition1 & condition2 & condition3]
print("Odd, between 20 and 50:", result)Output: [25 35 45]
Filtering 2D Arrays
import numpy as np
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
above_five = matrix[matrix > 5]
print("Values > 5:", above_five)Output: [6 7 8 9]
Note: Returns 1D array of matching values.
Filter Rows
import numpy as np
data = np.array([[10, 25], [30, 15], [20, 40]])
high_first_col = data[data[:, 0] > 15]
print("Rows where first column > 15:")
print(high_first_col)Output:
[[30 15]
[20 40]]
Practice Example
The scenario: Analyze and filter student performance data.
import numpy as np
student_scores = np.array([78, 85, 92, 68, 95, 72, 88, 76, 90, 82])
print("All scores:", student_scores)
print("Total students:", len(student_scores))
print()
passing = student_scores >= 70
print("Passing scores:", student_scores[passing])
print("Passing count:", np.sum(passing))
print("Pass rate:", round(np.mean(passing) * 100, 1) + " percent")
print()
excellent = student_scores[student_scores >= 90]
print("Excellent (90+):", excellent)
print("Count:", len(excellent))
print()
needs_help = student_scores[student_scores < 75]
print("Needs help (<75):", needs_help)
print("Count:", len(needs_help))
print()
mid_range = student_scores[(student_scores >= 75) & (student_scores < 90)]
print("Mid range (75-89):", mid_range)
print()
above_average = student_scores[student_scores > student_scores.mean()]
print("Above average:", above_average)
print("Average:", round(student_scores.mean(), 1))
print()
outliers = student_scores[(student_scores < 70) | (student_scores > 95)]
print("Outliers:", outliers)What this analysis shows:
- All student scores
- How many passed (70+)
- Excellent performers (90+)
- Students needing help (<75)
- Mid-range students
- Above-average performers
- Outliers (very low or very high)
Using isin()
Check if values are in a list.
import numpy as np
grades = np.array(["A", "B", "C", "A", "D", "B", "A"])
high_grades = np.isin(grades, ["A", "B"])
print("High grades:", grades[high_grades])Output: ['A' 'B' 'A' 'B' 'A']
Masking Invalid Data
import numpy as np
data = np.array([10, -999, 25, -999, 30])
mask = data != -999
valid_data = data[mask]
print("Valid data:", valid_data)
print("Average:", valid_data.mean())Use case: Remove placeholder values before calculations.
Selecting Random Subset
import numpy as np
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
mask = np.random.random(len(data)) > 0.5
sample = data[mask]
print("Random sample:", sample)What this does: Randomly selects about half the values.
Key Points to Remember
Boolean indexing uses True/False arrays to filter data. Create with comparison operators.
Combine conditions with & (and), | (or), ~ (not). Always use parentheses around conditions.
np.sum() on boolean array counts True values. np.mean() gives proportion.
np.where() finds positions or does conditional replacement.
Filtering 2D arrays returns 1D results unless you filter entire rows/columns.
Common Mistakes
Mistake 1: Using "and" instead of &
arr[(arr > 5) and (arr < 10)] # Error!
arr[(arr > 5) & (arr < 10)] # CorrectMistake 2: Forgetting parentheses
arr[arr > 5 & arr < 10] # Wrong!
arr[(arr > 5) & (arr < 10)] # CorrectMistake 3: Counting wrong
len(arr[arr > 5]) # Count of filtered values
np.sum(arr > 5) # Faster way to count True valuesMistake 4: Modifying filtered copy
filtered = arr[arr > 5]
filtered[0] = 99 # Doesn't change original arr
arr[arr > 5] = 99 # This changes originalWhat's Next?
You now know boolean indexing and filtering. Next, you'll learn statistical functions - advanced statistics and analysis with NumPy.