15 min read
NumPy Essentials
The foundation of scientific computing in Python: Arrays, broadcasting, and vectorization
What You'll Learn
- Creating NumPy arrays
- Array indexing and slicing
- Vectorized operations (speed!)
- Broadcasting
- Common statistical functions
Getting Started
Why NumPy?
- Speed: Written in C, much faster than Python lists.
- Functionality: Huge library of mathematical functions.
- Foundation: Pandas, Scikit-Learn, and TensorFlow are built on top of NumPy.
code.py
import numpy as np
# Creating arrays
arr = np.array([1, 2, 3, 4, 5])
print(type(arr)) # <class 'numpy.ndarray'>Creating Arrays
code.py
# From list
a = np.array([1, 2, 3])
# Zeros and Ones
zeros = np.zeros(5) # [0. 0. 0. 0. 0.]
ones = np.ones((2, 3)) # 2x3 matrix of 1s
# Ranges
range_arr = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
linspace = np.linspace(0, 1, 5) # 5 points between 0 and 1
# Random
rand = np.random.rand(3, 3) # 3x3 random values [0, 1)
randn = np.random.randn(5) # Standard normal distribution
randint = np.random.randint(0, 10, 5) # 5 random ints [0, 10)Array Operations
Vectorization (Element-wise operations):
code.py
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Math operations apply to ALL elements at once
print(a + b) # [5 7 9]
print(a * 2) # [2 4 6]
print(a ** 2) # [1 4 9]
print(np.sqrt(a)) # Square root of each elementBroadcasting: NumPy handles operations between arrays of different shapes automatically.
code.py
matrix = np.ones((3, 3))
row = np.array([1, 2, 3])
# Adds row to EVERY row of matrix
result = matrix + rowIndexing and Slicing
code.py
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Access element [row, col]
print(arr[0, 0]) # 1
print(arr[1, 2]) # 6
# Slicing
print(arr[0, :]) # First row: [1 2 3]
print(arr[:, 1]) # Second column: [2 5 8]
print(arr[0:2, 0:2]) # Top-left 2x2 sub-arrayBoolean Indexing (Filtering)
code.py
arr = np.array([1, 10, 20, 3, 40])
# Create mask
mask = arr > 10
print(mask) # [False False True False True]
# Filter array
filtered = arr[arr > 10]
print(filtered) # [20 40]Statistical Functions
code.py
arr = np.random.randn(1000)
print(np.mean(arr)) # Average
print(np.median(arr)) # Median
print(np.std(arr)) # Standard Deviation
print(np.min(arr)) # Minimum
print(np.max(arr)) # Maximum
print(np.sum(arr)) # SumPractice Exercise
code.py
import numpy as np
# 1. Create a 5x5 matrix of random integers between 1 and 100
matrix = np.random.randint(1, 101, (5, 5))
# 2. Find the mean of the entire matrix
print("Mean:", np.mean(matrix))
# 3. Find the max value in each row
print("Row maxs:", np.max(matrix, axis=1))
# 4. Filter values greater than 50
high_values = matrix[matrix > 50]
print("Values > 50 count:", len(high_values))Next Steps
Now let's apply this power to tabular data with Pandas DataFrames!
Practice & Experiment
Test your understanding by running Python code directly in your browser. Try the examples from the article above!