What is NumPy for Analysts — Fast Numerical Computing in Python?

Learn NumPy for data analysis. Master arrays, vectorized operations, statistical functions, and numerical computations essential for Python analysts.

Is NumPy for Analysts — Fast Numerical Computing in Python suitable for beginners?

This topic is designed for Beginner level learners. It takes approximately 10 min to complete and includes 10 interactive quizzes to test your understanding.

How long does it take to learn NumPy for Analysts — Fast Numerical Computing in Python?

You can complete this topic in about 10 min. The topic is part 30 of undefined in our comprehensive Data Analytics Learning Path.

NumPy for Data Analysts — Arrays, Vectorization, Statistics | DataPath

🔢

What is NumPy and Why Analysts Need It

NumPy (Numerical Python) is the foundation of Python's scientific computing stack. It provides fast, memory-efficient arrays and mathematical functions — the engine behind Pandas, scikit-learn, and most data science libraries.

Why NumPy Matters for Analysts

Speed: NumPy operations are 10-100x faster than Python lists because they use optimized C code under the hood. Processing millions of numbers? NumPy does it in milliseconds.

Memory Efficiency: NumPy arrays use less memory than Python lists. A list of 1 million integers takes ~8x more memory than a NumPy array.

Vectorization: Apply operations to entire arrays without loops. Instead of iterating through 1 million values, NumPy processes them all at once.

code.pyPython

import numpy as np

# Python list approach (slow)
amounts = [2500, 3200, 1800, 4100, 2900]
with_gst = []
for amount in amounts:
    with_gst.append(amount * 1.18)

# NumPy array approach (fast, clean)
amounts = np.array([2500, 3200, 1800, 4100, 2900])
with_gst = amounts * 1.18  # Vectorized operation — all at once
print(with_gst)  # [2950. 3776. 2124. 4838. 3422.]

When to Use NumPy vs Pandas

Use NumPy when:

You need pure numerical operations (math, statistics, linear algebra)
You're working with multi-dimensional arrays (matrices, images, tensors)
Performance is critical and you don't need labeled rows/columns

Use Pandas when:

You're working with tabular data (rows and columns with labels)
You need to merge, group, or pivot data
You want to handle missing data elegantly

In Practice: Most analysts use both — NumPy powers Pandas under the hood, and Pandas makes NumPy easier to use for tabular data.

Think of it this way...

If Pandas is Excel with programming, NumPy is a high-performance calculator. Pandas gives you tables and labels; NumPy gives you raw speed and mathematical power.

📦

NumPy Arrays — The Core Data Structure

A NumPy array is a grid of values, all of the same type. Unlike Python lists, arrays are fixed-size and homogeneous (all elements must be the same data type).

Creating Arrays

code.pyPython

import numpy as np

# From a Python list
arr = np.array([1, 2, 3, 4, 5])
print(arr)  # [1 2 3 4 5]

# 2D array (matrix)
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix)
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]

# Array of zeros
zeros = np.zeros(5)  # [0. 0. 0. 0. 0.]
zeros_matrix = np.zeros((3, 4))  # 3 rows, 4 columns

# Array of ones
ones = np.ones(5)  # [1. 1. 1. 1. 1.]

# Array with a range of values
range_arr = np.arange(0, 10, 2)  # [0 2 4 6 8] (start, stop, step)

# Array with evenly spaced values
linspace = np.linspace(0, 1, 5)  # [0.   0.25 0.5  0.75 1.  ] (start, stop, count)

# Random arrays
random_arr = np.random.rand(5)  # 5 random values between 0 and 1
random_int = np.random.randint(1, 100, size=10)  # 10 random integers between 1 and 99

Array Attributes

code.pyPython

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.shape)   # (2, 3) — 2 rows, 3 columns
print(arr.ndim)    # 2 — number of dimensions
print(arr.size)    # 6 — total number of elements
print(arr.dtype)   # dtype('int64') — data type of elements

Array Indexing and Slicing

code.pyPython

arr = np.array([10, 20, 30, 40, 50])

# Indexing (like Python lists)
print(arr[0])   # 10 (first element)
print(arr[-1])  # 50 (last element)

# Slicing
print(arr[1:4])  # [20 30 40] (index 1 to 3)
print(arr[:3])   # [10 20 30] (first 3)
print(arr[2:])   # [30 40 50] (from index 2 onward)

# 2D array indexing
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix[0, 0])    # 1 (row 0, column 0)
print(matrix[1, 2])    # 6 (row 1, column 2)
print(matrix[:, 1])    # [2 5 8] (all rows, column 1)
print(matrix[1, :])    # [4 5 6] (row 1, all columns)

Boolean Indexing (Filtering)

code.pyPython

amounts = np.array([2500, 3200, 1800, 4100, 2900])

# Filter: amounts greater than 3000
high_value = amounts[amounts > 3000]
print(high_value)  # [3200 4100]

# Multiple conditions
medium = amounts[(amounts > 2000) & (amounts < 4000)]
print(medium)  # [2500 3200 2900]

⚠️ CheckpointQuiz error: Missing or invalid options array

⚡

Array Operations and Vectorization

NumPy's superpower is vectorization — applying operations to entire arrays without explicit loops.

Arithmetic Operations

code.pyPython

amounts = np.array([2500, 3200, 1800, 4100, 2900])

# Scalar operations (applied to every element)
with_gst = amounts * 1.18
print(with_gst)  # [2950. 3776. 2124. 4838. 3422.]

discounted = amounts - 500
print(discounted)  # [2000 2700 1300 3600 2400]

# Element-wise array operations
revenue_day1 = np.array([45000, 38000, 52000])
revenue_day2 = np.array([48000, 39000, 55000])
total_revenue = revenue_day1 + revenue_day2
print(total_revenue)  # [93000 77000 107000]

growth = (revenue_day2 - revenue_day1) / revenue_day1 * 100
print(growth)  # [ 6.66666667  2.63157895  5.76923077]

Aggregation Functions

code.pyPython

amounts = np.array([2500, 3200, 1800, 4100, 2900])

print(amounts.sum())      # 14500 (total)
print(amounts.mean())     # 2900.0 (average)
print(amounts.median())   # 2900.0 (middle value) — wait, this is wrong!
print(np.median(amounts)) # 2900.0 (correct: use np.median, not method)
print(amounts.std())      # 797.18 (standard deviation)
print(amounts.min())      # 1800 (minimum)
print(amounts.max())      # 4100 (maximum)
print(amounts.argmin())   # 2 (index of minimum)
print(amounts.argmax())   # 3 (index of maximum)

# Percentiles
print(np.percentile(amounts, 25))  # 2250.0 (25th percentile)
print(np.percentile(amounts, 75))  # 3550.0 (75th percentile)

Axis-Wise Operations on 2D Arrays

code.pyPython

# City revenue by day (rows=cities, columns=days)
revenue = np.array([
    [45000, 48000, 52000],  # Mumbai
    [38000, 39000, 41000],  # Delhi
    [35000, 37000, 36000]   # Bangalore
])

# Total revenue per city (sum across columns)
city_totals = revenue.sum(axis=1)
print(city_totals)  # [145000 118000 108000]

# Total revenue per day (sum across rows)
day_totals = revenue.sum(axis=0)
print(day_totals)  # [118000 124000 129000]

# Average revenue per city
city_avg = revenue.mean(axis=1)
print(city_avg)  # [48333.33 39333.33 36000.]

Axis Reminder:

axis=0: operate down rows (column-wise aggregation)
axis=1: operate across columns (row-wise aggregation)

Universal Functions (ufuncs)

NumPy provides fast mathematical functions that work element-wise:

code.pyPython

amounts = np.array([100, 1000, 10000, 100000])

# Logarithm (useful for skewed data)
log_amounts = np.log10(amounts)
print(log_amounts)  # [2. 3. 4. 5.]

# Square root
sqrt_amounts = np.sqrt(amounts)
print(sqrt_amounts)  # [ 10.  31.62  100.  316.23]

# Exponential
exp_vals = np.exp([1, 2, 3])
print(exp_vals)  # [ 2.72  7.39 20.09]

# Rounding
values = np.array([2.3, 4.7, 5.5, 6.2])
print(np.round(values))    # [2. 5. 6. 6.]
print(np.floor(values))    # [2. 4. 5. 6.]
print(np.ceil(values))     # [3. 5. 6. 7.]

📊

Statistical Functions for Analysts

NumPy includes functions for common statistical calculations — essential for exploratory analysis.

Descriptive Statistics

code.pyPython

# Zomato order amounts
amounts = np.array([450, 680, 520, 890, 340, 720, 550, 480, 650, 920])

# Central tendency
mean = np.mean(amounts)      # 620.0 (average)
median = np.median(amounts)  # 585.0 (middle value)

# Spread
std = np.std(amounts)        # 184.39 (standard deviation)
var = np.var(amounts)        # 34000.0 (variance)
range_val = np.ptp(amounts)  # 580 (peak-to-peak: max - min)

# Percentiles/Quantiles
q25 = np.percentile(amounts, 25)   # 482.5 (25th percentile)
q75 = np.percentile(amounts, 75)   # 717.5 (75th percentile)
IQR = q75 - q25                     # 235.0 (interquartile range)

print(f"Mean: ₹{mean:.2f}")
print(f"Median: ₹{median:.2f}")
print(f"Std Dev: ₹{std:.2f}")
print(f"IQR: ₹{IQR:.2f}")

Correlation and Covariance

code.pyPython

# Swiggy: delivery time vs customer rating
delivery_time = np.array([25, 30, 35, 40, 45, 50, 55, 60])
rating = np.array([4.8, 4.7, 4.5, 4.3, 4.0, 3.8, 3.5, 3.2])

# Correlation coefficient (-1 to 1)
correlation = np.corrcoef(delivery_time, rating)[0, 1]
print(f"Correlation: {correlation:.3f}")  # -0.998 (strong negative correlation)

# Covariance
covariance = np.cov(delivery_time, rating)[0, 1]
print(f"Covariance: {covariance:.2f}")

Handling NaN Values

code.pyPython

# Data with missing values
amounts = np.array([2500, np.nan, 1800, 4100, np.nan, 2900])

# Regular mean fails
print(np.mean(amounts))  # nan

# NaN-safe functions
print(np.nanmean(amounts))  # 2825.0 (ignores NaN)
print(np.nanmedian(amounts))  # 2700.0
print(np.nansum(amounts))  # 11300.0
print(np.nanstd(amounts))  # 874.96

Random Sampling (for A/B Testing)

code.pyPython

# Randomly assign users to test groups
user_ids = np.arange(1, 10001)  # 10,000 users
np.random.shuffle(user_ids)

control_group = user_ids[:5000]   # First 5000
test_group = user_ids[5000:]      # Last 5000

# Random sample with replacement
sample = np.random.choice(amounts, size=100, replace=True)

# Random sample without replacement
sample_unique = np.random.choice(amounts, size=5, replace=False)

Info

For Analysts: Use NumPy for pure numerical calculations (mean, std, correlation). Use Pandas when you need to group by categories, handle missing data with business logic, or work with labeled data.

⚠️ FinalQuiz error: Missing or invalid questions array

⚠️ SummarySection error: Missing or invalid items array

Received: {"hasItems":false,"isArray":false}