#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Descriptive Statistics Review

Review key statistical concepts for data analysis

Descriptive Statistics Review

What are Descriptive Statistics?

Numbers that describe your data:

  • What's the average?
  • How spread out is it?
  • What's typical?

Measures of Center

Mean (Average)

code.py
import numpy as np
import pandas as pd

data = [10, 20, 30, 40, 50]

# NumPy
mean = np.mean(data)
print(mean)  # 30.0

# Pandas
df = pd.DataFrame({'values': data})
print(df['values'].mean())  # 30.0

Median (Middle Value)

code.py
data = [10, 20, 30, 100, 200]

print(np.median(data))  # 30.0

Median is better when you have outliers!

Mode (Most Common)

code.py
from scipy import stats

data = [1, 2, 2, 3, 3, 3, 4]
print(stats.mode(data))  # 3 (appears most)

Measures of Spread

Range

code.py
data = [10, 20, 30, 40, 50]

range_val = max(data) - min(data)
print(range_val)  # 40

Variance

How far values are from the mean:

code.py
print(np.var(data))  # 200.0

Standard Deviation

Square root of variance (same units as data):

code.py
print(np.std(data))  # 14.14

Rule: Most data falls within 2 standard deviations of mean.

Quartiles and IQR

code.py
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Q1 = np.percentile(data, 25)  # 3.25
Q2 = np.percentile(data, 50)  # 5.5 (median)
Q3 = np.percentile(data, 75)  # 7.75

IQR = Q3 - Q1  # 4.5
print(f"Q1: {Q1}, Q3: {Q3}, IQR: {IQR}")

IQR = Interquartile Range (middle 50% of data)

Pandas describe()

Get all stats at once:

code.py
df = pd.DataFrame({
    'Age': [25, 30, 35, 40, 45],
    'Salary': [50000, 60000, 70000, 80000, 90000]
})

print(df.describe())

Output:

StatAgeSalary
count5.005.00
mean35.0070000.00
std7.9115811.39
min25.0050000.00
25%30.0060000.00
50%35.0070000.00
75%40.0080000.00
max45.0090000.00

Skewness

Is data tilted left or right?

code.py
from scipy.stats import skew

data = [1, 2, 2, 3, 3, 3, 10]  # Has outlier

print(skew(data))  # Positive = right skew
  • Positive skew: Tail goes right (outliers are high)
  • Negative skew: Tail goes left (outliers are low)
  • Zero: Symmetric

Kurtosis

How peaked is the data?

code.py
from scipy.stats import kurtosis

print(kurtosis(data))
  • High kurtosis: Sharp peak, heavy tails
  • Low kurtosis: Flat peak

Complete Example

code.py
import pandas as pd
import numpy as np
from scipy import stats

# Sample data
df = pd.DataFrame({
    'Sales': [100, 150, 120, 180, 200, 90, 160, 140, 170, 130]
})

# All descriptive stats
print("=== Descriptive Statistics ===")
print(f"Mean: {df['Sales'].mean():.2f}")
print(f"Median: {df['Sales'].median():.2f}")
print(f"Mode: {df['Sales'].mode()[0]}")
print(f"Std Dev: {df['Sales'].std():.2f}")
print(f"Variance: {df['Sales'].var():.2f}")
print(f"Min: {df['Sales'].min()}")
print(f"Max: {df['Sales'].max()}")
print(f"Range: {df['Sales'].max() - df['Sales'].min()}")
print(f"Q1: {df['Sales'].quantile(0.25):.2f}")
print(f"Q3: {df['Sales'].quantile(0.75):.2f}")
print(f"IQR: {df['Sales'].quantile(0.75) - df['Sales'].quantile(0.25):.2f}")
print(f"Skewness: {df['Sales'].skew():.2f}")

When to Use What?

StatisticUse When
MeanData is symmetric, no outliers
MedianData has outliers or is skewed
Std DevMeasuring spread
IQRComparing with median

Key Points

  • Mean = average (affected by outliers)
  • Median = middle value (robust to outliers)
  • Std Dev = typical distance from mean
  • IQR = spread of middle 50%
  • Use describe() for quick summary
  • Check skewness for data shape

What's Next?

Learn the basics of probability.

SkillsetMaster - AI, Web Development & Data Analytics Courses