5 min read min read
Descriptive Statistics
Learn the key numbers that summarize your data
Descriptive Statistics
What are Descriptive Statistics?
Numbers that describe your data in simple terms:
- Average salary is $50,000
- Ages range from 20 to 65
- Most people live in NYC
The Big Three: Mean, Median, Mode
Mean (Average)
Add all values, divide by count.
code.py
import pandas as pd
df = pd.DataFrame({'Salary': [40000, 50000, 60000, 50000, 100000]})
print(df['Salary'].mean()) # 60000Problem: Mean is affected by extreme values. One person earning 100K pulls the average up.
Median (Middle Value)
The middle number when sorted.
code.py
print(df['Salary'].median()) # 50000Better for: Data with extreme values (salaries, house prices).
Mode (Most Common)
The value that appears most often.
code.py
print(df['Salary'].mode()) # 50000Best for: Categories (most popular product, common city).
Spread: How Different are Values?
Range
Difference between max and min.
code.py
range_val = df['Salary'].max() - df['Salary'].min()
print(range_val) # 60000Standard Deviation
How spread out the values are from the mean.
code.py
print(df['Salary'].std())- Low std = values are close together
- High std = values are spread out
Quick Summary with describe()
code.py
df = pd.DataFrame({
'Age': [25, 30, 28, 35, 22, 45, 33],
'Salary': [50000, 60000, 55000, 70000, 45000, 80000, 65000]
})
print(df.describe())Output:
Age Salary
count 7.000000 7.000000
mean 31.142857 60714.285714
std 7.559289 11726.533919
min 22.000000 45000.000000
25% 26.500000 52500.000000
50% 30.000000 60000.000000
75% 34.000000 67500.000000
max 45.000000 80000.000000
What Each Stat Means
| Stat | Meaning |
|---|---|
| count | How many values |
| mean | Average |
| std | Spread (standard deviation) |
| min | Smallest value |
| 25% | Lower quarter (25th percentile) |
| 50% | Middle (median) |
| 75% | Upper quarter (75th percentile) |
| max | Largest value |
Percentiles Explained
25% of people earn below 52,500 (25th percentile) 50% of people earn below 60,000 (median) 75% of people earn below 67,500 (75th percentile)
For Text/Categories
code.py
df = pd.DataFrame({
'City': ['NYC', 'LA', 'NYC', 'Chicago', 'NYC']
})
# Count each value
print(df['City'].value_counts())Output:
NYC 3
LA 1
Chicago 1
Key Points
- Mean = average (affected by extremes)
- Median = middle value (better for skewed data)
- Mode = most common value
- Std = how spread out values are
- describe() gives all stats at once
- value_counts() for categories
What's Next?
Learn to analyze one column at a time (univariate analysis).