#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Descriptive Statistics

Learn the key numbers that summarize your data

Descriptive Statistics

What are Descriptive Statistics?

Numbers that describe your data in simple terms:

  • Average salary is $50,000
  • Ages range from 20 to 65
  • Most people live in NYC

The Big Three: Mean, Median, Mode

Mean (Average)

Add all values, divide by count.

code.py
import pandas as pd

df = pd.DataFrame({'Salary': [40000, 50000, 60000, 50000, 100000]})

print(df['Salary'].mean())  # 60000

Problem: Mean is affected by extreme values. One person earning 100K pulls the average up.

Median (Middle Value)

The middle number when sorted.

code.py
print(df['Salary'].median())  # 50000

Better for: Data with extreme values (salaries, house prices).

Mode (Most Common)

The value that appears most often.

code.py
print(df['Salary'].mode())  # 50000

Best for: Categories (most popular product, common city).

Spread: How Different are Values?

Range

Difference between max and min.

code.py
range_val = df['Salary'].max() - df['Salary'].min()
print(range_val)  # 60000

Standard Deviation

How spread out the values are from the mean.

code.py
print(df['Salary'].std())
  • Low std = values are close together
  • High std = values are spread out

Quick Summary with describe()

code.py
df = pd.DataFrame({
    'Age': [25, 30, 28, 35, 22, 45, 33],
    'Salary': [50000, 60000, 55000, 70000, 45000, 80000, 65000]
})

print(df.describe())

Output:

Age Salary count 7.000000 7.000000 mean 31.142857 60714.285714 std 7.559289 11726.533919 min 22.000000 45000.000000 25% 26.500000 52500.000000 50% 30.000000 60000.000000 75% 34.000000 67500.000000 max 45.000000 80000.000000

What Each Stat Means

StatMeaning
countHow many values
meanAverage
stdSpread (standard deviation)
minSmallest value
25%Lower quarter (25th percentile)
50%Middle (median)
75%Upper quarter (75th percentile)
maxLargest value

Percentiles Explained

25% of people earn below 52,500 (25th percentile) 50% of people earn below 60,000 (median) 75% of people earn below 67,500 (75th percentile)

For Text/Categories

code.py
df = pd.DataFrame({
    'City': ['NYC', 'LA', 'NYC', 'Chicago', 'NYC']
})

# Count each value
print(df['City'].value_counts())

Output:

NYC 3 LA 1 Chicago 1

Key Points

  • Mean = average (affected by extremes)
  • Median = middle value (better for skewed data)
  • Mode = most common value
  • Std = how spread out values are
  • describe() gives all stats at once
  • value_counts() for categories

What's Next?

Learn to analyze one column at a time (univariate analysis).