Topic 42 of

Mean vs Median vs Mode — Which Measure to Use When

Mean, median, and mode all measure 'typical value' — but they tell very different stories. Knowing which to use (and when) is crucial for honest, accurate data analysis.

📚Beginner
⏱️9 min
10 quizzes
📊

Mean, Median, Mode — The Big Three

These three measures all answer the same question: "What's the typical value?" But they define "typical" differently.

Mean (Average)

Definition: Sum of all values divided by count.

Formula: Mean = (x₁ + x₂ + ... + xₙ) / n

Example — Swiggy Delivery Times (5 orders):

Data: 22, 25, 28, 30, 95 minutes Mean = (22 + 25 + 28 + 30 + 95) / 5 = 200 / 5 = 40 minutes

Interpretation: "Average delivery time is 40 minutes"

Problem: One outlier (95 min) inflates the mean. Most deliveries (4 out of 5) were 22-30 minutes, but mean says 40.


Median (Middle Value)

Definition: Middle value when data is sorted. Half of values are above, half below.

How to Calculate:

  1. Sort data (ascending or descending)
  2. Odd count: Pick middle value
  3. Even count: Average the two middle values

Example — Same Swiggy Data:

Data: 22, 25, 28, 30, 95 minutes Sorted: [22, 25, 28, 30, 95] ↑ ↑ 50% below 50% above Median = 28 minutes (middle value)

Interpretation: "Half of deliveries took ≤28 minutes, half took ≥28 minutes"

Advantage: Outlier (95 min) doesn't affect median. It's 28 min either way.


Mode (Most Frequent)

Definition: Value that appears most often in dataset.

Example — Flipkart Product Ratings:

Data: 5★, 5★, 5★, 4★, 4★, 3★, 1★, 1★ Mode = 5★ (appears 3 times, more than any other rating)

Interpretation: "Most common rating is 5 stars"

Special Cases:

  • No mode: All values appear equally (e.g., 1, 2, 3, 4, 5 — each appears once)
  • Bimodal: Two values tie for most frequent (e.g., 1, 1, 2, 3, 3 — mode is 1 AND 3)
  • Multimodal: Three or more values tie

When it matters: Categorical data (sizes: S, M, L), discrete data (ratings: 1-5), identifying most popular product/category.


Quick Comparison

| Measure | Affected by Outliers? | Best For | Example Use | |---------|----------------------|----------|-------------| | Mean | Yes (highly sensitive) | Symmetric data, no outliers | Test scores, heights, sensor data | | Median | No (resistant) | Skewed data, outliers present | Income, real estate prices, order values | | Mode | No | Categorical data, finding most common | Shoe sizes, product colors, customer segments |

Think of it this way...

Imagine 5 people's salaries: ₹5L, ₹5L, ₹6L, ₹7L, ₹1Cr. Mean salary = ₹24.6L (misleading — only CEO earns this much). Median = ₹6L (typical employee). Mode = ₹5L (most common). Each tells a different story — choose based on what you want to communicate.

🎯

When to Use Each Measure

Choosing the right measure depends on: (1) Data distribution, (2) Presence of outliers, (3) What story you want to tell.

Use Mean When...

Data is symmetric (bell curve, no skew)

  • Heights of adults: Most near average, few very tall/short (symmetric)
  • Test scores: Most students near average, few very high/low
  • Manufacturing measurements: Part dimensions cluster around target

No significant outliers

  • Daily website traffic: Consistent range (no viral spikes)
  • Sensor readings: Small natural variation

You need mathematical properties

  • Mean has algebraic properties (useful in formulas, regression)
  • Sum of deviations from mean = 0 (useful property)

Example — Zomato Restaurant Ratings:

Ratings: 4.1, 4.2, 4.3, 4.2, 4.4, 4.3, 4.2 (out of 5) Mean = 4.24 (good summary — data is tightly clustered)

When mean works: Data is consistent, no extreme values, symmetric distribution.


Use Median When...

Data is skewed (long tail on one side)

  • Income: Most people earn ₹5-10L, few earn crores (right-skewed)
  • Real estate prices: Most homes ₹50L-₹1Cr, few luxury ₹10Cr+ (right-skewed)
  • Website load time: Most pages 2s, few very slow (right-skewed)

Outliers are present

  • E-commerce order values: Most ₹500-₹2,000, occasional ₹50K laptop orders
  • Delivery times: Most 20-30 min, occasional 2-hour delays (traffic, weather)

You want 'typical' experience

  • Median represents middle 50% of data (less influenced by extremes)
  • Better for stakeholder communication: "Half our customers wait ≤25 minutes"

Example — Flipkart Order Values:

Orders: ₹350, ₹480, ₹920, ₹1,200, ₹1,500, ₹50,000 (laptop) Mean = ₹9,075 (misleading — laptop inflates average) Median = ₹1,060 (typical order for most customers)

Rule of thumb: If mean >> median (much larger), data is right-skewed → Use median.


Use Mode When...

Categorical data (non-numeric)

  • Most popular product category: Electronics, Fashion, Home
  • Most common traffic source: Organic, Paid, Direct, Social
  • Preferred payment method: UPI, Card, COD

Discrete data with clear peaks

  • Shoe sizes: Most common size 8 (mode = 8)
  • Star ratings: Most customers give 5★ (mode = 5)
  • Number of items per order: Most orders have 1 item (mode = 1)

You want 'most common' value

  • Mode answers: "What do most people do/choose?"
  • Inventory planning: Stock more of mode size (best-selling size)

Example — T-Shirt Size Sales:

Sales: S (15), M (40), L (55), XL (30), XXL (10) Mode = L (sold 55 units — most popular size) Mean = Not applicable (sizes aren't numeric) Median = L (if you order S < M < L < XL < XXL, middle is L)

When mode is essential: Non-numeric data (can't calculate mean/median for categories like "Red, Blue, Green").


Decision Tree: Which Measure?

START │ ├─ Is data numeric? │ ├─ NO → Use MODE (categorical data) │ └─ YES → Continue │ ├─ Are there outliers? │ ├─ YES → Use MEDIAN (robust to outliers) │ └─ NO → Continue │ ├─ Is data skewed (long tail)? │ ├─ YES → Use MEDIAN (better represents typical) │ └─ NO → Use MEAN (symmetric distribution)
Info

In practice, report ALL THREE when appropriate. Example dashboard: "Average order value: ₹1,250 (mean), Typical order: ₹950 (median), Most common order: ₹800 (mode)." This gives complete picture of data distribution.

⚠️ CheckpointQuiz error: Missing or invalid options array

📈

Visualizing Mean, Median, Mode

Seeing how mean, median, and mode behave with different distributions clarifies when to use each.

Symmetric Distribution (Normal/Bell Curve)

Shape: Data centered around middle, tapers evenly on both sides.

Frequency │ ╱╲ │ ╱ ╲ │ ╱ ╲ │ ╱ ╲ │_____╱________╲_____ │ Mean = Median = Mode (all equal)

Example — Heights of Adult Men:

Data: 165, 168, 170, 172, 170, 173, 175, 172, 170, 168 cm Mean = 170.3 cm Median = 170 cm Mode = 170 cm All three are nearly equal (symmetric distribution)

When this happens: Natural phenomena (heights, IQ scores, measurement errors), consistent processes (manufacturing).

Takeaway: For symmetric data, mean = median = mode (all valid). Use mean (most common in statistics).


Right-Skewed Distribution (Long Tail Right)

Shape: Most data on left (low values), long tail on right (high values).

Frequency │ ╱╲ │╱ ╲___ │ ╲___ │ ╲___ │______________╲___ Mode < Median < Mean ↑ ↑ ↑ Most Middle Inflated common value by outliers

Example — Income Distribution:

Data: ₹4L, ₹5L, ₹5L, ₹6L, ₹7L, ₹8L, ₹10L, ₹15L, ₹50L, ₹1Cr Mode = ₹5L (most common) Median = ₹7.5L (middle — half earn less, half more) Mean = ₹21L (inflated by ₹50L and ₹1Cr outliers)

Rule: Mode < Median < Mean (in right-skewed data)

When this happens: Income, wealth, real estate prices, order values, website load times.

Takeaway: Use median for right-skewed data (represents typical value). Mean overstates reality.


Left-Skewed Distribution (Long Tail Left)

Shape: Most data on right (high values), long tail on left (low values).

Frequency │ ╱╲ │ ___╱ ╲ │ ___╱ ╲ │___╱ │ Mean < Median < Mode

Example — Student Test Scores (Easy Exam):

Data: 35, 45, 50, 85, 88, 90, 92, 95, 95, 98 Mode = 95 (most common score) Median = 87 (middle value) Mean = 77.3 (pulled down by 35, 45, 50 outliers)

Rule: Mean < Median < Mode (in left-skewed data)

When this happens: Test scores (when most students do well, few fail), age at retirement, product ratings (most 5★, few 1★).

Takeaway: Use median (less affected by low outliers). Mean understates typical performance.


Bimodal Distribution (Two Peaks)

Shape: Two distinct clusters (two modes).

Frequency │ ╱╲ ╱╲ │ ╱ ╲ ╱ ╲ │╱ ╲___╱ ╲ │ │ Mode 1 Mean/Median Mode 2

Example — Website Traffic (Weekday vs Weekend):

Weekday traffic: 5,000-6,000 sessions/day (peak 1) Weekend traffic: 1,500-2,000 sessions/day (peak 2) Mode 1 = 5,500 (weekdays — most common high traffic) Mode 2 = 1,800 (weekends — most common low traffic) Mean = 4,200 (between two peaks — misleading) Median = 4,500 (also between peaks — misleading)

Takeaway: Mean/median fall BETWEEN peaks (not representative of either group). Report both modes or segment data ("Weekday avg: 5,500, Weekend avg: 1,800").

🏢

Real-World Examples: Mean vs Median Decisions

Let's see how companies use mean vs median for honest communication and decision-making.

Example 1: Swiggy Delivery Time Promise

Context: Swiggy wants to set customer expectations for delivery time on app.

Data: 100,000 deliveries last month

Mean delivery time: 38 minutes Median delivery time: 32 minutes 90th percentile: 55 minutes

Analysis:

  • Mean (38 min) is inflated by long-tail delays (traffic, weather, far locations)
  • Median (32 min) represents typical delivery (half faster, half slower)
  • 90th percentile (55 min) = worst 10% took ≥55 minutes

Decision: Show median + percentile on app:

  • "Typical delivery: 30-35 minutes" (median)
  • "90% of orders delivered within 50 minutes" (90th percentile for cautious estimate)

Why: Median sets realistic expectation for MOST customers. Mean would overpromise (32 min < 38 min).


Example 2: Flipkart Seller Dashboard (Revenue Reporting)

Context: Flipkart shows sellers their "average order value" to help plan inventory/pricing.

Seller's Data (last 1,000 orders):

Mean order value: ₹1,850 Median order value: ₹950 Mode: ₹800 (most common — single-item orders)

Analysis:

  • Mean (₹1,850) is inflated by occasional high-value orders (multi-item, electronics)
  • Median (₹950) represents typical single order
  • Mode (₹800) shows most common order size

Decision: Show ALL THREE on dashboard:

┌─────────────────────────────────────┐ │ Order Value Summary │ ├─────────────────────────────────────┤ │ Average order: ₹1,850 (mean) │ │ Typical order: ₹950 (median) │ │ Most common order: ₹800 (mode) │ └─────────────────────────────────────┘

Why: Sellers need full picture. Mean for revenue forecasting, median for pricing strategy, mode for inventory planning (stock more of ₹800 items).


Example 3: Real Estate Listing (Property Prices)

Context: Real estate website shows "Average home price in Bangalore" on city page.

Data: 5,000 home sales last quarter

Mean: ₹85 lakhs Median: ₹62 lakhs Distribution: 70% of homes sold for ₹40L-₹80L, 30% for ₹1Cr-₹5Cr (luxury)

Analysis:

  • Mean (₹85L) is inflated by luxury properties (₹1Cr-₹5Cr segment)
  • Median (₹62L) represents typical buyer's budget
  • Right-skewed distribution (high-end outliers)

Decision: Use median for city-wide summary:

  • "Median home price: ₹62 lakhs" (more honest for buyers)
  • Include note: "30% of homes sold above ₹1 crore (luxury segment)"

Why: Median protects buyers from false expectations. Saying "average ₹85L" misleads budget-conscious buyers (most homes are ₹40L-₹80L). Median is industry standard in real estate.


Example 4: Salary Negotiation (Company Offer vs Market Data)

Context: Data analyst receives offer of ₹12 LPA. HR says "Our average analyst salary is ₹14 LPA — this is below average."

You investigate Glassdoor data for company:

Mean salary: ₹14 LPA Median salary: ₹10 LPA Distribution: 80% earn ₹8-12 LPA, 20% earn ₹25-40 LPA (senior analysts/managers)

Analysis:

  • Mean (₹14L) is inflated by senior roles (₹25-40L)
  • Median (₹10L) represents typical analyst
  • Your offer (₹12L) is ABOVE median (better than 50% of analysts)

Counter-argument: "Median analyst salary is ₹10 LPA — my ₹12L offer is actually 20% above typical. The ₹14L average includes senior analysts and managers. For entry-level, ₹12L is competitive."

Why: Median prevents misleading comparisons. HR's "below average" claim is technically true but contextually misleading. Median gives fair comparison.

💻

Calculating Mean, Median, Mode in Python/SQL

Here's how to calculate these measures in real analysis workflows.

Python (Pandas)

code.pyPython
import pandas as pd
import numpy as np

# Sample data: E-commerce order values
orders = pd.Series([450, 520, 680, 920, 1100, 1250, 1500, 55000])

# Mean
mean_val = orders.mean()
print(f"Mean: ₹{mean_val:.0f}")  # ₹7,677.5

# Median
median_val = orders.median()
print(f"Median: ₹{median_val:.0f}")  # ₹1,010

# Mode
mode_val = orders.mode()
print(f"Mode: {mode_val.values}")  # [no mode — all unique]

# For dataset with mode
ratings = pd.Series([5, 5, 5, 4, 4, 3, 1, 1])
mode_rating = ratings.mode()[0]  # 5 (most frequent)
print(f"Mode rating: {mode_rating}★")

# Percentiles (bonus: 25th, 50th=median, 75th)
print(orders.quantile([0.25, 0.5, 0.75]))
# 0.25      737.5
# 0.50     1010.0  ← Median
# 0.75     1318.75

# Skewness (positive = right-skewed)
skew = orders.skew()
print(f"Skewness: {skew:.2f}")  # 2.72 (highly right-skewed)

When mean >> median (like here: ₹7,677 vs ₹1,010), data is right-skewed → Use median.


SQL (Most Databases)

query.sqlSQL
-- Mean (AVG built-in)
SELECT AVG(order_value) AS mean_order_value
FROM orders;
-- Result: 7677.5

-- Median (PostgreSQL, BigQuery)
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY order_value) AS median_order_value
FROM orders;
-- Result: 1010

-- Median (MySQL — no built-in, use subquery)
SELECT AVG(order_value) AS median_order_value
FROM (
  SELECT order_value,
         ROW_NUMBER() OVER (ORDER BY order_value) AS rn,
         COUNT(*) OVER () AS cnt
  FROM orders
) sub
WHERE rn IN (FLOOR((cnt+1)/2), CEIL((cnt+1)/2));
-- Result: 1010

-- Mode (most frequent value)
SELECT order_value AS mode_order_value, COUNT(*) AS frequency
FROM orders
GROUP BY order_value
ORDER BY COUNT(*) DESC
LIMIT 1;
-- Returns most common order value

-- Percentiles (25th, 50th, 75th)
SELECT
  PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY order_value) AS p25,
  PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY order_value) AS p50,
  PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY order_value) AS p75
FROM orders;

Excel (Quick Analysis)

Data in column A (A1:A8) Mean: =AVERAGE(A1:A8) Median: =MEDIAN(A1:A8) Mode: =MODE.SNGL(A1:A8) [single mode] =MODE.MULT(A1:A8) [multiple modes, returns array] Percentiles: 25th: =QUARTILE(A1:A8, 1) 50th: =QUARTILE(A1:A8, 2) [same as MEDIAN] 75th: =QUARTILE(A1:A8, 3) 90th: =PERCENTILE(A1:A8, 0.9)
Info

For large datasets (1M+ rows), use SQL or Python (Pandas). Excel slows down with large data. For quick exploration (100K rows), Excel's AVERAGE/MEDIAN functions work fine.

⚠️ FinalQuiz error: Missing or invalid questions array

⚠️ SummarySection error: Missing or invalid items array

Received: {"hasItems":false,"isArray":false}