Metric Selection | Learn Statistics Free - SkillsetMaster | Learn Data Analytics Free

What You'll Learn

Choosing primary metrics
Metric characteristics
Leading vs lagging metrics
Sensitivity and robustness
Common metric pitfalls

Characteristics of Good Metrics

1. Aligned with business goals Measures what you care about

2. Sensitive Detects meaningful changes

3. Robust Not easily gamed or manipulated

4. Interpretable Stakeholders understand it

5. Timely Measurable in reasonable timeframe

Primary Metric Selection

Choose ONE primary metric: The decision criterion

Must be:

Directly related to goal
Measurable for all users
Available during test period

Examples:

E-commerce:

Conversion rate
Revenue per visitor
Average order value

Social media:

Daily active users
Time spent
Engagement rate

SaaS:

Sign-up rate
Activation rate
Retention rate

Click-Through Rate (CTR)

Definition: Clicks / Impressions

Pros:

Easy to measure
Quick feedback
High sensitivity

Cons:

Doesn't measure outcomes
Can be gamed (clickbait)
May not correlate with business value

When to use: Early funnel optimization

When NOT to use: When you care about final conversions

Conversion Rate

Definition: Conversions / Visitors

Pros:

Directly measures outcomes
Clear business value
Interpretable

Cons:

May need large sample
Slower than CTR
Binary (doesn't capture magnitude)

Variations:

Purchase conversion
Sign-up conversion
Free-to-paid conversion

Revenue Metrics

Revenue per visitor (RPV): Total revenue / Visitors

Pros:

Captures magnitude
Direct business impact
Comprehensive

Cons:

High variance (few big purchases)
Need large samples
Outlier sensitive

Average order value (AOV): Revenue / Orders

Pros:

Easier to move than volume
Lower variance than RPV

Cons:

Ignores conversion rate
Could decrease orders while increasing AOV

Composite Metrics

Combine multiple signals:

Example: OEC (Overall Evaluation Criterion) Weighted combination: OEC = 0.5(Revenue) + 0.3(Engagement) + 0.2(Retention)

Pros:

Holistic view
Balance trade-offs

Cons:

Complex to explain
Weights are subjective
Harder to debug issues

Leading vs Lagging Metrics

Leading metrics: Early indicators, quick feedback

Click rate
Add-to-cart
Email opens

Lagging metrics: Final outcomes, slower

Revenue
Retention
Lifetime value

Strategy:

Optimize leading metrics for speed
Validate with lagging metrics
Ensure correlation between them!

Example: Leading: Email click rate Lagging: Conversions from email → Only optimize clicks if they convert!

Sensitivity and Variance

Sensitivity: Ability to detect small changes

High sensitivity:

Frequent events
Low variance
Large sample

Low sensitivity:

Rare events
High variance
Need huge sample

Example comparison:

Metric A: Page views per user

Mean: 5.2, SD: 2.1
Sensitive to changes

Metric B: Purchases per user

Mean: 0.05, SD: 0.22
Less sensitive (rare event)

Count vs Rate Metrics

Count metrics: Total clicks, revenue, users

Affected by exposure
Need to normalize

Rate metrics: CTR, conversion rate, RPV

Already normalized
Better for comparison

Example: ❌ "Treatment had 1000 more clicks" (What if it had more traffic?)

✓ "Treatment had 2% higher CTR" (Normalized)

User-Level vs Session-Level

User-level: Metrics per user

Better for retention
Removes session count effect

Session-level: Metrics per session

Better for engagement
Captures revisit behavior

Example:

User-level: 60% of users converted Session-level: 40% of sessions converted (Same users, multiple sessions)

Choose based on goal!

Ratio Metrics

Form: Numerator / Denominator

Examples:

CTR = Clicks / Impressions
Conversion = Orders / Visitors
Engagement = Time / Sessions

Challenge: Both parts can vary!

Delta method: Approximate variance for statistical tests

Alternative: Bootstrap confidence intervals

Guardrail Metrics

Purpose: Ensure you don't break things

Examples:

Performance:

Page load time
Error rate
Crash rate

User experience:

Bounce rate
Time to first action
Help tickets

Revenue:

Revenue per user shouldn't tank

Set thresholds: "Stop test if page load > 3 seconds"

Metric Tradeoffs

Common tensions:

Short-term vs long-term: Clicks (short) vs retention (long)

Engagement vs monetization: Time spent vs revenue

Growth vs experience: New users vs user satisfaction

Strategy:

Define acceptable tradeoffs
Monitor secondary metrics
Long-term tracking

Novelty and Primacy Effects

Novelty effect: Users react to change itself → Metrics improve then regress

Primacy effect: Existing users prefer old version → Metrics worse then improve

Solution:

Run longer tests (2-4 weeks)
Analyze new vs returning separately
Focus on new users for true effect

Statistical Properties

Good metric properties:

1. Low variance More stable, easier to detect changes

2. Normal distribution Standard tests work well

3. No ceiling/floor effects Room to improve

Example:

Metric A: 50% engagement (can go up or down) Metric B: 95% engagement (ceiling effect!)

Metric A is better for testing

Metric Debugging

When results seem weird:

1. Check metric definition Calculated correctly?

2. Check data quality Logging working?

3. Check sample ratio 50/50 split?

4. Check segments Effect in all groups?

5. Check time patterns Consistent across days?

Industry-Specific Metrics

Marketplace:

Gross merchandise value (GMV)
Take rate
Liquidity (supply/demand balance)

Content:

Time on site
Pages per session
Return rate

Freemium:

Free-to-paid conversion
Feature adoption
Upgrade rate

Gaming:

Day 1/7/30 retention
ARPU (average revenue per user)
Session length

Proxy Metrics

When long-term metrics take too long:

Ultimate metric: 1-year retention Proxy: Week 1 engagement

Requirements:

Correlation with ultimate metric
Measurable quickly
Validate relationship regularly

Example: Videos watched (proxy) → Retention (ultimate)

Practice Exercise

Scenario: Testing new product recommendation algorithm

Questions:

What could be primary metrics?
What are guardrail metrics?
Leading vs lagging considerations?
How long to run test?

Possible answers:

Primary: CTR on recommendations, Add-to-cart rate, Revenue per user
Guardrails: Overall conversion, customer satisfaction, page load time
Leading: Click-through, Lagging: Purchases
At least 2 weeks to capture repeat behavior

Next Steps

Learn about Avoiding Peeking!

Tip: The right metric makes or breaks your experiment!