Metric Selection
Choose the right metrics for experiments
What You'll Learn
- Choosing primary metrics
- Metric characteristics
- Leading vs lagging metrics
- Sensitivity and robustness
- Common metric pitfalls
Characteristics of Good Metrics
1. Aligned with business goals Measures what you care about
2. Sensitive Detects meaningful changes
3. Robust Not easily gamed or manipulated
4. Interpretable Stakeholders understand it
5. Timely Measurable in reasonable timeframe
Primary Metric Selection
Choose ONE primary metric: The decision criterion
Must be:
- Directly related to goal
- Measurable for all users
- Available during test period
Examples:
E-commerce:
- Conversion rate
- Revenue per visitor
- Average order value
Social media:
- Daily active users
- Time spent
- Engagement rate
SaaS:
- Sign-up rate
- Activation rate
- Retention rate
Click-Through Rate (CTR)
Definition: Clicks / Impressions
Pros:
- Easy to measure
- Quick feedback
- High sensitivity
Cons:
- Doesn't measure outcomes
- Can be gamed (clickbait)
- May not correlate with business value
When to use: Early funnel optimization
When NOT to use: When you care about final conversions
Conversion Rate
Definition: Conversions / Visitors
Pros:
- Directly measures outcomes
- Clear business value
- Interpretable
Cons:
- May need large sample
- Slower than CTR
- Binary (doesn't capture magnitude)
Variations:
- Purchase conversion
- Sign-up conversion
- Free-to-paid conversion
Revenue Metrics
Revenue per visitor (RPV): Total revenue / Visitors
Pros:
- Captures magnitude
- Direct business impact
- Comprehensive
Cons:
- High variance (few big purchases)
- Need large samples
- Outlier sensitive
Average order value (AOV): Revenue / Orders
Pros:
- Easier to move than volume
- Lower variance than RPV
Cons:
- Ignores conversion rate
- Could decrease orders while increasing AOV
Composite Metrics
Combine multiple signals:
Example: OEC (Overall Evaluation Criterion) Weighted combination: OEC = 0.5(Revenue) + 0.3(Engagement) + 0.2(Retention)
Pros:
- Holistic view
- Balance trade-offs
Cons:
- Complex to explain
- Weights are subjective
- Harder to debug issues
Leading vs Lagging Metrics
Leading metrics: Early indicators, quick feedback
- Click rate
- Add-to-cart
- Email opens
Lagging metrics: Final outcomes, slower
- Revenue
- Retention
- Lifetime value
Strategy:
- Optimize leading metrics for speed
- Validate with lagging metrics
- Ensure correlation between them!
Example: Leading: Email click rate Lagging: Conversions from email → Only optimize clicks if they convert!
Sensitivity and Variance
Sensitivity: Ability to detect small changes
High sensitivity:
- Frequent events
- Low variance
- Large sample
Low sensitivity:
- Rare events
- High variance
- Need huge sample
Example comparison:
Metric A: Page views per user
- Mean: 5.2, SD: 2.1
- Sensitive to changes
Metric B: Purchases per user
- Mean: 0.05, SD: 0.22
- Less sensitive (rare event)
Count vs Rate Metrics
Count metrics: Total clicks, revenue, users
- Affected by exposure
- Need to normalize
Rate metrics: CTR, conversion rate, RPV
- Already normalized
- Better for comparison
Example: ❌ "Treatment had 1000 more clicks" (What if it had more traffic?)
✓ "Treatment had 2% higher CTR" (Normalized)
User-Level vs Session-Level
User-level: Metrics per user
- Better for retention
- Removes session count effect
Session-level: Metrics per session
- Better for engagement
- Captures revisit behavior
Example:
User-level: 60% of users converted Session-level: 40% of sessions converted (Same users, multiple sessions)
Choose based on goal!
Ratio Metrics
Form: Numerator / Denominator
Examples:
- CTR = Clicks / Impressions
- Conversion = Orders / Visitors
- Engagement = Time / Sessions
Challenge: Both parts can vary!
Delta method: Approximate variance for statistical tests
Alternative: Bootstrap confidence intervals
Guardrail Metrics
Purpose: Ensure you don't break things
Examples:
Performance:
- Page load time
- Error rate
- Crash rate
User experience:
- Bounce rate
- Time to first action
- Help tickets
Revenue:
- Revenue per user shouldn't tank
Set thresholds: "Stop test if page load > 3 seconds"
Metric Tradeoffs
Common tensions:
Short-term vs long-term: Clicks (short) vs retention (long)
Engagement vs monetization: Time spent vs revenue
Growth vs experience: New users vs user satisfaction
Strategy:
- Define acceptable tradeoffs
- Monitor secondary metrics
- Long-term tracking
Novelty and Primacy Effects
Novelty effect: Users react to change itself → Metrics improve then regress
Primacy effect: Existing users prefer old version → Metrics worse then improve
Solution:
- Run longer tests (2-4 weeks)
- Analyze new vs returning separately
- Focus on new users for true effect
Statistical Properties
Good metric properties:
1. Low variance More stable, easier to detect changes
2. Normal distribution Standard tests work well
3. No ceiling/floor effects Room to improve
Example:
Metric A: 50% engagement (can go up or down) Metric B: 95% engagement (ceiling effect!)
Metric A is better for testing
Metric Debugging
When results seem weird:
1. Check metric definition Calculated correctly?
2. Check data quality Logging working?
3. Check sample ratio 50/50 split?
4. Check segments Effect in all groups?
5. Check time patterns Consistent across days?
Industry-Specific Metrics
Marketplace:
- Gross merchandise value (GMV)
- Take rate
- Liquidity (supply/demand balance)
Content:
- Time on site
- Pages per session
- Return rate
Freemium:
- Free-to-paid conversion
- Feature adoption
- Upgrade rate
Gaming:
- Day 1/7/30 retention
- ARPU (average revenue per user)
- Session length
Proxy Metrics
When long-term metrics take too long:
Ultimate metric: 1-year retention Proxy: Week 1 engagement
Requirements:
- Correlation with ultimate metric
- Measurable quickly
- Validate relationship regularly
Example: Videos watched (proxy) → Retention (ultimate)
Practice Exercise
Scenario: Testing new product recommendation algorithm
Questions:
- What could be primary metrics?
- What are guardrail metrics?
- Leading vs lagging considerations?
- How long to run test?
Possible answers:
- Primary: CTR on recommendations, Add-to-cart rate, Revenue per user
- Guardrails: Overall conversion, customer satisfaction, page load time
- Leading: Click-through, Lagging: Purchases
- At least 2 weeks to capture repeat behavior
Next Steps
Learn about Avoiding Peeking!
Tip: The right metric makes or breaks your experiment!