Topic 11 of 12

A/B Testing & Experimentation

A/B testing removes guesswork from product decisions. Learn to run experiments that prove what actually works.

๐Ÿ“šAdvanced
โฑ๏ธ17 min
โœ…6 quizzes

What is A/B Testing?

A/B testing compares two versions (A vs B) to see which performs better.

Examples:

  • Button color: Blue vs Green
  • Headline: "Buy Now" vs "Get Started"
  • Pricing: โ‚น999 vs โ‚น1,499
  • Email subject line

Why it matters:

  • Remove opinions, trust data
  • Optimize conversion rates
  • Increase revenue scientifically

When to A/B Test

โœ… Good use cases:

  • Landing page design
  • Email subject lines
  • Product pricing
  • Call-to-action buttons
  • Checkout flow

โŒ Don't A/B test:

  • When you have <1000 users/week
  • Critical bug fixes (just fix it!)
  • Unethical changes

The A/B Testing Process

Step 1: Form Hypothesis

Bad: "Let's test a green button" Good: "Green button will increase clicks by 10% because it stands out more"

Template: "Changing [X] will [increase/decrease] [metric] by [Y]% because [reason]"

Step 2: Choose Metric

Primary metric (one only!):

  • Click-through rate
  • Conversion rate
  • Revenue per user

Secondary metrics:

  • Time on page
  • Bounce rate

Step 3: Calculate Sample Size

Inputs needed:

  • Baseline conversion rate
  • Minimum detectable effect
  • Statistical power (usually 80%)
  • Significance level (usually 0.05)

Example:

  • Current conversion: 5%
  • Want to detect: 10% increase (to 5.5%)
  • Need: ~15,000 visitors per variant

Tools: Use online calculators (Optimizely, VWO, Evan's AB Test Calculator)

Step 4: Run Experiment

Duration:

  • Minimum: 1-2 weeks
  • Run full business cycles
  • Don't stop early!

Random assignment:

  • 50% see version A
  • 50% see version B

Step 5: Analyze Results

Calculate:

  • Conversion rate for each variant
  • Statistical significance (p-value)
  • Confidence interval

Decide:

  • p < 0.05: Winner! (statistically significant)
  • p โ‰ฅ 0.05: No clear winner (keep original)

Statistical Significance

p-value < 0.05 means:

  • Less than 5% chance difference is random
  • 95% confident there's a real effect

Example Results: | Variant | Visitors | Conversions | Conv. Rate | |---------|----------|-------------|------------| | A (Control) | 10,000 | 500 | 5.0% | | B (Test) | 10,000 | 570 | 5.7% |

Analysis:

  • Lift: +14%
  • p-value: 0.02
  • Result: B wins!

Sample Size Matters

Why larger is better:

| Sample Size | Conversion A | Conversion B | Significant? | |-------------|--------------|--------------|--------------| | 100 | 5% | 10% | No (p=0.18) | | 1,000 | 5% | 7% | Yes (p=0.04) | | 10,000 | 5% | 5.5% | Yes (p=0.02) |

Lesson: Small samples miss real effects, large samples detect small effects.

Common Pitfalls

1. Peeking Problem

โŒ Wrong: Check results daily, stop when significant

Why it's bad: Increases false positives

โœ… Right: Decide duration upfront, run full period

2. Multiple Testing

โŒ Wrong: Test 20 variants, pick the one with p<0.05

Why it's bad: 5% false positive rate means 1 in 20 will be "significant" by chance!

โœ… Right: Test A vs B only, or adjust significance level

3. Ignoring External Factors

โŒ Wrong: Run test during festival season

Why it's bad: Seasonal effects confound results

โœ… Right: Run during normal periods, avoid holidays

4. Small Sample Size

โŒ Wrong: 100 visitors per variant

Why it's bad: Not enough power to detect real effects

โœ… Right: Use sample size calculator upfront

5. Testing Too Many Things

โŒ Wrong: Change button color AND text AND position

Why it's bad: Can't tell what caused the change

โœ… Right: Test one change at a time

Real Example: Email Subject Line Test

Scenario: E-commerce company wants to increase email open rates.

Hypothesis: Personalized subject line will increase opens by 15%.

Variants:

  • A (Control): "New arrivals this week"
  • B (Test): "[Name], check out these new arrivals!"

Setup:

  • Send to 20,000 subscribers
  • 10,000 get A, 10,000 get B
  • Measure: Open rate

Results: | Variant | Sent | Opens | Open Rate | |---------|------|-------|-----------| | A | 10,000 | 1,500 | 15% | | B | 10,000 | 1,800 | 18% |

Analysis:

  • Lift: +20%
  • p-value: 0.001
  • Decision: Use personalized subject lines!

Impact:

  • 3% increase in open rate
  • On 1M emails/month = 30,000 extra opens
  • Drives significant revenue

Multivariate Testing

Test multiple elements simultaneously.

Example:

  • Button color (Blue/Green)
  • Button text (Buy/Get)
  • = 4 combinations to test

When to use:

  • High traffic sites
  • Want to optimize multiple elements
  • Need faster results

Downside:

  • Requires much larger sample size
  • More complex analysis

Bayesian A/B Testing

Alternative to traditional (frequentist) approach.

Advantages:

  • Can peek at results anytime
  • Shows probability of A being better than B
  • More intuitive interpretation

Disadvantage:

  • Requires choosing a prior distribution

Tools: VWO, Optimizely (Bayesian mode)

Tools for A/B Testing

| Tool | Best For | Cost | |------|----------|------| | Google Optimize | Websites | Free | | Optimizely | Enterprise | $$$$ | | VWO | Mid-size | $$$ | | Unbounce | Landing pages | $$ | | Mailchimp | Email testing | $ |

Python for A/B Testing

code.pyPython
from scipy import stats

# Sample data
conversions_a = 500  # out of 10,000
conversions_b = 570  # out of 10,000

visitors_a = 10000
visitors_b = 10000

# Conversion rates
rate_a = conversions_a / visitors_a
rate_b = conversions_b / visitors_b

# Chi-square test
observed = [[conversions_a, visitors_a - conversions_a],
            [conversions_b, visitors_b - conversions_b]]

chi2, p_value, dof, expected = stats.chi2_contingency(observed)

print(f"Conversion A: {rate_a:.2%}")
print(f"Conversion B: {rate_b:.2%}")
print(f"Lift: {(rate_b/rate_a - 1):.2%}")
print(f"p-value: {p_value:.4f}")

if p_value < 0.05:
    print("โœ… Statistically significant!")
else:
    print("โŒ Not significant")

Summary

โœ… A/B testing removes guesswork from decisions โœ… Form clear hypothesis before testing โœ… Calculate required sample size upfront โœ… Run test for full business cycle โœ… Don't peek early (peeking problem) โœ… p < 0.05 = statistically significant โœ… Test one change at a time โœ… Consider practical significance, not just statistical

Next: Capstone Project โ€” Put it all together! ๐Ÿš€