T-Tests and Chi-Square

When to Use Which Test?

Test	Data Type	Purpose
T-test	Numerical	Compare means
Chi-Square	Categorical	Compare frequencies

T-Test Types Summary

1. One-Sample T-Test

Compare sample mean to known value:

code.py

from scipy import stats

# Average human body temp = 98.6°F
# Our sample:
temps = [98.4, 98.6, 98.8, 98.2, 98.9, 98.5, 98.7, 98.3, 98.6, 98.5]

t_stat, p_value = stats.ttest_1samp(temps, 98.6)
print(f"P-value: {p_value:.4f}")

2. Independent T-Test

Compare two different groups:

code.py

# Male vs Female heights
males = [175, 180, 172, 178, 176, 182, 174, 179, 177, 181]
females = [162, 165, 160, 168, 163, 167, 161, 166, 164, 169]

t_stat, p_value = stats.ttest_ind(males, females)
print(f"P-value: {p_value:.4f}")

3. Paired T-Test

Same group, two measurements:

code.py

# Blood pressure before/after medication
before = [140, 135, 150, 145, 138, 142, 148, 136, 144, 139]
after = [130, 128, 142, 138, 132, 135, 140, 130, 138, 134]

t_stat, p_value = stats.ttest_rel(before, after)
print(f"P-value: {p_value:.4f}")

Welch's T-Test

Use when groups have unequal variances:

code.py

group1 = [10, 12, 11, 13, 12]
group2 = [20, 25, 30, 22, 28, 35, 40]  # Different size, more variance

# Default ttest_ind assumes equal variance
# Use equal_var=False for Welch's test
t_stat, p_value = stats.ttest_ind(group1, group2, equal_var=False)
print(f"P-value (Welch's): {p_value:.4f}")

Chi-Square Test

For categorical data (counts/frequencies).

Chi-Square Goodness of Fit

Test if observed matches expected:

code.py

from scipy.stats import chisquare

# Dice roll: Is it fair?
# Rolled 60 times, expected 10 of each
observed = [8, 12, 9, 11, 10, 10]  # actual counts
expected = [10, 10, 10, 10, 10, 10]  # if fair

chi_stat, p_value = chisquare(observed, expected)

print(f"Chi-square: {chi_stat:.3f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Dice is NOT fair")
else:
    print("Dice appears fair")

Chi-Square Test of Independence

Test if two categorical variables are related:

code.py

from scipy.stats import chi2_contingency
import numpy as np

# Survey: Is gender related to product preference?
#              Product A  Product B  Product C
# Male           30         20         10
# Female         20         35         25

observed = np.array([
    [30, 20, 10],
    [20, 35, 25]
])

chi_stat, p_value, dof, expected = chi2_contingency(observed)

print(f"Chi-square: {chi_stat:.3f}")
print(f"P-value: {p_value:.4f}")
print(f"Degrees of freedom: {dof}")

if p_value < 0.05:
    print("Gender and product preference ARE related")
else:
    print("Gender and product preference are NOT related")

Effect Size

P-value tells IF there's a difference, effect size tells HOW BIG.

Cohen's d (for T-tests)

code.py

def cohens_d(group1, group2):
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)

    # Pooled standard deviation
    pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))

    return (np.mean(group1) - np.mean(group2)) / pooled_std

import numpy as np

group_a = [85, 90, 88, 92, 87]
group_b = [75, 80, 78, 82, 77]

d = cohens_d(group_a, group_b)
print(f"Cohen's d: {d:.2f}")

# Interpretation:
# 0.2 = small effect
# 0.5 = medium effect
# 0.8 = large effect

Cramér's V (for Chi-Square)

code.py

def cramers_v(contingency_table):
    chi2 = chi2_contingency(contingency_table)[0]
    n = contingency_table.sum()
    min_dim = min(contingency_table.shape) - 1
    return np.sqrt(chi2 / (n * min_dim))

observed = np.array([[30, 20, 10], [20, 35, 25]])
v = cramers_v(observed)
print(f"Cramér's V: {v:.2f}")

# Interpretation:
# 0.1 = small
# 0.3 = medium
# 0.5 = large

Assumptions

T-Test Assumptions

Data is numerical
Approximately normal distribution
Independent observations

Chi-Square Assumptions

Data is categorical
Expected count ≥ 5 in each cell
Independent observations

Complete Example

code.py

from scipy import stats
import numpy as np

# Study: Does a training program improve test scores?

np.random.seed(42)
before = np.random.normal(70, 10, 30)
after = before + np.random.normal(5, 3, 30)  # Small improvement

print("=== Training Program Effectiveness ===")
print(f"Before: Mean = {np.mean(before):.1f}, Std = {np.std(before):.1f}")
print(f"After: Mean = {np.mean(after):.1f}, Std = {np.std(after):.1f}")

# Paired t-test
t_stat, p_value = stats.ttest_rel(after, before)
print(f"\nPaired T-test:")
print(f"T-statistic: {t_stat:.3f}")
print(f"P-value: {p_value:.6f}")

# Effect size
improvement = after - before
d = np.mean(improvement) / np.std(improvement)
print(f"Effect size (d): {d:.2f}")

if p_value < 0.05:
    print("\n✓ Training program significantly improves scores!")

Key Points

T-test: Compare numerical means
Chi-square: Compare categorical frequencies
Use paired t-test for before/after
Use independent t-test for two groups
Use Welch's t-test for unequal variances
Always check effect size, not just p-value
Verify assumptions before using tests

What's Next?

Learn the basics of Machine Learning.