5 min read min read
Correlation Analysis
Learn to measure relationships between numeric columns
Correlation Analysis
What is Correlation?
Correlation measures how two numbers move together.
- When one goes up, does the other go up too?
- When one goes up, does the other go down?
- Or is there no pattern?
Correlation Values
| Value | Meaning |
|---|---|
| +1.0 | Perfect positive (both go up) |
| +0.7 to +1.0 | Strong positive |
| +0.3 to +0.7 | Moderate positive |
| -0.3 to +0.3 | Weak or none |
| -0.7 to -0.3 | Moderate negative |
| -1.0 to -0.7 | Strong negative |
| -1.0 | Perfect negative (one up, one down) |
Calculate Correlation
code.py
import pandas as pd
df = pd.DataFrame({
'Hours_Studied': [1, 2, 3, 4, 5, 6],
'Test_Score': [50, 55, 65, 70, 80, 85],
'Hours_Sleep': [8, 7, 6, 5, 4, 3]
})
# Correlation between study and score
print(df['Hours_Studied'].corr(df['Test_Score']))
# Output: 0.98 (strong positive)
# Correlation between study and sleep
print(df['Hours_Studied'].corr(df['Hours_Sleep']))
# Output: -1.0 (perfect negative)Correlation Matrix
See all correlations at once:
code.py
print(df.corr())Output:
Hours_Studied Test_Score Hours_Sleep
Hours_Studied 1.00 0.98 -1.00
Test_Score 0.98 1.00 -0.98
Hours_Sleep -1.00 -0.98 1.00
Reading: Study hours and test score have 0.98 correlation (strong positive).
Interpret Results
code.py
df = pd.DataFrame({
'Age': [25, 30, 35, 40, 45],
'Salary': [40000, 50000, 60000, 70000, 80000],
'Shoe_Size': [9, 10, 9, 11, 10]
})
corr_matrix = df.corr()
print(corr_matrix)- Age & Salary: High correlation (makes sense - experience)
- Age & Shoe Size: Low correlation (no real connection)
Find Highly Correlated Pairs
code.py
# Get correlations above 0.7
corr = df.corr()
# Find strong correlations
for col in corr.columns:
for idx in corr.index:
if col != idx: # Skip self-correlation
if abs(corr.loc[idx, col]) > 0.7:
print(f"{idx} & {col}: {corr.loc[idx, col]:.2f}")Important: Correlation is Not Causation!
High correlation doesn't mean one causes the other.
Example: Ice cream sales and drowning deaths are correlated.
- Ice cream doesn't cause drowning!
- Both increase in summer (hidden factor: hot weather)
Key Points
- corr() calculates correlation
- Value between -1 and +1
- Closer to 1 or -1 = stronger relationship
- Closer to 0 = weaker relationship
- Positive = both go same direction
- Negative = opposite directions
- Correlation ≠ causation
What's Next?
Learn multivariate analysis - looking at many variables together.