#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Correlation Analysis

Learn to measure relationships between numeric columns

Correlation Analysis

What is Correlation?

Correlation measures how two numbers move together.

  • When one goes up, does the other go up too?
  • When one goes up, does the other go down?
  • Or is there no pattern?

Correlation Values

ValueMeaning
+1.0Perfect positive (both go up)
+0.7 to +1.0Strong positive
+0.3 to +0.7Moderate positive
-0.3 to +0.3Weak or none
-0.7 to -0.3Moderate negative
-1.0 to -0.7Strong negative
-1.0Perfect negative (one up, one down)

Calculate Correlation

code.py
import pandas as pd

df = pd.DataFrame({
    'Hours_Studied': [1, 2, 3, 4, 5, 6],
    'Test_Score': [50, 55, 65, 70, 80, 85],
    'Hours_Sleep': [8, 7, 6, 5, 4, 3]
})

# Correlation between study and score
print(df['Hours_Studied'].corr(df['Test_Score']))
# Output: 0.98 (strong positive)

# Correlation between study and sleep
print(df['Hours_Studied'].corr(df['Hours_Sleep']))
# Output: -1.0 (perfect negative)

Correlation Matrix

See all correlations at once:

code.py
print(df.corr())

Output:

Hours_Studied Test_Score Hours_Sleep Hours_Studied 1.00 0.98 -1.00 Test_Score 0.98 1.00 -0.98 Hours_Sleep -1.00 -0.98 1.00

Reading: Study hours and test score have 0.98 correlation (strong positive).

Interpret Results

code.py
df = pd.DataFrame({
    'Age': [25, 30, 35, 40, 45],
    'Salary': [40000, 50000, 60000, 70000, 80000],
    'Shoe_Size': [9, 10, 9, 11, 10]
})

corr_matrix = df.corr()
print(corr_matrix)
  • Age & Salary: High correlation (makes sense - experience)
  • Age & Shoe Size: Low correlation (no real connection)

Find Highly Correlated Pairs

code.py
# Get correlations above 0.7
corr = df.corr()

# Find strong correlations
for col in corr.columns:
    for idx in corr.index:
        if col != idx:  # Skip self-correlation
            if abs(corr.loc[idx, col]) > 0.7:
                print(f"{idx} & {col}: {corr.loc[idx, col]:.2f}")

Important: Correlation is Not Causation!

High correlation doesn't mean one causes the other.

Example: Ice cream sales and drowning deaths are correlated.

  • Ice cream doesn't cause drowning!
  • Both increase in summer (hidden factor: hot weather)

Key Points

  • corr() calculates correlation
  • Value between -1 and +1
  • Closer to 1 or -1 = stronger relationship
  • Closer to 0 = weaker relationship
  • Positive = both go same direction
  • Negative = opposite directions
  • Correlation ≠ causation

What's Next?

Learn multivariate analysis - looking at many variables together.

SkillsetMaster - AI, Web Development & Data Analytics Courses