5 min read min read
Multivariate Analysis
Learn to analyze multiple columns together
Multivariate Analysis
What is Multivariate Analysis?
Multivariate = many variables. Looking at 3 or more columns together.
Real data has many factors. Salary depends on age AND education AND city AND experience.
Group by Multiple Columns
code.py
import pandas as pd
df = pd.DataFrame({
'Gender': ['M', 'F', 'M', 'F', 'M', 'F'],
'City': ['NYC', 'NYC', 'LA', 'LA', 'NYC', 'LA'],
'Salary': [60000, 65000, 55000, 58000, 62000, 60000]
})
# Average salary by gender AND city
result = df.groupby(['Gender', 'City'])['Salary'].mean()
print(result)Output:
Gender City
F LA 59000
NYC 65000
M LA 55000
NYC 61000
Pivot Table for Summary
code.py
# Create summary table
pivot = pd.pivot_table(
df,
values='Salary',
index='Gender',
columns='City',
aggfunc='mean'
)
print(pivot)Output:
City LA NYC
Gender
F 59000 65000
M 55000 61000
Easy to compare all combinations!
Multiple Aggregations
code.py
result = df.groupby(['Gender', 'City']).agg({
'Salary': ['mean', 'min', 'max', 'count']
})
print(result)Correlation Matrix
See how all numeric columns relate:
code.py
df = pd.DataFrame({
'Age': [25, 30, 35, 40, 45, 50],
'Experience': [2, 5, 8, 12, 15, 20],
'Salary': [40000, 50000, 60000, 70000, 80000, 90000],
'Hours': [45, 42, 40, 38, 35, 35]
})
print(df.corr())Look for: High correlations (>0.7 or <-0.7)
Summarize by Category
code.py
df = pd.DataFrame({
'Department': ['Sales', 'IT', 'Sales', 'IT', 'HR'],
'Level': ['Junior', 'Senior', 'Senior', 'Junior', 'Senior'],
'Salary': [50000, 80000, 60000, 65000, 55000]
})
# Average by department and level
summary = df.pivot_table(
values='Salary',
index='Department',
columns='Level',
aggfunc='mean',
fill_value=0
)
print(summary)Find Patterns in Groups
code.py
# Which combination has highest salary?
result = df.groupby(['Department', 'Level'])['Salary'].mean()
print("Highest:", result.idxmax(), "=", result.max())Quick Multi-Column Summary
code.py
# Describe all numeric columns
print(df.describe())
# Count all categorical combinations
print(df.groupby(['Department', 'Level']).size())Key Points
- groupby() with multiple columns for detailed breakdown
- pivot_table() creates easy-to-read summaries
- corr() shows all numeric relationships
- Look for patterns across multiple factors
- Real-world data needs multivariate thinking
What's Next?
Learn to find outliers - unusual values in your data.