#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
15 min read

EDA Workflow II

Bivariate and Multivariate analysis: Finding relationships and correlations

What You'll Learn

  • Bivariate analysis (two variables)
  • Correlation vs Causation
  • Scatter plots and Line plots
  • Multivariate analysis (3+ variables)
  • Heatmaps

Bivariate Analysis

Analyzing the relationship between two variables.

Numerical vs Numerical:

code.py
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('tips')

# Scatter Plot
sns.scatterplot(x='total_bill', y='tip', data=df)
plt.title('Bill vs Tip')
plt.show()

# Correlation
correlation = df['total_bill'].corr(df['tip'])
print(f"Correlation: {correlation:.2f}")

Numerical vs Categorical:

code.py
# Box Plot (Distribution by category)
sns.boxplot(x='day', y='total_bill', data=df)
plt.title('Bill Distribution by Day')
plt.show()

# Bar Plot (Mean by category)
sns.barplot(x='sex', y='total_bill', data=df) # Shows mean with confidence interval
plt.show()

Categorical vs Categorical:

code.py
# Cross Tabulation
ct = pd.crosstab(df['day'], df['sex'])
print(ct)

# Heatmap of counts
sns.heatmap(ct, annot=True, fmt='d', cmap='Blues')
plt.show()

Multivariate Analysis

Adding a third (or fourth) dimension.

code.py
# Scatter plot with Color (Hue)
sns.scatterplot(x='total_bill', y='tip', hue='sex', data=df)
plt.title('Bill vs Tip by Sex')
plt.show()

# Scatter plot with Size
sns.scatterplot(x='total_bill', y='tip', size='size', data=df)
plt.show()

# Pair Plot (All numerical relationships)
sns.pairplot(df, hue='sex')
plt.show()

Correlation Heatmap

Visualizing correlations between all numerical variables.

code.py
# Calculate correlation matrix
corr_matrix = df.corr(numeric_only=True)

# Plot heatmap
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title('Correlation Matrix')
plt.show()

Practice Exercise

code.py
import seaborn as sns

# Load diamonds dataset
diamonds = sns.load_dataset('diamonds')

# 1. Correlation between price and carat
print("Correlation:", diamonds['price'].corr(diamonds['carat']))

# 2. Price distribution by cut (Boxplot)
# (Visualization code would go here)

# 3. Price vs Carat colored by Clarity
# (Visualization code would go here)

Next Steps

Now that we understand our data, let's start modeling!

Practice & Experiment

Test your understanding by running Python code directly in your browser. Try the examples from the article above!

SkillsetMaster - AI, Web Development & Data Analytics Courses