5 min read min read
Univariate Analysis - Numerical
Learn to analyze one numerical column at a time
Univariate Analysis - Numerical
What is Univariate Analysis?
Univariate = one variable. Analyzing one column by itself.
Analyzing Numbers
For numerical columns (age, salary, price):
code.py
import pandas as pd
df = pd.DataFrame({
'Age': [22, 25, 28, 30, 32, 35, 40, 45, 50, 65]
})
# Basic stats
print(df['Age'].describe())Key Statistics to Check
code.py
# Center of data
print("Mean:", df['Age'].mean())
print("Median:", df['Age'].median())
# Spread of data
print("Min:", df['Age'].min())
print("Max:", df['Age'].max())
print("Std:", df['Age'].std())
# Quartiles
print("25%:", df['Age'].quantile(0.25))
print("75%:", df['Age'].quantile(0.75))Check Distribution Shape
code.py
# Skewness: is data tilted left or right?
print("Skewness:", df['Age'].skew())- Skew = 0: Balanced (symmetric)
- Skew > 0: Tail on right (most values are low)
- Skew < 0: Tail on left (most values are high)
Count Values in Ranges (Bins)
code.py
# How many people in each age group?
bins = [0, 30, 40, 50, 100]
labels = ['Young', 'Middle', 'Senior', 'Old']
df['Age_Group'] = pd.cut(df['Age'], bins=bins, labels=labels)
print(df['Age_Group'].value_counts())Simple Histogram (Text)
code.py
# Quick view of distribution
print(df['Age'].value_counts(bins=5).sort_index())Check for Problems
code.py
# Any missing?
print("Missing:", df['Age'].isna().sum())
# Any zeros (if shouldn't be)?
print("Zeros:", (df['Age'] == 0).sum())
# Any negatives?
print("Negative:", (df['Age'] < 0).sum())
# Unique values
print("Unique values:", df['Age'].nunique())Find Unusual Values
code.py
# Values far from normal
mean = df['Age'].mean()
std = df['Age'].std()
# Values more than 2 std from mean
unusual = df[(df['Age'] < mean - 2*std) | (df['Age'] > mean + 2*std)]
print("Unusual values:", unusual)Quick Analysis Template
code.py
def analyze_numeric(series):
print(f"Column: {series.name}")
print(f"Count: {series.count()}")
print(f"Missing: {series.isna().sum()}")
print(f"Mean: {series.mean():.2f}")
print(f"Median: {series.median():.2f}")
print(f"Std: {series.std():.2f}")
print(f"Min: {series.min()}")
print(f"Max: {series.max()}")
print(f"Skew: {series.skew():.2f}")
analyze_numeric(df['Age'])Key Points
- Univariate = analyze one column
- Check: mean, median, min, max, std
- Look at distribution shape (skewness)
- Find missing values and outliers
- Group into bins to see patterns
What's Next?
Learn to analyze categorical columns (text like city, gender, status).