5 min read min read
Histograms
Learn to show how data is distributed
Histograms
What is a Histogram?
A histogram shows how data is spread out:
- How many people are age 20-30? 30-40? 40-50?
- How many products cost $10-20? $20-30?
It groups data into bins and counts each bin.
Basic Histogram
code.py
import matplotlib.pyplot as plt
ages = [22, 25, 27, 28, 30, 31, 33, 35, 38, 40, 42, 45, 48, 55]
fig, ax = plt.subplots()
ax.hist(ages)
ax.set_xlabel('Age')
ax.set_ylabel('Count')
ax.set_title('Age Distribution')
plt.show()Set Number of Bins
code.py
ax.hist(ages, bins=5) # 5 bins
ax.hist(ages, bins=10) # 10 bins
ax.hist(ages, bins=20) # 20 binsMore bins = more detail, but can be noisy.
Set Specific Bin Edges
code.py
# Custom ranges: 20-30, 30-40, 40-50, 50-60
ax.hist(ages, bins=[20, 30, 40, 50, 60])Change Color
code.py
ax.hist(ages, color='green')Add Edge Color
code.py
ax.hist(ages, color='skyblue', edgecolor='black')Multiple Histograms
code.py
import matplotlib.pyplot as plt
men_ages = [25, 28, 30, 32, 35, 38, 40, 42]
women_ages = [22, 25, 27, 30, 32, 33, 36, 39]
fig, ax = plt.subplots()
ax.hist(men_ages, alpha=0.5, label='Men', color='blue')
ax.hist(women_ages, alpha=0.5, label='Women', color='red')
ax.legend()
plt.show()alpha makes bars transparent so they overlap nicely.
Show Percentage Instead of Count
code.py
ax.hist(ages, density=True)Horizontal Histogram
code.py
ax.hist(ages, orientation='horizontal')Add Mean Line
code.py
import numpy as np
fig, ax = plt.subplots()
ax.hist(ages, color='skyblue', edgecolor='black')
# Add vertical line at mean
mean_age = np.mean(ages)
ax.axvline(mean_age, color='red', linestyle='--', label=f'Mean: {mean_age:.1f}')
ax.legend()
plt.show()Complete Example
code.py
import matplotlib.pyplot as plt
import numpy as np
# Generate sample data
np.random.seed(42)
salaries = np.random.normal(50000, 15000, 500) # 500 people
fig, ax = plt.subplots(figsize=(10, 6))
# Create histogram
ax.hist(salaries, bins=20, color='steelblue', edgecolor='white')
# Add mean and median lines
mean_sal = np.mean(salaries)
median_sal = np.median(salaries)
ax.axvline(mean_sal, color='red', linestyle='--', label=f'Mean: ${mean_sal:,.0f}')
ax.axvline(median_sal, color='green', linestyle='--', label=f'Median: ${median_sal:,.0f}')
ax.set_xlabel('Salary ($)')
ax.set_ylabel('Number of Employees')
ax.set_title('Salary Distribution')
ax.legend()
plt.show()Histogram vs Bar Chart
| Histogram | Bar Chart |
|---|---|
| Continuous data | Categories |
| Shows distribution | Shows comparison |
| Bars touch | Bars separate |
Key Points
- hist() creates histograms
- bins controls number of groups
- Shows how data is distributed
- Use alpha for overlapping histograms
- Use density=True for percentages
- Add axvline for mean/median markers
What's Next?
Learn box plots for showing data spread and outliers.