6 min read min read
Pivot Tables
Learn to create summary tables like in Excel with pivot_table()
Pivot Tables
What is a Pivot Table?
A pivot table is like a summary report.
Imagine you have thousands of sales records. A pivot table shows you:
- Total sales by store (rows)
- Broken down by product (columns)
Just like Excel pivot tables!
Basic Pivot Table
code.py
import pandas as pd
sales = pd.DataFrame({
'Store': ['NYC', 'NYC', 'LA', 'LA'],
'Product': ['Apple', 'Banana', 'Apple', 'Banana'],
'Sales': [100, 150, 120, 180]
})
print("Raw Data:")
print(sales)
# Create pivot table
pivot = pd.pivot_table(
sales,
values='Sales', # What to calculate
index='Store', # Rows
columns='Product' # Columns
)
print("\nPivot Table:")
print(pivot)Output:
Raw Data:
Store Product Sales
0 NYC Apple 100
1 NYC Banana 150
2 LA Apple 120
3 LA Banana 180
Pivot Table:
Product Apple Banana
Store
LA 120 180
NYC 100 150
Easy to read summary!
Choose the Calculation (aggfunc)
By default, pivot_table calculates the mean. Change it:
code.py
sales = pd.DataFrame({
'Store': ['NYC', 'NYC', 'LA', 'LA', 'LA'],
'Product': ['Apple', 'Apple', 'Apple', 'Banana', 'Banana'],
'Sales': [100, 120, 150, 180, 200]
})
# Sum instead of mean
pivot = pd.pivot_table(
sales,
values='Sales',
index='Store',
columns='Product',
aggfunc='sum'
)
print(pivot)Output:
Product Apple Banana
Store
LA 150 380
NYC 220 NaN
Common aggfunc Options
'sum'- Total'mean'- Average (default)'count'- How many'min'- Smallest'max'- Largest
Fill Missing Values
code.py
pivot = pd.pivot_table(
sales,
values='Sales',
index='Store',
columns='Product',
aggfunc='sum',
fill_value=0 # Fill NaN with 0
)
print(pivot)Output:
Product Apple Banana
Store
LA 150 380
NYC 220 0
Add Row and Column Totals
code.py
pivot = pd.pivot_table(
sales,
values='Sales',
index='Store',
columns='Product',
aggfunc='sum',
fill_value=0,
margins=True, # Add totals
margins_name='Total' # Name for totals
)
print(pivot)Output:
Product Apple Banana Total
Store
LA 150 380 530
NYC 220 0 220
Total 370 380 750
Grand total is 750.
Multiple Values
Calculate sum AND count:
code.py
pivot = pd.pivot_table(
sales,
values='Sales',
index='Store',
columns='Product',
aggfunc=['sum', 'count']
)
print(pivot)Multiple Index Rows
code.py
data = pd.DataFrame({
'Year': [2023, 2023, 2024, 2024],
'Store': ['NYC', 'LA', 'NYC', 'LA'],
'Sales': [100, 150, 120, 180]
})
pivot = pd.pivot_table(
data,
values='Sales',
index=['Year', 'Store'], # Two row levels
aggfunc='sum'
)
print(pivot)Output:
Sales
Year Store
2023 LA 150
NYC 100
2024 LA 180
NYC 120
Practice Example
code.py
import pandas as pd
# Monthly sales data
orders = pd.DataFrame({
'Month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Feb'],
'Region': ['East', 'West', 'East', 'East', 'West', 'West'],
'Product': ['A', 'A', 'B', 'A', 'B', 'A'],
'Sales': [100, 150, 200, 120, 180, 160]
})
print("Order Data:")
print(orders)
# Sales by Region and Product
pivot1 = pd.pivot_table(
orders,
values='Sales',
index='Region',
columns='Product',
aggfunc='sum',
fill_value=0,
margins=True
)
print("\nSales by Region and Product:")
print(pivot1)
# Sales by Month and Region
pivot2 = pd.pivot_table(
orders,
values='Sales',
index='Month',
columns='Region',
aggfunc='sum',
fill_value=0
)
print("\nSales by Month and Region:")
print(pivot2)Pivot Table vs GroupBy
Both can summarize data, but:
| Feature | Pivot Table | GroupBy |
|---|---|---|
| Output | 2D table | Series or DataFrame |
| Columns | Auto-created | Need unstack() |
| Totals | Easy with margins | Manual |
| Excel-like | Yes | No |
code.py
# GroupBy way
grouped = sales.groupby(['Store', 'Product'])['Sales'].sum().unstack(fill_value=0)
# Pivot Table way
pivot = pd.pivot_table(sales, values='Sales', index='Store',
columns='Product', aggfunc='sum', fill_value=0)
# Both give same result!Key Points
pivot_table()creates summary tablesvalues= what to calculateindex= row labelscolumns= column labelsaggfunc= how to calculate (sum, mean, count)fill_value=0fills missing datamargins=Trueadds totals
Common Mistakes
Mistake 1: Forgetting aggfunc defaults to mean
code.py
# This calculates MEAN, not sum
pd.pivot_table(data, values='Sales', index='Store')
# For sum, specify it
pd.pivot_table(data, values='Sales', index='Store', aggfunc='sum')Mistake 2: Confusing index and columns
code.py
# index = rows, columns = columns
# Think: index goes DOWN, columns go ACROSSWhat's Next?
You learned pivot tables. Next, you'll learn Cross Tabulation - counting combinations of categories.