Pivot Tables

What is a Pivot Table?

A pivot table is like a summary report.

Imagine you have thousands of sales records. A pivot table shows you:

Total sales by store (rows)
Broken down by product (columns)

Just like Excel pivot tables!

Basic Pivot Table

code.py

import pandas as pd

sales = pd.DataFrame({
    'Store': ['NYC', 'NYC', 'LA', 'LA'],
    'Product': ['Apple', 'Banana', 'Apple', 'Banana'],
    'Sales': [100, 150, 120, 180]
})

print("Raw Data:")
print(sales)

# Create pivot table
pivot = pd.pivot_table(
    sales,
    values='Sales',      # What to calculate
    index='Store',       # Rows
    columns='Product'    # Columns
)
print("\nPivot Table:")
print(pivot)

Output:

Raw Data:
  Store Product  Sales
0   NYC   Apple    100
1   NYC  Banana    150
2    LA   Apple    120
3    LA  Banana    180

Pivot Table:
Product  Apple  Banana
Store
LA         120     180
NYC        100     150

Easy to read summary!

Choose the Calculation (aggfunc)

By default, pivot_table calculates the mean. Change it:

code.py

sales = pd.DataFrame({
    'Store': ['NYC', 'NYC', 'LA', 'LA', 'LA'],
    'Product': ['Apple', 'Apple', 'Apple', 'Banana', 'Banana'],
    'Sales': [100, 120, 150, 180, 200]
})

# Sum instead of mean
pivot = pd.pivot_table(
    sales,
    values='Sales',
    index='Store',
    columns='Product',
    aggfunc='sum'
)
print(pivot)

Output:

Product  Apple  Banana
Store
LA         150     380
NYC        220     NaN

Common aggfunc Options

'sum' - Total
'mean' - Average (default)
'count' - How many
'min' - Smallest
'max' - Largest

Fill Missing Values

code.py

pivot = pd.pivot_table(
    sales,
    values='Sales',
    index='Store',
    columns='Product',
    aggfunc='sum',
    fill_value=0    # Fill NaN with 0
)
print(pivot)

Output:

Product  Apple  Banana
Store
LA         150     380
NYC        220       0

Add Row and Column Totals

code.py

pivot = pd.pivot_table(
    sales,
    values='Sales',
    index='Store',
    columns='Product',
    aggfunc='sum',
    fill_value=0,
    margins=True,           # Add totals
    margins_name='Total'    # Name for totals
)
print(pivot)

Output:

Product  Apple  Banana  Total
Store
LA         150     380    530
NYC        220       0    220
Total      370     380    750

Grand total is 750.

Multiple Values

Calculate sum AND count:

code.py

pivot = pd.pivot_table(
    sales,
    values='Sales',
    index='Store',
    columns='Product',
    aggfunc=['sum', 'count']
)
print(pivot)

Multiple Index Rows

code.py

data = pd.DataFrame({
    'Year': [2023, 2023, 2024, 2024],
    'Store': ['NYC', 'LA', 'NYC', 'LA'],
    'Sales': [100, 150, 120, 180]
})

pivot = pd.pivot_table(
    data,
    values='Sales',
    index=['Year', 'Store'],    # Two row levels
    aggfunc='sum'
)
print(pivot)

Output:

            Sales
Year Store
2023 LA       150
     NYC      100
2024 LA       180
     NYC      120

Practice Example

code.py

import pandas as pd

# Monthly sales data
orders = pd.DataFrame({
    'Month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Feb'],
    'Region': ['East', 'West', 'East', 'East', 'West', 'West'],
    'Product': ['A', 'A', 'B', 'A', 'B', 'A'],
    'Sales': [100, 150, 200, 120, 180, 160]
})

print("Order Data:")
print(orders)

# Sales by Region and Product
pivot1 = pd.pivot_table(
    orders,
    values='Sales',
    index='Region',
    columns='Product',
    aggfunc='sum',
    fill_value=0,
    margins=True
)
print("\nSales by Region and Product:")
print(pivot1)

# Sales by Month and Region
pivot2 = pd.pivot_table(
    orders,
    values='Sales',
    index='Month',
    columns='Region',
    aggfunc='sum',
    fill_value=0
)
print("\nSales by Month and Region:")
print(pivot2)

Pivot Table vs GroupBy

Both can summarize data, but:

Feature	Pivot Table	GroupBy
Output	2D table	Series or DataFrame
Columns	Auto-created	Need unstack()
Totals	Easy with margins	Manual
Excel-like	Yes	No

code.py

# GroupBy way
grouped = sales.groupby(['Store', 'Product'])['Sales'].sum().unstack(fill_value=0)

# Pivot Table way
pivot = pd.pivot_table(sales, values='Sales', index='Store',
                        columns='Product', aggfunc='sum', fill_value=0)

# Both give same result!

Key Points

pivot_table() creates summary tables
values = what to calculate
index = row labels
columns = column labels
aggfunc = how to calculate (sum, mean, count)
fill_value=0 fills missing data
margins=True adds totals

Common Mistakes

Mistake 1: Forgetting aggfunc defaults to mean

code.py

# This calculates MEAN, not sum
pd.pivot_table(data, values='Sales', index='Store')

# For sum, specify it
pd.pivot_table(data, values='Sales', index='Store', aggfunc='sum')

Mistake 2: Confusing index and columns

code.py

# index = rows, columns = columns
# Think: index goes DOWN, columns go ACROSS

What's Next?

You learned pivot tables. Next, you'll learn Cross Tabulation - counting combinations of categories.