5 min read min read
Multiple GroupBy Operations
Learn to group data by multiple columns for detailed analysis
Multiple GroupBy Operations
Why Group by Multiple Columns?
Single groupby: "What's the total sales per store?"
Multiple groupby: "What's the total sales per store per product?"
More detail = better insights.
Basic Multiple GroupBy
code.py
import pandas as pd
sales = pd.DataFrame({
'Store': ['NYC', 'NYC', 'NYC', 'LA', 'LA', 'LA'],
'Product': ['Apple', 'Apple', 'Banana', 'Apple', 'Banana', 'Banana'],
'Sales': [100, 50, 150, 120, 180, 90]
})
print("Sales Data:")
print(sales)
# Group by Store AND Product
result = sales.groupby(['Store', 'Product'])['Sales'].sum()
print("\nSales by Store and Product:")
print(result)Output:
Sales Data:
Store Product Sales
0 NYC Apple 100
1 NYC Apple 50
2 NYC Banana 150
3 LA Apple 120
4 LA Banana 180
5 LA Banana 90
Sales by Store and Product:
Store Product
LA Apple 120
Banana 270
NYC Apple 150
Banana 150
Name: Sales, dtype: int64
Get DataFrame Result
code.py
result = sales.groupby(['Store', 'Product'], as_index=False)['Sales'].sum()
print(result)Output:
Store Product Sales
0 LA Apple 120
1 LA Banana 270
2 NYC Apple 150
3 NYC Banana 150
Easier to read as a table.
Multiple Columns, Multiple Aggregations
code.py
data = pd.DataFrame({
'Store': ['NYC', 'NYC', 'LA', 'LA'],
'Category': ['Food', 'Electronics', 'Food', 'Electronics'],
'Sales': [100, 500, 150, 600],
'Quantity': [20, 5, 25, 8]
})
result = data.groupby(['Store', 'Category']).agg({
'Sales': 'sum',
'Quantity': 'mean'
}).reset_index()
print(result)Output:
Store Category Sales Quantity
0 LA Electronics 600 8.0
1 LA Food 150 25.0
2 NYC Electronics 500 5.0
3 NYC Food 100 20.0
Three or More Groups
code.py
orders = pd.DataFrame({
'Year': [2023, 2023, 2023, 2024, 2024, 2024],
'Store': ['NYC', 'NYC', 'LA', 'NYC', 'LA', 'LA'],
'Product': ['A', 'B', 'A', 'A', 'B', 'A'],
'Sales': [100, 150, 200, 120, 180, 220]
})
# Group by Year, Store, and Product
result = orders.groupby(['Year', 'Store', 'Product'])['Sales'].sum()
print(result)Output:
Year Store Product
2023 LA A 200
NYC A 100
B 150
2024 LA A 220
B 180
NYC A 120
Name: Sales, dtype: int64
Unstack - Convert to Wide Format
code.py
sales = pd.DataFrame({
'Store': ['NYC', 'NYC', 'LA', 'LA'],
'Product': ['Apple', 'Banana', 'Apple', 'Banana'],
'Sales': [100, 150, 120, 180]
})
# Group and unstack
result = sales.groupby(['Store', 'Product'])['Sales'].sum().unstack()
print(result)Output:
Product Apple Banana
Store
LA 120 180
NYC 100 150
Now it's like a spreadsheet!
Fill Missing Values After Unstack
code.py
sales = pd.DataFrame({
'Store': ['NYC', 'NYC', 'LA'],
'Product': ['Apple', 'Banana', 'Apple'],
'Sales': [100, 150, 120]
})
result = sales.groupby(['Store', 'Product'])['Sales'].sum().unstack(fill_value=0)
print(result)Output:
Product Apple Banana
Store
LA 120 0
NYC 100 150
LA has no Banana sales, so it's 0.
Practice Example
code.py
import pandas as pd
# Employee performance data
data = pd.DataFrame({
'Department': ['Sales', 'Sales', 'IT', 'IT', 'Sales', 'IT'],
'Region': ['East', 'West', 'East', 'West', 'East', 'East'],
'Employee': ['John', 'Sarah', 'Mike', 'Emma', 'Tom', 'Lisa'],
'Sales': [50000, 60000, 0, 0, 45000, 0],
'Projects': [0, 0, 5, 3, 0, 4]
})
print("Employee Data:")
print(data)
# Summary by Department and Region
summary = data.groupby(['Department', 'Region']).agg({
'Employee': 'count',
'Sales': 'sum',
'Projects': 'sum'
}).reset_index()
summary.columns = ['Department', 'Region', 'Employees', 'Total Sales', 'Total Projects']
print("\nSummary:")
print(summary)Key Points
- Group by multiple columns:
groupby(['col1', 'col2']) - Order matters: first column is main group
- Use
as_index=Falseorreset_index()for DataFrame unstack()converts to wide format- Fill missing with
fill_value=0
Order of Grouping
The first column is the main grouping:
code.py
# Group by Store first, then Product
groupby(['Store', 'Product'])
# Result: NYC-Apple, NYC-Banana, LA-Apple, LA-Banana
# Group by Product first, then Store
groupby(['Product', 'Store'])
# Result: Apple-NYC, Apple-LA, Banana-NYC, Banana-LAWhat's Next?
You learned multiple groupby. Next, you'll learn Pivot Tables - an even easier way to summarize data.