#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Multiple GroupBy Operations

Learn to group data by multiple columns for detailed analysis

Multiple GroupBy Operations

Why Group by Multiple Columns?

Single groupby: "What's the total sales per store?"

Multiple groupby: "What's the total sales per store per product?"

More detail = better insights.

Basic Multiple GroupBy

code.py
import pandas as pd

sales = pd.DataFrame({
    'Store': ['NYC', 'NYC', 'NYC', 'LA', 'LA', 'LA'],
    'Product': ['Apple', 'Apple', 'Banana', 'Apple', 'Banana', 'Banana'],
    'Sales': [100, 50, 150, 120, 180, 90]
})

print("Sales Data:")
print(sales)

# Group by Store AND Product
result = sales.groupby(['Store', 'Product'])['Sales'].sum()
print("\nSales by Store and Product:")
print(result)

Output:

Sales Data: Store Product Sales 0 NYC Apple 100 1 NYC Apple 50 2 NYC Banana 150 3 LA Apple 120 4 LA Banana 180 5 LA Banana 90 Sales by Store and Product: Store Product LA Apple 120 Banana 270 NYC Apple 150 Banana 150 Name: Sales, dtype: int64

Get DataFrame Result

code.py
result = sales.groupby(['Store', 'Product'], as_index=False)['Sales'].sum()
print(result)

Output:

Store Product Sales 0 LA Apple 120 1 LA Banana 270 2 NYC Apple 150 3 NYC Banana 150

Easier to read as a table.

Multiple Columns, Multiple Aggregations

code.py
data = pd.DataFrame({
    'Store': ['NYC', 'NYC', 'LA', 'LA'],
    'Category': ['Food', 'Electronics', 'Food', 'Electronics'],
    'Sales': [100, 500, 150, 600],
    'Quantity': [20, 5, 25, 8]
})

result = data.groupby(['Store', 'Category']).agg({
    'Sales': 'sum',
    'Quantity': 'mean'
}).reset_index()

print(result)

Output:

Store Category Sales Quantity 0 LA Electronics 600 8.0 1 LA Food 150 25.0 2 NYC Electronics 500 5.0 3 NYC Food 100 20.0

Three or More Groups

code.py
orders = pd.DataFrame({
    'Year': [2023, 2023, 2023, 2024, 2024, 2024],
    'Store': ['NYC', 'NYC', 'LA', 'NYC', 'LA', 'LA'],
    'Product': ['A', 'B', 'A', 'A', 'B', 'A'],
    'Sales': [100, 150, 200, 120, 180, 220]
})

# Group by Year, Store, and Product
result = orders.groupby(['Year', 'Store', 'Product'])['Sales'].sum()
print(result)

Output:

Year Store Product 2023 LA A 200 NYC A 100 B 150 2024 LA A 220 B 180 NYC A 120 Name: Sales, dtype: int64

Unstack - Convert to Wide Format

code.py
sales = pd.DataFrame({
    'Store': ['NYC', 'NYC', 'LA', 'LA'],
    'Product': ['Apple', 'Banana', 'Apple', 'Banana'],
    'Sales': [100, 150, 120, 180]
})

# Group and unstack
result = sales.groupby(['Store', 'Product'])['Sales'].sum().unstack()
print(result)

Output:

Product Apple Banana Store LA 120 180 NYC 100 150

Now it's like a spreadsheet!

Fill Missing Values After Unstack

code.py
sales = pd.DataFrame({
    'Store': ['NYC', 'NYC', 'LA'],
    'Product': ['Apple', 'Banana', 'Apple'],
    'Sales': [100, 150, 120]
})

result = sales.groupby(['Store', 'Product'])['Sales'].sum().unstack(fill_value=0)
print(result)

Output:

Product Apple Banana Store LA 120 0 NYC 100 150

LA has no Banana sales, so it's 0.

Practice Example

code.py
import pandas as pd

# Employee performance data
data = pd.DataFrame({
    'Department': ['Sales', 'Sales', 'IT', 'IT', 'Sales', 'IT'],
    'Region': ['East', 'West', 'East', 'West', 'East', 'East'],
    'Employee': ['John', 'Sarah', 'Mike', 'Emma', 'Tom', 'Lisa'],
    'Sales': [50000, 60000, 0, 0, 45000, 0],
    'Projects': [0, 0, 5, 3, 0, 4]
})

print("Employee Data:")
print(data)

# Summary by Department and Region
summary = data.groupby(['Department', 'Region']).agg({
    'Employee': 'count',
    'Sales': 'sum',
    'Projects': 'sum'
}).reset_index()

summary.columns = ['Department', 'Region', 'Employees', 'Total Sales', 'Total Projects']
print("\nSummary:")
print(summary)

Key Points

  • Group by multiple columns: groupby(['col1', 'col2'])
  • Order matters: first column is main group
  • Use as_index=False or reset_index() for DataFrame
  • unstack() converts to wide format
  • Fill missing with fill_value=0

Order of Grouping

The first column is the main grouping:

code.py
# Group by Store first, then Product
groupby(['Store', 'Product'])
# Result: NYC-Apple, NYC-Banana, LA-Apple, LA-Banana

# Group by Product first, then Store
groupby(['Product', 'Store'])
# Result: Apple-NYC, Apple-LA, Banana-NYC, Banana-LA

What's Next?

You learned multiple groupby. Next, you'll learn Pivot Tables - an even easier way to summarize data.

SkillsetMaster - AI, Web Development & Data Analytics Courses