5 min read min read
GroupBy Aggregations
Learn to calculate multiple statistics at once with agg()
GroupBy Aggregations
What is Aggregation?
Aggregation means combining many values into one.
- Sum of all sales = aggregation
- Average price = aggregation
- Count of orders = aggregation
Multiple Aggregations at Once
Instead of doing sum(), mean(), count() separately, do them all together:
code.py
import pandas as pd
sales = pd.DataFrame({
'Store': ['NYC', 'NYC', 'LA', 'LA', 'LA'],
'Sales': [100, 150, 120, 180, 90]
})
# Get sum, mean, and count for each store
result = sales.groupby('Store')['Sales'].agg(['sum', 'mean', 'count'])
print(result)Output:
sum mean count
Store
LA 390 130.0 3
NYC 250 125.0 2
One line gives you three calculations!
Available Aggregation Functions
code.py
result = sales.groupby('Store')['Sales'].agg([
'sum', # Total
'mean', # Average
'median', # Middle value
'min', # Smallest
'max', # Largest
'count', # How many
'std' # Spread of values
])
print(result)Different Aggregations for Different Columns
code.py
data = pd.DataFrame({
'Store': ['NYC', 'NYC', 'LA', 'LA'],
'Sales': [100, 150, 120, 180],
'Customers': [20, 25, 30, 35]
})
# Sum for Sales, Mean for Customers
result = data.groupby('Store').agg({
'Sales': 'sum',
'Customers': 'mean'
})
print(result)Output:
Sales Customers
Store
LA 300 32.5
NYC 250 22.5
Multiple Aggregations per Column
code.py
result = data.groupby('Store').agg({
'Sales': ['sum', 'mean'],
'Customers': ['min', 'max']
})
print(result)Output:
Sales Customers
sum mean min max
Store
LA 300 150.0 30 35
NYC 250 125.0 20 25
Custom Aggregation Names
code.py
result = data.groupby('Store').agg(
total_sales=('Sales', 'sum'),
avg_sales=('Sales', 'mean'),
total_customers=('Customers', 'sum')
)
print(result)Output:
total_sales avg_sales total_customers
Store
LA 300 150.0 65
NYC 250 125.0 45
Column names are now clear and readable.
Using Custom Functions
code.py
# Custom function: range (max - min)
def sales_range(x):
return x.max() - x.min()
result = data.groupby('Store')['Sales'].agg(['sum', 'mean', sales_range])
print(result)Output:
sum mean sales_range
Store
LA 300 150.0 60
NYC 250 125.0 50
Lambda Functions
code.py
result = data.groupby('Store')['Sales'].agg([
'sum',
'mean',
('range', lambda x: x.max() - x.min()),
('above_100', lambda x: (x > 100).sum())
])
print(result)Practice Example
code.py
import pandas as pd
# Order data
orders = pd.DataFrame({
'Customer': ['Alice', 'Alice', 'Bob', 'Bob', 'Bob'],
'Product': ['Laptop', 'Mouse', 'Laptop', 'Keyboard', 'Mouse'],
'Amount': [1000, 25, 1000, 75, 25],
'Quantity': [1, 2, 1, 1, 3]
})
print("Order Data:")
print(orders)
# Summary per customer
summary = orders.groupby('Customer').agg(
total_spent=('Amount', 'sum'),
avg_order=('Amount', 'mean'),
num_orders=('Amount', 'count'),
total_items=('Quantity', 'sum')
)
print("\nCustomer Summary:")
print(summary)Output:
total_spent avg_order num_orders total_items
Customer
Alice 1025 512.5 2 3
Bob 1100 366.7 3 5
Reset Index After Aggregation
code.py
# Get flat DataFrame
result = orders.groupby('Customer').agg(
total_spent=('Amount', 'sum')
).reset_index()
print(result)Output:
Customer total_spent
0 Alice 1025
1 Bob 1100
Key Points
agg(['sum', 'mean'])calculates multiple stats at onceagg({'col1': 'sum', 'col2': 'mean'})different stats per column- Use named aggregations for clear column names
- Custom functions work with agg()
- Use
reset_index()to get regular DataFrame
Common Aggregation Functions
| Function | What It Does |
|---|---|
| sum | Total of all values |
| mean | Average |
| median | Middle value |
| min | Smallest value |
| max | Largest value |
| count | Number of values |
| std | Standard deviation |
| var | Variance |
| first | First value |
| last | Last value |
What's Next?
You learned aggregations. Next, you'll learn to group by multiple columns for detailed analysis.