#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

GroupBy Aggregations

Learn to calculate multiple statistics at once with agg()

GroupBy Aggregations

What is Aggregation?

Aggregation means combining many values into one.

  • Sum of all sales = aggregation
  • Average price = aggregation
  • Count of orders = aggregation

Multiple Aggregations at Once

Instead of doing sum(), mean(), count() separately, do them all together:

code.py
import pandas as pd

sales = pd.DataFrame({
    'Store': ['NYC', 'NYC', 'LA', 'LA', 'LA'],
    'Sales': [100, 150, 120, 180, 90]
})

# Get sum, mean, and count for each store
result = sales.groupby('Store')['Sales'].agg(['sum', 'mean', 'count'])
print(result)

Output:

sum mean count Store LA 390 130.0 3 NYC 250 125.0 2

One line gives you three calculations!

Available Aggregation Functions

code.py
result = sales.groupby('Store')['Sales'].agg([
    'sum',      # Total
    'mean',     # Average
    'median',   # Middle value
    'min',      # Smallest
    'max',      # Largest
    'count',    # How many
    'std'       # Spread of values
])
print(result)

Different Aggregations for Different Columns

code.py
data = pd.DataFrame({
    'Store': ['NYC', 'NYC', 'LA', 'LA'],
    'Sales': [100, 150, 120, 180],
    'Customers': [20, 25, 30, 35]
})

# Sum for Sales, Mean for Customers
result = data.groupby('Store').agg({
    'Sales': 'sum',
    'Customers': 'mean'
})
print(result)

Output:

Sales Customers Store LA 300 32.5 NYC 250 22.5

Multiple Aggregations per Column

code.py
result = data.groupby('Store').agg({
    'Sales': ['sum', 'mean'],
    'Customers': ['min', 'max']
})
print(result)

Output:

Sales Customers sum mean min max Store LA 300 150.0 30 35 NYC 250 125.0 20 25

Custom Aggregation Names

code.py
result = data.groupby('Store').agg(
    total_sales=('Sales', 'sum'),
    avg_sales=('Sales', 'mean'),
    total_customers=('Customers', 'sum')
)
print(result)

Output:

total_sales avg_sales total_customers Store LA 300 150.0 65 NYC 250 125.0 45

Column names are now clear and readable.

Using Custom Functions

code.py
# Custom function: range (max - min)
def sales_range(x):
    return x.max() - x.min()

result = data.groupby('Store')['Sales'].agg(['sum', 'mean', sales_range])
print(result)

Output:

sum mean sales_range Store LA 300 150.0 60 NYC 250 125.0 50

Lambda Functions

code.py
result = data.groupby('Store')['Sales'].agg([
    'sum',
    'mean',
    ('range', lambda x: x.max() - x.min()),
    ('above_100', lambda x: (x > 100).sum())
])
print(result)

Practice Example

code.py
import pandas as pd

# Order data
orders = pd.DataFrame({
    'Customer': ['Alice', 'Alice', 'Bob', 'Bob', 'Bob'],
    'Product': ['Laptop', 'Mouse', 'Laptop', 'Keyboard', 'Mouse'],
    'Amount': [1000, 25, 1000, 75, 25],
    'Quantity': [1, 2, 1, 1, 3]
})

print("Order Data:")
print(orders)

# Summary per customer
summary = orders.groupby('Customer').agg(
    total_spent=('Amount', 'sum'),
    avg_order=('Amount', 'mean'),
    num_orders=('Amount', 'count'),
    total_items=('Quantity', 'sum')
)
print("\nCustomer Summary:")
print(summary)

Output:

total_spent avg_order num_orders total_items Customer Alice 1025 512.5 2 3 Bob 1100 366.7 3 5

Reset Index After Aggregation

code.py
# Get flat DataFrame
result = orders.groupby('Customer').agg(
    total_spent=('Amount', 'sum')
).reset_index()

print(result)

Output:

Customer total_spent 0 Alice 1025 1 Bob 1100

Key Points

  • agg(['sum', 'mean']) calculates multiple stats at once
  • agg({'col1': 'sum', 'col2': 'mean'}) different stats per column
  • Use named aggregations for clear column names
  • Custom functions work with agg()
  • Use reset_index() to get regular DataFrame

Common Aggregation Functions

FunctionWhat It Does
sumTotal of all values
meanAverage
medianMiddle value
minSmallest value
maxLargest value
countNumber of values
stdStandard deviation
varVariance
firstFirst value
lastLast value

What's Next?

You learned aggregations. Next, you'll learn to group by multiple columns for detailed analysis.