Aggregation Functions โ Summarizing Data
Aggregation functions collapse many rows into a single value. They're always used with GROUP BY (or across the entire table if no GROUP BY is specified).
| Function | What it does | Example |
|----------|-------------|---------|
| COUNT(*) | Counts all rows | Total number of orders |
| COUNT(column) | Counts non-NULL values in a column | Orders with a delivery time recorded |
| SUM(column) | Adds up all values | Total revenue |
| AVG(column) | Average of all values | Average order value |
| MIN(column) | Smallest value | Cheapest order |
| MAX(column) | Largest value | Most expensive order |
Aggregating the Entire Table
Without GROUP BY, the function applies to all rows at once:
-- How many orders total?
SELECT COUNT(*) AS total_orders FROM orders;
-- What's the total revenue?
SELECT SUM(amount) AS total_revenue FROM orders;
-- What's the average order value?
SELECT AVG(amount) AS avg_order_value FROM orders;
-- What's the range of order values?
SELECT MIN(amount) AS cheapest, MAX(amount) AS most_expensive FROM orders;Imagine you have a jar of 10,000 marbles (rows). COUNT(*) tells you there are 10,000 marbles. AVG(weight) tells you the average weight of a marble. GROUP BY color is like sorting the marbles by color first, then computing the count or average for each color separately.
GROUP BY โ Summarizing by Category
GROUP BY splits your data into groups based on unique values in one or more columns, then applies the aggregation function to each group separately.
-- How many orders per city?
SELECT
city,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value
FROM orders
GROUP BY city
ORDER BY total_revenue DESC;Result:
| city | order_count | total_revenue | avg_order_value | | --------- | ----------- | ------------- | --------------- | | Mumbai | 12,450 | 62,25,000 | 500 | | Bangalore | 10,230 | 48,08,100 | 470 | | Delhi | 9,880 | 41,49,600 | 420 |
Grouping by Multiple Columns
-- Orders and revenue per city per restaurant
SELECT
city,
restaurant,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue
FROM orders
GROUP BY city, restaurant
ORDER BY city, total_revenue DESC;Key Rule: Every column in your SELECT that isn't inside an aggregation function MUST appear in your GROUP BY clause. If you SELECT city and restaurant, both must be in GROUP BY.
โ ๏ธ CheckpointQuiz error: Missing or invalid options array
HAVING โ Filtering Grouped Results
WHERE filters rows before grouping. HAVING filters after grouping โ it filters the groups themselves.
-- Cities with more than 10,000 orders
SELECT
city,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue
FROM orders
GROUP BY city
HAVING COUNT(*) > 10000
ORDER BY order_count DESC;
-- Cities where average order value exceeds โน600
SELECT
city,
AVG(amount) AS avg_order_value
FROM orders
GROUP BY city
HAVING AVG(amount) > 600;WHERE vs HAVING โ The Critical Difference
-- WRONG: Can't use WHERE to filter on an aggregate
SELECT city, COUNT(*) FROM orders
WHERE COUNT(*) > 1000 -- โ Error: can't use aggregate in WHERE
GROUP BY city;
-- RIGHT: Use HAVING for aggregate conditions
SELECT city, COUNT(*) FROM orders
GROUP BY city
HAVING COUNT(*) > 1000; -- โ
-- Combining WHERE and HAVING:
-- Filter rows first (WHERE), then group, then filter groups (HAVING)
SELECT city, COUNT(*) AS order_count
FROM orders
WHERE order_date >= '2026-01-01' -- filter rows first
GROUP BY city
HAVING COUNT(*) > 500; -- then filter groupsA Zomato analyst wants to find restaurants with more than 100 orders AND an average rating above 4.2. WHERE can't do this โ it runs before grouping. HAVING is the tool: GROUP BY restaurant_name HAVING COUNT(*) > 100 AND AVG(rating) > 4.2.
COUNT(*) vs COUNT(column) โ The Difference Matters
SELECT
COUNT(*) AS all_rows,
COUNT(delivery_time) AS rows_with_delivery_time,
COUNT(*) - COUNT(delivery_time) AS rows_missing_delivery_time
FROM orders;COUNT(*)counts every row, including those with NULLsCOUNT(column)counts only rows where the column is NOT NULL- This difference is crucial for data quality checks
COUNT DISTINCT
-- How many unique customers placed orders this month?
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
WHERE order_date >= '2026-03-01';
-- Average order count per customer
SELECT AVG(orders_per_customer) FROM (
SELECT customer_id, COUNT(*) AS orders_per_customer
FROM orders
GROUP BY customer_id
) sub;โ ๏ธ FinalQuiz error: Missing or invalid questions array
โ ๏ธ SummarySection error: Missing or invalid items array
Received: {"hasItems":false,"isArray":false}