Topic 19 of

SQL GROUP BY and Aggregation Functions

GROUP BY is the command that turns a million rows into a handful of meaningful insights. It's how analysts answer 'which city had the most orders?' and 'what'''s the average delivery time by restaurant?'

๐Ÿ“šBeginner
โฑ๏ธ11 min
โœ…7 quizzes
๐Ÿ“Š

Aggregation Functions โ€” Summarizing Data

Aggregation functions collapse many rows into a single value. They're always used with GROUP BY (or across the entire table if no GROUP BY is specified).

| Function | What it does | Example | |----------|-------------|---------| | COUNT(*) | Counts all rows | Total number of orders | | COUNT(column) | Counts non-NULL values in a column | Orders with a delivery time recorded | | SUM(column) | Adds up all values | Total revenue | | AVG(column) | Average of all values | Average order value | | MIN(column) | Smallest value | Cheapest order | | MAX(column) | Largest value | Most expensive order |

Aggregating the Entire Table

Without GROUP BY, the function applies to all rows at once:

query.sqlSQL
-- How many orders total?
SELECT COUNT(*) AS total_orders FROM orders;

-- What's the total revenue?
SELECT SUM(amount) AS total_revenue FROM orders;

-- What's the average order value?
SELECT AVG(amount) AS avg_order_value FROM orders;

-- What's the range of order values?
SELECT MIN(amount) AS cheapest, MAX(amount) AS most_expensive FROM orders;
Think of it this way...

Imagine you have a jar of 10,000 marbles (rows). COUNT(*) tells you there are 10,000 marbles. AVG(weight) tells you the average weight of a marble. GROUP BY color is like sorting the marbles by color first, then computing the count or average for each color separately.

๐Ÿ—‚๏ธ

GROUP BY โ€” Summarizing by Category

GROUP BY splits your data into groups based on unique values in one or more columns, then applies the aggregation function to each group separately.

query.sqlSQL
-- How many orders per city?
SELECT
  city,
  COUNT(*) AS order_count,
  SUM(amount) AS total_revenue,
  AVG(amount) AS avg_order_value
FROM orders
GROUP BY city
ORDER BY total_revenue DESC;

Result:

| city | order_count | total_revenue | avg_order_value | | --------- | ----------- | ------------- | --------------- | | Mumbai | 12,450 | 62,25,000 | 500 | | Bangalore | 10,230 | 48,08,100 | 470 | | Delhi | 9,880 | 41,49,600 | 420 |

Grouping by Multiple Columns

query.sqlSQL
-- Orders and revenue per city per restaurant
SELECT
  city,
  restaurant,
  COUNT(*) AS order_count,
  SUM(amount) AS total_revenue
FROM orders
GROUP BY city, restaurant
ORDER BY city, total_revenue DESC;
Info

Key Rule: Every column in your SELECT that isn't inside an aggregation function MUST appear in your GROUP BY clause. If you SELECT city and restaurant, both must be in GROUP BY.

โš ๏ธ CheckpointQuiz error: Missing or invalid options array

๐Ÿ”ฌ

HAVING โ€” Filtering Grouped Results

WHERE filters rows before grouping. HAVING filters after grouping โ€” it filters the groups themselves.

query.sqlSQL
-- Cities with more than 10,000 orders
SELECT
  city,
  COUNT(*) AS order_count,
  SUM(amount) AS total_revenue
FROM orders
GROUP BY city
HAVING COUNT(*) > 10000
ORDER BY order_count DESC;

-- Cities where average order value exceeds โ‚น600
SELECT
  city,
  AVG(amount) AS avg_order_value
FROM orders
GROUP BY city
HAVING AVG(amount) > 600;

WHERE vs HAVING โ€” The Critical Difference

query.sqlSQL
-- WRONG: Can't use WHERE to filter on an aggregate
SELECT city, COUNT(*) FROM orders
WHERE COUNT(*) > 1000  -- โŒ Error: can't use aggregate in WHERE
GROUP BY city;

-- RIGHT: Use HAVING for aggregate conditions
SELECT city, COUNT(*) FROM orders
GROUP BY city
HAVING COUNT(*) > 1000;  -- โœ…

-- Combining WHERE and HAVING:
-- Filter rows first (WHERE), then group, then filter groups (HAVING)
SELECT city, COUNT(*) AS order_count
FROM orders
WHERE order_date >= '2026-01-01'  -- filter rows first
GROUP BY city
HAVING COUNT(*) > 500;            -- then filter groups
Think of it this way...

A Zomato analyst wants to find restaurants with more than 100 orders AND an average rating above 4.2. WHERE can't do this โ€” it runs before grouping. HAVING is the tool: GROUP BY restaurant_name HAVING COUNT(*) > 100 AND AVG(rating) > 4.2.

โš ๏ธ

COUNT(*) vs COUNT(column) โ€” The Difference Matters

query.sqlSQL
SELECT
  COUNT(*) AS all_rows,
  COUNT(delivery_time) AS rows_with_delivery_time,
  COUNT(*) - COUNT(delivery_time) AS rows_missing_delivery_time
FROM orders;
  • COUNT(*) counts every row, including those with NULLs
  • COUNT(column) counts only rows where the column is NOT NULL
  • This difference is crucial for data quality checks

COUNT DISTINCT

query.sqlSQL
-- How many unique customers placed orders this month?
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
WHERE order_date >= '2026-03-01';

-- Average order count per customer
SELECT AVG(orders_per_customer) FROM (
  SELECT customer_id, COUNT(*) AS orders_per_customer
  FROM orders
  GROUP BY customer_id
) sub;

โš ๏ธ FinalQuiz error: Missing or invalid questions array

โš ๏ธ SummarySection error: Missing or invalid items array

Received: {"hasItems":false,"isArray":false}