SQL Aggregations: Summarizing Data
So far you've been retrieving rows — individual records from the ShopFlow database. But some of the most important questions an application needs to answer aren't about individual rows at all:
- How many orders are in each status?
- What's the total revenue from shipped orders?
- Which product category has the highest average price?
These questions need aggregations — SQL's ability to collapse many rows into a single summary value. Let's discover how this works.
Quick Reference
Aggregate functions:
SELECT
COUNT(*) AS total_rows,
COUNT(column) AS non_null_count,
SUM(price) AS total,
AVG(price) AS average,
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM products;
Grouping:
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;
Filtering groups (not rows):
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 2;
Gotchas:
- ⚠️ Every column in
SELECTmust either be inside an aggregate function or listed inGROUP BY - ⚠️ Use
HAVINGto filter on aggregated values,WHEREto filter on raw row values - ⚠️
COUNT(column)skipsNULLvalues;COUNT(*)counts all rows
Version Information
Tested with:
- PostgreSQL: 15.x, 16.x
- Node.js: v18.x, v20.x, v22.x
Last verified: May 2025
What You Need to Know First
Required reading:
- Relational Databases: How Structured Data Works
- SQL Basics: Reading Data with SELECT —
WHERE,ORDER BY, andLIMITare used throughout this article
What We'll Cover in This Article
By the end of this guide, you'll understand:
- The five core aggregate functions:
COUNT,SUM,AVG,MIN,MAX - How
GROUP BYsplits rows into groups before aggregating - Why
HAVINGexists and when to use it instead ofWHERE - How
DISTINCTremoves duplicates - How to combine aggregations with
ORDER BYandLIMIT
What We'll Explain Along the Way
We'll introduce these concepts with full explanations:
- The split-apply-combine mental model for
GROUP BY - Why
NULLvalues behave differently in aggregate functions - The order SQL processes clauses (which explains the
WHEREvsHAVINGdistinction)
Part 1: Aggregate Functions — Collapsing Many Rows Into One
An aggregate function takes a set of rows and returns a single value. Think of it as compression: you feed in 8 product rows, and the function hands back one number.
COUNT — How Many Rows?
The most commonly used aggregate. COUNT(*) counts every row:
-- How many products are in the ShopFlow catalog?
SELECT COUNT(*) AS total_products
FROM products;
| total_products |
|---|
| 8 |
COUNT(column_name) counts only rows where that column is not NULL:
-- How many products have a description?
SELECT
COUNT(*) AS total_rows,
COUNT(description) AS rows_with_description
FROM products;
| total_rows | rows_with_description |
|---|---|
| 8 | 8 |
In our dataset, all products have descriptions. But if any had NULL in that column, COUNT(description) would skip them while COUNT(*) would still count them. This is a subtle but important distinction.
SUM — What's the Total?
SUM adds up the values in a numeric column:
-- What's the total value of all inventory?
SELECT SUM(price * stock_quantity) AS total_inventory_value
FROM products;
| total_inventory_value |
|---|
| 18333.35 |
You can use expressions inside aggregate functions — here, price * stock_quantity computes the value per product before summing.
-- Total revenue from all order items
SELECT SUM(price_at_purchase * quantity) AS total_revenue
FROM order_items;
| total_revenue |
|---|
| 958.82 |
AVG — What's the Average?
-- Average product price
SELECT
AVG(price) AS avg_price,
ROUND(AVG(price), 2) AS avg_price_rounded
FROM products;
| avg_price | avg_price_rounded |
|---|---|
| 64.9862500000000000 | 64.99 |
AVG returns full precision by default. ROUND(value, decimal_places) trims it to something readable.
Important: AVG ignores NULL values — it averages only the rows where the column has a value. This is almost always what you want, but be aware of it.
MIN and MAX — The Extremes
-- Cheapest and most expensive products
SELECT
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM products;
| cheapest | most_expensive |
|---|---|
| 19.99 | 129.99 |
MIN and MAX work on text columns too — they return the alphabetically first and last values:
SELECT
MIN(name) AS first_alphabetically,
MAX(name) AS last_alphabetically
FROM products;
| first_alphabetically | last_alphabetically |
|---|---|
| Cable Management | Wireless Keyboard |
Part 2: DISTINCT — Removing Duplicates
Before we get to GROUP BY, let's look at DISTINCT — a simpler tool for a common problem.
-- What categories exist in the product catalog?
SELECT DISTINCT category
FROM products;
| category |
|---|
| Electronics |
| Accessories |
Without DISTINCT, you'd get one row per product (8 rows, with "Electronics" and "Accessories" repeated). DISTINCT collapses duplicates so each unique value appears once.
DISTINCT can apply to multiple columns — in that case, it deduplicates on the combination:
-- What unique status values appear in orders?
SELECT DISTINCT status FROM orders;
| status |
|---|
| shipped |
| pending |
| cancelled |
COUNT(DISTINCT column) combines both — count unique values:
-- How many distinct categories are there?
SELECT COUNT(DISTINCT category) AS category_count
FROM products;
| category_count |
|---|
| 2 |
Part 3: GROUP BY — The Heart of Aggregation
DISTINCT shows what unique values exist. GROUP BY lets you compute something for each group. This is where aggregations become genuinely powerful.
The Mental Model: Split → Apply → Combine
When you write GROUP BY category, think of the database doing three steps:
- Split the rows into groups — one group per unique value of
category - Apply the aggregate function to each group separately
- Combine the results into one row per group
Let's see it in action:
-- Count products per category
SELECT
category,
COUNT(*) AS product_count
FROM products
GROUP BY category;
| category | product_count |
|---|---|
| Accessories | 3 |
| Electronics | 5 |
The database split the 8 product rows into two groups (Electronics and Accessories), counted each group, and returned one summary row per group.
-- Average price per category
SELECT
category,
ROUND(AVG(price), 2) AS avg_price,
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM products
GROUP BY category;
| category | avg_price | cheapest | most_expensive |
|---|---|---|---|
| Accessories | 33.32 | 19.99 | 49.99 |
| Electronics | 76.99 | 39.99 | 129.99 |
With one query, you've computed three different summary statistics for each product category.
The GROUP BY Rule: Every Column Must Be Grouped or Aggregated
Here's a rule PostgreSQL will enforce with an error if you break it:
Every column in the
SELECTlist must either appear inGROUP BY, or be wrapped in an aggregate function.
Why? Because GROUP BY collapses many rows into one. If you ask for a column that isn't grouped or aggregated, the database doesn't know which row's value to show for that group.
-- ❌ Error: "name" is not in GROUP BY and not aggregated
SELECT name, category, COUNT(*)
FROM products
GROUP BY category;
-- PostgreSQL error:
-- ERROR: column "products.name" must appear in the GROUP BY
-- clause or be used in an aggregate function
There are 5 Electronics products — if you asked for name without grouping by it, which of the 5 names should appear in the "Electronics" row? The database refuses to guess.
-- ✅ Include name in GROUP BY (now you get one row per category+name combination)
SELECT name, category, COUNT(*)
FROM products
GROUP BY category, name;
-- ✅ Or aggregate it (though MIN(name) is rarely useful in practice)
SELECT category, MIN(name) AS sample_name, COUNT(*)
FROM products
GROUP BY category;
Grouping by Multiple Columns
You can group by several columns — each unique combination becomes its own group:
-- Count orders by status, broken down by... (we'll need order data)
-- For now: count products by category and stock availability
SELECT
category,
CASE WHEN stock_quantity > 0 THEN 'in_stock' ELSE 'out_of_stock' END AS availability,
COUNT(*) AS product_count
FROM products
GROUP BY category, availability
ORDER BY category, availability;
| category | availability | product_count |
|---|---|---|
| Accessories | in_stock | 2 |
| Accessories | out_of_stock | 1 |
| Electronics | in_stock | 4 |
| Electronics | out_of_stock | 1 |
Notice the CASE WHEN ... END expression — it creates a derived column on the fly. We can group by it just like any real column. We'll use CASE WHEN more in later articles; for now, read it as "if stock_quantity > 0, label it 'in_stock', otherwise 'out_of_stock'".
Part 4: HAVING — Filtering Groups
You already know WHERE filters rows. But what if you want to filter based on an aggregated value — like "only show me categories with more than 3 products"?
You can't use WHERE for this. Here's why, and what to use instead.
Why WHERE Doesn't Work on Aggregated Values
SQL processes clauses in a specific order:
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
Notice that WHERE happens before GROUP BY. That means WHERE runs on individual rows before any grouping or aggregation has occurred. At WHERE time, there's no COUNT(*) yet — that number doesn't exist until after GROUP BY runs.
If you try to use an aggregate in WHERE, PostgreSQL tells you clearly:
-- ❌ Error: aggregate functions not allowed in WHERE
SELECT category, COUNT(*) AS product_count
FROM products
WHERE COUNT(*) > 3
GROUP BY category;
-- ERROR: aggregate functions are not allowed in WHERE
HAVING: WHERE for Groups
HAVING runs after GROUP BY — after the aggregation has happened. This is when you can filter on computed values:
-- ✅ Only show categories with more than 3 products
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 3;
| category | product_count |
|---|---|
| Electronics | 5 |
Accessories (3 products) doesn't appear — it failed the HAVING condition.
-- Categories where average price exceeds $50
SELECT
category,
ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category
HAVING AVG(price) > 50;
| category | avg_price |
|---|---|
| Electronics | 76.99 |
WHERE and HAVING Together
They serve different purposes and can both appear in the same query:
-- Among in-stock products only,
-- find categories where average price exceeds $50
SELECT
category,
ROUND(AVG(price), 2) AS avg_price,
COUNT(*) AS product_count
FROM products
WHERE stock_quantity > 0 -- filter individual rows FIRST
GROUP BY category
HAVING AVG(price) > 50 -- then filter groups
ORDER BY avg_price DESC;
Here's what happens step by step:
WHERE stock_quantity > 0— removes out-of-stock products (Laptop Stand, Cable Management) from considerationGROUP BY category— groups the remaining 6 rows by categoryHAVING AVG(price) > 50— keeps only groups where the average price of in-stock items exceeds $50ORDER BYsorts the result
This is a meaningful business query: "among products you can actually sell, which categories have a high average price point?"
Part 5: Combining Aggregations with ORDER BY and LIMIT
Aggregate queries work naturally with ORDER BY and LIMIT — you just use the aggregate value in ORDER BY:
-- Top 3 most-stocked product categories by total units
SELECT
category,
SUM(stock_quantity) AS total_units
FROM products
GROUP BY category
ORDER BY total_units DESC
LIMIT 3;
| category | total_units |
|---|---|
| Electronics | 221 |
| Accessories | 75 |
You can reference the alias (total_units) in ORDER BY — PostgreSQL resolves it. Some databases don't allow alias references in ORDER BY, requiring you to repeat the expression: ORDER BY SUM(stock_quantity) DESC.
Part 6: Practical Aggregation Queries
Let's apply what we've learned to realistic ShopFlow business questions.
Order Summary by Status
-- How many orders are in each status?
SELECT
status,
COUNT(*) AS order_count
FROM orders
GROUP BY status
ORDER BY order_count DESC;
| status | order_count |
|---|---|
| shipped | 3 |
| pending | 2 |
| cancelled | 1 |
Revenue from Order Items
-- Total revenue per order
SELECT
order_id,
SUM(price_at_purchase * quantity) AS order_total,
SUM(quantity) AS total_items
FROM order_items
GROUP BY order_id
ORDER BY order_total DESC;
| order_id | order_total | total_items |
|---|---|---|
| 6 | 119.97 | 3 |
| 1 | 169.97 | 3 |
| 5 | 139.98 | 2 |
| 3 | 94.98 | 2 |
| 2 | 59.99 | 1 |
| 4 | 129.99 | 1 |
Finding Orders Above a Threshold
-- Orders where the total exceeds $100
SELECT
order_id,
SUM(price_at_purchase * quantity) AS order_total
FROM order_items
GROUP BY order_id
HAVING SUM(price_at_purchase * quantity) > 100
ORDER BY order_total DESC;
| order_id | order_total |
|---|---|
| 1 | 169.97 |
| 5 | 139.98 |
| 4 | 129.99 |
| 6 | 119.97 |
Common Misconceptions
❌ Misconception: Use WHERE to filter on aggregated values
Reality: WHERE filters individual rows before grouping. Aggregate functions like COUNT(*) don't exist at WHERE time. Use HAVING for filtering on aggregate results.
Why this matters: The query will fail with an error — or worse, if you construct the condition in a way that doesn't use the aggregate, it silently returns wrong results.
Example:
-- ❌ Error: can't use COUNT(*) in WHERE
SELECT category, COUNT(*) FROM products
WHERE COUNT(*) > 2
GROUP BY category;
-- ✅ Correct: HAVING filters after grouping
SELECT category, COUNT(*) FROM products
GROUP BY category
HAVING COUNT(*) > 2;
❌ Misconception: COUNT(*) and COUNT(column) are the same
Reality: COUNT(*) counts every row. COUNT(column) skips rows where that column is NULL.
Why this matters: If you're counting an optional column (like description in products), you'll get a lower count than expected and might not notice.
Example:
-- Imagine some products have NULL descriptions
-- COUNT(*) = 8 (all rows)
-- COUNT(description) = 6 (only rows with non-NULL description)
SELECT COUNT(*), COUNT(description) FROM products;
❌ Misconception: You can SELECT non-grouped columns freely
Reality: PostgreSQL enforces the rule strictly — every SELECT column must be in GROUP BY or inside an aggregate. MySQL (with default settings) allows this but returns a random row's value, which is a silent bug.
Why this matters: If you're used to MySQL's lenient behavior, switching to PostgreSQL will surface these as errors. The PostgreSQL behavior is correct — MySQL's permissive behavior masks real ambiguity.
Troubleshooting Common Issues
Problem: ERROR: column must appear in GROUP BY clause
Symptoms: PostgreSQL refuses to run your query with this error.
Cause: You're selecting a column that isn't in GROUP BY and isn't inside an aggregate function.
Solution:
-- ❌ Causes error
SELECT name, category, COUNT(*) FROM products GROUP BY category;
-- ✅ Option 1: Add the column to GROUP BY
SELECT name, category, COUNT(*) FROM products GROUP BY category, name;
-- ✅ Option 2: Aggregate the column
SELECT MAX(name) AS sample_name, category, COUNT(*) FROM products GROUP BY category;
-- ✅ Option 3: Remove the column if you don't need it
SELECT category, COUNT(*) FROM products GROUP BY category;
Problem: HAVING condition produces unexpected results
Symptoms: Your HAVING filter seems too aggressive or too lenient.
Diagnostic steps:
-- Step 1: Run without HAVING to see all groups
SELECT category, COUNT(*) AS cnt
FROM products
GROUP BY category;
-- Step 2: Add HAVING and compare
SELECT category, COUNT(*) AS cnt
FROM products
GROUP BY category
HAVING COUNT(*) > 2;
-- Step 3: Verify the threshold is what you meant
-- (is it > 2, or >= 2, or > 3?)
Problem: AVG returns NULL
Symptoms: AVG(column) returns NULL instead of a number.
Cause: The column has NULL values for every row in the group, or the table has no rows.
Solution:
-- Use COALESCE to provide a fallback value
SELECT COALESCE(AVG(price), 0) AS avg_price FROM products;
-- Or check if there are rows before aggregating
SELECT COUNT(*), AVG(price) FROM products;
Check Your Understanding
Quick Quiz
-
What's the difference between
WHEREandHAVING?Show Answer
WHEREfilters individual rows beforeGROUP BYruns.HAVINGfilters groups afterGROUP BYruns. UseWHEREfor conditions on raw column values; useHAVINGfor conditions on aggregate values likeCOUNT(*),SUM(), orAVG(). -
What will this query return?
SELECT COUNT(*), COUNT(description) FROM products
WHERE category = 'Electronics';Show Answer
Both return
5in our dataset, because all 5 Electronics products have non-NULL descriptions. If any Electronics product had aNULLdescription,COUNT(description)would be lower thanCOUNT(*). -
Why does this query fail?
SELECT category, name, COUNT(*)
FROM products
GROUP BY category;Show Answer
nameis in theSELECTlist but not inGROUP BYand not wrapped in an aggregate function. PostgreSQL doesn't know which product's name to show for each category group. Fix by addingnametoGROUP BY, removing it fromSELECT, or wrapping it in an aggregate likeMIN(name).
Hands-On Challenge
Task: Write a query that shows each order status with its order count and the percentage of total orders it represents. Sort by order count descending.
Expected result:
| status | order_count | percentage |
|---|---|---|
| shipped | 3 | 50.00 |
| pending | 2 | 33.33 |
| cancelled | 1 | 16.67 |
Show Solution
SELECT
status,
COUNT(*) AS order_count,
ROUND(
COUNT(*) * 100.0 / (SELECT COUNT(*) FROM orders),
2
) AS percentage
FROM orders
GROUP BY status
ORDER BY order_count DESC;
Key techniques:
COUNT(*) * 100.0 / (SELECT COUNT(*) FROM orders)— the innerSELECT COUNT(*) FROM ordersis a subquery that returns the total order count. Dividing by it gives a fraction; multiplying by100.0converts to a percentage. The.0ensures decimal division instead of integer division.ROUND(..., 2)— rounds to 2 decimal places- We'll cover subqueries in more depth in a later article; for now, know that a
SELECTstatement can be embedded inside another one
Summary: Key Takeaways
- Aggregate functions collapse many rows into a single summary value:
COUNT,SUM,AVG,MIN,MAX COUNT(*)counts all rows;COUNT(column)skipsNULLvaluesDISTINCTremoves duplicate values before aggregating or displayingGROUP BYsplits rows into groups, applies the aggregate to each group, and returns one row per group- The
GROUP BYrule: every column inSELECTmust be inGROUP BYor inside an aggregate — PostgreSQL enforces this strictly WHEREfilters rows before grouping;HAVINGfilters groups after grouping- Use
HAVINGfor any condition that involves an aggregate value WHEREandHAVINGcan both appear in the same query — they serve different purposes
What's Next?
You can now summarize data from a single table using aggregate functions and grouping. But ShopFlow's most interesting questions involve multiple tables: "which users placed the most orders?" or "what's the total revenue broken down by product?"
The next step is SQL Joins: Combining Tables — where you'll learn to combine data from multiple tables in a single query using JOIN. The aggregation skills from this article pair directly with joins, letting you ask cross-table questions like "total revenue per user" with a single query.