Aggregate functions provide us with more tools to combine and analyse our data in meaningful ways. These include finding averages, counting, and summing based on grouping data together.

The functions outlined in this post are known as *aggregate functions* because they work on aggregated data. Make sense so far? Let’s look at some examples to illustrate.

For the purposes of these examples, let’s pretend we’ve created a table called ‘books' to keep track of our reading:

book_id | title | auth_fname | auth_lname | published | pages |
---|---|---|---|---|---|

1 | The Alchemist | Paulo | Coehlo | 1988 | 161 |

2 | Dune | Frank | Herbert | 1966 | 577 |

3 | Deep Work | Cal | Newport | 2016 | 296 |

4 | War God | Graham | Hancock | 2013 | 551 |

5 | 12 Rules For Life | Jordan | Peterson | 2018 | 320 |

6 | Magicians Of The Gods | Graham | Hancock | 2016 | 591 |

*COUNT()* returns the number of rows that match specified criteria. There are 7 entries in the ’books’ table, so the following query will return 7 (i.e. the total number of items in the table):

```
SELECT COUNT(*) FROM books;
Returns:
7
```

Getting more specific, what if we wanted to know how many authors were in our table?

```
SELECT COUNT(auth_fname) FROM books;
Returns:
7
```

Note that *this also returns 7*, even though there are two books by Graham Hancock. As mentioned, COUNT() adds up all instances of the data queried. To find out how many unique authors are in the table, we can use our old friend *DISTINCT()*:

```
SELECT COUNT(DISTINCT auth_fname) FROM sales;
Returns:
6
```

This now produces the desired result, as there are 6 different authors in total.

COUNT() can also be used in combination with operators, such as *LIKE*:

```
SELECT COUNT(*) FROM books WHERE auth_fname LIKE ‘%Cal%’;
Returns:
1
```

*GROUP BY* summarises or aggregates identical data into single rows. Essentially, it will group rows that have the same value into ’summary’ or ’super’ rows.

GROUP BY can be used alongside aggregate functions to group the result-set by one or more columns. It’s a little tricky to explain, so let’s look at some examples.

Suppose we want to find out how many books we have read by each author. We can display *author first name* in one column, the *count* in another, and group by the number of *author last names*:

```
SELECT auth_lname, COUNT(*) FROM books GROUP BY auth_lname;
Returns:
auth_lname COUNT(*)
--------------------
Cialdini 1
Coehlo 1
Hancock 2
Herbert 1
Newport 1
Peterson 1
```

Another example of a query that would make use of this might be ‘Find the number of books published for each published year’:

```
SELECT published, COUNT(*) FROM books GROUP BY published;
Returns:
published COUNT(*)
--------------------
1966 1
1984 1
1988 1
2013 1
2016 2
2018 1
```

Here you can see that 2 books were published in 2016.

*MIN()* & *MAX()* are pretty self explanatory. They will produce the minimum (smallest) or maximum (largest) values.

Let’s say we want to find the publish year of the oldest book:

```
SELECT MIN(published) FROM books;
Returns:
1966
```

Or, perhaps we need to find out the number of pages of the largest book:

```
SELECT MAX(pages) FROM books;
Returns:
591
```

But what if we want to search for the title of the book that will take the longest to read? This is a little trickier, as the following won’t work:

```
SELECT title, MAX(pages) FROM books;
Returns:
title MAX(pages)
----------------------------
The Alchemist 591
```

We know from looking at the original table that this isn’t correct - The Alchemist has just 161 pages. What’s happening here? Well, since there is only one maximum value, SQL will only return one row - the first row. It’s seems a little counterintuitive at first but it makes sense from SQL’s point of view.

To find out which book is definitely the longest, we’ll need to make use of a ‘subquery'. A subquery, is simply a query within another query, wrapped up using brackets. For example:

```
SELECT title, pages FROM books
WHERE pages = (SELECT MAX(pages) FROM books);
Returns:
title pages
------------------------------
Magicians Of The Gods 591
```

The easiest way to think about this is actually to work *backwards*. Let’s break it down:

- In the subquery above, we’re simply selecting the highest number of pages from the table. This results in a value of 591.
- Then, we’re selecting the
*title*and*pages*columns where*pages*is equal to the result from the subquery (591). - As
*Magicians Of The Gods*' page count matches 591, SQL returns that result.

Pretty nifty. HOWEVER…

Using MIN() or MAX() can take up *a lot* of processing power, especially when working with large databases containing thousands of records. They need to query the entire table to figure out which value is the lowest or highest, which isn’t the most efficient way.

Instead, we can use *ORDER BY* and *LIMIT*:

```
SELECT title, pages FROM books
ORDER BY pages
DESC LIMIT 1;
Returns:
title pages
------------------------------
Magicians Of The Gods 591
```

Yep, that returns the same result. Plus, instead of looking for a maximum number, SQL can simply return the first row it finds in descending order of pages. Nice.

Where COUNT() returns the number of rows that match the criteria we pass in, *SUM()* returns the *sum total* of a numeric column.

We can use SUM() to find out the total number of pages from all of the books in the table. This query isn’t very useful in itself, but it helps to illustrate how SUM() works:

```
SELECT SUM(pages) FROM books;
Returns:
2905
```

So, rather than adding up the number of rows, it sums all *pages*.

The *AVG()* function will return the (you guessed it) *average value* of a numeric column. Following on from the last example, let’s calculate the average page count between all of the books:

```
SELECT AVG(pages) FROM books;
Returns:
415
```

You can imagine how useful this functionality would be if you were dealing with book sales, costs, prices etc.

We could even work out the average number of pages in the books by the year they were released:

```
SELECT published, AVG(pages) FROM books
GROUP BY published;
Returns:
published AVG(pages)
---------------------------
1966 577.0000
1984 320.0000
1988 161.0000
2013 551.0000
2016 443.5000
2018 409.0000
```

Note that integers are returned to *4 decimal places* - this is standard in SQL.

There we have it! You may have noticed that the dataset provided is super simple, but hopefully it helped you to understand the core functionality of aggregate functions.

You can see that, when working with large quantities of complex data, these tools would enable you to return useful information and answer complex questions. You'll be the talk of the town.