Data Aggregation Using GROUP BY

Anytime you need to aggregate some data you have to define two things:

  • what are the dimensions
  • what are the facts

First of all, what is a dimension and what is a fact? Let me explain that

Dimension table

Dimension tables group dimensions facts together based on their nature. They contain basic business entities (such as customers, products, salesrooms, employees, etc.). They are mostly represented by a string value.

Fact table 

Fact tables are represented mostly by numbers and are related to a particular business process. Fact tables contain some measurement that might be easily manipulated (sum, count, …). To give you a few examples, fact tables might store information about calls, sales, payments, purchases, etc.

Now, when you know what is the difference between the dimension and fact, let me show you how would a general query look like.

SELECT dimension_1
, dimension_2
, dimension_3
, count(fact_1)
, sum(fact_2)
FROM sample_table
GROUP BY dimension_1
, dimension_2
, dimension_3

As you can see in the example, we only mostly aggregate facts. We can aggregate also dimensions in order to get the number of customers we would need to count them

-- to get unique number of customers who purchased a TV
SELECT count(distinct customers)
FROM purchase
AND id_product = 10 -- TV

You can execute these single row functions against a single report or grouped data (multiple dimensions). As long as you do not have any dimension involved, there is no need to use GROUP BY  (as you can see in the example above).

However, as long as you select a dimension (non-aggregated field) you have to list it down to the GROUP BY section.

Unlike in ORDER BY, you can use neither aliases or number references to a column position in GROUP BY. On the other side, you do not have to put your GROUP BY columns in a SELECT area necessarily (which does not work the other way around – if you have a certain column in your selection (SELECT part) and it is not aggregated, it must be listed in GROUP BY as well).

-- this query will get the count of employees in each department
SELECT count(id_employee) as emp_nbr
FROM employee
GROUP BY id_department


-- this is the exact query except you can see the departments
SELECT id_department
, count(id_employee) as emp_nbr
FROM employee
GROUP BY id_department

id_department | emb_nbr
    1001      |  10
    1010      |  13
    1055      |   5
    1068      |   8 

To be able to filter aggregated results, Oracle prepared a feature that goes hand in hand with GROUP BY. It is called HAVING and you can easily filter aggregated records with this directive. Find more details about it in the article SELECT statement (decomposed).

GROUP BY clause does not have to be used for aggregates only but it is also capable of giving you the unique records from your selection.