Subqueries

Subqueries come in handy when it comes to a complex logic. There are many cases where it is easier to tackle the problem using subqueries.

Let me start with an example right away.

/*
  Task: Display all products with price lower than average price of all products
*/

SELECT id
, product_name
FROM products
WHERE product_price < (SELECT avg(product_price) FROM products)

You can use a subquery  in SELECT, FROM, WHERE clause and even as a JOINed result set. See some examples below

-- subquery in SELECT
SELECT product_name
, (SELECT count(*) FROM product) as total_products
FROM product

-- subquery in FROM 
SELECT product_name
, product_price
FROM (SELECT * FROM product WHERE product_type = 'TV')

-- subquery as a JOIN result set
SELECT p.product_name
, p.product_price
, o.orders#
FROM product p
JOIN (SELECT id_product
      , count(*) as orders#
      FROM order
      WHERE status = 'finsihed'
      GROUP BY id_product
     ) o
ON p.id_product = o.id_product

-- subquery in WHERE
SELECT product_name
, product_price
FROM product
WHERE product_name IN (SELECT product_name FROM promo_products)

-- you can even use multiple-level subqueries if needed
SELECT product_name
, (SELECT count(*) 
   FROM product 
    WHERE product_price < (SELECT avg(product_price) 
                           FROM product) 
   ) as under_avg_prod_cnt 
FROM product
WHERE 1=1
AND id_product IN (SELECT id_product
                   FROM (
                          SELECT id_product
                          , count(*) as cnt
                          FROM order
                          WHERE date_order >= trunc(sysdate, 'MM')
                         )
                    ) 

Subqueries are executed first before their parent queries. It is very important to realize which type of subquery we use:

  1. Single-row – will return only one record and thus we can easily use any operator we want
  2. Multi-row – will return two or more records and therefore we have to use set operators (IN, ALL, ANY, NOT IN)
  3. Correlated – these are very inefficient and should be avoided at all costs because they will be executed for every single row and therefore will generate a high cost

Not only they are handy in terms of shorter code but also with regard to the performance. With subqueries, you can restrict selected data and extract only those you need instead of performing Full Table Scan (reading all records in a table).

SELECT e.employee_name
, e.employee_number
, a.date_attendace
, a.clock_in
, a.clock_out
FROM employee e
JOIN (SELECT id_employee
      , date_attedance
      , clock_in
      , clock_out
      FROM attendance
      WHERE date_attendance >= trunc(sysdate)-7
     ) a
ON a.id_employee = e.id_employee

Another great feature for subqueries is called WITH clause. This is especially useful when you have multiple long subqueries and the code is becoming very messy.

--- instead of this
SELECT *
FROM table_1 t1
LEFT JOIN (SELECT 
           FROM table_Y 
           /* multiple lines here */
           WHERE 1=1
           AND ...
           ) sub_A
ON sub_A.id = t1.id
LEFT JOIN (SELECT 
           FROM table_X 
           /* multiple lines here */
           WHERE 1=1
           ) sub_B
ON sub_B.id = t1.id
WHERE 1=1
AND t1.col_name IN (SELECT col_other_name FROM table_2 WHERE parameter = 'XNA')

-- with the WITH clause it's way more readable
WITH w_sub_A AS (SELECT 
           FROM table_Y 
           /* multiple lines here */
           WHERE 1=1
           AND ...
           )
, w_sub_B AS (SELECT 
           FROM table_X 
           /* multiple lines here */
           WHERE 1=1
           )
, w_t2 (SELECT t2.col_div / t3.col_data as col_other_name 
        FROM table_2  t2
        LEFT JOIN table_3 t3
         ON t2.id = t3.id
         AND t2.date_time = t3.date_time
         WHERE t2.parameter = 'XNA')

--- instead of this
SELECT *
FROM table_1 t1
LEFT JOIN w_sub_A
ON sub_A.id = t1.id
LEFT JOIN w_sub_B
ON sub_B.id = t1.id
WHERE 1=1
AND t1.col_name IN (SELECT col_other_name FROM w_t2)

The final query looks here very friendly and even the rest is very well organized. We can work with that as with separate data subsets. Another advantage of using WITH clauses is that you can easily reuse the same data set as many times as you want and in case of any change in your code, you will need to change it only once at one place and thus you will not most likely make a mistake.

WITH w_reuse AS (
 SELECT *
 FROM multi_src
 )

SELECT *
FROM w_reuse a
LEFT JOIN w_reuse b
ON a.id = b.id
LEFT JOIN w_reuse c
ON c.id2 = b.id2
WHERE 1=1

There is one thing you have to be cautious about and it is when using big tables (big result sets) in a WITH clause and you are going to use it multiple times, you have to use MATERIALIZE hint. You can find more in my article about hints.

Subqueries are very powerful and may help you with your tasks but this door swings both ways. It means it could be a killer for your performance when incorrectly written.