Statistics

Statistics are one of the crucial parts of Oracle when it comes to performance. Invalid or obsolete statistics means that the optimizer will choose a bad plan and therefore the query might run pretty slow.

Purpose

Every time the Optimizer is about to compose an execution plan – it is all based on information about tables (heap and partitioned) and indexes. If this information is not correct the execution plan will most probably be incorrect as well.

How does it work

The usual scenario in a Data Warehouse is that the number of rows in tables changes (usually grows). To provide a corresponding execution plan, the Optimizer must know about the changes. Originally, there might be just a couple of rows in a table and Full Table Scan was a better choice even though there was an index. However, in time the number of records will grow and an Index scan might be a faster solution. To keep the Optimizer updated Oracle prepared statistics.

They analyze data in all tables and prepare a corresponding output. If you have ever seen any execution plan, there was a column named ROWS. This is exactly the output from statistics.  Statistics are usually updated during the night after the load into the Data Warehouse.

To check your statistics whether they are stale run this command

-- table statistics
SELECT owner
, table_name
, last_analyzed
, stale_stats
FROM dba_tab_statistics
WHERE 1=1
AND last_analyzed < sysdate - 30 -- older than a month (doesn't necessarily mean outdated)
OR stale_stats = 'YES';

-- column statistics
SELECT owner
, table_name
, column_name
, last_analyzed
FROM dba_tab_col_statistics
WHERE 1=1
AND last_analyzed < sysdate - 30 -- older than a month (doesn't necessarily mean outdated)
;


-- index statistics
SELECT owner
, index_name
, table_owner
, table_name
, last_analyzed
, stale_stats
FROM dba_ind_statistics
WHERE 1=1
AND last_analyzed < sysdate - 30 -- older than a month (doesn't necessarily mean outdated)
OR stale_stats = 'YES';


-- partition statistics
SELECT owner
, table_name
, partition_name
, column_name
, last_analyzed
FROM dba_ind_statistics
WHERE 1=1
AND last_analyzed < sysdate - 30 -- older than a month (doesn't necessarily mean outdated)
;

To gather statistics, there is a system package called DBMS_STATS. See some examples of how you can use it.

-- gather whole schema stats
BEGIN 
  dbms_stats.gather_schema_stats(ownname => 'MY_SCHEMA_NAME', 
                                 cascade => TRUE);
END;

-- gather table stats
BEGIN
  dbms_stats.gather_table_stats(ownname => 'MY_SCHEMA_NAME',
                                tabname => 'TABLE_NAME',
                                cascade => TRUE);
END;

-- gather index stats
BEGIN
   dbms_stats.gather_index_stats(ownname => 'MY_SCHEMA_NAME',
                                 indname => 'INDEX_NAME');
END;

-- gather histogram for a particular column
BEGIN
   dbms_stats.gather_table_stats(ownname => 'MY_SCHEMA_NAME',
                                 tabname => 'TABLE_NAME', 
                                 method_opt => ('FOR COLUMN ID_USER SIZE 254');
END;

* CASCADE option enforces indexes to be analyzed as well.

Global Temporary Tables (GTT) and statistics

Since Global Temporary Tables are most of the time empty, statistics will not be accurate during the “after-load” update. It is a good idea to include the manual update of statistics after the load of your GTT so the Optimizer can get a valid input for its decision-making process.