Statistics are one of the crucial parts of Oracle when it comes to performance. Invalid or obsolete statistics means that the optimizer will choose a bad plan and therefore the query might run pretty slow.
Purpose
Every time the Optimizer is about to compose an execution plan – it is all based on information about tables (heap and partitioned) and indexes. If this information is not correct the execution plan will most probably be incorrect as well.
How does it work
The usual scenario in a Data Warehouse is that the number of rows in tables changes (usually grows). To provide a corresponding execution plan, the Optimizer must know about the changes. Originally, there might be just a couple of rows in a table and Full Table Scan was a better choice even though there was an index. However, in time the number of records will grow and an Index scan might be a faster solution. To keep the Optimizer updated Oracle prepared statistics.
They analyze data in all tables and prepare a corresponding output. If you have ever seen any execution plan, there was a column named ROWS. This is exactly the output from statistics. Statistics are usually updated during the night after the load into the Data Warehouse.
To check your statistics whether they are stale run this command
-- table statistics SELECT owner , table_name , last_analyzed , stale_stats FROM dba_tab_statistics WHERE 1=1 AND last_analyzed < sysdate - 30 -- older than a month (doesn't necessarily mean outdated) OR stale_stats = 'YES'; -- column statistics SELECT owner , table_name , column_name , last_analyzed FROM dba_tab_col_statistics WHERE 1=1 AND last_analyzed < sysdate - 30 -- older than a month (doesn't necessarily mean outdated) ; -- index statistics SELECT owner , index_name , table_owner , table_name , last_analyzed , stale_stats FROM dba_ind_statistics WHERE 1=1 AND last_analyzed < sysdate - 30 -- older than a month (doesn't necessarily mean outdated) OR stale_stats = 'YES'; -- partition statistics SELECT owner , table_name , partition_name , column_name , last_analyzed FROM dba_ind_statistics WHERE 1=1 AND last_analyzed < sysdate - 30 -- older than a month (doesn't necessarily mean outdated) ;
To gather statistics, there is a system package called DBMS_STATS. See some examples of how you can use it.
-- gather whole schema stats BEGIN dbms_stats.gather_schema_stats(ownname => 'MY_SCHEMA_NAME', cascade => TRUE); END; -- gather table stats BEGIN dbms_stats.gather_table_stats(ownname => 'MY_SCHEMA_NAME', tabname => 'TABLE_NAME', cascade => TRUE); END; -- gather index stats BEGIN dbms_stats.gather_index_stats(ownname => 'MY_SCHEMA_NAME', indname => 'INDEX_NAME'); END; -- gather histogram for a particular column BEGIN dbms_stats.gather_table_stats(ownname => 'MY_SCHEMA_NAME', tabname => 'TABLE_NAME', method_opt => ('FOR COLUMN ID_USER SIZE 254'); END;
* CASCADE option enforces indexes to be analyzed as well.
Global Temporary Tables (GTT) and statistics
Since Global Temporary Tables are most of the time empty, statistics will not be accurate during the “after-load” update. It is a good idea to include the manual update of statistics after the load of your GTT so the Optimizer can get a valid input for its decision-making process.