Partitioning comes into play where indexes are not enough. Not only they enhance the performance but also improve the manageability and reduce the cost of queries. Tables, indexes and index-organized tables can be divided (partitioned) into smaller chunks of data to allow users or applications to manage the data easier. The biggest advantage of partitioning is that they act as individual objects and therefore they can be managed either individually or collectively. They have their names and eventually their storage.
It is all interesting but why don’t make all the tables partitioned, you might ask. Well, it is easy to answer. There is no point in partitioning small tables. You have to keep in mind that it consumes additional space and adds a little more agenda on Oracle’s side. Only huge tables with millions of rows are the best fit for partitioning. A general rule of thumbs says that partition should have roughly around 1 to 5 million rows. Historical tables are, for example, great candidates.
In order to be able to maintain a partitioned table properly always enable row movement. You can define this at the table creation or easily alter the existing partition table by executing the command:
ALTER TABLE my_table ENABLE ROW MOVEMENT;
Why? Because if you don’t do that and will try to update the partitioned key (considering the new value is outside of the partition definition – meaning it should be stored in another partition), it will end up with an error ORA-14402. However, if you enable row movement, you can easily update the partition key according to your needs with no bothering about any system errors 🙂
There are multiple different strategies you can use with regard to partitioning:
- Single-Level partitioning
- Hash
- Range (Interval)
- List
- Composite partitioning
- all partitions can be combined with the same methods for subpartitions (with certain limitations 🙂 )
- Reference partitioning
Single-Level partitioning
See the picture below to get a better understanding of how partitioning in Oracle works.
As I said, partitioning divides the table into smaller pieces that you can easily operate with. Imagine the table ORDER having billions of records in total with a couple of million rows monthly. You need to work with last month data (or a couple of last months) only and if the table was not partitioned, it would be very costly to run such queries; Oracle will perform a full table scan (meaning, you have to read all records (billions) to extract only a few millions. On the other hand, if the table is partitioned it is a piece of cake because you read only that particular partition (or multiple partitions) instead of the whole table. P1 represents the first partition where only (let’s say) 4 million records are stored. I bet you will agree that reading 4 million records is much easier than reading several billion records 🙂 As you can see from the picture above, not all partitions must have (and they don’t in reality) even distribution of records; some might have lesser records some more. It’s ok as long as the distribution IS somehow even. Having partitions within the same table once with a couple of hundred records and then with millions is definitely not ok and you should reconsider the partitioning definition.
Hash
Hash partitioning is very handy when you don’t have any definite partition key and still want to have them evenly distributed with roughly the same size. You just define the number of partitions and Oracle, based on his internal algorithm, will hash the partition key and store it in a particular partition (based on the generated hash). It is a great alternative to range partitioning. You can see some examples below.
-- hash partitioning by id_order, the number of partitions is set to 16 (a system generated name will be assigned to each partition) CREATE TABLE order ( id_order NUMBER(4) NOT NULL, id_product NUMBER(6), id_customer NUMBER(6), amt_price INTEGER ) PARTITION BY HASH(id_order) PARTITIONS 16; -- hash partitioning by id_order, the number of partitions is listed with custom names as well as tablespaces where those partitions will reside CREATE TABLE order ( id_order NUMBER(4) NOT NULL, id_product NUMBER(6), id_customer NUMBER(6), amt_price INTEGER ) PARTITION BY HASH(id_order) ( PARTITION p1 TABLESPACE tbs1, PARTITION p2 TABLESPACE tbs2, PARTITION p3 TABLESPACE tbs1, PARTITION p4 TABLESPACE tbs3 );
Range (Interval)
This is the most commonly used strategy. It organizes data using range mechanism which fits best for dates (but numbers might be used as well). All partitions contain keyword VALUES LESS THAN which defines the ranges between partitions. This works the best with an INTERVAL clause which helps you to define auto-creation of new partitions if needed. Without this keyword, you have to define all partitions before you load the data otherwise you will end up with a failure. I would recommend you go always with INTERVAL every time you are about to use RANGE partitioning; you can’t go wrong 🙂 See some examples below to get a clearer picture of how to work with RANGE partitioning.
With INTERVAL, you can specify whether you want to have it on a daily and lower basis (weeks, days, hours, minutes, …) or on a monthly and higher basis (monthly, quarterly, half-yearly, yearly, …).
- NumToYmInterval( x, y )
- Ym stands for Year2Month
- X stands for a number of periods to be stored in the partition
- Y stands for a period (you have only two choices: YEAR and MONTH)
- Examples:
- NumToYmInterval(1, ‘MONTH’) -> monthly partitions
- NumToYmInterval(3,’MONTH’)Â -> quarterly
- NumToYmInterval(4,’MONTH’) -> partitions by 4 months
- NumToYmInterval(1,’YEAR’) -> yearly
- etc …
- NumToDsInterval( x , y)
- Ds stands for Day2Second
- X stands for a number of periods to be stored in the partition
- Y stands for a period (you have only four choices: DAY, HOUR, MINUTE, and SECOND)
- Examples:
- NumToDsInterval(1, ‘DAY’) -> daily
- NumToDsInterval(7, ‘DAY’) -> weekly
- NumToDsInterval(60,’MINUTE’) -> hourly
- NumToDsInterval(1, ‘HOUR’) -> hourly (same as the one above)
- etc
-- range partitioning by number CREATE TABLE order ( id_order NUMBER(4) NOT NULL, id_product NUMBER(6), id_customer NUMBER(6), amt_price INTEGER ) PARTITION BY RANGE(id_product) ( PARTITION p1 VALUES LESS THAN (6), PARTITION p2 VALUES LESS THAN (12), PARTITION p3 VALUES LESS THAN (18), PARTITION p4 VALUES LESS THAN (24), PARTITION p5 VALUES LESS THAN MAXVALUE -- everything above 24 (including) will be stored here in this default partition ); -- range partitioning by date using interval and only one default partition ... the rest will be generated based on the definition (on monthly basis) CREATE TABLE order ( id_order NUMBER(4) NOT NULL, id_product NUMBER(6), id_customer NUMBER(6), amt_price INTEGER, dtime_order DATE ) PARTITION BY RANGE(dtime_order ) INTERVAL (NumToYmInterval(1,'MONTH')) ( PARTITION p_init VALUES LESS THAN (DATE '1900-01-01') ) ENABLE ROW MOVEMENT -- enable the partition key to be moved across partitions /* in case you want to populate the table with data already uncomment this AS SELECT .... .. you query goes here .... */ ;
There might be cases where you just a few historical data and generating the auto-partitioning might not be a good idea.
Let imagine that the distribution per a specific period (monthly in this case) is as follow:
2015/01 | 400 |
2015/02 | 1239 |
2015/03 | 3490 |
2015/04 | 3803 |
2015/05 | 9389 |
2015/06 | 3493 |
2015/07 | 93489 |
2015/08 | 109382 |
2015/09 | 203290 |
2015/10 | 539080 |
2015/11 | 890380 |
2015/12 | 1308302 |
2016/01 | 1302183 |
2016/02 | 2091830 |
2016/03 | 2309480 |
2016/04 | 5484799 |
2016/05 | 3908090 |
2016/06 | 5009230 |
2016/07 | 7028030 |
2016/08 | 7608908 |
To follow the general rule of thumb we need to try to have an even distribution of rows at around 1-5 million records per partition. If we generated partitions dynamically with INTERVAL from the beginning it would be a pure waste. Let’s have a closer look a try to compose it. The year 2015 puts together something above 3 million and the first three months in 2016 nearly 6 million. According to the trend, we can expect that the future records will be around 5 million records per month which we are satisfied with and will have Oracle to create partitions dynamically. See the example below how we can achieve it:
CREATE TABLE order ( id_order NUMBER(4) NOT NULL, id_product NUMBER(6), id_customer NUMBER(6), amt_price INTEGER, dtime_order DATE ) PARTITION BY RANGE(dtime_order ) INTERVAL (NumToYmInterval(1,'MONTH')) ( PARTITION p_2015 VALUES LESS THAN (DATE '2016-01-01'), -- all records less than 2016 will be stored here PARTITION p_2016_Q1 VALUES LESS THAN (DATE '2016-04-01') -- all records within the first three months of 2016 will be stored here -- there is no need to create any other partition definition. Oracle will use the INTERVAL to create monthly partitions if there will be an attempt to insert date bigger than what is defined the previous partition ) ;
With this example, you can easily adjust row distribution according to your needs and optimize the structures.
List
List partitioning is very easy. It’s most commonly used in a case where you have a definite number of values that will act as a partition key. Then, you just have to define the distribution of values across partitions and that’s it 🙂 With this strategy, you have two more wild-cards to be used: DEFAULT and NULL. NULL keyword is used when the partition key is NULL and the DEFAULT when the partition key does not match with already defined partitions. See the example below:
CREATE TABLE order_by_product ( qty INTEGER, amt_total INTEGER, product_desc VARCHAR2(255), product_code VARCHAR2(20) ) PARTITION BY LIST (product_code) ( PARTITION prod_online VALUES ('PA1','PA2','PA3','PA4'), PARTITION prod_promo VALUES ('ACT_01','ACT_O2','ACT_03','ACT_04','ACT_05','ACT_XMAS2016'), PARTITION prod_tele VALUES ('TPA','TPB','TPC','TPC_01','TPC_02','TPD','TPE','TPF'), PARTITION prod_null VALUES (NULL), PARTITION prod_unknown VALUES (DEFAULT) -- all product codes that are not listed above and are NOT NULL will be stored here );
There might be a need for adding or removing values from the definition of partitions. It’s a piece of cake too 🙂
-- adding new values ALTER TABLE order_by_product MODIFY PARTITION prod_online ADD VALUES ('PA5'); -- removing existing values ALTER TABLE order_by_product MODIFY PARTITION prod_promo DROP VALUES ('ACT_02');
Composite partitioning
As a matter of fact, it is exactly the same with just one difference; also partitions are divided into smaller chunks called subpartitions 🙂 See the picture below for better illustration.
Here are the only allowed combinations for composite partitioning. The definition is the same as it is with single-level partitioning.
- Range-Range
- Range-Hash
- Range-List
- List-Range
- List-Hash
- List-List
The general syntax for composite partitioning is:
CREATE TABLE order ( id_order NUMBER(4) NOT NULL, code_product VARCHAR2(50), id_customer NUMBER(6), amt_price INTEGER, dtime_order DATE ) PARTITION BY RANGE(dtime_order ) INTERVAL (NumToYmInterval(1,'MONTH')) SUBPARTITION BY LIST (id_product) SUBPARTITION TEMPLATE ( SUBPARTITION prod_online VALUES ('PA1','PA2','PA3','PA4'), SUBPARTITION prod_promo VALUES ('ACT_01','ACT_O2','ACT_03','ACT_04','ACT_05','ACT_XMAS2016'), SUBPARTITION prod_tele VALUES ('TPA','TPB','TPC','TPC_01','TPC_02','TPD','TPE','TPF'), SUBPARTITION prod_null VALUES (NULL), SUBPARTITION prod_unknown VALUES (DEFAULT) -- all product codes that are not listed above and are NOT NULL will ) ( PARTITION p_init VALUES LESS THAN (DATE '1900-01-01') ) ENABLE ROW MOVEMENT -- enable the partition key to be moved across partitions ;
This will crate RANGE-LIST composite partitioning and automatically creates partitions for newly inserted records based on the definition. You can use any combination allowed to solve your needs 🙂
Reference partitioning
Reference partitioning is a very special approach. It refers to a relationship between two tables where only one is partitioned. The relationship is defined by primary and foreign keys (parent-child relationship). In order to simplify the logic and make it less error-prone, you can store the partition key only in the parent table instead of storing it in both tables. All strategies are available in reference to partitioning (hash, list, range).
Structure without reference partitioning:
CREATE TABLE order ( id_order INTEGER NOT NULL, dtime_order DATE, -- partition key -- rest of data amt_total INTEGER, qty INTEGER ); CREATE TABLE order_item ( id_order INTEGER NOT NULL, dtime_order DATE, -- duplicated partition key -- rest of data id_product INTEGER, code_action VARCHAR2(50) ); -- in order to activate partition pruning in both tables, you have to use both dates in filter (not to mention the maintenance of both tables will be clumsy) SELECT * FROM order o JOIN order_item oi ON oi.id_order = o.id_order AND oi.dtime_order = o.dtime_order WHERE 1=1 AND o.dtime_order BETWEEN trunc(sysdate)-3 AND sysdate
Here is the version with reference partitioning
CREATE TABLE order ( id_order INTEGER NOT NULL, -- primary key dtime_order DATE, -- partition key -- rest of data amt_total INTEGER, qty INTEGER, CONSTRAINT order_pk PRIMARY KEY (id_order) ); CREATE TABLE order_item ( id_order INTEGER NOT NULL, -- foreign key -- rest of data id_product INTEGER, code_action VARCHAR2(50), CONSTRAINT order_item_fk FOREIGN KEY (id_order) REFERRENCES order(id_order) -- ON DELETE [CASCADE, SET NULL, NO ACTION] ); /**** * * You can alternatively set the ON DELETE clause to cascade the action done on a parent key * CASCADE - when you delete a row in a parent table, all rows referencing to that particular row will be deleted in the child table as well * SET NULL - when you delete a row in a parent table, all rows referencing to that particular row will be set to null * NO ACTION - it is set as default when you don't specify other; when you try to delete a row in a parent table, you will be unable to do so until there is at least one record in the child table * ****/ -- then the query might look as follows and the partitioning will be activated for the child table as well (meaning only those id_orders will be selected that are located within the selected period) SELECT * FROM order o JOIN order_item oi ON oi.id_order = o.id_order WHERE 1=1 AND o.dtime_order BETWEEN trunc(sysdate)-3 AND sysdate
Merge Partitions/Subpartitions
You might need to merge partitions in order to optimize them. There is no need to rebuild the table again with just a slight mistake you did while creating the table. However, in 11g you can only merge two consecutive (neighboring) partitions (the preceding must be merged into the following one). On the other hand, 12c is way more mature and allows us to merge multiple partitions into one 🙂
The general syntax is:
- 11g
- ALTER TABLE table_name MERGE PARTITIONS partition_name, partition_name2 INTO PARTITION partition_with_new_name;
- 12c
- ALTER TABLE table_name MERGE PARTITIONS (list_of_partitions; comma separated) INTO PARTITION partition_with_new_name;
- or ALTER TABLE table_name MERGE PARTITIONS p_01 TO p_05 INTO PARTITION partition_with_new_name;
ALTER TABLE order MERGE PARTITIONS p_201501, p_201502 INTO PARTITION p_2015_01_02;
Conclusion
To use partitions properly you have to follow the same logic as with indexes; you have to use the partitioned key as it is defined. Otherwise, partition pruning won’t be activated. To activate partitions and subpartitions as well, you have to use both keys – it’s not enough when you filter the subpartition only. You can use function-based partitioning as well but be careful about the usage – as mentioned above, you have to use the exact same definition! Be very mindful while creating the function-based partitions.
There is one thing you should know about partitions at the end of this article. Once you have a regular heap table, there is no chance you can convert it to a partitioned table. The best way to “partition” it is to recreate it again and, as I mentioned in my previous article about DLL statements, CTAS is the name of the game 🙂 Do NOT (ever!) create an empty partitioned table structure and then migrate the data there.