Database Partitioning and Multiple Data Source Considerations
- by Jeffrey McDaniel
With the release of P6 Reporting Database 3.0 partitioning was added as a feature to help with performance and data management. Careful investigation of requirements should be conducting prior to installation to help improve overall performance throughout the lifecycle of the data warehouse, preventing future maintenance that would result in data loss. Before installation try to determine how many data sources and partitions will be required along with the ranges. In P6 Reporting Database 3.0 any adjustments outside of defaults must be made in the scripts and changes will require new ETL runs for each data source.
Considerations:
1. Standard Edition or Enterprise Edition of Oracle Database.
If you aren't using Oracle Enterprise Edition Database; the partitioning feature is not available. Multiple Data sources are only supported on Enterprise Edition of Oracle Database.
2. Number of Data source Ids for partitioning during configuration.
This setting will specify how many partitions will be allocated for tables containing data source information. This setting requires some evaluation prior to installation as there are repercussions if you don't estimate correctly.
For example, if you configured the software for only 2 data sources and the partition setting was set to 2, however along came a 3rd data source. The necessary steps to accommodate this change are as follows:
a) By default, 3 partitions are configured in the Reporting Database scripts. Edit the create_star_tables_part.sql script located in <installation directory>\star\scripts and search for partition. You’ll see P1, P2, P3. Add additional partitions and sub-partitions for P4 and so on. These will appear in several areas. (See P6 Reporting Database 3.0 Installation and Configuration guide for more information on this and how to adjust partition ranges).
b) Run starETL -r. This will recreate each table with the new partition key. The effect of this step is that all tables data will be lost except for history related tables.
c) Run starETL for each of the 3 data sources (with the data source # (starETL.bat "-s2" -as defined in P6 Reporting Database 3.0 Installation and Configuration guide)
The best strategy for this setting is to overestimate based on possible growth. If during implementation it is deemed that there are atleast 2 data sources with possibility for growth, it is a better idea to set this setting to 4 or 5, allowing room for the future and preventing a ‘start over’ scenario.
3. The Number of Partitions and the Number of Months per Partitions are not specific to multi-data source. These settings work in accordance to a sub partition of larger tables with regard to time related data. These settings are dataset specific for optimization. The number of months per partition is self explanatory, optimally the smaller the partition, the better query performance so if the dataset has an extremely large number of spread/history records, a lower number of months is optimal. Working in accordance with this setting is the number of partitions, this will determine how many "buckets" will be created per the number of months setting. For example, if you kept the default for # of partitions of 3, and select 2 months for each partitions you would end up with:
-1st partition, 2 months
-2nd partition, 2 months
-3rd partition, all the remaining records
Therefore with records to this setting, it is important to analyze your source db spread ranges and history settings when determining the proper number of months per partition and number of partitions to optimize performance. Also be aware the DBA will need to monitor when these partition ranges will fill up and when additional partitions will need to be added. If you get to the final range partition and there are no additional range partitions all data will be included into the last partition.