One of the biggest challenges for content management solutions is related to
the storage management due the high volumes of the unstoppable growing of
information.
Even if you have storage appliances and a lot of terabytes, thinks like
backup, compression, deduplication, storage relocation, encryption, availability
could be a nightmare.
One standard option that you have with the Oracle WebCenter Content is to
store data to the database. And the Oracle Database allows you leverage features
like compression, deduplication, encryption and seamless backup.
But with a huge volume, the challenge is passed to the DBA to keep the
WebCenter Content Database up and running.
One solution is the use of DB partitions for your content repository, but
what are the implications of this? Can I fit this with my business requirements?
Well, yes. It’s up to you how you will manage that, you just need a good
plan. During you “storage brainstorm plan” take in your mind what you need, such
as storage petabytes of documents? You need everything on-line? There’s a way to
logically separate the “good content” from the “legacy content”?
The first thing that comes to my mind is to use the creation date of the
document, but you need to remember that this document could receive a lot of
revisions and maybe you can consider the revision creation date. Your plan can
have also complex rules like per Document Type or per a custom metadata like
department or an hybrid per date, per DocType and an specific virtual folder.
Extrapolation the use, you can have your repository distributed in different
servers, different disks, different disk types (Such as ssds, sas, sata,
tape,…), separated accordingly your business requirements, separating the “hot”
content from the legacy and easily matching your compliance requirements.
If you think to use by revision, the simple way is to consider the dId, that
is the sequential unique id for every content created using the WebCenter
Content or the dLastModified that is the date field of the FileStorage table
that contains the date of inclusion of the content to the DB Table using
SecureFiles.
Using the scenario of partitioned repository using an hierarchical separation
by date, we will transform the FileStorage table in an partitioned table using
“Partition by Range” of the dLastModified column (You can use the dId or a join
with other tables for other metadata such as dDocType, Security, etc…).
The test scenario bellow covers:
Previous existent data on the JDBC Storage to be migrated to the new
partitioned JDBC Storage
Partition by Date
Automatically generation of new partitions based on a pre-defined interval
(Available only with Oracle Database 11g+)
Deduplication and Compression for legacy data
Oracle WebCenter Content 11g PS5 (Could present some
customizations that do not affect the test scenario)
For the test case you need some data stored using JDBC Storage to be the
“legacy” data. If you do not have done before, just create an Storage rule
pointed to the JDBC Storage:
Enable the metadata StorageRule in the UI and upload some documents using
this rule.
For this test case you can run using the schema owner or an dba user. We will
use the schema owner TESTS_OCS.
I can’t forgot to tell that this is just a test and you should do a proper
backup of your environment.
When you use the schema owner, you need some privileges, using the dba user
grant the privileges needed:
REM Grant privileges required for online redefinition.
GRANT EXECUTE ON DBMS_REDEFINITION TO TESTS_OCS;
GRANT ALTER ANY TABLE TO TESTS_OCS;
GRANT DROP ANY TABLE TO TESTS_OCS;
GRANT LOCK ANY TABLE TO TESTS_OCS;
GRANT CREATE ANY TABLE TO TESTS_OCS;
GRANT SELECT ANY TABLE TO TESTS_OCS;
REM Privileges required to perform cloning of dependent objects.
GRANT CREATE ANY TRIGGER TO TESTS_OCS;
GRANT CREATE ANY INDEX TO TESTS_OCS;
In our test scenario we will separate the content as Legacy, Day1, Day2, Day3
and Future. This last one will partitioned automatically using 3 tablespaces in
a round robin mode. In a real scenario the partition rule could be per month,
per year or any rule that you choose.
Table spaces for the test scenario:
CREATE TABLESPACE TESTS_OCS_PART_LEGACY DATAFILE 'tests_ocs_part_legacy.dat' SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
CREATE TABLESPACE TESTS_OCS_PART_DAY1 DATAFILE 'tests_ocs_part_day1.dat' SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
CREATE TABLESPACE TESTS_OCS_PART_DAY2 DATAFILE 'tests_ocs_part_day2.dat' SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
CREATE TABLESPACE TESTS_OCS_PART_DAY3 DATAFILE 'tests_ocs_part_day3.dat' SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
CREATE TABLESPACE TESTS_OCS_PART_ROUND_ROBIN_A 'tests_ocs_part_round_robin_a.dat' DATAFILE SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
CREATE TABLESPACE TESTS_OCS_PART_ROUND_ROBIN_B 'tests_ocs_part_round_robin_b.dat' DATAFILE SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
CREATE TABLESPACE TESTS_OCS_PART_ROUND_ROBIN_C 'tests_ocs_part_round_robin_c.dat' DATAFILE SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
Before start, gather optimizer statistics on the actual FileStorage table:
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'FileStorage', cascade => TRUE);
Now check if is possible execute the redefinition process:
EXEC DBMS_REDEFINITION.CAN_REDEF_TABLE('TESTS_OCS', 'FileStorage',DBMS_REDEFINITION.CONS_USE_PK);
If no errors messages, you are good to go.
Create a Partitioned Interim FileStorage table.
You need to create a new table with the partition information to act as an
interim table:
CREATE TABLE FILESTORAGE_Part
(
DID NUMBER(*,0) NOT NULL ENABLE,
DRENDITIONID VARCHAR2(30 CHAR) NOT NULL ENABLE,
DLASTMODIFIED TIMESTAMP (6),
DFILESIZE NUMBER(*,0),
DISDELETED VARCHAR2(1 CHAR),
BFILEDATA BLOB
)
LOB (BFILEDATA) STORE AS SECUREFILE
(
ENABLE STORAGE IN ROW
NOCACHE LOGGING
KEEP_DUPLICATES
NOCOMPRESS
)
PARTITION BY RANGE (DLASTMODIFIED)
INTERVAL (NUMTODSINTERVAL(1,'DAY'))
STORE IN (TESTS_OCS_PART_ROUND_ROBIN_A, TESTS_OCS_PART_ROUND_ROBIN_B, TESTS_OCS_PART_ROUND_ROBIN_C)
(
PARTITION FILESTORAGE_PART_LEGACY VALUES LESS THAN (TO_DATE('05-APR-2012 12.00.00 AM', 'DD-MON-YYYY HH.MI.SS AM'))
TABLESPACE TESTS_OCS_PART_LEGACY
LOB (BFILEDATA) STORE AS SECUREFILE
( TABLESPACE TESTS_OCS_PART_LEGACY
RETENTION NONE
DEDUPLICATE
COMPRESS HIGH
),
PARTITION FILESTORAGE_PART_DAY1 VALUES LESS THAN (TO_DATE('06-APR-2012 07.25.00 PM', 'DD-MON-YYYY HH.MI.SS AM'))
TABLESPACE TESTS_OCS_PART_DAY1
LOB (BFILEDATA) STORE AS SECUREFILE
( TABLESPACE TESTS_OCS_PART_DAY1
RETENTION AUTO
KEEP_DUPLICATES
COMPRESS
),
PARTITION FILESTORAGE_PART_DAY2 VALUES LESS THAN (TO_DATE('06-APR-2012 07.55.00 PM', 'DD-MON-YYYY HH.MI.SS AM'))
TABLESPACE TESTS_OCS_PART_DAY2
LOB (BFILEDATA) STORE AS SECUREFILE
( TABLESPACE TESTS_OCS_PART_DAY2
RETENTION AUTO
KEEP_DUPLICATES
NOCOMPRESS
),
PARTITION FILESTORAGE_PART_DAY3 VALUES LESS THAN (TO_DATE('06-APR-2012 07.58.00 PM', 'DD-MON-YYYY HH.MI.SS AM'))
TABLESPACE TESTS_OCS_PART_DAY3
LOB (BFILEDATA) STORE AS SECUREFILE
( TABLESPACE TESTS_OCS_PART_DAY3
RETENTION AUTO
KEEP_DUPLICATES
NOCOMPRESS
)
);
After the creation you should see your partitions defined.
Note that only the fixed range partitions have been created, none of the
interval partition have been created.
Start the redefinition process:
BEGIN
DBMS_REDEFINITION.START_REDEF_TABLE(
uname => 'TESTS_OCS'
,orig_table => 'FileStorage'
,int_table => 'FileStorage_PART'
,col_mapping => NULL
,options_flag => DBMS_REDEFINITION.CONS_USE_PK
);
END;
This operation can take some time to complete, depending how many contents
that you have and on the size of the table.
Using the DBA user you can check the progress with this command:
SELECT * FROM v$sesstat WHERE sid = 1;
Copy dependent objects:
DECLARE
redefinition_errors PLS_INTEGER := 0;
BEGIN
DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS(
uname => 'TESTS_OCS'
,orig_table => 'FileStorage'
,int_table => 'FileStorage_PART'
,copy_indexes => DBMS_REDEFINITION.CONS_ORIG_PARAMS
,copy_triggers => TRUE
,copy_constraints => TRUE
,copy_privileges => TRUE
,ignore_errors => TRUE
,num_errors => redefinition_errors
,copy_statistics => FALSE
,copy_mvlog => FALSE
);
IF (redefinition_errors > 0) THEN
DBMS_OUTPUT.PUT_LINE('>>> FileStorage to FileStorage_PART temp copy Errors: ' || TO_CHAR(redefinition_errors));
END IF;
END;
With the DBA user, verify that there's no errors:
SELECT object_name, base_table_name, ddl_txt FROM DBA_REDEFINITION_ERRORS;
*Note that will show 2 lines related to the constrains, this is expected.
Synchronize the interim table FileStorage_PART:
BEGIN
DBMS_REDEFINITION.SYNC_INTERIM_TABLE(
uname => 'TESTS_OCS',
orig_table => 'FileStorage',
int_table => 'FileStorage_PART');
END;
Gather statistics on the new table:
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'FileStorage_PART', cascade => TRUE);
Complete the redefinition:
BEGIN
DBMS_REDEFINITION.FINISH_REDEF_TABLE(
uname => 'TESTS_OCS',
orig_table => 'FileStorage',
int_table => 'FileStorage_PART');
END;
During the execution the FileStorage table is locked in exclusive mode until
finish the operation.
After the last command the FileStorage table is partitioned.
If you have contents out of the range partition, you should see the new
partitions created automatically, not generating an error if you “forgot” to
create all the future ranges. You will see something like:
You now can drop the FileStorage_PART table:
border-bottom-width: 1px; border-bottom-style: solid; text-align: left; border-left-color: silver; border-left-width: 1px; border-left-style: solid; padding-bottom: 4px; line-height: 12pt; background-color: #f4f4f4; margin-top: 20px; margin-right: 0px; margin-bottom: 10px; margin-left: 0px; padding-left: 4px; width: 97.5%; padding-right: 4px; font-family: 'Courier New', Courier, monospace; direction: ltr; max-height: 200px; font-size: 8pt; overflow-x: auto; overflow-y: auto; border-top-color: silver; border-top-width: 1px; border-top-style: solid; cursor: text; border-right-color: silver; border-right-width: 1px; border-right-style: solid; padding-top: 4px; " id="codeSnippetWrapper">
DROP TABLE FileStorage_PART PURGE;
To check the FileStorage table is valid and is partitioned, use the
command:
SELECT num_rows,partitioned
FROM user_tables
WHERE table_name = 'FILESTORAGE';
You can list the contents of the FileStorage table in a specific partition,
per example:
SELECT *
FROM FileStorage
PARTITION (FILESTORAGE_PART_LEGACY)
Some useful commands that you can use to check the partitions, note that you
need to run using a DBA user:
SELECT *
FROM DBA_TAB_PARTITIONS
WHERE table_name = 'FILESTORAGE';
SELECT *
FROM DBA_TABLESPACES
WHERE tablespace_name like 'TESTS_OCS%';
After the redefinition process complete you have a new FileStorage table
storing all content that has the Storage rule pointed to the JDBC Storage and
partitioned using the rule set during the creation of the temporary interim
FileStorage_PART table.
At this point you can test the WebCenter Content downloading the documents
(Original and Renditions). Note that the content could be already in the cache
area, take a look in the weblayout directory to see if a file with the same id
is there, then click on the web rendition of your test file and see if have
created the file and you can open, this means that is all working.
The redefinition process can be repeated many times, this allow you test what
the better layout, over and over again.
Now some interesting maintenance actions related to the partitions:
Make an tablespace read only.
No issues viewing, the WebCenter Content do not alter the revisions
When try to delete an content that is part of an read only tablespace, an
error will occurs and the document will not be deleted
The only way to prevent errors today is creating an custom component that
checks the partitions and if you have an document in an “Read Only” repository,
execute the deletion process of the metadata and mark the document to be deleted
on the next db maintenance, like a new redefinition.
Take an tablespace off-line for archiving purposes or any other reason.
When you try open an document that is included in this tablespace will
receive an error that was unable to retrieve the content, but the others online
tablespaces are not affected.
Same behavior when deleting documents.
Again, an custom component is the solution. If you have an document “out of
range”, the component can show an message that the repository for that document
is offline. This can be extended to a option to the user to request to put
online again.
Moving some legacy content to an offline repository (table) using the
Exchange option to move the content from one partition to a empty
nonpartitioned table like FileStorage_LEGACY. Note that this option will remove
the registers from the FileStorage and will not be able to open the stored
content. You always need to keep in mind the indexes and constrains.
An redefinition separating the original content (vault) from the renditions
and separate by date ate the same time. This could be an option for DAM
environments that want to have an special place for the renditions and put the
original files in a storage with less performance.
The process will be the same, you just need to change the script of
the interim table to use composite partitioning. Will be something like:
CREATE TABLE FILESTORAGE_RenditionPart
(
DID NUMBER(*,0) NOT NULL ENABLE,
DRENDITIONID VARCHAR2(30 CHAR) NOT NULL ENABLE,
DLASTMODIFIED TIMESTAMP (6),
DFILESIZE NUMBER(*,0),
DISDELETED VARCHAR2(1 CHAR),
BFILEDATA BLOB
)
LOB (BFILEDATA) STORE AS SECUREFILE
(
ENABLE STORAGE IN ROW
NOCACHE LOGGING
KEEP_DUPLICATES
NOCOMPRESS
)
PARTITION BY LIST (DRENDITIONID)
SUBPARTITION BY RANGE (DLASTMODIFIED)
(
PARTITION Vault VALUES ('primaryFile')
( SUBPARTITION FILESTORAGE_VAULT_LEGACY VALUES LESS THAN (TO_DATE('05-APR-2012 12.00.00 AM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_VAULT_DAY1 VALUES LESS THAN (TO_DATE('06-APR-2012 07.25.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_VAULT_DAY2 VALUES LESS THAN (TO_DATE('06-APR-2012 07.55.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_VAULT_DAY3 VALUES LESS THAN (TO_DATE('06-APR-2012 07.58.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_VAULT_FUTURE VALUES LESS THAN (MAXVALUE)
)
,PARTITION WebLayout VALUES ('webViewableFile')
( SUBPARTITION FILESTORAGE_WEBLAYOUT_LEGACY VALUES LESS THAN (TO_DATE('05-APR-2012 12.00.00 AM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_WEBLAYOUT_DAY1 VALUES LESS THAN (TO_DATE('06-APR-2012 07.25.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_WEBLAYOUT_DAY2 VALUES LESS THAN (TO_DATE('06-APR-2012 07.55.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_WEBLAYOUT_DAY3 VALUES LESS THAN (TO_DATE('06-APR-2012 07.58.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_WEBLAYOUT_FUTURE VALUES LESS THAN (MAXVALUE)
)
,PARTITION Special VALUES ('Special')
( SUBPARTITION FILESTORAGE_SPECIAL_LEGACY VALUES LESS THAN (TO_DATE('05-APR-2012 12.00.00 AM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_SPECIAL_DAY1 VALUES LESS THAN (TO_DATE('06-APR-2012 07.25.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_SPECIAL_DAY2 VALUES LESS THAN (TO_DATE('06-APR-2012 07.55.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_SPECIAL_DAY3 VALUES LESS THAN (TO_DATE('06-APR-2012 07.58.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_SPECIAL_FUTURE VALUES LESS THAN (MAXVALUE)
)
)ENABLE ROW MOVEMENT;
The next post related to partitioned repository will come with an sample
component to handle the possible exceptions when you need to take off line an
tablespace/partition or move to another place.
Also, we can include some integration to the Retention Management and Records
Management.
Another subject related to partitioning is the ability to create an FileStore
Provider pointed to a different database, raising the level of the distributed
storage vs. performance.
Let us know if this is important to you or you have an use case not listed,
leave a comment.
Cross-posted on the blog.ContentrA.com