At the forthcoming SQLBits conference, I will be presenting a session on how to dynamically build an OLAP data mart on-the-fly. This blog entry is intended to clarify exactly what I mean by an OLAP data mart, why you may need to build them on-the-fly and finally outline the steps needed to build them dynamically. In subsequent blog entries, I will present exactly how to implement some of the techniques involved.
What is an OLAP data mart?
In data warehousing parlance, a data mart is a subset of the overall corporate data provided to business users to meet specific business needs. Of course, the term does not specify the technology involved, so I coined the term "OLAP data mart" to identify a subset of data which is delivered in the form of an OLAP cube which may be accompanied by the relational database upon which it was built. To clarify, the relational database is specifically create and loaded with the subset of data and then the OLAP cube is built and processed to make the data available to the end-users via standard OLAP client tools.
Why build OLAP data marts?
Market research companies sell data to their clients to make money. To gain competitive advantage, market research providers like to "add value" to their data by providing systems that enhance analytics, thereby allowing clients to make best use of the data. As such, OLAP cubes have become a standard way of delivering added value to clients. They can be built on-the-fly to hold specific data sets and meet particular needs and then hosted on a secure intranet site for remote access, or shipped to clients' own infrastructure for hosting. Even better, they support a wide range of different tools for analytical purposes, including the ever popular Microsoft Excel.
Extension Attributes: The Challenge
One of the key challenges in building multiple OLAP data marts based on the same 'template' is handling extension attributes. These are attributes that meet the client's specific reporting needs, but do not form part of the standard template. Now clearly, these extension attributes have to come into the system via additional files and ultimately be added to relational tables so they can end up in the OLAP cube. However, processing these files and filling dynamically altered tables with SSIS is a challenge as SSIS packages tend to break as soon as the database schema changes.
There are two approaches to this: (1) dynamically build an SSIS package in memory to match the new database schema using C#, or (2) have the extension attributes provided as name/value pairs so the file's schema does not change and can easily be loaded using SSIS. The problem with the first approach is the complexity of writing an awful lot of complex C# code. The problem of the second approach is that name/value pairs are useless to an OLAP cube; so they have to be pivoted back into a proper relational table somewhere in the data load process WITHOUT breaking SSIS. How this can be done will be part of future blog entry.
What is involved in building an OLAP data mart?
There are a great many steps involved in building OLAP data marts on-the-fly. The key point is that all the steps must be automated to allow for the production of multiple OLAP data marts per day (i.e. many thousands, each with its own specific data set and attributes).
Now most of these steps have a great deal in common with standard data warehouse practices. The key difference is that the databases are all built to order. The only permanent database is the metadata database (shown in orange) which holds all the metadata needed to build everything else (i.e. client orders, configuration information, connection strings, client specific requirements and attributes etc.). The staging database (shown in red) has a short life: it is built, populated and then ripped down as soon as the OLAP Data Mart has been populated. In the diagram below, the OLAP data mart comprises the two blue components: the Data Mart which is a relational database and the OLAP Cube which is an OLAP database implemented using Microsoft Analysis Services (SSAS). The client may receive just the OLAP cube or both components together depending on their reporting requirements.
So, in broad terms the steps required to fulfil a client order are as follows:
Step 1: Prepare metadata
Create a set of database names unique to the client's order
Modify all package connection strings to be used by SSIS to point to new databases and file locations.
Step 2: Create relational databases
Create the staging and data mart relational databases using dynamic SQL and set the database recovery mode to SIMPLE as we do not need the overhead of logging anything
Execute SQL scripts to build all database objects (tables, views, functions and stored procedures) in the two databases
Step 3: Load staging database
Use SSIS to load all data files into the staging database in a parallel operation
Load extension files containing name/value pairs. These will provide client-specific attributes in the OLAP cube.
Step 4: Load data mart relational database
Load the data from staging into the data mart relational database, again in parallel where possible
Allocate surrogate keys and use SSIS to perform surrogate key lookup during the load of fact tables
Step 5: Load extension tables & attributes
Pivot the extension attributes from their native name/value pairs into proper relational tables
Add the extension attributes to the views used by OLAP cube
Step 6: Deploy & Process OLAP cube
Deploy the OLAP database directly to the server using a C# script task in SSIS
Modify the connection string used by the OLAP cube to point to the data mart relational database
Modify the cube structure to add the extension attributes to both the data source view and the relevant dimensions
Remove any standard attributes that not required
Process the OLAP cube
Step 7: Backup and drop databases
Drop staging database as it is no longer required
Backup data mart relational and OLAP database and ship these to the client's infrastructure
Drop data mart relational and OLAP database from the build server
Mark order complete
Start processing the next order, ad infinitum.
So my future blog posts and my forthcoming session at the SQLBits conference will all focus on some of the more interesting aspects of building OLAP data marts on-the-fly such as handling the load of extension attributes and how to dynamically alter the structure of an OLAP cube using C#.