This blog describes a tool I developed which allows you to compare the data types and data sizes found in the cube’s data source view with the data types/sizes of the corresponding dimensional attribute.
Why is this important? Well when creating named queries in a cube’s data source view, it is often necessary to use the SQL CAST or CONVERT operation to change the data type to something more appropriate for SSAS. This is particularly important when your cube is based on an Oracle data source or using custom SQL queries rather than views in the relational database. The problem with BIDS is that if you change the underlying SQL query, then the size of the data type in the dimension does not update automatically. This then causes problems during deployment whereby processing the dimension fails because the data in the relational database is wider than that allowed by the dimensional attribute.
In particular, if you use some string manipulation functions provided by SQL Server or Oracle in your queries, you may find that the 10 character string you expect suddenly turns into an 8,000 character monster. For example, the SQL Server function REPLACE returns column with a width of 8,000 characters. So if you use this function in the named query in your DSV, you will get a column width of 8,000 characters. Although the Oracle REPLACE function is far more intelligent, the generated column size could still be way bigger than the maximum length of the data actually in the field.
Now this may not be a problem when prototyping, but in your production cubes you really should clean up this kind of thing as these massive strings will add to processing times and storage space. Similarly, you do not want to forget to change the size of the dimension attribute if your database columns increase in size.
Introducing CheckCubeDataTypes Utiltity
The CheckCubeDataTypes application extracts all the data types and data sizes for all attributes in the cube and compares them to the data types and data sizes in the cube’s data source view. It then generates an Excel CSV file which contains all this metadata along with a flag indicating if there is a mismatch between the DSV and the dimensional attribute. Note that the app not only checks all the attribute keys but also the name and value columns for each attribute. Another benefit of having the metadata held in a CSV text file format is that you can place the file under source code control. This allows you to compare the metadata of the previous cube release with your new release to highlight problems introduced by new development.
You can download the C# source code from here: CheckCubeDataTypes.zip
A typical example of the output Excel CSV file is shown below - note that the last column shows a data size mismatch by TRUE appearing in the column