Working with packed dates in SSIS
- by Jim Giercyk
One of the challenges recently thrown my way was to read an EBCDIC flat file, decode packed dates, and insert the dates into a SQL table. For those unfamiliar with packed data, it is a way to store data at the nibble level (half a byte), and was often used by mainframe programmers to conserve storage space. In the case of my input file, the dates were 2 bytes long and represented the number of days that have past since 01/01/1950. My first thought was, in the words of Scooby, Hmmmmph? But, I love a good challenge, so I dove in. Reading in the flat file was rather simple. The only difference between reading an EBCDIC and an ASCII file is the Code Page option in the connection manager. In my case, I needed to use Code Page 1140 for EBCDIC (I could have also used Code Page 37). Once the code page is set correctly, SSIS can understand what it is reading and it will convert the output to the default code page, 1252. However, packed data is either unreadable or produces non-alphabetic characters, as we can see in the preview window. Column 1 is actually the packed date, columns 0 and 2 are the values in the rest of the file. We are only interested in Column 1, which is a 2 byte field representing a packed date. We know that 2 bytes of packed data can be stored in 1 byte of character data, so we are working with 4 packed digits in 2 character bytes. If you are confused, stay tuned….this will make sense in a minute. Right-click on your Flat File Source shape and select “Show Advanced Editor”. Here is where the magic begins. By changing the properties of the output columns, we can access the packed digits from each byte. By default, the Output Column data type is DT_STR. Since we want to look at the bytes individually and not the entire string, change the data type to DT_BYTES. Next, and most important, set UseBinaryFormat to TRUE. This will write the HEX VALUES of the output string instead of writing the character values. Now we are getting somewhere! Next, you will need to use a Data Conversion shape in your Data Flow to transform the 2 position byte stream to a 4 position Unicode string containing the packed data. You need the string to be 4 bytes long because it will contain the 4 packed digits. Here is what that should look like in the Data Conversion shape: Direct the output of your data flow to a test table or file to see the results. In my case, I created a test table. The results looked like this: Hold on a second! That doesn't look like a date at all. No, of course not. It is a hex number which represents the days which have passed between 01/01/1950 and the date. We have to convert the Hex value to a decimal value, and use the DATEADD function to get a date value. Luckily, I have created a function to convert Hex to Decimal: -- ============================================= -- Author: Jim Giercyk -- Create date: March, 2012 -- Description: Converts a Hex string to a decimal value -- ============================================= CREATE FUNCTION [dbo].[ftn_HexToDec] ( @hexValue NVARCHAR(6) ) RETURNS DECIMAL AS BEGIN -- Declare the return variable here DECLARE @decValue DECIMAL IF @hexValue LIKE '0x%' SET @hexValue = SUBSTRING(@hexValue,3,4) DECLARE @decTab TABLE ( decPos1 VARCHAR(2), decPos2 VARCHAR(2), decPos3 VARCHAR(2), decPos4 VARCHAR(2) ) DECLARE @pos1 VARCHAR(1) = SUBSTRING(@hexValue,1,1) DECLARE @pos2 VARCHAR(1) = SUBSTRING(@hexValue,2,1) DECLARE @pos3 VARCHAR(1) = SUBSTRING(@hexValue,3,1) DECLARE @pos4 VARCHAR(1) = SUBSTRING(@hexValue,4,1) INSERT @decTab VALUES (CASE WHEN @pos1 = 'A' THEN '10' WHEN @pos1 = 'B' THEN '11' WHEN @pos1 = 'C' THEN '12' WHEN @pos1 = 'D' THEN '13' WHEN @pos1 = 'E' THEN '14' WHEN @pos1 = 'F' THEN '15' ELSE @pos1 END, CASE WHEN @pos2 = 'A' THEN '10' WHEN @pos2 = 'B' THEN '11' WHEN @pos2 = 'C' THEN '12' WHEN @pos2 = 'D' THEN '13' WHEN @pos2 = 'E' THEN '14' WHEN @pos2 = 'F' THEN '15' ELSE @pos2 END, CASE WHEN @pos3 = 'A' THEN '10' WHEN @pos3 = 'B' THEN '11' WHEN @pos3 = 'C' THEN '12' WHEN @pos3 = 'D' THEN '13' WHEN @pos3 = 'E' THEN '14' WHEN @pos3 = 'F' THEN '15' ELSE @pos3 END, CASE WHEN @pos4 = 'A' THEN '10' WHEN @pos4 = 'B' THEN '11' WHEN @pos4 = 'C' THEN '12' WHEN @pos4 = 'D' THEN '13' WHEN @pos4 = 'E' THEN '14' WHEN @pos4 = 'F' THEN '15' ELSE @pos4 END) SET @decValue = (CONVERT(INT,(SELECT decPos4 FROM @decTab))) + (CONVERT(INT,(SELECT decPos3 FROM @decTab))*16) + (CONVERT(INT,(SELECT decPos2 FROM @decTab))*(16*16)) + (CONVERT(INT,(SELECT decPos1 FROM @decTab))*(16*16*16)) RETURN @decValue END GO Making use of the function, I found the decimal conversion, added that number of days to 01/01/1950 and FINALLY arrived at my “unpacked relative date”. Here is the query I used to retrieve the formatted date, and the result set which was returned: SELECT [packedDate] AS 'Hex Value', dbo.ftn_HexToDec([packedDate]) AS 'Decimal Value', CONVERT(DATE,DATEADD(day,dbo.ftn_HexToDec([packedDate]),'01/01/1950'),101) AS 'Relative String Date' FROM [dbo].[Output Table] This technique can be used any time you need to retrieve the hex value of a character string in SSIS. The date example may be a bit difficult to understand at first, but with SSIS becoming the preferred tool for enterprise level integration for many companies, there is no doubt that developers will encounter these types of requirements with regularity in the future. Please feel free to contact me if you have any questions.