Loading XML file containing leading zeros with SSIS preserving the zeros
- by Compudicted
Visiting the MSDN SQL Server Integration Services Forum oftentimes I could see that people would pop up asking this question: “why I am not able to load an element from an XML file that contains zeros so the leading/trailing zeros would remain intact?”. I started to suspect that such a trivial and often-required operation perhaps is being misunderstood by the developer community. I would also like to add that the whole state of affairs surrounding the XML today is probably also going to be increasingly affected by a motion of people who dislike XML in general and many aspects of it as XSD and XSLT invoke a negative reaction at best. Nevertheless, XML is in wide use today and its importance as a bridge between diverse systems is ever increasing. Therefore, I deiced to write up an example of loading an arbitrary XML file that contains leading zeros in one of its elements using SSIS so the leading zeros would be preserved keeping in mind the goal on simplicity into a table in SQL Server database. To start off bring up your BIDS (running as admin) and add a new Data Flow Task (DFT). This DFT will serve as container to adding our XML processing elements (besides, the XML Source is not available anywhere else other than from within the DFT). Double-click your DFT and drag and drop the XMS Source component from the Tool Box’s Data Flow Sources. Now, let the fun begin! Being inspired by the upcoming Christmas I created a simple XML file with one set of data that contains an imaginary SSN number of Rudolph containing several leading zeros like 0000003. This file can be viewed here. To configure the XML Source of course it is quite intuitive to point it to our XML file, next what the XML source needs is either an embedded schema (XSD) or it can generate one for us. In lack of the one I opted to auto-generate it for me and I ended up with an XSD that looked like: <?xml version="1.0"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="XMasEvent">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" name="CaseInfo">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" name="ID" type="xs:unsignedByte" />
<xs:element minOccurs="0" name="CreatedDate" type="xs:unsignedInt" />
<xs:element minOccurs="0" name="LastName" type="xs:string" />
<xs:element minOccurs="0" name="FirstName" type="xs:string" />
<xs:element minOccurs="0" name="SSN" type="xs:unsignedByte" /> <!-- Becomes string -- >
<xs:element minOccurs="0" name="DOB" type="xs:unsignedInt" />
<xs:element minOccurs="0" name="Event" type="xs:string" />
<xs:element minOccurs="0" name="ClosedDate" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
As an aside on the XML file: if your XML file does not contain the outer node (<XMasEvent>) then you may end up in a situation where you see just one field in the output.
Now please note that the SSN element’s data type was chosen to be of unsignedByte (and this is for a reason). The reason is stemming from the fact all our figures in the element are digits, this is good, but this is not exactly what we need, because if we will attempt to load the data with this XSD then we are going to either get errors on the destination or most typically lose the leading zeros. So the next intuitive choice is to change the data type to string. Besides, if a SSIS package was already created based on this XSD and the data type change was done thereafter, one should re-set the metadata by right-clicking the XML Source and choosing “Advanced Editor” in which there is a refresh button at the bottom left which will do the trick.
So far so good, we are ready to load our XML file, well actually yes, and no, in my experience typically some data conversion may be required. So depending on your data destination you may need to tweak the data types targeted. Let’s add a Data Conversion Task to our DFT.
Your package should look like:
To make the story short I only will cover the SSN field, so in my data source the target SQL Table has it as nchar(10) and we chose string in our XSD (yes, this is a big difference), under such circumstances the SSIS will complain. So will go and manipulate on the data type of SSN by making it Unicode String (DT_WSTR), World String per se.
The conversion should look like:
The peek at the Metadata:
We are almost there, now all we need is to configure the destination. For simplicity I chose SQL Server Destination. The mapping is a breeze, F5 and I am able to insert my data into SQL Server now!
Checking the zeros – they are all intact!