I recently needed to load several thousand XML files into a SQL database, I created an SSIS package which was created as followed:
Using a foreach container to loop through a directory and load each file path into a variable, the “Import XML” dataflow would then load each XML file into a SQL table.
Running this, it took approximately 1 second to load each file which seemed a massive amount of time to parse the XML and load the data, speaking to my colleague Martin Croft, he suggested the use of T-SQL Bulk Insert and OpenRowset, so we adjusted the package as followed:
The same foreach container was used but instead the following SQL command was executed (this is an expression):
"INSERT INTO MyTable(FileDate) SELECT CAST(bulkcolumn AS XML) FROM OPENROWSET( BULK '" + @[User::CurrentFile] + "', SINGLE_BLOB ) AS x"
Using this method we managed to load approximately 20 records per second, much faster…for data loading!
For what we wanted to achieve this was perfect but I’ll leave you with the following points when making your own decision on which solution you decide to choose!
Openrowset Method
Much faster to get the data into SQL
You’ll need to parse or create a view over the XML data to allow the data to be more usable(another post on this!)
Not able to apply validation/transformation against the data when loading it
The SQL Server service account will need permission to the file
No schema validation when loading files
SSIS
Slower (in our case)
Schema validation
Allows you to apply transformations/joins to the data
Permissions should be less of a problem
Data can be loaded into the final form through the package
When using a schema validation errors can fail the package (I’ll do another post on this)