Connecting to DB2 from SSIS
- by Christopher House
The project I'm currently working on involves moving various pieces of data from a legacy DB2 environment to some SQL Server and flat file locations. Most of the data flows are real time, so they were a natural fit for the client's MQSeries on their iSeries servers and BizTalk to handle the messaging. Some of the data flows, however, are daily batch type transmissions. For the daily batch transmissions, it was decided that we'd use SSIS to pull the data direct from DB2 to either a SQL Server or flat file. I'm not at all an SSIS guy, I've done a bit here and there, but mainly for situations were we needed to move data from a dev environment to QA, mostly informal stuff like that. And, as much as I'm not an SSIS guy, I'm even less a DB2/iSeries guy. Prior to this engagement, my knowledge of DB2 was limited to the fact that it's an IBM product and that it was probably a DBMS flatform (that's what the DB in DB2 means, right?).
One of my first goals when I came onto this project was to develop of POC SSIS package to pull some data from DB2 and dump it to a flat file. It sounded like a pretty straight forward task. As always, the devil is in the details. Configuring the DB2 connection manager took a bit of trial and error. As such, I thought I'd post my experiences here in hopes that they might save someone the efforts I went through. That being said, please keep in mind, as I pointed out, I'm not at all a DB2 guy, so my terminology and explanations may not be 100% spot on.
Before you get started, you need to figure out how you're going to connect to DB2. From the research I did, it looks like there are a few options. IBM has both an OLE DB and .Net data provider which can be found here. I installed their client access tools and tried to use both the .Net and OLE DB providers but I received an error message from both when attempting to connect to the iSeries that indicated I needed a license for a product called DB2 Connect. I inquired with one of my client's iSeries resources about a license for this product and it appears they didn't have one, so that meant the IBM drivers were out. The other option that I found quite a bit of discussion around was Microsoft's OLE DB Provider for DB2. This driver is part of the feature pack for SQL Server 2008 Enterprise Edition and can be downloaded here.
As it turns out, I already had Microsoft's driver installed on my dev VM, which stuck me as odd since I hadn't installed it. I discovered that the driver is installed with the BizTalk adapter pack for host systems, which was also installed on my VM. However, it looks like the version used by the adapter pack is newer than the version provided in the SQL Server feature pack.
Once you get the driver installed, create a connection manager in your package just like you normally would and select the Microsoft OLE DB Provider for DB2 from the list of available drivers.
After you select the driver, you'll need to enter in your host name, login credentials and initial catalog.
A couple of things to note here. First, the Initial catalog needs to be the same as your host name. Not sure why that is, but trust me, it just does. Second, for credentials, in my environment, we're using what the client's iSeries people refer to as "profiles". I guess this is similar to SQL auth in the SQL Server world. In other words, they've given me a username and password for connecting to DB, so I've entered it here.
Next, click the Data Links button. On the Data Links screen, enter your package collection on the first tab.
Package collection is one of those DB2 concepts I'm still trying to figure out. From the little bit I've read, packages are used to control SQL compilation and each DB2 connection needs one. The package collection, I believe, controls where your package is created. One of the iSeries folks I've been working with told me that I should always use QGPL for my package collection, as QGPL is "general purpose" and doesn't require any additional authority.
Next click the ellipsis next to the Network drop-down. Here you'll want to enter your host name again.
Again, not sure why you need to do this, but trust me, my connection wouldn't work until I entered my hostname here.
Finally, go to the Advanced tab, select your DBMS platform and check Process binary as character.
My environment is DB2 on the iSeries and iSeries is the replacement for AS/400, so I selected DB2/AS400 for my platform. Process binary as character was necessary to handle some of the DB2 data types. I had a few columns that showed all their data as "System.Byte[]". Checking Process binary as character resolved this.
At this point, you should be good to go. You can go back to the Connection tab on the Data Links dialog to perform a couple of tests to validate your configuration. The Test Connection button is obvious, this just verifies you can connect to the host using the configuration data you've entered. The Packages button will attempt to connect to the host and create the packages required to execute queries.
This isn't meant to be a comprehensive look SSIS and DB2, these are just some of the notes I've come up with since I've started working with DB2 and SSIS. I'm sure as I continue developing my packages, I'll find more quirks and will post them here.