Search Results

Search found 97 results on 4 pages for 'sqlis'.

Page 1/4 | 1 2 3 4 | Next Page >

Welcome to the new SQLIS site

If you're reading this and have visited us before, then you will probably have noticed we have released a new site. We have migrated all the content, and old links will continue to work for now. We have also updated all of our tasks and components for SQL Server 2008. See the Component Downloads category for a full list. I hope it all looks good and works fine, but if you have any issues or problems them please let us know.

Read the article
Change Data Capture Webinar

I am going to be doing a webinar with our friends at Attunity on Change Data Capture. Attunity have a good story around this technology and you can use it in your SSIS loads to great effect. Join Attunity and Konesans/SQLIS for a Webinar on 17 September Space is limited. Reserve your Webinar seat now at: https://www1.gotomeeting.com/register/693735512 Want increased efficiency and real-time speed when conducting ETL loads? Need lower implementation costs while minimizing system impact? Learn how change data capture (CDC) technologies can reduce ETL load times. Allan Mitchell, Principal Consultant at Konesans and SQLServer MVP specialising in ETL, will explain CDC concepts and benefits and how CDC can dramatically reduce ETL load times. Ian Archibald, Pre-Sales Director EMEA for Attunity, will present and demonstrate Attunity's award-winning Oracle-CDC for SSIS, a fully-integrated SSIS solution for designing, deploying and managing Oracle CDC processes. Title: Change Data Capture - Reducing ETL Load Times Date: Thursday, September 17, 2009 Time: 10:00 AM - 11:00 AM BST ABOUT THE SPEAKERS: Allan Mitchell is the joint owner of Konesans Ltd, a UK based consultancy specializing in SQL Server, and most importantly SQL Server Integration Services. Having been working with SQL Server from 6.5 onwards, he has extensive experience in many aspects of SQL Server, but now focuses on the BI suite of tools. He is a SQL Server MVP, a frequent poster on the MS SSIS/DTS newsgroups, and runs the sqldts.com and sqlis.com resource sites. Ian Archibald, Attunity Pre-Sales Director EMEA, has worked in Attunity’s UK Office for 17 years. An expert in Attunity solutions, Ian has extensive knowledge of Attunity’s products and data integration & CDC technologies. After registering you will receive a confirmation email containing information about joining the Webinar. System Requirements PC-based attendees Required: Windows® 2000, XP Home, XP Pro, 2003 Server, Vista Macintosh®-based attendees Required: Mac OS® X 10.4 (Tiger®) or newer

Read the article
Change Data Capture Webinar

I am going to be doing a webinar with our friends at Attunity on Change Data Capture. Attunity have a good story around this technology and you can use it in your SSIS loads to great effect. Join Attunity and Konesans/SQLIS for a Webinar on 17 September Space is limited. Reserve your Webinar seat now at: https://www1.gotomeeting.com/register/693735512 Want increased efficiency and real-time speed when conducting ETL loads? Need lower implementation costs while minimizing system impact? Learn how change data capture (CDC) technologies can reduce ETL load times. Allan Mitchell, Principal Consultant at Konesans and SQLServer MVP specialising in ETL, will explain CDC concepts and benefits and how CDC can dramatically reduce ETL load times. Ian Archibald, Pre-Sales Director EMEA for Attunity, will present and demonstrate Attunity's award-winning Oracle-CDC for SSIS, a fully-integrated SSIS solution for designing, deploying and managing Oracle CDC processes. Title: Change Data Capture - Reducing ETL Load Times Date: Thursday, September 17, 2009 Time: 10:00 AM - 11:00 AM BST ABOUT THE SPEAKERS: Allan Mitchell is the joint owner of Konesans Ltd, a UK based consultancy specializing in SQL Server, and most importantly SQL Server Integration Services. Having been working with SQL Server from 6.5 onwards, he has extensive experience in many aspects of SQL Server, but now focuses on the BI suite of tools. He is a SQL Server MVP, a frequent poster on the MS SSIS/DTS newsgroups, and runs the sqldts.com and sqlis.com resource sites. Ian Archibald, Attunity Pre-Sales Director EMEA, has worked in Attunity’s UK Office for 17 years. An expert in Attunity solutions, Ian has extensive knowledge of Attunity’s products and data integration & CDC technologies. After registering you will receive a confirmation email containing information about joining the Webinar. System Requirements PC-based attendees Required: Windows® 2000, XP Home, XP Pro, 2003 Server, Vista Macintosh®-based attendees Required: Mac OS® X 10.4 (Tiger®) or newer

Read the article
Issuing Current Time Increments in StreamInsight (A Practical Example)

The issuing of a Current Time Increment, Cti, in StreamInsight is very definitely one of the most important concepts to learn if you want your Streams to be responsive. A full discussion of how to issue Ctis is beyond the scope of this article but a very good explanation in addition to Books Online can be found in these three articles by a member of the StreamInsight team at Microsoft, Ciprian Gerea. Time in StreamInsight Series http://blogs.msdn.com/b/streaminsight/archive/2010/07/23/time-in-streaminsight-i.aspx http://blogs.msdn.com/b/streaminsight/archive/2010/07/30/time-in-streaminsight-ii.aspx http://blogs.msdn.com/b/streaminsight/archive/2010/08/03/time-in-streaminsight-iii.aspx A lot of the problems I see with unresponsive or stuck streams on the MSDN Forums are to do with how Ctis are enqueued or in a lot of cases not enqueued. If you enqueue events and never enqueue a Cti then StreamInsight will be perfectly happy. You, on the other hand, will never see data on the output as you have not told StreamInsight to flush the stream. This article deals with a specific implementation problem I had recently whilst working on a StreamInsight project. I look at some possible options and discuss why they would not work before showing the way I solved the problem. The stream of data I was dealing with on this project was very bursty that is to say when events were flowing they came through very quickly and in large numbers (1000 events/sec), but when the stream calmed down it could be a few seconds between each event. When enqueuing events into the StreamInsight engne it is best practice to do so with a StartTime that is given to you by the system producing the event . StreamInsight processes events and it doesn't matter whether those events are being pushed into the engine by a source system or the events are being read from something like a flat file in a directory somewhere. You can apply the same logic and temporal algebra to both situations. Reading from a file is an excellent example of where the time of the event on the source itself is very important. We could be reading that file a long time after it was written. Being able to read the StartTime from the events allows us to define windows that will hold the correct sets of events. I was able to do this with my stream but this is where my problems started. Below is a very simple script to create a SQL Server table and populate it with sample data that will show exactly the problem I had. CREATE TABLE [dbo].[t] ( [c1] [int] PRIMARY KEY, [c2] [datetime] NULL ) INSERT t VALUES (1,'20100810'),(2,'20100810'),(3,'20100810') Column c2 defines the StartTime of the event on the source and as you can see the values in all 3 rows of data is the same. If we read Ciprian’s articles we know that we can define how Ctis get injected into the stream in 3 different places The Stream Definition The Input Factory The Input Adapter I personally have always been a fan of enqueing Ctis through the factory. Below is code typical of what I would use to do this On the class itself I do some inheriting public class SimpleInputFactory : ITypedInputAdapterFactory<SimpleInputConfig>, ITypedDeclareAdvanceTimeProperties<SimpleInputConfig> And then I implement the following function public AdapterAdvanceTimeSettings DeclareAdvanceTimeProperties<TPayload>(SimpleInputConfig configInfo, EventShape eventShape) { return new AdapterAdvanceTimeSettings( new AdvanceTimeGenerationSettings(configInfo.CtiFrequency, TimeSpan.FromTicks(-1)), AdvanceTimePolicy.Adjust); } The configInfo .CtiFrequency property is a value I pass through to define after how many events I want a Cti to be injected and this in turn will flush through the stream of data. I usually pass a value of 1 for this setting. The second parameter determines the CTI timestamp in terms of a delay relative to the events. -1 ticks in the past results in 1 tick in the future, i.e., ahead of the event. The problem with this method though is that if consecutive events have the same StartTime then only one of those events will be enqueued. In this example I use the following to define how I assign the StartTime of my events currEvent.StartTime = (DateTimeOffset)dt.c2; If I go ahead and run my StreamInsight process with this configuration i can see on the output adapter that two events have been removed To see this in a little more depth I can use the StreamInsight Debugger and see what happens internally. What is happening here is that the first event arrives and a Cti is injected with a time of 1 tick after the StartTime of that event (Also the EndTime of the event). The second event arrives and it has a StartTime of before the Cti and even though we specified AdvanceTimePolicy.Adjust on the factory we know that a point event can never be adjusted like this and the event is dropped. The same happens for the third event as well (The second and third events get trumped by the Cti). For a more detailed discussion of why this happens look here http://www.sqlis.com/sqlis/post/AdvanceTimePolicy-and-Point-Event-Streams-In-StreamInsight.aspx We end up with a single event being pushed into the output adapter and our result now makes sense. The next way I tried to solve this problem by changing the value of the second parameter to TimeSpan.Zero Here is how my factory code now looks public AdapterAdvanceTimeSettings DeclareAdvanceTimeProperties<TPayload>(SimpleInputConfig configInfo, EventShape eventShape) { return new AdapterAdvanceTimeSettings( new AdvanceTimeGenerationSettings(configInfo.CtiFrequency, TimeSpan.Zero), AdvanceTimePolicy.Adjust); } What I am doing here is declaring a policy that says inject a Cti together with every event and stamp it with a StartTime that is equal to the start time of the event itself (TimeSpan.Zero). This method has plus points as well as a downside. The upside is that no events will be lost by having the same StartTime as previous events. The Downside is that because the Cti is declared with the StartTime of the event itself then it does not actually flush that particular event because in the StreamInsight algebra, a Cti commits only those events that occurred strictly before them. To flush the events we need a Cti to be enqueued with a greater StartTime than the events themselves. Here is what happened when I ran this configuration As you can see all we got through was the Cti and none of the events. The debugger output shows the stamps on the Cti and the events themselves. Because the Cti issued has the same timestamp (StartTime) as the events then none of the events get flushed. I was nearly there but not quite. Because my stream was bursty it was possible that the next event would not come along for a few seconds and this was far too long for an event to be enqueued and not be flushed to the output adapter. I needed another solution. Two possible solutions crossed my mind although only one of them made sense when I explored it some more. Where multiple events have the same StartTime I could add 1 tick to the first event, two to the second, three to third etc thereby giving them unique StartTime values. Add a timer to manually inject Ctis The problem with the first implementation is that I would be giving the events a new StartTime. This would cause me the following problems If I want to define windows over the stream then some events may not be captured in the right windows and therefore any calculations on those windows I did would be wrong What would happen if we had 10,000 events with the same StartTime? I would enqueue them with StartTime + n ticks. Along comes a genuine event with a StartTime of the very first event + 1 tick. It is now too far in the past as far as my stream is concerned and it would be dropped. Not what I would want to do at all. I decided then to look at the Timer based solution I created a timer on my input adapter that elapsed every 200ms. private Timer tmr; public SimpleInputAdapter(SimpleInputConfig configInfo) { ctx = new SimpleTimeExtractDataContext(configInfo.ConnectionString); this.configInfo = configInfo; tmr = new Timer(200); tmr.Elapsed += new ElapsedEventHandler(t_Elapsed); tmr.Enabled = true; } void t_Elapsed(object sender, ElapsedEventArgs e) { ts = DateTime.Now - dtCtiIssued; if (ts.TotalMilliseconds >= 200 && TimerIssuedCti == false) { EnqueueCtiEvent(System.DateTime.Now.AddTicks(-100)); TimerIssuedCti = true; } } In the t_Elapsed event handler I find out the difference in time between now and when the last event was processed (dtCtiIssued). I then check to see if that is greater than or equal to 200ms and if the last issuing of a Cti was done by the timer or by a genuine event (TimerIssuedCti). If I didn’t do this check then I would enqueue a Cti every time the timer elapsed which is not something I wanted. If the difference between the two times is greater than or equal to 500ms and the last event enqueued was by a real event then I issue a Cti through the timer to flush the event Queue, otherwise I do nothing. When I enqueue the Ctis into my stream in my ProduceEvents method I also set the values of dtCtiIssued and TimerIssuedCti currEvent = CreateInsertEvent(); currEvent.StartTime = (DateTimeOffset)dt.c2; TimerIssuedCti = false; dtCtiIssued = currEvent.StartTime; If I go ahead and run this configuration I see the following in my output. As we can see the first Cti gets enqueued as before but then another is enqueued by the timer and because this has a later timestamp it flushes the enqueued events through the engine. Conclusion Hopefully this has shown how the enqueuing of Ctis can have a dramatic effect on the responsiveness of your output in StreamInsight. Understanding the temporal nature of the product is for me one of the most important things you can learn. I have attached my solution for the demos. It is all in one project and testing each variation is a simple matter of commenting and un-commenting the parts in the code we have been dealing with here.

Read the article
CreationName for SSIS 2008 and adding components programmatically

If you are building SSIS 2008 packages programmatically and adding data flow components, you will probably need to know the creation name of the component to add. I can never find a handy reference when I need one, hence this rather mundane post. See also CreationName for SSS 2005. We start with a very simple snippet for adding a component: // Add the Data Flow Task package.Executables.Add("STOCK:PipelineTask"); // Get the task host wrapper, and the Data Flow task TaskHost taskHost = package.Executables[0] as TaskHost; MainPipe dataFlowTask = (MainPipe)taskHost.InnerObject; // Add OLE-DB source component - ** This is where we need the creation name ** IDTSComponentMetaData90 componentSource = dataFlowTask.ComponentMetaDataCollection.New(); componentSource.Name = "OLEDBSource"; componentSource.ComponentClassID = "DTSAdapter.OLEDBSource.2"; So as you can see the creation name for a OLE-DB Source is DTSAdapter.OLEDBSource.2. CreationName Reference ADO NET Destination Microsoft.SqlServer.Dts.Pipeline.ADONETDestination, Microsoft.SqlServer.ADONETDest, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 ADO NET Source Microsoft.SqlServer.Dts.Pipeline.DataReaderSourceAdapter, Microsoft.SqlServer.ADONETSrc, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 Aggregate DTSTransform.Aggregate.2 Audit DTSTransform.Lineage.2 Cache Transform DTSTransform.Cache.1 Character Map DTSTransform.CharacterMap.2 Checksum Konesans.Dts.Pipeline.ChecksumTransform.ChecksumTransform, Konesans.Dts.Pipeline.ChecksumTransform, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b2ab4a111192992b Conditional Split DTSTransform.ConditionalSplit.2 Copy Column DTSTransform.CopyMap.2 Data Conversion DTSTransform.DataConvert.2 Data Mining Model Training MSMDPP.PXPipelineProcessDM.2 Data Mining Query MSMDPP.PXPipelineDMQuery.2 DataReader Destination Microsoft.SqlServer.Dts.Pipeline.DataReaderDestinationAdapter, Microsoft.SqlServer.DataReaderDest, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 Derived Column DTSTransform.DerivedColumn.2 Dimension Processing MSMDPP.PXPipelineProcessDimension.2 Excel Destination DTSAdapter.ExcelDestination.2 Excel Source DTSAdapter.ExcelSource.2 Export Column TxFileExtractor.Extractor.2 Flat File Destination DTSAdapter.FlatFileDestination.2 Flat File Source DTSAdapter.FlatFileSource.2 Fuzzy Grouping DTSTransform.GroupDups.2 Fuzzy Lookup DTSTransform.BestMatch.2 Import Column TxFileInserter.Inserter.2 Lookup DTSTransform.Lookup.2 Merge DTSTransform.Merge.2 Merge Join DTSTransform.MergeJoin.2 Multicast DTSTransform.Multicast.2 OLE DB Command DTSTransform.OLEDBCommand.2 OLE DB Destination DTSAdapter.OLEDBDestination.2 OLE DB Source DTSAdapter.OLEDBSource.2 Partition Processing MSMDPP.PXPipelineProcessPartition.2 Percentage Sampling DTSTransform.PctSampling.2 Performance Counters Source DataCollectorTransform.TxPerfCounters.1 Pivot DTSTransform.Pivot.2 Raw File Destination DTSAdapter.RawDestination.2 Raw File Source DTSAdapter.RawSource.2 Recordset Destination DTSAdapter.RecordsetDestination.2 RegexClean Konesans.Dts.Pipeline.RegexClean.RegexClean, Konesans.Dts.Pipeline.RegexClean, Version=2.0.0.0, Culture=neutral, PublicKeyToken=d1abe77e8a21353e Row Count DTSTransform.RowCount.2 Row Count Plus Konesans.Dts.Pipeline.RowCountPlusTransform.RowCountPlusTransform, Konesans.Dts.Pipeline.RowCountPlusTransform, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b2ab4a111192992b Row Number Konesans.Dts.Pipeline.RowNumberTransform.RowNumberTransform, Konesans.Dts.Pipeline.RowNumberTransform, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b2ab4a111192992b Row Sampling DTSTransform.RowSampling.2 Script Component Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost, Microsoft.SqlServer.TxScript, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 Slowly Changing Dimension DTSTransform.SCD.2 Sort DTSTransform.Sort.2 SQL Server Compact Destination Microsoft.SqlServer.Dts.Pipeline.SqlCEDestinationAdapter, Microsoft.SqlServer.SqlCEDest, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 SQL Server Destination DTSAdapter.SQLServerDestination.2 Term Extraction DTSTransform.TermExtraction.2 Term Lookup DTSTransform.TermLookup.2 Trash Destination Konesans.Dts.Pipeline.TrashDestination.Trash, Konesans.Dts.Pipeline.TrashDestination, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b8351fe7752642cc TxTopQueries DataCollectorTransform.TxTopQueries.1 Union All DTSTransform.UnionAll.2 Unpivot DTSTransform.UnPivot.2 XML Source Microsoft.SqlServer.Dts.Pipeline.XmlSourceAdapter, Microsoft.SqlServer.XmlSrc, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 Here is a simple console program that can be used to enumerate the pipeline components installed on your machine, and dumps out a list of all components like that above. You will need to add a reference to the Microsoft.SQLServer.ManagedDTS assembly. using System; using System.Diagnostics; using Microsoft.SqlServer.Dts.Runtime; public class Program { static void Main(string[] args) { Application application = new Application(); PipelineComponentInfos componentInfos = application.PipelineComponentInfos; foreach (PipelineComponentInfo componentInfo in componentInfos) { Debug.WriteLine(componentInfo.Name + "\t" + componentInfo.CreationName); } Console.Read(); } }

Read the article
Where is my app.config for SSIS?

Sometimes when working with SSIS you need to add or change settings in the .NET application configuration file, which can be a bit confusing when you are building a SSIS package not an application. First of all lets review a couple of examples where you may need to do this. You are using referencing an assembly in a Script Task that uses Enterprise Library (aka EntLib), so you need to add the relevant configuration sections and settings, perhaps for the logging application block. You are using using Enterprise Library in a custom task or component, and again you need to add the relevant configuration sections and settings. You are using a web service with Microsoft Web Services Enhancements (WSE) 3.0 and hosting the proxy in SSIS, in an assembly used by your package, and need to add the configuration sections and settings. You need to change behaviours of the .NET framework which can be influenced by a configuration file, such as the System.Net.Mail default SMTP settings. Perhaps you wish to configure System.Net and the httpWebRequest header for parsing unsafe header (useUnsafeHeaderParsing), which will change the way the HTTP Connection manager behaves. You are consuming a WCF service and wish to specify the endpoint in configuration. There are no doubt plenty more examples but each of these requires us to identify the correct configuration file and and make the relevant changes. There are actually several configuration files, each used by a different execution host depending on how you are working with the SSIS package. The folders we need to look in will actually vary depending on the version of SQL Server as well as the processor architecture, but most are all what we can call the Binn folder. The SQL Server 2005 Binn folder is at C:\Program Files\Microsoft SQL Server\90\DTS\Binn\, compared to C:\Program Files\Microsoft SQL Server\100\DTS\Binn\ for SQL Server 2008. If you are on a 64-bit machine then you will see C:\Program Files (x86)\Microsoft SQL Server\90\DTS\Binn\ for the 32-bit executables and C:\Program Files\Microsoft SQL Server\90\DTS\Binn\ for 64-bit, so be sure to check all relevant locations. Of course SQL Server 2008 may have a C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\ on a 64-bit machine too. To recap, the version of SQL Server determines if you look in the 90 or 100 sub-folder under SQL Server in Program Files (C:\Program Files\Microsoft SQL Server\nn\) . If you are running a 64-bit operating system then you will have two instances program files, C:\Program Files (x86)\ for 32-bit and C:\Program Files\ for 64-bit. You may wish to check both depending on what you are doing, but this is covered more under each section below. There are a total of five specific configuration files that you may need to change, each one is detailed below: DTExec.exe.config DTExec.exe is the standalone command line tool used for executing SSIS packages, and therefore it is an execution host with an app.config file. e.g. C:\Program Files\Microsoft SQL Server\90\DTS\Binn\DTExec.exe.config The file can be found in both the 32-bit and 64-bit Binn folders. DtsDebugHost.exe.config DtsDebugHost.exe is the execution host used by Business Intelligence Development Studio (BIDS) / Visual Studio when executing a package from the designer in debug mode, which is the default behaviour. e.g. C:\Program Files\Microsoft SQL Server\90\DTS\Binn\DtsDebugHost.exe.config The file can be found in both the 32-bit and 64-bit Binn folders. This may surprise some people as Visual Studio is only 32-bit, but thankfully the debugger supports both. This can be set in the project properties, see the Run64BitRuntime property (true or false) in the Debugging pane of the Project Properties. dtshost.exe.config dtshost.exe is the execution host used by what I think of as the built-in features of SQL Server such as SQL Server Agent e.g. C:\Program Files\Microsoft SQL Server\90\DTS\Binn\dtshost.exe.config This file can be found in both the 32-bit and 64-bit Binn folders devenv.exe.config Something slightly different is devenv.exe which is Visual Studio. This configuration file may also need changing if you need a feature at design-time such as in a Task Editor or Connection Manager editor. Visual Studio 2005 for SQL Server 2005 - C:\Program Files\Microsoft Visual Studio 8\Common7\IDE\devenv.exe.config Visual Studio 2008 for SQL Server 2008 - C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\devenv.exe.config Visual Studio is only available for 32-bit so on a 64-bit machine you will have to look in C:\Program Files (x86)\ only. DTExecUI.exe.config The DTExec UI tool can also have a configuration file and these cab be found under the Tools folders for SQL Sever as shown below. C:\Program Files\Microsoft SQL Server\90\Tools\Binn\VSShell\Common7\IDE\DTExecUI.exe C:\Program Files\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\DTExecUI.exe A configuration file may not exist, but if you can find the matching executable you know you are in the right place so can go ahead and add a new file yourself. In summary we have covered the assembly configuration files for all of the standard methods of building and running a SSIS package, but obviously if you are working programmatically you will need to make the relevant modifications to your program’s app.config as well.

Read the article
New SSIS tool on Codeplex – SSIS Log Analyzer

I stumbled across a new SSIS tool on Codeplex today, the SSIS Log Analyzer which was only released a few days ago. Whilst it is a beta release and currently only supports 2005 (2008 is promised) it looks quite interesting. It seems to be a fancy log viewer, but with some clever features and a nice looking front-end. I’ve only read the documentation so far, but it has graphs and a debug view that shows your package with the colour animations similar to when debugging in BIDS, and everyone knows, the way the pretty colours and numbers change is the best bit! I’ll quote some of the features for you here and then let you make your own mind up, is it useful in the real world? Option to analyze the logs manually by applying row and column filters over the log data or by using queries to specify more complex criterions. Automated Performance Analysis which provides a quick graphical look on which tasks spent most time during package execution. Rerun (debug) the entire sequence of events which happened during package execution showing the flow of control in graphical form, changes in runtime values for each task like execution duration etc. Support for Auto Analyzers to automatically find out issues and provide suggestions for problems which can be figured out with the help of SSIS logs and/or package. Option to analyze just log file or log and package together. Provides a lightweight environment to have a quick look at the package. Opening it in BIDS takes some time as being an authoring environment it does all sorts of validations resulting in some delay. See http://ssisloganalyzer.codeplex.com/ for more details.

Read the article
New SSIS tool on Codeplex – SSIS Log Analyzer

I stumbled across a new SSIS tool on Codeplex today, the SSIS Log Analyzer which was only released a few days ago. Whilst it is a beta release and currently only supports 2005 (2008 is promised) it looks quite interesting. It seems to be a fancy log viewer, but with some clever features and a nice looking front-end. I’ve only read the documentation so far, but it has graphs and a debug view that shows your package with the colour animations similar to when debugging in BIDS, and everyone knows, the way the pretty colours and numbers change is the best bit! I’ll quote some of the features for you here and then let you make your own mind up, is it useful in the real world? Option to analyze the logs manually by applying row and column filters over the log data or by using queries to specify more complex criterions. Automated Performance Analysis which provides a quick graphical look on which tasks spent most time during package execution. Rerun (debug) the entire sequence of events which happened during package execution showing the flow of control in graphical form, changes in runtime values for each task like execution duration etc. Support for Auto Analyzers to automatically find out issues and provide suggestions for problems which can be figured out with the help of SSIS logs and/or package. Option to analyze just log file or log and package together. Provides a lightweight environment to have a quick look at the package. Opening it in BIDS takes some time as being an authoring environment it does all sorts of validations resulting in some delay. See http://ssisloganalyzer.codeplex.com/ for more details.

Read the article
SQL Server Editions and Integration Services

The SQL Server 2005 and SQL Server 2008 product family has quite a few editions now, so what does this mean for SQL Server Integration Services? Starting from the bottom we have the free edition known as Express, and the entry level Workgroup edition, as well as the new Web edition. None of these three include the full SSIS product, but they do all include the SQL Server Import and Export Wizard, with access to basic data sources but nothing more, so for simple loading and extraction of data this should suffice. You will not be able to build packages though, this is just a one shot deal aimed at using the wizard on an ad-hoc basis. To get the full power of Integration Services you need to start with Standard edition. This includes the BI Development Studio, for building your own packages, and fully functional IDE integrated into Visual Studio. (You get the full VS 2005/2008 IDE with the product). All core functions will be available but with a restricted set of transformations and tasks. The SQL Server 2005 Features Comparison or Features Supported by the Editions of SQL Server 2008 describes standard edition as having basic transforms, compared to Enterprise which includes the advanced transforms. I think basic is a little harsh considering the power you get with Standard, but the advanced covers the truly ground-breaking capabilities of data mining, text mining and cleansing or fuzzy transforms. The power of performing these operations within your ETL pipeline should not be underestimated, but not all processes will require these capabilities, so it seems like a reasonable delineation. Thankfully there are no feature limitations or artificial governors within Standard compared to Enterprise. The same control flow and data flow engines underpin both editions, with the same configuration and deployment options allowing you to work seamlessly between environments and editions if using the common components. In fact there are no govenors at all in SSIS, so whilst the SQL Database engine is limited to 4 CPUs in Standard edition, SSIS is only limited by the base operating system. The advanced transforms only available with Enterprise edition: Data Mining Training Destination Data Mining Query Component Fuzzy Grouping Fuzzy Lookup Term Extraction Term Lookup Dimension Processing Destination Partition Processing Destination The advanced tasks only available with Enterprise edition: Data Mining Query Task So in summary, if you want SQL Server Integration Services, you need SQL Server Standard edition, and for the more advanced tasks and transforms you need SQL Server Enterprise edition. To recap, the answer to the often asked question is no, SQL Server Integration Services is not available in SQL Server Express or Workgroup editions.

Read the article
Creating packages in code – Execute SQL Task

The Execute SQL Task is for obvious reasons very well used, so I thought if you are building packages in code the chances are you will be using it. Using the task basic features of the task are quite straightforward, add the task and set some properties, just like any other. When you start interacting with variables though it can be a little harder to grasp so these samples should see you through. Some of these more advanced features are explained in much more detail in our ever popular post The Execute SQL Task, here I’ll just be showing you how to implement them in code. The abbreviated code blocks below demonstrate the different features of the task. The complete code has been encapsulated into a sample class which you can download (ExecSqlPackage.cs). Each feature described has its own method in the sample class which is mentioned after the code block. This first sample just shows adding the task, setting the basic properties for a connection and of course an SQL statement. Package package = new Package(); // Add the SQL OLE-DB connection ConnectionManager sqlConnection = AddSqlConnection(package, "localhost", "master"); // Add the SQL Task package.Executables.Add("STOCK:SQLTask"); // Get the task host wrapper TaskHost taskHost = package.Executables[0] as TaskHost; // Set required properties taskHost.Properties["Connection"].SetValue(taskHost, sqlConnection.ID); taskHost.Properties["SqlStatementSource"].SetValue(taskHost, "SELECT * FROM sysobjects"); For the full version of this code, see the CreatePackage method in the sample class. The AddSqlConnection method is a helper method that adds an OLE-DB connection to the package, it is of course in the sample class file too. Returning a single value with a Result Set The following sample takes a different approach, getting a reference to the ExecuteSQLTask object task itself, rather than just using the non-specific TaskHost as above. Whilst it means we need to add an extra reference to our project (Microsoft.SqlServer.SQLTask) it makes coding much easier as we have compile time validation of any property and types we use. For the more complex properties that is very valuable and saves a lot of time during development. The query has also been changed to return a single value, one row and one column. The sample shows how we can return that value into a variable, which we also add to our package in the code. To do this manually you would set the Result Set property on the General page to Single Row and map the variable on the Result Set page in the editor. Package package = new Package(); // Add the SQL OLE-DB connection ConnectionManager sqlConnection = AddSqlConnection(package, "localhost", "master"); // Add the SQL Task package.Executables.Add("STOCK:SQLTask"); // Get the task host wrapper TaskHost taskHost = package.Executables[0] as TaskHost; // Add variable to hold result value package.Variables.Add("Variable", false, "User", 0); // Get the task object ExecuteSQLTask task = taskHost.InnerObject as ExecuteSQLTask; // Set core properties task.Connection = sqlConnection.Name; task.SqlStatementSource = "SELECT id FROM sysobjects WHERE name = 'sysrowsets'"; // Set single row result set task.ResultSetType = ResultSetType.ResultSetType_SingleRow; // Add result set binding, map the id column to variable task.ResultSetBindings.Add(); IDTSResultBinding resultBinding = task.ResultSetBindings.GetBinding(0); resultBinding.ResultName = "id"; resultBinding.DtsVariableName = "User::Variable"; For the full version of this code, see the CreatePackageResultVariable method in the sample class. The other types of Result Set behaviour are just a variation on this theme, set the property and map the result binding as required. Parameter Mapping for SQL Statements This final example uses a parameterised SQL statement, with the coming from a variable. The syntax varies slightly between connection types, as explained in the Working with Parameters and Return Codes in the Execute SQL Taskhelp topic, but OLE-DB is the most commonly used, for which a question mark is the parameter value placeholder. Package package = new Package(); // Add the SQL OLE-DB connection ConnectionManager sqlConnection = AddSqlConnection(package, ".", "master"); // Add the SQL Task package.Executables.Add("STOCK:SQLTask"); // Get the task host wrapper TaskHost taskHost = package.Executables[0] as TaskHost; // Get the task object ExecuteSQLTask task = taskHost.InnerObject as ExecuteSQLTask; // Set core properties task.Connection = sqlConnection.Name; task.SqlStatementSource = "SELECT id FROM sysobjects WHERE name = ?"; // Add variable to hold parameter value package.Variables.Add("Variable", false, "User", "sysrowsets"); // Add input parameter binding task.ParameterBindings.Add(); IDTSParameterBinding parameterBinding = task.ParameterBindings.GetBinding(0); parameterBinding.DtsVariableName = "User::Variable"; parameterBinding.ParameterDirection = ParameterDirections.Input; parameterBinding.DataType = (int)OleDBDataTypes.VARCHAR; parameterBinding.ParameterName = "0"; parameterBinding.ParameterSize = 255; For the full version of this code, see the CreatePackageParameterVariable method in the sample class. You’ll notice the data type has to be specified for the parameter IDTSParameterBinding .DataType Property, and these type codes are connection specific too. My enumeration I wrote several years ago is shown below was probably done by reverse engineering a package and also the API header file, but I recently found a very handy post that covers more connections as well for exactly this, Setting the DataType of IDTSParameterBinding objects (Execute SQL Task). /// <summary> /// Enumeration of OLE-DB types, used when mapping OLE-DB parameters. /// </summary> private enum OleDBDataTypes { BYTE = 0x11, CURRENCY = 6, DATE = 7, DB_VARNUMERIC = 0x8b, DBDATE = 0x85, DBTIME = 0x86, DBTIMESTAMP = 0x87, DECIMAL = 14, DOUBLE = 5, FILETIME = 0x40, FLOAT = 4, GUID = 0x48, LARGE_INTEGER = 20, LONG = 3, NULL = 1, NUMERIC = 0x83, NVARCHAR = 130, SHORT = 2, SIGNEDCHAR = 0x10, ULARGE_INTEGER = 0x15, ULONG = 0x13, USHORT = 0x12, VARCHAR = 0x81, VARIANT_BOOL = 11 } Download Sample code ExecSqlPackage.cs (10KB)

Read the article
Where is my app.config for SSIS?

Sometimes when working with SSIS you need to add or change settings in the .NET application configuration file, which can be a bit confusing when you are building a SSIS package not an application. First of all lets review a couple of examples where you may need to do this. You are using referencing an assembly in a Script Task that uses Enterprise Library (aka EntLib), so you need to add the relevant configuration sections and settings, perhaps for the logging application block. You are using using Enterprise Library in a custom task or component, and again you need to add the relevant configuration sections and settings. You are using a web service with Microsoft Web Services Enhancements (WSE) 3.0 and hosting the proxy in SSIS, in an assembly used by your package, and need to add the configuration sections and settings. You need to change behaviours of the .NET framework which can be influenced by a configuration file, such as the System.Net.Mail default SMTP settings. Perhaps you wish to configure System.Net and the httpWebRequest header for parsing unsafe header (useUnsafeHeaderParsing), which will change the way the HTTP Connection manager behaves. You are consuming a WCF service and wish to specify the endpoint in configuration. There are no doubt plenty more examples but each of these requires us to identify the correct configuration file and and make the relevant changes. There are actually several configuration files, each used by a different execution host depending on how you are working with the SSIS package. The folders we need to look in will actually vary depending on the version of SQL Server as well as the processor architecture, but most are all what we can call the Binn folder. The SQL Server 2005 Binn folder is at C:\Program Files\Microsoft SQL Server\90\DTS\Binn\, compared to C:\Program Files\Microsoft SQL Server\100\DTS\Binn\ for SQL Server 2008. If you are on a 64-bit machine then you will see C:\Program Files (x86)\Microsoft SQL Server\90\DTS\Binn\ for the 32-bit executables and C:\Program Files\Microsoft SQL Server\90\DTS\Binn\ for 64-bit, so be sure to check all relevant locations. Of course SQL Server 2008 may have a C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\ on a 64-bit machine too. To recap, the version of SQL Server determines if you look in the 90 or 100 sub-folder under SQL Server in Program Files (C:\Program Files\Microsoft SQL Server\nn\) . If you are running a 64-bit operating system then you will have two instances program files, C:\Program Files (x86)\ for 32-bit and C:\Program Files\ for 64-bit. You may wish to check both depending on what you are doing, but this is covered more under each section below. There are a total of five specific configuration files that you may need to change, each one is detailed below: DTExec.exe.config DTExec.exe is the standalone command line tool used for executing SSIS packages, and therefore it is an execution host with an app.config file. e.g. C:\Program Files\Microsoft SQL Server\90\DTS\Binn\DTExec.exe.config The file can be found in both the 32-bit and 64-bit Binn folders. DtsDebugHost.exe.config DtsDebugHost.exe is the execution host used by Business Intelligence Development Studio (BIDS) / Visual Studio when executing a package from the designer in debug mode, which is the default behaviour. e.g. C:\Program Files\Microsoft SQL Server\90\DTS\Binn\DtsDebugHost.exe.config The file can be found in both the 32-bit and 64-bit Binn folders. This may surprise some people as Visual Studio is only 32-bit, but thankfully the debugger supports both. This can be set in the project properties, see the Run64BitRuntime property (true or false) in the Debugging pane of the Project Properties. dtshost.exe.config dtshost.exe is the execution host used by what I think of as the built-in features of SQL Server such as SQL Server Agent e.g. C:\Program Files\Microsoft SQL Server\90\DTS\Binn\dtshost.exe.config This file can be found in both the 32-bit and 64-bit Binn folders devenv.exe.config Something slightly different is devenv.exe which is Visual Studio. This configuration file may also need changing if you need a feature at design-time such as in a Task Editor or Connection Manager editor. Visual Studio 2005 for SQL Server 2005 - C:\Program Files\Microsoft Visual Studio 8\Common7\IDE\devenv.exe.config Visual Studio 2008 for SQL Server 2008 - C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\devenv.exe.config Visual Studio is only available for 32-bit so on a 64-bit machine you will have to look in C:\Program Files (x86)\ only. DTExecUI.exe.config The DTExec UI tool can also have a configuration file and these cab be found under the Tools folders for SQL Sever as shown below. C:\Program Files\Microsoft SQL Server\90\Tools\Binn\VSShell\Common7\IDE\DTExecUI.exe C:\Program Files\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\DTExecUI.exe A configuration file may not exist, but if you can find the matching executable you know you are in the right place so can go ahead and add a new file yourself. In summary we have covered the assembly configuration files for all of the standard methods of building and running a SSIS package, but obviously if you are working programmatically you will need to make the relevant modifications to your program’s app.config as well.

Read the article
Creating packages in code - Package Configurations

Continuing my theme of building various types of packages in code, this example shows how to building a package with package configurations. Incidentally it shows you how to add a variable, and a connection too. It covers the five most common configurations: Configuration File Indirect Configuration File SQL Server Indirect SQL Server Environment Variable For a general overview try the SQL Server Books Online Package Configurations topic. The sample uses a a simple helper function ApplyConfig to create or update a configuration, although in the example we will only ever create. The most useful knowledge is the configuration string (Configuration.ConfigurationString) that you need to set. Configuration Type Configuration String Description Configuration File The full path and file name of an XML configuration file. The file can contain one or more configuration and includes the target path and new value to set. Indirect Configuration File An environment variable the value of which contains full path and file name of an XML configuration file as per the Configuration File type described above. SQL Server A three part configuration string, with each part being quote delimited and separated by a semi-colon. -- The first part is the connection manager name. The connection tells you which server and database to look for the configuration table. -- The second part is the name of the configuration table. The table is of a standard format, use the Package Configuration Wizard to help create an example, or see the sample script files below. The table contains one or more rows or configuration items each with a target path and new value. -- The third and final part is the optional filter name. A configuration table can contain multiple configurations, and the filter is literal value that can be used to group items together and act as a filter clause when configurations are being read. If you do not need a filter, just leave the value empty. Indirect SQL Server An environment variable the value of which is the three part configuration string as per the SQL Server type described above. Environment Variable An environment variable the value of which is the value to set in the package. This is slightly different to the other examples as the configuration definition in the package also includes the target information. In our ApplyConfig function this is the only example that actually supplies a target value for the Configuration.PackagePath property. The path is an XPath style path for the target property, \Package.Variables[User::Variable].Properties[Value], the equivalent of which can be seen in the screenshot below, with the object being our variable called Variable, and the property to set is the Value property of that variable object. The configurations as seen when opening the generated package in BIDS: The sample code creates the package, adds a variable and connection manager, enables configurations, and then adds our example configurations. The package is then saved to disk, useful for checking the package and testing, before finally executing, just to prove it is valid. There are some external resources used here, namely some environment variables and a table, see below for more details. namespace Konesans.Dts.Samples { using System; using Microsoft.SqlServer.Dts.Runtime; public class PackageConfigurations { public void CreatePackage() { // Create a new package Package package = new Package(); package.Name = "ConfigurationSample"; // Add a variable, the target for our configurations package.Variables.Add("Variable", false, "User", 0); // Add a connection, for SQL configurations // Add the SQL OLE-DB connection ConnectionManager connectionManagerOleDb = package.Connections.Add("OLEDB"); connectionManagerOleDb.Name = "SQLConnection"; connectionManagerOleDb.ConnectionString = "Provider=SQLOLEDB.1;Data Source=(local);Initial Catalog=master;Integrated Security=SSPI;"; // Add our example configurations, first must enable package setting package.EnableConfigurations = true; // Direct configuration file, see sample file this.ApplyConfig(package, "Configuration File", DTSConfigurationType.ConfigFile, "C:\\Temp\\XmlConfig.dtsConfig", string.Empty); // Indirect configuration file, the emvironment variable XmlConfigFileEnvironmentVariable // contains the path to the configuration file, e.g. C:\Temp\XmlConfig.dtsConfig this.ApplyConfig(package, "Indirect Configuration File", DTSConfigurationType.IConfigFile, "XmlConfigFileEnvironmentVariable", string.Empty); // Direct SQL Server configuration, uses the SQLConnection package connection to read // configurations from the [dbo].[SSIS Configurations] table, with a filter of "SampleFilter" this.ApplyConfig(package, "SQL Server", DTSConfigurationType.SqlServer, "\"SQLConnection\";\"[dbo].[SSIS Configurations]\";\"SampleFilter\";", string.Empty); // Indirect SQL Server configuration, the environment variable "SQLServerEnvironmentVariable" // contains the configuration string e.g. "SQLConnection";"[dbo].[SSIS Configurations]";"SampleFilter"; this.ApplyConfig(package, "Indirect SQL Server", DTSConfigurationType.ISqlServer, "SQLServerEnvironmentVariable", string.Empty); // Direct environment variable, the value of the EnvironmentVariable environment variable is // applied to the target property, the value of the "User::Variable" package variable this.ApplyConfig(package, "EnvironmentVariable", DTSConfigurationType.EnvVariable, "EnvironmentVariable", "\\Package.Variables[User::Variable].Properties[Value]"); #if DEBUG // Save package to disk, DEBUG only new Application().SaveToXml(String.Format(@"C:\Temp\{0}.dtsx", package.Name), package, null); Console.WriteLine(@"C:\Temp\{0}.dtsx", package.Name); #endif // Execute package package.Execute(); // Basic check for warnings foreach (DtsWarning warning in package.Warnings) { Console.WriteLine("WarningCode : {0}", warning.WarningCode); Console.WriteLine(" SubComponent : {0}", warning.SubComponent); Console.WriteLine(" Description : {0}", warning.Description); Console.WriteLine(); } // Basic check for errors foreach (DtsError error in package.Errors) { Console.WriteLine("ErrorCode : {0}", error.ErrorCode); Console.WriteLine(" SubComponent : {0}", error.SubComponent); Console.WriteLine(" Description : {0}", error.Description); Console.WriteLine(); } package.Dispose(); } /// <summary> /// Add or update an package configuration. /// </summary> /// <param name="package">The package.</param> /// <param name="name">The configuration name.</param> /// <param name="type">The type of configuration</param> /// <param name="setting">The configuration setting.</param> /// <param name="target">The target of the configuration, leave blank if not required.</param> internal void ApplyConfig(Package package, string name, DTSConfigurationType type, string setting, string target) { Configurations configurations = package.Configurations; Configuration configuration; if (configurations.Contains(name)) { configuration = configurations[name]; } else { configuration = configurations.Add(); } configuration.Name = name; configuration.ConfigurationType = type; configuration.ConfigurationString = setting; configuration.PackagePath = target; } } } The following table lists the environment variables required for the full example to work along with some sample values. Variable Sample value EnvironmentVariable 1 SQLServerEnvironmentVariable "SQLConnection";"[dbo].[SSIS Configurations]";"SampleFilter"; XmlConfigFileEnvironmentVariable C:\Temp\XmlConfig.dtsConfig Sample code, package and configuration file. ConfigurationApplication.cs ConfigurationSample.dtsx XmlConfig.dtsConfig

Read the article
The Execute SQL Task

In this article we are going to take you through the Execute SQL Task in SQL Server Integration Services for SQL Server 2005 (although it appies just as well to SQL Server 2008). We will be covering all the essentials that you will need to know to effectively use this task and make it as flexible as possible. The things we will be looking at are as follows: A tour of the Task. The properties of the Task. After looking at these introductory topics we will then get into some examples. The examples will show different types of usage for the task: Returning a single value from a SQL query with two input parameters. Returning a rowset from a SQL query. Executing a stored procedure and retrieveing a rowset, a return value, an output parameter value and passing in an input parameter. Passing in the SQL Statement from a variable. Passing in the SQL Statement from a file. Tour Of The Task Before we can start to use the Execute SQL Task in our packages we are going to need to locate it in the toolbox. Let's do that now. Whilst in the Control Flow section of the package expand your toolbox and locate the Execute SQL Task. Below is how we found ours. Now drag the task onto the designer. As you can see from the following image we have a validation error appear telling us that no connection manager has been assigned to the task. This can be easily remedied by creating a connection manager. There are certain types of connection manager that are compatable with this task so we cannot just create any connection manager and these are detailed in a few graphics time. Double click on the task itself to take a look at the custom user interface provided to us for this task. The task will open on the general tab as shown below. Take a bit of time to have a look around here as throughout this article we will be revisting this page many times. Whilst on the general tab, drop down the combobox next to the ConnectionType property. In here you will see the types of connection manager which this task will accept. As with SQL Server 2000 DTS, SSIS allows you to output values from this task in a number of formats. Have a look at the combobox next to the Resultset property. The major difference here is the ability to output into XML. If you drop down the combobox next to the SQLSourceType property you will see the ways in which you can pass a SQL Statement into the task itself. We will have examples of each of these later on but certainly when we saw these for the first time we were very excited. Next to the SQLStatement property if you click in the empty box next to it you will see ellipses appear. Click on them and you will see the very basic query editor that becomes available to you. Alternatively after you have specified a connection manager for the task you can click on the Build Query button to bring up a completely different query editor. This is slightly inconsistent. Once you've finished looking around the general tab, move on to the next tab which is the parameter mapping tab. We shall, again, be visiting this tab throughout the article but to give you an initial heads up this is where you define the input, output and return values from your task. Note this is not where you specify the resultset. If however you now move on to the ResultSet tab this is where you define what variable will receive the output from your SQL Statement in whatever form that is. Property Expressions are one of the most amazing things to happen in SSIS and they will not be covered here as they deserve a whole article to themselves. Watch out for this as their usefulness will astound you. For a more detailed discussion of what should be the parameter markers in the SQL Statements on the General tab and how to map them to variables on the Parameter Mapping tab see Working with Parameters and Return Codes in the Execute SQL Task. Task Properties There are two places where you can specify the properties for your task. One is in the task UI itself and the other is in the property pane which will appear if you right click on your task and select Properties from the context menu. We will be doing plenty of property setting in the UI later so let's take a moment to have a look at the property pane. Below is a graphic showing our properties pane. Now we shall take you through all the properties and tell you exactly what they mean. A lot of these properties you will see across all tasks as well as the package because of everything's base structure The Container. BypassPrepare Should the statement be prepared before sending to the connection manager destination (True/False) Connection This is simply the name of the connection manager that the task will use. We can get this from the connection manager tray at the bottom of the package. DelayValidation Really interesting property and it tells the task to not validate until it actually executes. A usage for this may be that you are operating on table yet to be created but at runtime you know the table will be there. Description Very simply the description of your Task. Disable Should the task be enabled or not? You can also set this through a context menu by right clicking on the task itself. DisableEventHandlers As a result of events that happen in the task, should the event handlers for the container fire? ExecValueVariable The variable assigned here will get or set the execution value of the task. Expressions Expressions as we mentioned earlier are a really powerful tool in SSIS and this graphic below shows us a small peek of what you can do. We select a property on the left and assign an expression to the value of that property on the right causing the value to be dynamically changed at runtime. One of the most obvious uses of this is that the property value can be built dynamically from within the package allowing you a great deal of flexibility FailPackageOnFailure If this task fails does the package? FailParentOnFailure If this task fails does the parent container? A task can he hosted inside another container i.e. the For Each Loop Container and this would then be the parent. ForcedExecutionValue This property allows you to hard code an execution value for the task. ForcedExecutionValueType What is the datatype of the ForcedExecutionValue? ForceExecutionResult Force the task to return a certain execution result. This could then be used by the workflow constraints. Possible values are None, Success, Failure and Completion. ForceExecutionValue Should we force the execution result? IsolationLevel This is the transaction isolation level of the task. IsStoredProcedure Certain optimisations are made by the task if it knows that the query is a Stored Procedure invocation. The docs say this will always be false unless the connection is an ADO connection. LocaleID Gets or sets the LocaleID of the container. LoggingMode Should we log for this container and what settings should we use? The value choices are UseParentSetting, Enabled and Disabled. MaximumErrorCount How many times can the task fail before we call it a day? Name Very simply the name of the task. ResultSetType How do you want the results of your query returned? The choices are ResultSetType_None, ResultSetType_SingleRow, ResultSetType_Rowset and ResultSetType_XML. SqlStatementSource Your Query/SQL Statement. SqlStatementSourceType The method of specifying the query. Your choices here are DirectInput, FileConnection and Variables TimeOut How long should the task wait to receive results? TransactionOption How should the task handle being asked to join a transaction? Usage Examples As we move through the examples we will only cover in them what we think you must know and what we think you should see. This means that some of the more elementary steps like setting up variables will be covered in the early examples but skipped and simply referred to in later ones. All these examples used the AventureWorks database that comes with SQL Server 2005. Returning a Single Value, Passing in Two Input Parameters So the first thing we are going to do is add some variables to our package. The graphic below shows us those variables having been defined. Here the CountOfEmployees variable will be used as the output from the query and EndDate and StartDate will be used as input parameters. As you can see all these variables have been scoped to the package. Scoping allows us to have domains for variables. Each container has a scope and remember a package is a container as well. Variable values of the parent container can be seen in child containers but cannot be passed back up to the parent from a child. Our following graphic has had a number of changes made. The first of those changes is that we have created and assigned an OLEDB connection manager to this Task ExecuteSQL Task Connection. The next thing is we have made sure that the SQLSourceType property is set to Direct Input as we will be writing in our statement ourselves. We have also specified that only a single row will be returned from this query. The expressions we typed in was: SELECT COUNT(*) AS CountOfEmployees FROM HumanResources.Employee WHERE (HireDate BETWEEN ? AND ?) Moving on now to the Parameter Mapping tab this is where we are going to tell the task about our input paramaters. We Add them to the window specifying their direction and datatype. A quick word here about the structure of the variable name. As you can see SSIS has preceeded the variable with the word user. This is a default namespace for variables but you can create your own. When defining your variables if you look at the variables window title bar you will see some icons. If you hover over the last one on the right you will see it says "Choose Variable Columns". If you click the button you will see a list of checkbox options and one of them is namespace. after checking this you will see now where you can define your own namespace. The next tab, result set, is where we need to get back the value(s) returned from our statement and assign to a variable which in our case is CountOfEmployees so we can use it later perhaps. Because we are only returning a single value then if you remember from earlier we are allowed to assign a name to the resultset but it must be the name of the column (or alias) from the query. A really cool feature of Business Intelligence Studio being hosted by Visual Studio is that we get breakpoint support for free. In our package we set a Breakpoint so we can break the package and have a look in a watch window at the variable values as they appear to our task and what the variable value of our resultset is after the task has done the assignment. Here's that window now. As you can see the count of employess that matched the data range was 2. Returning a Rowset In this example we are going to return a resultset back to a variable after the task has executed not just a single row single value. There are no input parameters required so the variables window is nice and straight forward. One variable of type object. Here is the statement that will form the soure for our Resultset. select p.ProductNumber, p.name, pc.Name as ProductCategoryNameFROM Production.ProductCategory pcJOIN Production.ProductSubCategory pscON pc.ProductCategoryID = psc.ProductCategoryIDJOIN Production.Product pON psc.ProductSubCategoryID = p.ProductSubCategoryID We need to make sure that we have selected Full result set as the ResultSet as shown below on the task's General tab. Because there are no input parameters we can skip the parameter mapping tab and move straight to the Result Set tab. Here we need to Add our variable defined earlier and map it to the result name of 0 (remember we covered this earlier) Once we run the task we can again set a breakpoint and have a look at the values coming back from the task. In the following graphic you can see the result set returned to us as a COM object. We can do some pretty interesting things with this COM object and in later articles that is exactly what we shall be doing. Return Values, Input/Output Parameters and Returning a Rowset from a Stored Procedure This example is pretty much going to give us a taste of everything. We have already covered in the previous example how to specify the ResultSet to be a Full result set so we will not cover it again here. For this example we are going to need 4 variables. One for the return value, one for the input parameter, one for the output parameter and one for the result set. Here is the statement we want to execute. Note how much cleaner it is than if you wanted to do it using the current version of DTS. In the Parameter Mapping tab we are going to Add our variables and specify their direction and datatypes. In the Result Set tab we can now map our final variable to the rowset returned from the stored procedure. It really is as simple as that and we were amazed at how much easier it is than in DTS 2000. Passing in the SQL Statement from a Variable SSIS as we have mentioned is hugely more flexible than its predecessor and one of the things you will notice when moving around the tasks and the adapters is that a lot of them accept a variable as an input for something they need. The ExecuteSQL task is no different. It will allow us to pass in a string variable as the SQL Statement. This variable value could have been set earlier on from inside the package or it could have been populated from outside using a configuration. The ResultSet property is set to single row and we'll show you why in a second when we look at the variables. Note also the SQLSourceType property. Here's the General Tab again. Looking at the variable we have in this package you can see we have only two. One for the return value from the statement and one which is obviously for the statement itself. Again we need to map the Result name to our variable and this can be a named Result Name (The column name or alias returned by the query) and not 0. The expected result into our variable should be the amount of rows in the Person.Contact table and if we look in the watch window we see that it is. Passing in the SQL Statement from a File The final example we are going to show is a really interesting one. We are going to pass in the SQL statement to the task by using a file connection manager. The file itself contains the statement to run. The first thing we are going to need to do is create our file connection mananger to point to our file. Click in the connections tray at the bottom of the designer, right click and choose "New File Connection" As you can see in the graphic below we have chosen to use an existing file and have passed in the name as well. Have a look around at the other "Usage Type" values available whilst you are here. Having set that up we can now see in the connection manager tray our file connection manager sitting alongside our OLE-DB connection we have been using for the rest of these examples. Now we can go back to the familiar General Tab to set up how the task will accept our file connection as the source. All the other properties in this task are set up exactly as we have been doing for other examples depending on the options chosen so we will not cover them again here. We hope you will agree that the Execute SQL Task has changed considerably in this release from its DTS predecessor. It has a lot of options available but once you have configured it a few times you get to learn what needs to go where. We hope you have found this article useful.

Read the article
Manchester UG Presentation Video

In July I was invited to speak at the UK SQL Server UG event in Manchester. I spoke about Excel being a good data mining client. I was a little rushed at the end as Chris Testa-ONeill told me I had only 5 minutes to go when I had only been talking for 10 minutes. Apparently I have a reputation for running over my time allocation. At the event we also had a product demo from SQL Sentry around their BI monitoring dashboard solution. This includes SSIS but the main thrust was SSAS Then came Chris with a look at Analysis Services. If you have never heard Chris talk then take the opportunity now, he is a top class presenter and I am often found sat at the back of his classes. Here is the video link

Read the article
Creating packages in code – Execute SQL Task

The Execute SQL Task is for obvious reasons very well used, so I thought if you are building packages in code the chances are you will be using it. Using the task basic features of the task are quite straightforward, add the task and set some properties, just like any other. When you start interacting with variables though it can be a little harder to grasp so these samples should see you through. Some of these more advanced features are explained in much more detail in our ever popular post The Execute SQL Task, here I’ll just be showing you how to implement them in code. The abbreviated code blocks below demonstrate the different features of the task. The complete code has been encapsulated into a sample class which you can download (ExecSqlPackage.cs). Each feature described has its own method in the sample class which is mentioned after the code block. This first sample just shows adding the task, setting the basic properties for a connection and of course an SQL statement. Package package = new Package(); // Add the SQL OLE-DB connection ConnectionManager sqlConnection = AddSqlConnection(package, "localhost", "master"); // Add the SQL Task package.Executables.Add("STOCK:SQLTask"); // Get the task host wrapper TaskHost taskHost = package.Executables[0] as TaskHost; // Set required properties taskHost.Properties["Connection"].SetValue(taskHost, sqlConnection.ID); taskHost.Properties["SqlStatementSource"].SetValue(taskHost, "SELECT * FROM sysobjects"); For the full version of this code, see the CreatePackage method in the sample class. The AddSqlConnection method is a helper method that adds an OLE-DB connection to the package, it is of course in the sample class file too. Returning a single value with a Result Set The following sample takes a different approach, getting a reference to the ExecuteSQLTask object task itself, rather than just using the non-specific TaskHost as above. Whilst it means we need to add an extra reference to our project (Microsoft.SqlServer.SQLTask) it makes coding much easier as we have compile time validation of any property and types we use. For the more complex properties that is very valuable and saves a lot of time during development. The query has also been changed to return a single value, one row and one column. The sample shows how we can return that value into a variable, which we also add to our package in the code. To do this manually you would set the Result Set property on the General page to Single Row and map the variable on the Result Set page in the editor. Package package = new Package(); // Add the SQL OLE-DB connection ConnectionManager sqlConnection = AddSqlConnection(package, "localhost", "master"); // Add the SQL Task package.Executables.Add("STOCK:SQLTask"); // Get the task host wrapper TaskHost taskHost = package.Executables[0] as TaskHost; // Add variable to hold result value package.Variables.Add("Variable", false, "User", 0); // Get the task object ExecuteSQLTask task = taskHost.InnerObject as ExecuteSQLTask; // Set core properties task.Connection = sqlConnection.Name; task.SqlStatementSource = "SELECT id FROM sysobjects WHERE name = 'sysrowsets'"; // Set single row result set task.ResultSetType = ResultSetType.ResultSetType_SingleRow; // Add result set binding, map the id column to variable task.ResultSetBindings.Add(); IDTSResultBinding resultBinding = task.ResultSetBindings.GetBinding(0); resultBinding.ResultName = "id"; resultBinding.DtsVariableName = "User::Variable"; For the full version of this code, see the CreatePackageResultVariable method in the sample class. The other types of Result Set behaviour are just a variation on this theme, set the property and map the result binding as required. Parameter Mapping for SQL Statements This final example uses a parameterised SQL statement, with the coming from a variable. The syntax varies slightly between connection types, as explained in the Working with Parameters and Return Codes in the Execute SQL Taskhelp topic, but OLE-DB is the most commonly used, for which a question mark is the parameter value placeholder. Package package = new Package(); // Add the SQL OLE-DB connection ConnectionManager sqlConnection = AddSqlConnection(package, ".", "master"); // Add the SQL Task package.Executables.Add("STOCK:SQLTask"); // Get the task host wrapper TaskHost taskHost = package.Executables[0] as TaskHost; // Get the task object ExecuteSQLTask task = taskHost.InnerObject as ExecuteSQLTask; // Set core properties task.Connection = sqlConnection.Name; task.SqlStatementSource = "SELECT id FROM sysobjects WHERE name = ?"; // Add variable to hold parameter value package.Variables.Add("Variable", false, "User", "sysrowsets"); // Add input parameter binding task.ParameterBindings.Add(); IDTSParameterBinding parameterBinding = task.ParameterBindings.GetBinding(0); parameterBinding.DtsVariableName = "User::Variable"; parameterBinding.ParameterDirection = ParameterDirections.Input; parameterBinding.DataType = (int)OleDBDataTypes.VARCHAR; parameterBinding.ParameterName = "0"; parameterBinding.ParameterSize = 255; For the full version of this code, see the CreatePackageParameterVariable method in the sample class. You’ll notice the data type has to be specified for the parameter IDTSParameterBinding .DataType Property, and these type codes are connection specific too. My enumeration I wrote several years ago is shown below was probably done by reverse engineering a package and also the API header file, but I recently found a very handy post that covers more connections as well for exactly this, Setting the DataType of IDTSParameterBinding objects (Execute SQL Task). /// <summary> /// Enumeration of OLE-DB types, used when mapping OLE-DB parameters. /// </summary> private enum OleDBDataTypes { BYTE = 0x11, CURRENCY = 6, DATE = 7, DB_VARNUMERIC = 0x8b, DBDATE = 0x85, DBTIME = 0x86, DBTIMESTAMP = 0x87, DECIMAL = 14, DOUBLE = 5, FILETIME = 0x40, FLOAT = 4, GUID = 0x48, LARGE_INTEGER = 20, LONG = 3, NULL = 1, NUMERIC = 0x83, NVARCHAR = 130, SHORT = 2, SIGNEDCHAR = 0x10, ULARGE_INTEGER = 0x15, ULONG = 0x13, USHORT = 0x12, VARCHAR = 0x81, VARIANT_BOOL = 11 } Download Sample code ExecSqlPackage.cs (10KB)

Read the article
Downloading a file over HTTP the SSIS way

This post shows you how to download files from a web site whilst really making the most of the SSIS objects that are available. There is no task to do this, so we have to use the Script Task and some simple VB.NET or C# (if you have SQL Server 2008) code. Very often I see suggestions about how to use the .NET class System.Net.WebClient and of course this works, you can code pretty much anything you like in .NET. Here I’d just like to raise the profile of an alternative. This approach uses the HTTP Connection Manager, one of the stock connection managers, so you can use configurations and property expressions in the same way you would for all other connections. Settings like the security details that you would want to make configurable already are, but if you take the .NET route you have to write quite a lot of code to manage those values via package variables. Using the connection manager we get all of that flexibility for free. The screenshot below illustrate some of the options we have. Using the HttpClientConnection class makes for much simpler code as well. I have demonstrated two methods, DownloadFile which just downloads a file to disk, and DownloadData which downloads the file and retains it in memory. In each case we show a message box to note the completion of the download. You can download a sample package below, but first the code: Imports System Imports System.IO Imports System.Text Imports System.Windows.Forms Imports Microsoft.SqlServer.Dts.Runtime Public Class ScriptMain Public Sub Main() ' Get the unmanaged connection object, from the connection manager called "HTTP Connection Manager" Dim nativeObject As Object = Dts.Connections("HTTP Connection Manager").AcquireConnection(Nothing) ' Create a new HTTP client connection Dim connection As New HttpClientConnection(nativeObject) ' Download the file #1 ' Save the file from the connection manager to the local path specified Dim filename As String = "C:\Temp\Sample.txt" connection.DownloadFile(filename, True) ' Confirm file is there If File.Exists(filename) Then MessageBox.Show(String.Format("File {0} has been downloaded.", filename)) End If ' Download the file #2 ' Read the text file straight into memory Dim buffer As Byte() = connection.DownloadData() Dim data As String = Encoding.ASCII.GetString(buffer) ' Display the file contents MessageBox.Show(data) Dts.TaskResult = Dts.Results.Success End Sub End Class Sample Package HTTPDownload.dtsx (74KB)

Read the article
Searching for tasks with code – Executables and Event Handlers

Searching packages or just enumerating through all tasks is not quite as straightforward as it may first appear, mainly because of the way you can nest tasks within other containers. You can see this illustrated in the sample package below where I have used several sequence containers and loops. To complicate this further all containers types, including packages and tasks, can have event handlers which can then support the full range of nested containers again. Towards the lower right, the task called SQL In FEL also has an event handler not shown, within which is another Execute SQL Task, so that makes a total of 6 Execute SQL Tasks 6 tasks spread across the package. In my previous post about such as adding a property expressionI kept it simple and just looked at tasks at the package level, but what if you wanted to find any or all tasks in a package? For this post I've written a console program that will search a package looking at all tasks no matter how deeply nested, and check to see if the name starts with "SQL". When it finds a matching task it writes out the hierarchy by name for that task, starting with the package and working down to the task itself. The output for our sample package is shown below, note it has found all 6 tasks, including the one on the OnPreExecute event of the SQL In FEL task TaskSearch v1.0.0.0 (1.0.0.0) Copyright (C) 2009 Konesans Ltd Processing File - C:\Projects\Alpha\Packages\MyPackage.dtsx MyPackage\FOR Counter Loop\SQL In Counter Loop MyPackage\SEQ For Each Loop Wrapper\FEL Simple Loop\SQL In FEL MyPackage\SEQ For Each Loop Wrapper\FEL Simple Loop\SQL In FEL\OnPreExecute\SQL On Pre Execute for FEL SQL Task MyPackage\SEQ Top Level\SEQ Nested Lvl 1\SEQ Nested Lvl 2\SQL In Nested Lvl 2 MyPackage\SEQ Top Level\SEQ Nested Lvl 1\SQL In Nested Lvl 1 #1 MyPackage\SEQ Top Level\SEQ Nested Lvl 1\SQL In Nested Lvl 1 #2 6 matching tasks found in package. The full project and code is available for download below, but first we can walk through the project to highlight the most important sections of code. This code has been abbreviated for this description, but is complete in the download. First of all we load the package, and then start by looking at the Executables for the package. // Load the package file Application application = new Application(); using (Package package = application.LoadPackage(filename, null)) { int matchCount = 0; // Look in the package's executables ProcessExecutables(package.Executables, ref matchCount); ... // // ... // Write out final count Console.WriteLine("{0} matching tasks found in package.", matchCount); } The ProcessExecutables method is a key method, as an executable could be described as the the highest level of a working functionality or container. There are several of types of executables, such as tasks, or sequence containers and loops. To know what to do next we need to work out what type of executable we are dealing with as the abbreviated version of method shows below. private static void ProcessExecutables(Executables executables, ref int matchCount) { foreach (Executable executable in executables) { TaskHost taskHost = executable as TaskHost; if (taskHost != null) { ProcessTaskHost(taskHost, ref matchCount); ProcessEventHandlers(taskHost.EventHandlers, ref matchCount); continue; } ... // // ... ForEachLoop forEachLoop = executable as ForEachLoop; if (forEachLoop != null) { ProcessExecutables(forEachLoop.Executables, ref matchCount); ProcessEventHandlers(forEachLoop.EventHandlers, ref matchCount); continue; } } } As you can see if the executable we find is a task we then call out to our ProcessTaskHost method. As with all of our executables a task can have event handlers which themselves contain more executables such as task and loops, so we also make a call out our ProcessEventHandlers method. The other types of executables such as loops can also have event handlers as well as executables. As shown with the example for the ForEachLoop we call the same ProcessExecutables and ProcessEventHandlers methods again to drill down into the hierarchy of objects that the package may contain. This code needs to explicitly check for each type of executable (TaskHost, Sequence, ForLoop and ForEachLoop) because whilst they all have an Executables property this is not from a common base class or interface. This example was just a simple find a task by its name, so ProcessTaskHost really just does that. We also get the hierarchy of objects so we can write out for information, obviously you can adapt this method to do something more interesting such as adding a property expression. private static void ProcessTaskHost(TaskHost taskHost, ref int matchCount) { if (taskHost == null) { return; } // Check if the task matches our match name if (taskHost.Name.StartsWith(TaskNameFilter, StringComparison.OrdinalIgnoreCase)) { // Build up the full object hierarchy of the task // so we can write it out for information StringBuilder path = new StringBuilder(); DtsContainer container = taskHost; while (container != null) { path.Insert(0, container.Name); container = container.Parent; if (container != null) { path.Insert(0, "\\"); } } // Write the task path // e.g. Package\Container\Event\Task Console.WriteLine(path); Console.WriteLine(); // Increment match counter for info matchCount++; } } Just for completeness, the other processing method we covered above is for event handlers, but really that just calls back to the executables. This same method is called in our main package method, but it was omitted for brevity here. private static void ProcessEventHandlers(DtsEventHandlers eventHandlers, ref int matchCount) { foreach (DtsEventHandler eventHandler in eventHandlers) { ProcessExecutables(eventHandler.Executables, ref matchCount); } } As hopefully the code demonstrates, executables (Microsoft.SqlServer.Dts.Runtime.Executable) are the workers, but within them you can nest more executables (except for task tasks).Executables themselves can have event handlers which can in turn hold more executables. I have tried to illustrate this highlight the relationships in the following diagram. Download Sample code project TaskSearch.zip (11KB)

Read the article
Trace File Source Adapter

The Trace File Source adapter is a useful addition to your SSIS toolbox. It allows you to read 2005 and 2008 profiler traces stored as .trc files and read them into the Data Flow. From there you can perform filtering and analysis using the power of SSIS. There is no need for a SQL Server connection this just uses the trace file. Example Usages Cache warming for SQL Server Analysis Services Reading the flight recorder Find out the longest running queries on a server Analyze statements for CPU, memory by user or some other criteria you choose Properties The Trace File Source adapter has two properties, both of which combine to control the source trace file that is read at runtime. SQL Server 2005 and SQL Server 2008 trace files are supported for both the Database Engine (SQL Server) and Analysis Services. The properties are managed by the Editor form or can be set directly from the Properties Grid in Visual Studio. Property Type Description AccessMode Enumeration This property determines how the Filename property is interpreted. The values available are: DirectInput Variable Filename String This property holds the path for trace file to load (*.trc). The value is either a full path, or the name of a variable which contains the full path to the trace file, depending on the AccessMode property. Trace Column Definition Hopefully the majority of you can skip this section entirely, but if you encounter some problems processing a trace file this may explain it and allow you to fix the problem. The component is built upon the trace management API provided by Microsoft. Unfortunately API methods that expose the schema of a trace file have known issues and are unreliable, put simply the data often differs from what was specified. To overcome these limitations the component uses some simple XML files. These files enable the trace column data types and sizing attributes to be overridden. For example SQL Server Profiler or TMO generated structures define EventClass as an integer, but the real value is a string. TraceDataColumnsSQL.xml - SQL Server Database Engine Trace Columns TraceDataColumnsAS.xml - SQL Server Analysis Services Trace Columns The files can be found in the %ProgramFiles%\Microsoft SQL Server\100\DTS\PipelineComponents folder, e.g. "C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents\TraceDataColumnsSQL.xml" "C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents\TraceDataColumnsAS.xml" If at runtime the component encounters a type conversion or sizing error it is most likely due to a discrepancy between the column definition as reported by the API and the actual value encountered. Whilst most common issues have already been fixed through these files we have implemented specific exception traps to direct you to the files to enable you to fix any further issues due to different usage or data scenarios that we have not tested. An example error that you can fix through these files is shown below. Buffer exception writing value to column 'Column Name'. The string value is 999 characters in length, the column is only 111. Columns can be overridden by the TraceDataColumns XML files in "C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents\TraceDataColumnsAS.xml". Installation The component is provided as an MSI file which you can download and run to install it. This simply places the files on disk in the correct locations and also installs the assemblies in the Global Assembly Cache as per Microsoft’s recommendations. You may need to restart the SQL Server Integration Services service, as this caches information about what components are installed, as well as restarting any open instances of Business Intelligence Development Studio (BIDS) / Visual Studio that you may be using to build your SSIS packages. Finally you will have to add the transformation to the Visual Studio toolbox manually. Right-click the toolbox, and select Choose Items.... Select the SSIS Data Flow Items tab, and then check the Trace File Source transformation in the Choose Toolbox Items window. This process has been described in detail in the related FAQ entry for How do I install a task or transform component? We recommend you follow best practice and apply the current Microsoft SQL Server Service pack to your SQL Server servers and workstations. Please note that the Microsoft Trace classes used in the component are not supported on 64-bit platforms. To use the Trace File Source on a 64-bit host you need to ensure you have the 32-bit (x86) tools available, and the way you execute your package is setup to use them, please see the help topic 64-bit Considerations for Integration Services for more details. Downloads Trace Sources for SQL Server 2005 -- Trace Sources for SQL Server 2008 Version History SQL Server 2008 Version 2.0.0.382 - SQL Sever 2008 public release. (9 Apr 2009) SQL Server 2005 Version 1.0.0.321 - SQL Server 2005 public release. (18 Nov 2008) -- Screenshots

Read the article
Data Profiling without SSIS

Strangely enough for a predominantly SSIS blog, this post is all about how to perform data profiling without using SSIS. Whilst the Data Profiling Task is a worthy addition, there are a couple of limitations I’ve encountered of late. The first is that it requires SQL Server 2008, and not everyone is there yet. The second is that it can only target SQL Server 2005 and above. What about older systems, which are the ones that we probably need to investigate the most, or other vendor databases such as Oracle? With these limitations in mind I did some searching to find a quick and easy alternative to help me perform some data profiling for a project I was working on recently. I only had SQL Server 2005 available, and anyway most of my target source systems were Oracle, and of course I had short timescales. I looked at several options. Some never got beyond the download stage, they failed to install or just did not run, and others provided less than I could have produced myself by spending 2 minutes writing some basic SQL queries. In the end I settled on an open source product called DataCleaner. To quote from their website: DataCleaner is an Open Source application for profiling, validating and comparing data. These activities help you administer and monitor your data quality in order to ensure that your data is useful and applicable to your business situation. DataCleaner is the free alternative to software for master data management (MDM) methodologies, data warehousing (DW) projects, statistical research, preparation for extract-transform-load (ETL) activities and more. DataCleaner is developed in Java and licensed under LGPL. As quoted above it claims to support profiling, validating and comparing data, but I didn’t really get past the profiling functions, so won’t comment on the other two. The profiling whilst not prefect certainly saved some time compared to the limited alternatives. The ability to profile heterogeneous data sources is a big advantage over the SSIS option, and I found it overall quite easy to use and performance was good. I could see it struggling at times, but actually for what it does I was impressed. It had some data type niggles with Oracle, and some metrics seem a little strange, although thankfully they were easy to augment with some SQL queries to ensure a consistent picture. The report export options didn’t do it for me, but copy and paste with a bit of Excel magic was sufficient. One initial point for me personally is that I have had limited exposure to things of the Java persuasion and whilst I normally get by fine, sometimes the simplest things can throw me. For example installing a JDBC driver, why do I have to copy files to make it all work, has nobody ever heard of an MSI? In case there are other people out there like me who have become totally indoctrinated with the Microsoft software paradigm, I’ve written a quick start guide that details every step required. Steps 1- 5 are the key ones, the rest is really an excuse for some screenshots to show you the tool. Quick Start Guide Step 1 - Download Data Cleaner. The Microsoft Windows zipped exe option, and I chose the latest stable build, currently DataCleaner 1.5.3 (final). Extract the files to a suitable location. Step 2 - Download Java. If you try and run datacleaner.exe without Java it will warn you, and then open your default browser and take you to the Java download site. Follow the installation instructions from there, normally just click Download Java a couple of times and you’re done. Step 3 - Download Microsoft SQL Server JDBC Driver. You may have SQL Server installed, but you won’t have a JDBC driver. Version 3.0 is the latest as of April 2010. There is no real installer, we are in the Java world here, but run the exe you downloaded to extract the files. The default Unzip to folder is not much help, so try a fully qualified path such as C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\ to ensure you can find the files afterwards. Step 4 - If you wish to use Windows Authentication to connect to your SQL Server then first we need to copy a file so that Data Cleaner can find it. Browse to the JDBC extract location from Step 3 and drill down to the file sqljdbc_auth.dll. You will have to choose the correct directory for your processor architecture. e.g. C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\sqljdbc_3.0\enu\auth\x86\sqljdbc_auth.dll. Now copy this file to the Data Cleaner extract folder you chose in Step 1. An alternative method is to edit datacleaner.cmd in the data cleaner extract folder as detailed in this data cleaner wiki topic, but I find copying the file simpler. Step 5 – Now lets run Data Cleaner, just run datacleaner.exe from the extract folder you chose in Step 1. Step 6 – Complete or skip the registration screen, and ignore the task window for now. In the main window click settings. Step 7 – In the Settings dialog, select the Database drivers tab, then click Register database driver and select the Local JAR file option. Step 8 – Browse to the JDBC driver extract location from Step 3 and drill down to select sqljdbc4.jar. e.g. C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\sqljdbc_3.0\enu\sqljdbc4.jar Step 9 – Select the Database driver class as com.microsoft.sqlserver.jdbc.SQLServerDriver, and then click the Test and Save database driver button. Step 10 - You should be back at the Settings dialog with a the list of drivers that includes SQL Server. Just click Save Settings to persist all your hard work. Step 11 – Now we can start to profile some data. In the main Data Cleaner window click New Task, and then Profile from the task window. Step 12 – In the Profile window click Open Database Step 13 – Now choose the SQL Server connection string option. Selecting a connection string gives us a template like jdbc:sqlserver://<hostname>:1433;databaseName=<database>, but obviously it requires some details to be entered for example jdbc:sqlserver://localhost:1433;databaseName=SQLBits. This will connect to the database called SQLBits on my local machine. The port may also have to be changed if using such as when you have a multiple instances of SQL Server running. If using SQL Server Authentication enter a username and password as required and then click Connect to database. You can use Window Authentication, just add integratedSecurity=true to the end of your connection string. e.g jdbc:sqlserver://localhost:1433;databaseName=SQLBits;integratedSecurity=true. If you didn’t complete Step 4 above you will need to do so now and restart Data Cleaner before it will work. Manually setting the connection string is fine, but creating a named connection makes more sense if you will be spending any length of time profiling a specific database. As highlighted in the left-hand screen-shot, at the bottom of the dialog it includes partial instructions on how to create named connections. In the folder shown C:\Users\<Username>\.datacleaner\1.5.3, open the datacleaner-config.xml file in your editor of choice add your own details. You’ll see a sample connection in the file already, just add yours following the same pattern. e.g.  <bean class="dk.eobjects.datacleaner.gui.model.NamedConnection"> <property name="name" value="SQLBits Local Connection" /> <property name="driverClass" value="com.microsoft.sqlserver.jdbc.SQLServerDriver" /> <property name="connectionString" value="jdbc:sqlserver://localhost:1433;databaseName=SQLBits;integratedSecurity=true" /> <property name="tableTypes"> <list> <value>TABLE</value> <value>VIEW</value> </list> </property> </bean> Step 14 – Once back at the Profile window, you should now see your schemas, tables and/or views listed down the left hand side. Browse this tree and double-click a table to select it for profiling. You can then click Add profile, and choose some profiling options, before finally clicking Run profiling. You can see below a sample output for three of the most common profiles, click the image for full size. I hope this has given you a taster for DataCleaner, and should help you get up and running pretty quickly.

Read the article
Logging connection strings

If you some of the dynamic features of SSIS such as package configurations or property expressions then sometimes trying to work out were your connections are pointing can be a bit confusing. You will work out in the end but it can be useful to explicitly log this information so that when things go wrong you can just review the logs. You may wish to develop this idea further and encapsulate such logging into a custom task, but for now lets keep it simple and use the Script Task. The Script Task code below will raise an Information event showing the name and connection string for a connection. Imports System Imports Microsoft.SqlServer.Dts.Runtime Public Class ScriptMain Public Sub Main() Dim fireAgain As Boolean ' Get the connection string, we need to know the name of the connection Dim connectionName As String = "My OLE-DB Connection" Dim connectionString As String = Dts.Connections(connectionName).ConnectionString ' Format the message and log it via an information event Dim message As String = String.Format("Connection ""{0}"" has a connection string of ""{1}"".", _ connectionName, connectionString) Dts.Events.FireInformation(0, "Information", message, Nothing, 0, fireAgain) Dts.TaskResult = Dts.Results.Success End Sub End Class Building on that example it is probably more flexible to log all connections in a package as shown in the next example. Imports System Imports Microsoft.SqlServer.Dts.Runtime Public Class ScriptMain Public Sub Main() Dim fireAgain As Boolean ' Loop through all connections in the package For Each connection As ConnectionManager In Dts.Connections ' Get the connection string and log it via an information event Dim message As String = String.Format("Connection ""{0}"" has a connection string of ""{1}"".", _ connection.Name, connection.ConnectionString) Dts.Events.FireInformation(0, "Information", message, Nothing, 0, fireAgain) Next Dts.TaskResult = Dts.Results.Success End Sub End Class By using the Information event it makes it readily available in the designer, for example the Visual Studio Output window (Ctrl+Alt+O) or the package designer Execution Results tab, and also allows you to readily control the logging by choosing which events to log in the normal way. Now before somebody starts commenting that this is a security risk, I would like to highlight good practice for building connection managers. Firstly the Password property, or any other similar sensitive property is always defined as write-only, and secondly the connection string property only uses the public properties to assemble the connection string value when requested. In other words the connection string will never contain the password. I have seen a couple of cases where this is not true, but that was just bad development by third-parties, you won’t find anything like that in the box from Microsoft. Whilst writing this code it made me wish that there was a custom log entry that you could just turn on that did this for you, but alas connection managers do not even seem to support custom events. It did however remind me of a very useful event that is often overlooked and fits rather well alongside connection string logging, the Execute SQL Task’s custom ExecuteSQLExecutingQuery event. To quote the help reference Custom Messages for Logging - Provides information about the execution phases of the SQL statement. Log entries are written when the task acquires connection to the database, when the task starts to prepare the SQL statement, and after the execution of the SQL statement is completed. The log entry for the prepare phase includes the SQL statement that the task uses. It is the last part that is so useful, how often have you used an expression to derive a SQL statement and you want to log that to make sure the correct SQL is being returned? You need to turn it one, by default no custom log events are captured, but I’ll refer you to a walkthrough on setting up the logging for ExecuteSQLExecutingQuery by Jamie.

Read the article
RegexClean Transformation

Use the power of regular expressions to cleanse your data right there inside the Data Flow. This transformation includes a full user interface for simple configuration, as well as advanced features such as error output configuration. Two regular expressions are used, a match expression and a replace expression. The transformation is designed around the named capture groups or match groups, and even supports multiple expressions. This allows for rich and complex expressions to be built, all through an easy to reuse transformation where a bespoke Script Component was previously the only alternative. Some simple properties are available for each column selected – Behaviour The two behaviour modes offer similar functionality but with a difference. Replace, replaces tokens with the input, and Emit overwrites the whole string. Cascade Cascade allows you to define multiple expressions, each on a new line. The match expression will be processed into one operation per line, which are then processed in order at run-time. Multiple replace expressions can also be specified, again each on a new line. If there is no corresponding replace expression for a match expression line, then the last replace expression will be used instead. It is common to have multiple match expressions, but only a single replace expression. Match Expression The expression used to define the named capture groups. This is where you can analyse the data, and tag or name elements within it as found by the match expression. Replace Expression The replace determines the final output. It will reference the named groups from the match expression and assembles them into the final output. If you want to use regular expressions to validate data then try the Regular Expression Transformation. Quick Start Guide Select a column. A new output column is created for each selected column; there is no option for in-place replacement of column values. One input column can be used to populate multiple output columns, just select the column again in the lower grid, using the Input Columns drop-down selector. Amend the output column name and size as required. They default to the same as the input column selected. Amend the behaviour as required, the default is Replace. Amend the cascade option as required, the default is true. Finally enter your match and replace regular expressions Quick Sample #1 Parse an email address and extract the user and domain portions. Format as a web address passing the user portion as a URL parameter. This uses two match groups, user and host, which correspond to the text before the @ and after it respectively. Behaviour is Emit, and cascade of false, we only have a single match expression. Match Expression ^(?<user>[^@]+)@(?<host>.+)$ Replace Expression - http://www.${host}?user=${user} Results Sample Input Sample Output [email protected] http://www.adventure-works.com?user=zheng0 The component is provided as an MSI file, however to complete the installation, you will have to add the transformation to the Visual Studio toolbox manually. Right-click the toolbox, and select Choose Items.... Select the SSIS Data Flow Items tab, and then check the RegexClean Transformation from the list. Downloads The RegexClean Transformation is available for both SQL Server 2005 and SQL Server 2008. Please choose the version to match your SQL Server version, or you can install both versions and use them side by side if you have both SQL Server 2005 and SQL Server 2008 installed. RegexClean Transformation for SQL Server 2005 RegexClean Transformation for SQL Server 2008 Version History SQL Server 2005 Version 1.0.0.105 - Public Release (28 Jan 2008) SQL Server 2005 Version 1.0.0.105 - Public Release (28 Jan 2008) Screenshot

Read the article
Downloading a file over HTTP the SSIS way

This post shows you how to download files from a web site whilst really making the most of the SSIS objects that are available. There is no task to do this, so we have to use the Script Task and some simple VB.NET or C# (if you have SQL Server 2008) code. Very often I see suggestions about how to use the .NET class System.Net.WebClient and of course this works, you can code pretty much anything you like in .NET. Here I’d just like to raise the profile of an alternative. This approach uses the HTTP Connection Manager, one of the stock connection managers, so you can use configurations and property expressions in the same way you would for all other connections. Settings like the security details that you would want to make configurable already are, but if you take the .NET route you have to write quite a lot of code to manage those values via package variables. Using the connection manager we get all of that flexibility for free. The screenshot below illustrate some of the options we have. Using the HttpClientConnection class makes for much simpler code as well. I have demonstrated two methods, DownloadFile which just downloads a file to disk, and DownloadData which downloads the file and retains it in memory. In each case we show a message box to note the completion of the download. You can download a sample package below, but first the code: Imports System Imports System.IO Imports System.Text Imports System.Windows.Forms Imports Microsoft.SqlServer.Dts.Runtime Public Class ScriptMain Public Sub Main() ' Get the unmanaged connection object, from the connection manager called "HTTP Connection Manager" Dim nativeObject As Object = Dts.Connections("HTTP Connection Manager").AcquireConnection(Nothing) ' Create a new HTTP client connection Dim connection As New HttpClientConnection(nativeObject) ' Download the file #1 ' Save the file from the connection manager to the local path specified Dim filename As String = "C:\Temp\Sample.txt" connection.DownloadFile(filename, True) ' Confirm file is there If File.Exists(filename) Then MessageBox.Show(String.Format("File {0} has been downloaded.", filename)) End If ' Download the file #2 ' Read the text file straight into memory Dim buffer As Byte() = connection.DownloadData() Dim data As String = Encoding.ASCII.GetString(buffer) ' Display the file contents MessageBox.Show(data) Dts.TaskResult = Dts.Results.Success End Sub End Class Sample Package HTTPDownload.dtsx (74KB)

Read the article
Exploring packages in code

In my previous post Searching for tasks with code you can see how to explore the control flow side of packages, drilling down through containers, task, and event handlers, but it didn’t cover the data flow. I recently saw a post on the MSDN forum asking how to edit an existing package programmatically, and the sticking point was how to find the the data flow and the components inside. This post builds on some of the previous code and shows how you can explore all objects inside a package. I took the sample Task Search application I’d written previously, and came up with a totally pointless little console application that just walks through the package and writes out the basic type and name of every object it finds, starting with the package itself e.g. Package – MyPackage . The sample package we used last time showed nested objects as well an event handler; a OnPreExecute event tucked away on the task SQL In FEL. The output of this sample tool would look like this: PackageObjects v1.0.0.0 (1.0.0.26627) Copyright (C) 2009 Konesans Ltd Processing File - Z:\Users\Darren Green\Documents\Visual Studio 2005\Projects\SSISTestProject\EventsAndContainersWithExe cSQLForSearch.dtsx Package - EventsAndContainersWithExecSQLForSearch For Loop - FOR Counter Loop Task - SQL In Counter Loop Sequence Container - SEQ For Each Loop Wrapper For Each Loop - FEL Simple Loop Task - SQL In FEL Task - SQL On Pre Execute for FEL SQL Task Sequence Container - SEQ Top Level Sequence Container - SEQ Nested Lvl 1 Sequence Container - SEQ Nested Lvl 2 Task - SQL In Nested Lvl 2 Task - SQL In Nested Lvl 1 #1 Task - SQL In Nested Lvl 1 #2 Connection Manager – LocalHost The code is very similar to what we had previously, but there are a couple of extra bits to deal with connections and to look more closely at a task and see if it is a Data Flow task. For connections your just examine the package's Connections collection as shown in the abridged snippets below. First you can see the call to the ProcessConnections method, followed by the method itself. // Load the package file Application application = new Application(); using (Package package = application.LoadPackage(filename, null)) { // Write out the package name Console.WriteLine("Package - {0}", package.Name); ... More ... // Look and the connections ProcessConnections(package.Connections); } private static void ProcessConnections(Connections connections) { foreach (ConnectionManager connectionManager in connections) { Console.WriteLine("Connection Manager - {0}", connectionManager.Name); } } What we didn’t see in the sample output above was anything to do with the Data Flow, but rest assured the code now handles it too. The following snippet shows how each task is examined to see if it is a Data Flow task, and if so we can then loop through all of the components inside the data flow. private static void ProcessTaskHost(TaskHost taskHost) { if (taskHost == null) { return; } Console.WriteLine("Task - {0}", taskHost.Name); // Check if the task is a Data Flow task MainPipe pipeline = taskHost.InnerObject as MainPipe; if (pipeline != null) { ProcessPipeline(pipeline); } } private static void ProcessPipeline(MainPipe pipeline) { foreach (IDTSComponentMetaData90 componentMetadata in pipeline.ComponentMetaDataCollection) { Console.WriteLine("Pipeline Component - {0}", componentMetadata.Name); // If you wish to make changes to the component then you should really use the managed wrapper. // CManagedComponentWrapper wrapper = componentMetadata.Instantiate(); // wrapper.SetComponentProperty("PropertyName", "Value"); } } Hopefully you can see how we get a reference to the Data Flow task, and then use the ComponentMetaDataCollection to find out what components we have inside the pipeline. If you wanted to know more about the component you could look at the ObjectType or ComponentClassID properties. After that it gets a bit harder and you should get a reference to the wrapper object as the comment suggest and start using the properties, just like you would in the create packages samples, see our Code Development category for some for these examples. Download Sample code project PackageObjects.zip (5KB)

Read the article
Integrating Data Mining into your BI Solution (Presentation)

I recently gave a live meeting presentation to the UK User Group on Integrating Data Mining into your BI Solution. In it I talk about and demo ways of using your data mining models inside Integration Services, Analysis Services and Reporting Services. This is the first in a series of presentations I will be doing for the UG as I try to get the word out that Data Mining can be for the masses. You can download my deck and my line meeting recording from here.

Read the article
UK SQL Server User Group Events (June)

There are two events of note for the SQL Server User Group in June. The first is a Live Meeting event with myself on 04.06.2009. I am going to be looking at how to integrate Data Mining into your BI solution. I will be looking at putting DM into SSIS, SSAS and SSRS. It will be very demo oriented. You can register for the event here The second event is an event at Microsoft Reading on 10.06.2009. The evening will be a BI/Data Mining event. Chris Webb and myself are organizing it and we want speakers. We would love to see new faces up there telling us about their BI/DM solutions/Tips and Tricks. If you want to speak at the event then let me or Chris know. If you just want to attend then you can register here.

Read the article