Search Results

Search found 808 results on 33 pages for 'ssis snack'.

Page 16/33 | < Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23  | Next Page >

  • Investigation: Can different combinations of components effect Dataflow performance?

    - by jamiet
    Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation!  The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test:     One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category         CSPL Split by Month of Birth and Income Category       Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category         CSPL Split by Month of Birth 1& 2       Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

    Read the article

  • Presenting to the New England SQL Server Users Group 10 Jun 2010!

    - by andyleonard
    I am honored to present Applied SSIS Design Patterns to the New England SQL Server Users Group on 10 Jun 2010! This is a reprise of the spotlight session presented at the PASS Summit 2009. Abstract "Design Patterns" is more than a trendy buzz phrase; design patterns are a way of breaking down complex development projects into manageable tasks. They lend themselves to several development methodologies and apply to SSIS development. Chances are you're using your own design patterns now! In this spotlight...(read more)

    Read the article

  • SSIS 2008 - How to read from SQL Server Compact Edition file?

    - by Gustavo Cavalcanti
    I can see "SQL Server Compact Destination" under Data Flow Destinations, but I am looking for its source counterpart. If I choose ADO.Net source and create a new connection, there's no provider for SQL CE. What am I missing? Thanks! Update: I am able to create a "Data Source" (under "Data Sources" folder in my SSIS project") that connects to an existing Sql CE file. But how can I use this Data Source in my data flow?

    Read the article

  • Can SQL Server Compact be used as both a Source and Destination in SSIS?

    - by Rich
    I'm wondering if SQL Server Compact Edition can be used as both a Source and Destination in an SSIS dataflow. I know I can setup a SQLMOBILE connection manager, and I've found some information that mentions using it as a Destination, but nothing on using it as a Source. What I'm looking to do is to transfer data from one SQL Server Compact file to another.

    Read the article

  • How do I reference SSIS on a build machine without installing SQL Server 2008 Client Tools?

    - by freshr
    I need to build SSIS packages on a build machine, and do not want the overhead of installing SQL Server Management Studio on this machine. A SQL Server 2008 SDK would be ideal, but I could not find where to download it. The dlls I require are (for example): Microsoft.SQLServer.ManagedDTS Microsoft.SqlServer.PipelineHost Microsoft.SqlServer.DTSPipelineWrap Microsoft.SQLServer.DTSRuntimeWrap I could attempt to copy them to the build machine individually, but I would rather just use an SDK if possible. Where can I get the SDK, or alternatively, what suggestions are there?

    Read the article

  • SSIS Send Mail task limited to 255 characters in address?

    - by CodeByMoonlight
    Is this a bug, or some hidden limit I can't find any documentation about? When creating a Send Mail Task in SSIS 2008, the TO, CC and BCC fields seem to have a hidden limit of 255 characters. I'm aware this is the standard limit for individual email addresses, but all three are commonly used for multiple addresses and the comment for the To field even says "separate the recipients with semicolons". But nevertheless, it truncates the address to a maximum of 255 characters. Bug, non-obvious standard, or something I'm missing? Any way around this? We were trying to build a CC list dynamically, but this has caused a rethink.

    Read the article

  • Code changes in the Dtsx file in an SSIS package not reflecting after deploying and running on the server

    - by SKumar
    I have some folder called Test-Deploy in which I keep all the dtsx files, manifest and configuration files. Whenever I want to deploy the ssis package, I run manifest file in this folder and deploy. My problem is I have to change one of the dtsx files out of it. So, I opened only that particular dtsx file BI studio, updated and built. After the build, I copied the dtsx file from bin folder and copied to my Test-Deploy folder. When I deployed and run this new package in the Test-Deploy folder, the changes I made are not reflecting in the result. I could not find any difference in the results before and after changing. My doubt is has it saved my previous dtsx file somewhere on the server and executing the same dtsx file instead of the new one?

    Read the article

  • SSIS 2005 - How to Import a Fixed Width Flat File?

    - by Greg
    I have a flat file that looks something like this: junk I don't care about \n \n columns names\n val1 val2 val3\n val1 val2 val3\n columns names \n val1 val2 val3\n I only care the lines with values. These value lines are all fixed width format and have the same line length. The other junk lines and column names can have any line width. When I try the flat file fixed width option or the ragged right option the preview looks all wrong. Any ideas what the easiest way to get this into SSIS is?

    Read the article

  • SSIS - Can I get the column schema for a flat file source from a database?

    - by Steve Clement
    We receive a nightly data export from a vendor in the form of about 10 tab-delimited flat file without column headers. In addition, the vendor provides us with the SQL scripts for the database tables so that we can import the files into our system. Unfortunately, the vendor recently changed the schema for the flat files. Each file has upwards 150 columns, and having to go through the DB schema and adjust column types on a Flat File Data Source in SSIS is extremely time consuming, not to mention a royal pain. Since I know the file data layout in the database schema, is there any way I can dynamically pull that into a Flat File source to set the columns correctly? Or am I just stuck with manually setting everything?

    Read the article

  • Tuesday 6th Manchester SQL User Group - Chris Testa-O'Neil (Loading a datawarehouse using SSIS) and

    - by tonyrogerson
    Chris will give a talk on Loading a datawarehouse using SQL Server Integration Services, Tony Rogerson will give a talk on Database Design: Normalisation/Denormalisation and using Surrogate Keys - practicalities/pitfalls and benefits in Microsoft SQL Server. Registration is essential which you can do here: http://sqlserverfaq.com?eid=218 . Come and join us for an evening of SQL Server discussion, as well as the two formal sessions by Chris Testa-O'Neil and Tony Rogerson there will be a chance...(read more)

    Read the article

  • Upgrading SSIS Custom Components for SQL Server 2012

    Having finally got around to upgrading my custom components to SQL Server 2012, I thought I’d share some notes on the process. One of the goals was minimal duplication, so the same code files are used to build the 2008 and 2012 components, I just have a separate project file. The high level steps are listed below, followed by some more details. Create a 2012 copy of the project file Upgrade project, just open the new project file is VS2010 Change target framework to .NET 4.0 Set conditional compilation symbol for DENALI Change any conditional code, including assembly version and UI type name Edit project file to change referenced assemblies for 2012 Change target framework to .NET 4.0 Open the project properties. On the Applications page, change the Target framework to .NET Framework 4. Set conditional compilation symbol for DENALI Re-open the project properties. On the Build tab, first change the Configuration to All Configurations, then set a Conditional compilation symbol of DENALI. Change any conditional code, including assembly version and UI type name The value doesn’t have to be DENALI, it can actually be anything you like, that is just what I use. It is how I control sections of code that vary between versions. There were several API changes between 2005 and 2008, as well as interface name changes. Whilst we don’t have the same issues between 2008 and 2012, I still have some sections of code that do change such as the assembly attributes. #if DENALI [assembly: AssemblyDescription("Data Generator Source for SQL Server Integration Services 2012")] [assembly: AssemblyCopyright("Copyright © 2012 Konesans Ltd")] [assembly: AssemblyVersion("3.0.0.0")] #else [assembly: AssemblyDescription("Data Generator Source for SQL Server Integration Services 2008")] [assembly: AssemblyCopyright("Copyright © 2008 Konesans Ltd")] [assembly: AssemblyVersion("2.0.0.0")] #endif The Visual Studio editor automatically formats the code based on the current compilation symbols, hence in this case the 2008 code is grey to indicate it is disabled. As you can see in the previous example I have distinct assembly version attributes, ensuring I can run both 2008 and 2012 versions of my component side by side. For custom components with a user interface, be sure to update the UITypeName property of the DtsTask or DtsPipelineComponent attributes. As above I use the conditional compilation symbol to control the code. #if DENALI [DtsTask ( DisplayName = "File Watcher Task", Description = "File Watcher Task", IconResource = "Konesans.Dts.Tasks.FileWatcherTask.FileWatcherTask.ico", UITypeName = "Konesans.Dts.Tasks.FileWatcherTask.FileWatcherTaskUI,Konesans.Dts.Tasks.FileWatcherTask,Version=3.0.0.0,Culture=Neutral,PublicKeyToken=b2ab4a111192992b", TaskContact = "File Watcher Task; Konesans Ltd; Copyright © 2012 Konesans Ltd; http://www.konesans.com" )] #else [DtsTask ( DisplayName = "File Watcher Task", Description = "File Watcher Task", IconResource = "Konesans.Dts.Tasks.FileWatcherTask.FileWatcherTask.ico", UITypeName = "Konesans.Dts.Tasks.FileWatcherTask.FileWatcherTaskUI,Konesans.Dts.Tasks.FileWatcherTask,Version=2.0.0.0,Culture=Neutral,PublicKeyToken=b2ab4a111192992b", TaskContact = "File Watcher Task; Konesans Ltd; Copyright © 2004-2008 Konesans Ltd; http://www.konesans.com" )] #endif public sealed class FileWatcherTask: Task, IDTSComponentPersist, IDTSBreakpointSite, IDTSSuspend { // .. code goes on... } Shown below is another example I found that needed changing. I borrow one of the MS editors, and use it against a custom property, but need to ensure I reference the correct version of the MS controls assembly. This section of code is actually shared between the 2005, 2008 and 2012 versions of my component hence it has test for both DENALI and KATMAI symbols. #if DENALI const string multiLineUI = "Microsoft.DataTransformationServices.Controls.ModalMultilineStringEditor, Microsoft.DataTransformationServices.Controls, Version=11.0.00.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91"; #elif KATMAI const string multiLineUI = "Microsoft.DataTransformationServices.Controls.ModalMultilineStringEditor, Microsoft.DataTransformationServices.Controls, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91"; #else const string multiLineUI = "Microsoft.DataTransformationServices.Controls.ModalMultilineStringEditor, Microsoft.DataTransformationServices.Controls, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91"; #endif // Create Match Expression parameter IDTSCustomPropertyCollection100 propertyCollection = outputColumn.CustomPropertyCollection; IDTSCustomProperty100 property = propertyCollection.New(); property = propertyCollection.New(); property.Name = MatchParams.Name; property.Description = MatchParams.Description; property.TypeConverter = typeof(MultilineStringConverter).AssemblyQualifiedName; property.UITypeEditor = multiLineUI; property.Value = MatchParams.DefaultValue; Edit project file to change referenced assemblies for 2012 We now need to edit the project file itself. Open the MyComponente2012.cproj  in you favourite text editor, and then perform a couple of find and replaces as listed below: Find Replace Comment Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 Change the assembly references version from SQL Server 2008 to SQL Server 2012. Microsoft SQL Server\100\ Microsoft SQL Server\110\ Change any assembly reference hint path locations from from SQL Server 2008 to SQL Server 2012. If you use any Build Events during development, such as copying the component assembly to the DTS folder, or calling GACUTIL to install it into the GAC, you can also change these now. An example of my new post-build event for a pipeline component is shown below, which uses the .NET 4.0 path for GACUTIL. It also uses the 110 folder location, instead of 100 for SQL Server 2008, but that was covered the the previous find and replace. "C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin\NETFX 4.0 Tools\gacutil.exe" /if "$(TargetPath)" copy "$(TargetPath)" "%ProgramFiles%\Microsoft SQL Server\110\DTS\PipelineComponents" /Y

    Read the article

  • Importing Excel data into SSIS 2008 using Data Conversion Transformation

    Despite its benefits, SQL Server Integration Services Import Export Wizard has a number of limitations, resulting in part from a new set of rules that eliminate implicit data type conversion mechanisms present in Data Transformation Services. This article discusses a method that addresses such limitations, focusing in particular on importing the content of Excel spreadsheets into SQL Server.

    Read the article

  • The SSIS File Sweeper

    Moving files around is a task that many DBAs need to accomplish. Whether as part of an import or export process, or just for administration purposes, this new article from JD Gonzalez can help you solve this problem.

    Read the article

  • Sorting data in the SSIS Pipeline (Video)

    In this post I want to show a couple of ways to order the data that comes into the pipeline.  a number of people have asked me about this primarily because there are a number of ways to do it but also because some components in the pipeline take sorted inputs.  One of the methods I show is visually easy to understand and the other is less visual but potentially more performant.

    Read the article

< Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23  | Next Page >