Search Results

Search found 808 results on 33 pages for 'ssis snack'.

Page 20/33 | < Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27  | Next Page >

  • where to find Microsoft.SqlServer.Dts.Pipeline

    - by CoffeeAddict
    I'm opening a 2005 SSIS pakage and also an old C# project..both are in this solution here. I'm missing namespaces and I can't find the assemblies to add back to my references folder for my C# Project Microsoft.SqlServer.Dts.Pipeline for example is not one I find in the list of references in the .NET references tab. So how the hell do I get these SQL Server assemblies? Do I have to install the SQL Server 2008 sdk? Lost.

    Read the article

  • select xml data column into flat reporting table

    - by Bernard
    We have a xml column in SQL Server 2008. We need to do reporting off the data in the xml so we're going to select the xml into a flat table. The flat table has columns that correspond to various nodes in the xml. What is the best way to do this using SSIS? Is this a good approach? Or should we just try and write the reports directly off the xml column?

    Read the article

  • timestamp in C#

    - by praveen
    Hi all, I need to convert the system datetime to timestamp in script task SSIS. input date format: 29/05/2010 2:36 AM output format: 29-15-2010 14:36:00 thanks prav

    Read the article

  • SQL Server Agent 2005 job runs but provides no output

    - by alimack
    Essentially I have a job which runs in BIDS and as as a stand lone package and while it runs under the SQL Server Agent it doesn't complete properly (no error messages though). The job steps are: 1) Delete all rows from table; 2) Use For each loop to fill up table from Excel spreasheets; 3) Clean up table. I've tried this MS page (steps 1 & 2), didn't see any need to start changing from Server side security. Also SQLServerCentral.com for this page, no resolution. How can I get error logging or a fix?

    Read the article

  • DMZ and LAN on the same Windows Storage Server 2008 R2

    - by Sergei
    We are moving from EMC Celerra NS20 to Windows Storage Server 2008 R2. I would like to use deduplication feature in WSS (Single Instance Storage) for hosting data for our external FTP server.The idea is to use NFS server on WSS as datastore for Linux FTP server located in DMZ and CIFS services for servers in LAN. Using Celerra fileserver I was able to create multiple instances of fileservers with multiple virtual interfaces and separate filesystems so all data and networks would be separated. Would it be possible to do something similiar on WSS?

    Read the article

  • dtexec with password

    - by user1602996
    I have added a new step in my job activity monitor which runs ssis package(encrepted with password). dtexec /f "\\sw-conf-dev-01\projects\dtsx\Email.dtsx" /de "ssispassword" error message: Description: The package is encrypted with a password. The password was not specified, or is not correct. End Error Could not load package "\sw-conf-dev-01\projects\dtsx\Email.dtsx" because of error 0xC0014037. Description: Failed to remove package protection with error 0xC0014037 "The package is encrypted with a password. The password was not specified, or is not correct.". This occurs in the CPackage::LoadFromXML method I have used the same password in the package as well, but i don't know why I'm still getting an error message. Any ideas?

    Read the article

  • Visual Studio and SQL Server - correct installation sequence?

    - by cdonner
    I am rebuilding my development machine. This issue is not new to me, but I don't remember the solution. I started with VS 2008 Pro, then SQL 2008 Developer, then the SQL SP1, then VS SP1. The result is that I cannot open SSIS projects (see the error below). What is the correct order so that I can avoid the installation of SQL Server Express and still have all the features working? --------------------------- Microsoft Visual Studio --------------------------- Package Load Failure Package 'DataWarehouse VSIntegration layer' has failed to load properly ( GUID = {4A0C6509-BF90-43DA-ABEE-0ABA3A8527F1} ). Please contact package vendor for assistance. Application restart is recommended, due to possible environment corruption. Would you like to disable loading this package in the future? You may use 'devenv /resetskippkgs' to re-enable package loading. --------------------------- Yes No ---------------------------

    Read the article

  • SSIS- Sharepoint list data transfer issue

    - by Vicky
    Hi , We are trying to transfer data from oracle database (about 60,0000) records only to a sharepoint list using SSIS. But we are getting following error when records reaches around 19000 . The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020 and System.ServiceModel.ProtocolException: The remote server returned an unexpected response: (400) Bad Request. Earlier we thought if could because of Sharepoint list limit so we tried by reducing two of the columns and then it has went fine. So we left with one of the column of Datatype DT_STR and length 400 in oracle beacuse of which issue might be happening, It is mapped to sharepoint custom list field of multiline type. We also verified if length of field is issue but in oracle DB for all records max length for this column is only 239 so length issue is also ruled out. Any one who has faced this kind of issue or knows cause of this issue.Kindly let us know.. Thanks and regards, Vicky

    Read the article

  • Best Practise to populate Fact and Dimension Tables from Transactional Flat DB

    - by alex25
    Hi! I want to populate a star schema / cube in SSIS / SSAS. I prepared all my dimension tables and my fact table, primary keys etc. The source is a 'flat' (item level) table and my problem is now how to split it up and get it from one into the respective tables. I did a fair bit of googling but couldn't find a satisfying solution to the problem. One would imagine that this is a rather common problem/situation in BI development?! Thanks, alexl

    Read the article

  • ETL mechanisms for MySQL to SQL Server over WAN

    - by Troy Hunt
    I’m looking for some feedback on mechanisms to batch data from MySQL Community Server 5.1.32 with an external host down to an internal SQL Server 05 Enterprise machine over VPN. The external box accumulates data throughout business hours (about 100Mb per day), which then needs to be transferred internationally across a WAN connection to an internal corporate environment before some BI work is performed. This should just be change-sets making their way down each night. I’m interested in thoughts on the ETL mechanisms people have successfully used in similar scenarios before. SSIS seems like a potential candidate; can anyone comment on the suitability for this scenario? Alternatively, other thoughts on how to do this in a cost-conscious way would be most appreciated. Thanks!

    Read the article

  • Compare date fields in SQL server

    - by huslayer
    Hi all, I've a flat file that I cleaned the data out using SSIS, the output looks like that : MEDICAL ADMIT PATIENT PATIENT DATE OF DX REC NO DATE NUMBER NAME DISCHARGE Code DRG # 123613 02/16/09 12413209 MORIBALDI ,GEMMA 02/19/09 428.20 988 130897 01/23/09 12407193 TINLEY ,PATRICIA 01/23/09 535.10 392 139367 02/27/09 36262509 THARPE ,GLORIA 03/05/09 562.10 392 141954 02/25/09 72779499 SHUMATE ,VALERIA 02/25/09 112.84 370 141954 03/07/09 36271732 SHUMATE ,VALERIA 03/10/09 493.92 203 145299 01/21/09 12406294 BAUGH ,MARIA 01/21/09 366.17 117 and the report (final results) attached in the screen shot from the final excel report. so what's happening is IF the same name or same account number is duplicate, that means the patient has entered the hospital again and needs to be included in the report. ![alt text][1] what I need to do is... Eliminate any rows that is NOT duplicate (not everybody in this file has been admitted again) and compare the dates to get the ReAdmitdate and ReDischargedate I dumped the data into a SQL table and trying to compare the dates to figure out "ReAdmitdate" and "ReDischargedate" any help is appreciated. Thanks [link text][1]

    Read the article

  • sqL table is sorted by default

    - by Pramodtech
    I have simple SSIS package where I import data from flat file into sql table(MS sql 2005). File contains 70k rows and table has no primary key. Importing is sucessful but when I open sql table the order of rwos is different from the that of file. After observing closely I see that data in table is sorted by default by first column. Why this is happening? and how I can avoid default sort? Thanks.

    Read the article

  • Loading Fact Table + Lookup / UnionAll for SK lookups.

    - by Nev_Rahd
    I got to populate FactTable with 12 lookups to dimension table to get SK's, of which 6 are to different Dim Tables and rest 6 are lookup to same DimTable (type II) doing lookup to same natural key. Ex: PrimeObjectID = lookup to DimObject.ObjectID = get ObjectSK and got other columns which does same OtherObjectID1 = lookup to DimObject.ObjectID = get ObjectSK OtherObjectID2 = lookup to DimObject.ObjectID = get ObjectSK OtherObjectID3 = lookup to DimObject.ObjectID = get ObjectSK OtherObjectID4 = lookup to DimObject.ObjectID = get ObjectSK OtherObjectID5 = lookup to DimObject.ObjectID = get ObjectSK for such multiple lookup how should go in my SSIS package. for now am using lookup / unionall foreach lookup. Is there a better way to this.

    Read the article

  • MaximumErrorCount has now effect

    - by Rob Bowman
    Hi I'm quite new to SSIS - using 2008 version. I have a job that uses a few data flow tasks. On the third one I'm getting a primary key violation on the last row that it needs to insert, but only sometimes! I'd like to ignore this problem for now and let the job continue. I have set the MaximumErrorCount property to 10 for the DataFlowTaks, the SequenceContainer and for the Package but still taks fails and this causes the package to stop. Could anyone please advise how I can get the package to ignore the error? Thanks Rob.

    Read the article

  • Firing events in script task

    - by Anonymouslemming
    I've got an SSIS project where I am constructing an SQL command based on some variables. I'm constructing the command in a script task, and want to output the constructed SQL to the 'Execution Results' window. I am trying to do this using a FireInformation line from inside my script as follows: Dts.Events.FireInformation(99, "test", "Make this event appear!", "", 0, true); However, in the script editor when editing ScriptMain.cs, that line is underlined in red, and on mouseover, I get the following message: Error: The best overloaded method match for 'Microsoft.SqlServer.Dts.Tasks.ScriptTask.EventsObjectWrapper.FireInformation(int, string, string, string, int, ref bool') has some invalid arguments As a result, my script does not compile and I cannot execute it. Any idea what I'm doing wrong here, or what I need to change to be able to see the values of my variables at this point in the Execution output?

    Read the article

  • Tutorial needed to learn Microsoft BI

    - by Zerotoinfinite
    I am a .NET developer and have a little experience in Sql Server. As we all know that .NET developer mostly deals with stored procedures in Sql Server. I am willing to learn BI (SSIS,SSRS,SSAS) from home. I have seen some article but I don't think that's going to work for me. I am seeking some Free video tutorials I can get to learn BI. Imp: I am asking this question here because I believe that many of you are genius BI and .NET developers so you guys can help me figure out which tutorial will take benift from my .net skills. Apologize : I used free in my question. I am guilt to say that I am one of the guy who is not willing to invest but the reason is this that currently I am Jobless. :-(

    Read the article

  • Excel Import how would you do it?

    - by Rico
    Ok i have a Excel import written. It uses excel automation to go through all the records and get the job done. BUt how would you do it if you had to do it? Would you use SSIS? Would You use a Dataconnection? I am really confused as to the best way to get this done properly. So that it doesn't slow down the actual application for the other clients when one client does an import. Thanks

    Read the article

  • Problem with the row count transform

    - by abkl
    Hi, I currently deployed an SSIS package (Developed on the 2005 version) (developed on my local server) in a pre production environment for testing. I have used the Row count transform to get a count of good/bad records. It works fine on my local system . However when i deploy this on the pre prod server, the row count does not work! (as in it does not recognize the vairbales i have assigned to the relevant transofm - no drop down abvaliable in the variables attribute part. tried deleting and adding a new transoform.. no luck. Strangely this does not work for any of the other packages also present/deployed on the same server (tried this out by dropping an rc tramsform onto an existing package... same problem) Any suggestions? Thanks a tonne

    Read the article

  • How to import data to SAP

    - by Mehmet AVSAR
    Hi, As a complete stranger in town of SAP, I want to transfer my own application's (mobile salesforce automation) data to SAP. My application has records of customers, stocks, inventory, invoices (and waybills), cheques, payments, collections, stock transfer data etc. I have an additional database which holds matchings of records. ie. A customer with ID 345 in my application has key 120-035-0223 in SAP. Every record, for sure, has to know it's counterpart, including parameters. After searching Google and SAP help site for a day, I covered that it's going to be a bit more pain than I expected. Especially SAP site does not give even a clue on it. Say I couldn't find. We transferred our data to some other ERP systems, some of which wanted XML files, some other exposed their APIs. My point is, is Sql Server's SSIS an option for me? I hope it is, so I can fight on my own territory. Since client requests would vary a lot, I count flexibility as most important criteria. Also, I want to transfer as much data as I could. Any help is appreciated. Regards,

    Read the article

  • Field specific errors for ETL

    - by AaronLS
    I am creating a ETL process in MS SQL Server and I would like to have errors specific to a particular column of a particular row. For example, the data is initially loaded from excel files into a table(we'll call the Initial table) where all columns are varchar(2000) and then I stage the data to another table(the DataTypedTable) that contains more specific data types (datetime,int, etc.) or more tightly constrained varchar lengths. I need to be able to create error messages for a specific field such as: "Jan. 13th" is not a valid date format for the submission date. Please use a format of MM/DD/YYYY These error messages would need to be stored in some way such that later in the process a automated process can create reports with the error messages such that each message references a specific row and field(someone will need to go back and correct the data in the source system and resubmit the excel file). So ideally it would be inserted into a Failures tables of some sort and contain the primary key of the failed row, the column name, and the error message. Question: So I am wondering if this can be accomplished with SSIS, or some open source tool like Talend, and if so, what would be your general approach? Or what hand coded approach you would take? Couple approaches I've thought of using SQL(up until no I have done ETL by hand in SQL procs, but I want to consider other approaches. Possible C# even.): Use a cursor to read through the Initial table, and for each row insert a blank record with only the primary key into the DataTyped table, then use a single update statement for each column, such that if that update fails I can insert a very specific error message specific to that column in the error messages table. Insert all the data as is into the DataTyped table, but have duplicate columns like SubmissionDate and SubmissionDateOld. After the initial insert the *Old columns have data, the rest are blank, and I have a single update for each column that sets the SubmissionDate based on the SubmissionDateOld. In addition to suggesting an approach, I'd like to know if you are using that approach or something similar already in the work you do.

    Read the article

  • "Unable to read data from the transport connection: net_io_connectionclosed." - Windows Vista Busine

    - by John DaCosta
    Unable to test sending email from .NET code in Windows Vista Business. I am writing code which I will migrate to an SSIS Package once it its proven. The code is to send an error message via email to a list of recipients. The code is below, however I am getting an exception when I execute the code. I created a simple class to do the mailing... the design could be better, I am testing functionality before implementing more robust functionality, methods, etc. namespace LabDemos { class Program { static void Main(string[] args) { Mailer m = new Mailer(); m.test(); } } } namespace LabDemos { class MyMailer { List<string> _to = new List<string>(); List<string> _cc = new List<string>(); List<string> _bcc = new List<string>(); String _msgFrom = ""; String _msgSubject = ""; String _msgBody = ""; public void test(){ //create the mail message MailMessage mail = new MailMessage(); //set the addresses mail.From = new MailAddress("[email protected]"); //set the content mail.Subject = "This is an email"; mail.Body = "this is a sample body"; mail.IsBodyHtml = false; //send the message SmtpClient smtp = new SmtpClient(); smtp.Host = "emailservername"; smtp.Port = 25; smtp.UseDefaultCredentials = true; smtp.Send(mail); } } Exception Message Inner Exception {"Unable to read data from the transport connection: net_io_connectionclosed."} Stack Trace " at System.Net.Mail.SmtpReplyReaderFactory.ProcessRead(Byte[] buffer, Int32 offset, Int32 read, Boolean readLine)\r\n at System.Net.Mail.SmtpReplyReaderFactory.ReadLines(SmtpReplyReader caller, Boolean oneLine)\r\n at System.Net.Mail.SmtpReplyReaderFactory.ReadLine(SmtpReplyReader caller)\r\n at System.Net.Mail.SmtpConnection.GetConnection(String host, Int32 port)\r\n at System.Net.Mail.SmtpTransport.GetConnection(String host, Int32 port)\r\n at System.Net.Mail.SmtpClient.GetConnection()\r\n at System.Net.Mail.SmtpClient.Send(MailMessage message)" Outer Exception System.Net.Mail.SmtpException was unhandled Message="Failure sending mail." Source="System" StackTrace: at System.Net.Mail.SmtpClient.Send(MailMessage message) at LabDemos.Mailer.test() in C:\Users\username\Documents\Visual Studio 2008\Projects\LabDemos\LabDemos\Mailer.cs:line 40 at LabDemos.Program.Main(String[] args) in C:\Users\username\Documents\Visual Studio 2008\Projects\LabDemos\LabDemos\Program.cs:line 48 at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args) at System.AppDomain.nExecuteAssembly(Assembly assembly, String[] args) at System.Runtime.Hosting.ManifestRunner.Run(Boolean checkAptModel) at System.Runtime.Hosting.ManifestRunner.ExecuteAsAssembly() at System.Runtime.Hosting.ApplicationActivator.CreateInstance(ActivationContext activationContext, String[] activationCustomData) at System.Runtime.Hosting.ApplicationActivator.CreateInstance(ActivationContext activationContext) at System.Activator.CreateInstance(ActivationContext activationContext) at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssemblyDebugInZone() at System.Threading.ThreadHelper.ThreadStart_Context(Object state) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.ThreadHelper.ThreadStart() InnerException: System.IO.IOException Message="Unable to read data from the transport connection: net_io_connectionclosed." Source="System" StackTrace: at System.Net.Mail.SmtpReplyReaderFactory.ProcessRead(Byte[] buffer, Int32 offset, Int32 read, Boolean readLine) at System.Net.Mail.SmtpReplyReaderFactory.ReadLines(SmtpReplyReader caller, Boolean oneLine) at System.Net.Mail.SmtpReplyReaderFactory.ReadLine(SmtpReplyReader caller) at System.Net.Mail.SmtpConnection.GetConnection(String host, Int32 port) at System.Net.Mail.SmtpTransport.GetConnection(String host, Int32 port) at System.Net.Mail.SmtpClient.GetConnection() at System.Net.Mail.SmtpClient.Send(MailMessage message) InnerException:

    Read the article

  • The blocking nature of aggregates

    - by Rob Farley
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here.  In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

    Read the article

  • The blocking nature of aggregates

    - by Rob Farley
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here.  In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

    Read the article

< Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27  | Next Page >