Search Results

Search found 3921 results on 157 pages for 'from zero to ssis!'.

Page 29/157 | < Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36 | Next Page >

SQL Server Agent 2005 job runs but provides no output

- by alimack

Essentially I have a job which runs in BIDS and as as a stand lone package and while it runs under the SQL Server Agent it doesn't complete properly (no error messages though). The job steps are: 1) Delete all rows from table; 2) Use For each loop to fill up table from Excel spreasheets; 3) Clean up table. I've tried this MS page (steps 1 & 2), didn't see any need to start changing from Server side security. Also SQLServerCentral.com for this page, no resolution. How can I get error logging or a fix?

Read the article
DMZ and LAN on the same Windows Storage Server 2008 R2

- by Sergei

We are moving from EMC Celerra NS20 to Windows Storage Server 2008 R2. I would like to use deduplication feature in WSS (Single Instance Storage) for hosting data for our external FTP server.The idea is to use NFS server on WSS as datastore for Linux FTP server located in DMZ and CIFS services for servers in LAN. Using Celerra fileserver I was able to create multiple instances of fileservers with multiple virtual interfaces and separate filesystems so all data and networks would be separated. Would it be possible to do something similiar on WSS?

Read the article
How can I handle emailed job control files for SQL Server? [closed]

- by ganesh

How can I detect control files from any mailbox, scan that file, decrypt it and queue it as a job using job manager in SQL Server?

Read the article
dtexec with password

- by user1602996

I have added a new step in my job activity monitor which runs ssis package(encrepted with password). dtexec /f "\\sw-conf-dev-01\projects\dtsx\Email.dtsx" /de "ssispassword" error message: Description: The package is encrypted with a password. The password was not specified, or is not correct. End Error Could not load package "\sw-conf-dev-01\projects\dtsx\Email.dtsx" because of error 0xC0014037. Description: Failed to remove package protection with error 0xC0014037 "The package is encrypted with a password. The password was not specified, or is not correct.". This occurs in the CPackage::LoadFromXML method I have used the same password in the package as well, but i don't know why I'm still getting an error message. Any ideas?

Read the article
Visual Studio and SQL Server - correct installation sequence?

- by cdonner

I am rebuilding my development machine. This issue is not new to me, but I don't remember the solution. I started with VS 2008 Pro, then SQL 2008 Developer, then the SQL SP1, then VS SP1. The result is that I cannot open SSIS projects (see the error below). What is the correct order so that I can avoid the installation of SQL Server Express and still have all the features working? --------------------------- Microsoft Visual Studio --------------------------- Package Load Failure Package 'DataWarehouse VSIntegration layer' has failed to load properly ( GUID = {4A0C6509-BF90-43DA-ABEE-0ABA3A8527F1} ). Please contact package vendor for assistance. Application restart is recommended, due to possible environment corruption. Would you like to disable loading this package in the future? You may use 'devenv /resetskippkgs' to re-enable package loading. --------------------------- Yes No ---------------------------

Read the article
SSIS- Sharepoint list data transfer issue

- by Vicky

Hi , We are trying to transfer data from oracle database (about 60,0000) records only to a sharepoint list using SSIS. But we are getting following error when records reaches around 19000 . The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020 and System.ServiceModel.ProtocolException: The remote server returned an unexpected response: (400) Bad Request. Earlier we thought if could because of Sharepoint list limit so we tried by reducing two of the columns and then it has went fine. So we left with one of the column of Datatype DT_STR and length 400 in oracle beacuse of which issue might be happening, It is mapped to sharepoint custom list field of multiline type. We also verified if length of field is issue but in oracle DB for all records max length for this column is only 239 so length issue is also ruled out. Any one who has faced this kind of issue or knows cause of this issue.Kindly let us know.. Thanks and regards, Vicky

Read the article
Best Practise to populate Fact and Dimension Tables from Transactional Flat DB

- by alex25

Hi! I want to populate a star schema / cube in SSIS / SSAS. I prepared all my dimension tables and my fact table, primary keys etc. The source is a 'flat' (item level) table and my problem is now how to split it up and get it from one into the respective tables. I did a fair bit of googling but couldn't find a satisfying solution to the problem. One would imagine that this is a rather common problem/situation in BI development?! Thanks, alexl

Read the article
ETL mechanisms for MySQL to SQL Server over WAN

- by Troy Hunt

I’m looking for some feedback on mechanisms to batch data from MySQL Community Server 5.1.32 with an external host down to an internal SQL Server 05 Enterprise machine over VPN. The external box accumulates data throughout business hours (about 100Mb per day), which then needs to be transferred internationally across a WAN connection to an internal corporate environment before some BI work is performed. This should just be change-sets making their way down each night. I’m interested in thoughts on the ETL mechanisms people have successfully used in similar scenarios before. SSIS seems like a potential candidate; can anyone comment on the suitability for this scenario? Alternatively, other thoughts on how to do this in a cost-conscious way would be most appreciated. Thanks!

Read the article
Compare date fields in SQL server

- by huslayer

Hi all, I've a flat file that I cleaned the data out using SSIS, the output looks like that : MEDICAL ADMIT PATIENT PATIENT DATE OF DX REC NO DATE NUMBER NAME DISCHARGE Code DRG # 123613 02/16/09 12413209 MORIBALDI ,GEMMA 02/19/09 428.20 988 130897 01/23/09 12407193 TINLEY ,PATRICIA 01/23/09 535.10 392 139367 02/27/09 36262509 THARPE ,GLORIA 03/05/09 562.10 392 141954 02/25/09 72779499 SHUMATE ,VALERIA 02/25/09 112.84 370 141954 03/07/09 36271732 SHUMATE ,VALERIA 03/10/09 493.92 203 145299 01/21/09 12406294 BAUGH ,MARIA 01/21/09 366.17 117 and the report (final results) attached in the screen shot from the final excel report. so what's happening is IF the same name or same account number is duplicate, that means the patient has entered the hospital again and needs to be included in the report. ![alt text][1] what I need to do is... Eliminate any rows that is NOT duplicate (not everybody in this file has been admitted again) and compare the dates to get the ReAdmitdate and ReDischargedate I dumped the data into a SQL table and trying to compare the dates to figure out "ReAdmitdate" and "ReDischargedate" any help is appreciated. Thanks [link text][1]

Read the article
sqL table is sorted by default

- by Pramodtech

I have simple SSIS package where I import data from flat file into sql table(MS sql 2005). File contains 70k rows and table has no primary key. Importing is sucessful but when I open sql table the order of rwos is different from the that of file. After observing closely I see that data in table is sorted by default by first column. Why this is happening? and how I can avoid default sort? Thanks.

Read the article
Loading Fact Table + Lookup / UnionAll for SK lookups.

- by Nev_Rahd

I got to populate FactTable with 12 lookups to dimension table to get SK's, of which 6 are to different Dim Tables and rest 6 are lookup to same DimTable (type II) doing lookup to same natural key. Ex: PrimeObjectID = lookup to DimObject.ObjectID = get ObjectSK and got other columns which does same OtherObjectID1 = lookup to DimObject.ObjectID = get ObjectSK OtherObjectID2 = lookup to DimObject.ObjectID = get ObjectSK OtherObjectID3 = lookup to DimObject.ObjectID = get ObjectSK OtherObjectID4 = lookup to DimObject.ObjectID = get ObjectSK OtherObjectID5 = lookup to DimObject.ObjectID = get ObjectSK for such multiple lookup how should go in my SSIS package. for now am using lookup / unionall foreach lookup. Is there a better way to this.

Read the article
MaximumErrorCount has now effect

- by Rob Bowman

Hi I'm quite new to SSIS - using 2008 version. I have a job that uses a few data flow tasks. On the third one I'm getting a primary key violation on the last row that it needs to insert, but only sometimes! I'd like to ignore this problem for now and let the job continue. I have set the MaximumErrorCount property to 10 for the DataFlowTaks, the SequenceContainer and for the Package but still taks fails and this causes the package to stop. Could anyone please advise how I can get the package to ignore the error? Thanks Rob.

Read the article
Firing events in script task

- by Anonymouslemming

I've got an SSIS project where I am constructing an SQL command based on some variables. I'm constructing the command in a script task, and want to output the constructed SQL to the 'Execution Results' window. I am trying to do this using a FireInformation line from inside my script as follows: Dts.Events.FireInformation(99, "test", "Make this event appear!", "", 0, true); However, in the script editor when editing ScriptMain.cs, that line is underlined in red, and on mouseover, I get the following message: Error: The best overloaded method match for 'Microsoft.SqlServer.Dts.Tasks.ScriptTask.EventsObjectWrapper.FireInformation(int, string, string, string, int, ref bool') has some invalid arguments As a result, my script does not compile and I cannot execute it. Any idea what I'm doing wrong here, or what I need to change to be able to see the values of my variables at this point in the Execution output?

Read the article
Tutorial needed to learn Microsoft BI

- by Zerotoinfinite

I am a .NET developer and have a little experience in Sql Server. As we all know that .NET developer mostly deals with stored procedures in Sql Server. I am willing to learn BI (SSIS,SSRS,SSAS) from home. I have seen some article but I don't think that's going to work for me. I am seeking some Free video tutorials I can get to learn BI. Imp: I am asking this question here because I believe that many of you are genius BI and .NET developers so you guys can help me figure out which tutorial will take benift from my .net skills. Apologize : I used free in my question. I am guilt to say that I am one of the guy who is not willing to invest but the reason is this that currently I am Jobless. :-(

Read the article
Excel Import how would you do it?

- by Rico

Ok i have a Excel import written. It uses excel automation to go through all the records and get the job done. BUt how would you do it if you had to do it? Would you use SSIS? Would You use a Dataconnection? I am really confused as to the best way to get this done properly. So that it doesn't slow down the actual application for the other clients when one client does an import. Thanks

Read the article
Problem with the row count transform

- by abkl

Hi, I currently deployed an SSIS package (Developed on the 2005 version) (developed on my local server) in a pre production environment for testing. I have used the Row count transform to get a count of good/bad records. It works fine on my local system . However when i deploy this on the pre prod server, the row count does not work! (as in it does not recognize the vairbales i have assigned to the relevant transofm - no drop down abvaliable in the variables attribute part. tried deleting and adding a new transoform.. no luck. Strangely this does not work for any of the other packages also present/deployed on the same server (tried this out by dropping an rc tramsform onto an existing package... same problem) Any suggestions? Thanks a tonne

Read the article
No SFTP Task Component in SSIS 2005/2008? No Problem!

SSIS 2005 and 2008 have an FTP Task component, but no SFTP (SSH/TLS-encrypted FTP) Task component. This tutorial explains how to overcome this omission.

Read the article
How to import data to SAP

- by Mehmet AVSAR

Hi, As a complete stranger in town of SAP, I want to transfer my own application's (mobile salesforce automation) data to SAP. My application has records of customers, stocks, inventory, invoices (and waybills), cheques, payments, collections, stock transfer data etc. I have an additional database which holds matchings of records. ie. A customer with ID 345 in my application has key 120-035-0223 in SAP. Every record, for sure, has to know it's counterpart, including parameters. After searching Google and SAP help site for a day, I covered that it's going to be a bit more pain than I expected. Especially SAP site does not give even a clue on it. Say I couldn't find. We transferred our data to some other ERP systems, some of which wanted XML files, some other exposed their APIs. My point is, is Sql Server's SSIS an option for me? I hope it is, so I can fight on my own territory. Since client requests would vary a lot, I count flexibility as most important criteria. Also, I want to transfer as much data as I could. Any help is appreciated. Regards,

Read the article
Field specific errors for ETL

- by AaronLS

I am creating a ETL process in MS SQL Server and I would like to have errors specific to a particular column of a particular row. For example, the data is initially loaded from excel files into a table(we'll call the Initial table) where all columns are varchar(2000) and then I stage the data to another table(the DataTypedTable) that contains more specific data types (datetime,int, etc.) or more tightly constrained varchar lengths. I need to be able to create error messages for a specific field such as: "Jan. 13th" is not a valid date format for the submission date. Please use a format of MM/DD/YYYY These error messages would need to be stored in some way such that later in the process a automated process can create reports with the error messages such that each message references a specific row and field(someone will need to go back and correct the data in the source system and resubmit the excel file). So ideally it would be inserted into a Failures tables of some sort and contain the primary key of the failed row, the column name, and the error message. Question: So I am wondering if this can be accomplished with SSIS, or some open source tool like Talend, and if so, what would be your general approach? Or what hand coded approach you would take? Couple approaches I've thought of using SQL(up until no I have done ETL by hand in SQL procs, but I want to consider other approaches. Possible C# even.): Use a cursor to read through the Initial table, and for each row insert a blank record with only the primary key into the DataTyped table, then use a single update statement for each column, such that if that update fails I can insert a very specific error message specific to that column in the error messages table. Insert all the data as is into the DataTyped table, but have duplicate columns like SubmissionDate and SubmissionDateOld. After the initial insert the *Old columns have data, the rest are blank, and I have a single update for each column that sets the SubmissionDate based on the SubmissionDateOld. In addition to suggesting an approach, I'd like to know if you are using that approach or something similar already in the work you do.

Read the article
Symantec publie son bilan 2010 et ses prévisions pour 2011, plus d'exploits de failles zero-day et d'attaques sensibles

Symantec fait son bilan 2010 et donne ses perspectives pour 2011 : plus d'attaques contre les infrastructures vitales et d'exploits de failles zero-day Symantec publie aujourd'hui ses perspectives en termes de sécurité informatique pour 2011, grâce à l'observation des phénomènes apparus ou s'étant développés en 2010, et s'appuyant sur son réseau de plus de 240.000 capteurs dans le monde entier. Première tendance lourde : « L'hactivisme » - La fréquence des attaques contre les infrastructures vitales va augmenter et les fournisseurs de services vont réagir, mais les gouvernements risquent d'être plus lents Les pirates ont certainement été attentifs aux effets produits par la menace Stuxnet sur les secteurs d'...

Read the article
Simulating smooth movement along a line after calculating a collision containing a restitution of zero in 2D

- by Casey

[for tl;dr see after listing] //...Code to determine shapes types involved in collision here... //...Rectangle-Line collision detected. if(_rbTest->GetCollisionShape()->Intersects(*_ground->GetCollisionShape())) { //Convert incoming shape to a line. a2de::Line l(*dynamic_cast<a2de::Line*>(_ground->GetCollisionShape())); //Get line's normal. a2de::Vector2D normal_vector(l.GetSlope().GetY(), -l.GetSlope().GetX()); a2de::Vector2D::Normalize(normal_vector); //Accumulate forces involved. a2de::Vector2D intermediate_forces; a2de::Vector2D normal_force = normal_vector * _rbTest->GetMass() * _world->GetGravityHandler()->GetGravityValue(); intermediate_forces += normal_force; //Calculate final velocity: See [1] double Ma = _rbTest->GetMass(); a2de::Vector2D Ua = _rbTest->GetVelocity(); double Mb = _ground->GetMass(); a2de::Vector2D Ub = _ground->GetVelocity(); double mCr = Mb * _ground->GetRestitution(); a2de::Vector2D collision_velocity( ((Ma * Ua) + (Mb * Ub) + ((mCr * Ub) - (mCr * Ua))) / (Ma + Mb)); //Calculate reflection vector: See [2] a2de::Vector2D reflect_velocity( -collision_velocity + 2 * (a2de::Vector2D::DotProduct(collision_velocity, normal_vector)) * normal_vector ); //Affect velocity to account for restitution of colliding bodies. reflect_velocity *= (_ground->GetRestitution() * _rbTest->GetRestitution()); _rbTest->SetVelocity(reflect_velocity); //THE ULTIMATE ISSUE STEMS FROM THE FOLLOWING LINE: //Move object away from collision one pixel to prevent constant collision. _rbTest->SetPosition(_rbTest->GetPosition() + normal_vector); _rbTest->ApplyImpulse(intermediate_forces); } Sources: (1) Wikipedia: Coefficient of Restitution: Speeds after impact (2) Wikipedia: Specular Reflection: Direction of reflection First, I have a system in place to account for friction (that is, a coefficient of friction) but is not used right now (in addition, it is zero, which should not affect the math anyway). I'll deal with that after I get this working. Anyway, when the restitution of either object involved in the collision is zero the object stops as required, but if movement along the same direction (again, irrespective of the friction value that isn't used) as the line is attempted the object moves as if slogging through knee deep snow. If I remove the line of code in question and the object is not push away one pixel the object barely moves at all. All because the object collides, is stopped, is pushed up, collides, is stopped...etc. OR collides, is stopped, collides, is stopped, etc... TL;DR How do I only account for a collision ONCE for restitution purposes (BONUS: but CONTINUALLY for frictional purposes, to be implemented later)

Read the article
Zero-channel RAID for High Performance MySQL Server (IBM ServeRAID 8k) : Any Experience/Recommendation?

- by prs563

We are getting this IBM rack mount server and it has this IBM ServeRAID8k storage controller with Zero-Channel RAID and 256MB battery backed cache. It can support RAID 10 which we need for our high performance MySQL server which will have 4 x 15000K RPM 300GB SAS HDD. This is mission-critical and we want as much bandwidth and performance. Is this a good card or should we replace with another IBM RAID card? IBM ServeRAID 8k SAS Controller option provides 256 MB of battery backed 533 MHz DDR2 standard power memory in a fixed mounting arrangement. The device attaches directly to IBM planar which can provide full RAID capability. Manufacturer IBM Manufacturer Part # 25R8064 Cost Central Item # 10025907 Product Description IBM ServeRAID 8k SAS - Storage controller (zero-channel RAID) - RAID 0, 1, 5, 6, 10, 1E Device Type Storage controller (zero-channel RAID) - plug-in module Buffer Size 256 MB Supported Devices Disk array (RAID) Max Storage Devices Qty 8 RAID Level RAID 0, RAID 1, RAID 5, RAID 6, RAID 10, RAID 1E Manufacturer Warranty 1 year warranty

Read the article
"Unable to read data from the transport connection: net_io_connectionclosed." - Windows Vista Busine

- by John DaCosta

Unable to test sending email from .NET code in Windows Vista Business. I am writing code which I will migrate to an SSIS Package once it its proven. The code is to send an error message via email to a list of recipients. The code is below, however I am getting an exception when I execute the code. I created a simple class to do the mailing... the design could be better, I am testing functionality before implementing more robust functionality, methods, etc. namespace LabDemos { class Program { static void Main(string[] args) { Mailer m = new Mailer(); m.test(); } } } namespace LabDemos { class MyMailer { List<string> _to = new List<string>(); List<string> _cc = new List<string>(); List<string> _bcc = new List<string>(); String _msgFrom = ""; String _msgSubject = ""; String _msgBody = ""; public void test(){ //create the mail message MailMessage mail = new MailMessage(); //set the addresses mail.From = new MailAddress("[email protected]"); //set the content mail.Subject = "This is an email"; mail.Body = "this is a sample body"; mail.IsBodyHtml = false; //send the message SmtpClient smtp = new SmtpClient(); smtp.Host = "emailservername"; smtp.Port = 25; smtp.UseDefaultCredentials = true; smtp.Send(mail); } } Exception Message Inner Exception {"Unable to read data from the transport connection: net_io_connectionclosed."} Stack Trace " at System.Net.Mail.SmtpReplyReaderFactory.ProcessRead(Byte[] buffer, Int32 offset, Int32 read, Boolean readLine)\r\n at System.Net.Mail.SmtpReplyReaderFactory.ReadLines(SmtpReplyReader caller, Boolean oneLine)\r\n at System.Net.Mail.SmtpReplyReaderFactory.ReadLine(SmtpReplyReader caller)\r\n at System.Net.Mail.SmtpConnection.GetConnection(String host, Int32 port)\r\n at System.Net.Mail.SmtpTransport.GetConnection(String host, Int32 port)\r\n at System.Net.Mail.SmtpClient.GetConnection()\r\n at System.Net.Mail.SmtpClient.Send(MailMessage message)" Outer Exception System.Net.Mail.SmtpException was unhandled Message="Failure sending mail." Source="System" StackTrace: at System.Net.Mail.SmtpClient.Send(MailMessage message) at LabDemos.Mailer.test() in C:\Users\username\Documents\Visual Studio 2008\Projects\LabDemos\LabDemos\Mailer.cs:line 40 at LabDemos.Program.Main(String[] args) in C:\Users\username\Documents\Visual Studio 2008\Projects\LabDemos\LabDemos\Program.cs:line 48 at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args) at System.AppDomain.nExecuteAssembly(Assembly assembly, String[] args) at System.Runtime.Hosting.ManifestRunner.Run(Boolean checkAptModel) at System.Runtime.Hosting.ManifestRunner.ExecuteAsAssembly() at System.Runtime.Hosting.ApplicationActivator.CreateInstance(ActivationContext activationContext, String[] activationCustomData) at System.Runtime.Hosting.ApplicationActivator.CreateInstance(ActivationContext activationContext) at System.Activator.CreateInstance(ActivationContext activationContext) at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssemblyDebugInZone() at System.Threading.ThreadHelper.ThreadStart_Context(Object state) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.ThreadHelper.ThreadStart() InnerException: System.IO.IOException Message="Unable to read data from the transport connection: net_io_connectionclosed." Source="System" StackTrace: at System.Net.Mail.SmtpReplyReaderFactory.ProcessRead(Byte[] buffer, Int32 offset, Int32 read, Boolean readLine) at System.Net.Mail.SmtpReplyReaderFactory.ReadLines(SmtpReplyReader caller, Boolean oneLine) at System.Net.Mail.SmtpReplyReaderFactory.ReadLine(SmtpReplyReader caller) at System.Net.Mail.SmtpConnection.GetConnection(String host, Int32 port) at System.Net.Mail.SmtpTransport.GetConnection(String host, Int32 port) at System.Net.Mail.SmtpClient.GetConnection() at System.Net.Mail.SmtpClient.Send(MailMessage message) InnerException:

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article

Search Results

Search found 3921 results on 157 pages for 'from zero to ssis!'.

Page 29/157 | < Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36 | Next Page >

- by alimack

- by Sergei

- by ganesh

- by user1602996

- by cdonner

- by Vicky

- by alex25

- by Troy Hunt

- by huslayer

- by Pramodtech

- by Nev_Rahd

- by Rob Bowman

- by Anonymouslemming

- by Zerotoinfinite

- by Rico

- by abkl

- by Mehmet AVSAR

- by AaronLS

- by Casey

- by prs563

- by John DaCosta

- by Rob Farley

- by Rob Farley

< Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36 | Next Page >