Search Results

Search found 13654 results on 547 pages for 'ssis performance'.

Page 19/547 | < Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26  | Next Page >

  • Using The Data Mining Query Task in SSIS

    SQL Server Integration Services (SSIS) is a Business Intelligence tool which can be used by database developers or administrators to perform Extract, Transform & Load (ETL) operations. In my previous article Using Analysis Services Processing Task & Analysis Services ... [Read Full Article]

    Read the article

  • Performance of ClearCase servers on VMs?

    - by Garen
    Where I work, we are in need of upgrading our ClearCase servers and it's been proposed that we move them into a new (yet-to-be-deployed) VMmare system. In the past I've not noticed a significant problem with performance with most applications when running in VMs, but given that ClearCase "speed" (i.e. dynamic-view response times) is so latency sensitive I am concerned that this will not be a good idea. VMWare has numerous white-papers detailing performance related issues based on network traffic patterns that re-inforces my hypothesis, but nothing particularly concrete for this particular use case that I can see. What I can find are various forum posts online, but which are somewhat dated, e.g.: ClearCase clients are supported on VMWare, but not for performance issues. I would never put a production server on VM. It will work but will be slower. The more complex the slower it gets. accessing or building from a local snapshot view will be the fastest, building in a remote VM stored dynamic view using clearmake will be painful..... VMWare is best used for test environments (via http://www.cmcrossroads.com/forums?func=view&catid=31&id=44094&limit=10&start=10) and: VMware + ClearCase = works but SLUGGISH!!!!!! (windows)(not for production environment) My company tried to mandate that all new apps or app upgrades needed to be on/moved VMware instances. The VMware instance could not handle the demands of ClearCase. (come to find out that I was sharing a box with a database server) Will you know what else would be on that box besides ClearCase? Karl (via http://www.cmcrossroads.com/forums?func=view&id=44094&catid=31) and: ... are still finding we can't get the performance using dynamic views to below 2.5 times that of a physical machine. Interestingly, speaking to a few people with much VMWare experience and indeed from running builds, we are finding that typically, VMWare doesn't take that much longer for most applications and about 10-20% longer has been quoted. (via http://www.cmcrossroads.com/forums?func=view&catid=31&id=44094&limit=10&start=10) Which brings me to the more direct question: Does anyone have any more recent experience with ClearCase servers on VMware (if not any specific, relevant performance advice)?

    Read the article

  • Snap to object layout in SSIS

    - by simonsabin
    If you’ve ever used SSIS you will have found that getting a decent layout is a pain. It would be nice to have more features to help layout things nicely. Jamie has proposed such a suggestion to allow you to align objects to each other, a bit like what you get with reporting services. Have a look at Jamie’s suggestion and vote for it if you agree https://connect.microsoft.com/SQLServer/feedback/details/644668/ssis-snap-to...(read more)

    Read the article

  • Tips and Tricks in SSIS Presentation

    Here is a Live Meeting presentation I did for the Polish SQL Server User Group on Tips and Tricks I use when out and about using SSIS (It is in English).  It lasts about an hour and you can either watch it online or download the wmv to view later   Tips and Tricks in SSIS

    Read the article

  • New and Improved Functionality in SSIS 2008

    While it does not bring revolutionary changes, the release of SQL Server 2008 delivers functionality, scalability, and performance improvements to SQL Server Integration Services (SSIS). Here is a comprehensive listing of new and enhanced features of SSIS with a short description of each.

    Read the article

  • SSIS Basics: Introducing Variables

    In the third of her SSIS Basics articles, Annette Allen shows you how to use Variables in your SSIS Packages, and explains the functions of the system-defined variables. Are you sure you can restore your backups? Run full restore + DBCC CHECKDB quickly and easily with SQL Backup Pro's new automated verification. Check for corruption and prepare for when disaster strikes. Try it now.

    Read the article

  • Export to XML Using SSIS

    Exporting data to XML format using SSIS initially seems like it should be straight forward – just dump it in a flat file and give it a name xml type, however SSIS has no XML destination just an XML source. I had no luck dumping the XML in flat file. My solution was to use a script task which worked well. Check SQL Server performance at a glanceWe consulted 1000 SQL Server professionals to make SQL Monitor’s UI as clear as possible. Start monitoring with a free trial.

    Read the article

  • How To Support Multiple SQL Server Package Configurations in SSIS

    I have several applications that use SSIS packages and I want to be able to store all the configurations together in a single table when I deploy. When a package executes I need a way of specifying the "application" and having SSIS automatically handle the package configuration based on the application. Compress live data by 73% Red Gate's SQL Storage Compress reduces the size of live SQL Server databases, saving you disk space and storage costs. Learn more.

    Read the article

  • New and Improved Functionality in SSIS 2008

    While it does not bring revolutionary changes, the release of SQL Server 2008 delivers functionality, scalability, and performance improvements to SQL Server Integration Services (SSIS). Here is a comprehensive listing of new and enhanced features of SSIS with a short description of each.

    Read the article

  • Logging hurts MySQL performance - but, why?

    - by jimbo
    I'm quite surprised that I can't see an answer to this anywhere on the site already, nor in the MySQL documentation (section 5.2 seems to have logging otherwise well covered!) If I enable binlogs, I see a small performance hit (subjectively), which is to be expected with a little extra IO -- but when I enable a general query log, I see an enormous performance hit (double the time to run queries, or worse), way in excess of what I see with binlogs. Of course I'm now logging every SELECT as well as every UPDATE/INSERT, but, other daemons record their every request (Apache, Exim) without grinding to a halt. Am I just seeing the effects of being close to a performance "tipping point" when it comes to IO, or is there something fundamentally difficult about logging queries that causes this to happen? I'd love to be able to log all queries to make development easier, but I can't justify the kind of hardware it feels like we'd need to get performance back up with general query logging on. I do, of course, log slow queries, and there's negligible improvement in general usage if I disable this. (All of this is on Ubuntu 10.04 LTS, MySQLd 5.1.49, but research suggests this is a fairly universal issue)

    Read the article

  • Looking for a short term solution to improve website performance with additional server

    - by Tanim Mirza
    I am working with a small team to run an internal website running with PHP 5.3.9, MySQL 5.0.77. All the files and database are hosted on a dedicated Linux machine with the following configuration: Intel Xeon E5450 8 CPU cores @3.00GHz, 2992.498 MHz, Cache 6148 KB, Cent OS – Red Hat Enterprise Linux Server release 5.4 We started small and then the database got bigger and now the website performance degraded significantly. We often get server space overrun, mysql overloaded with too many calls, etc. We don't have much experience dealing with these issues. We recently got another server that we were thinking to use to improve performance. Since it has better configuration, some of us wanted to completely move everything to the new machine. But I am trying to find out how we can utilize both machine for optimized performance. I found options such as MySQL clustering, Load balancer, etc. I was wondering if I could get any suggestion for this situation "How to utilize two machines in short term for best performance", that would be great. By short term we are looking for something that we can deploy in a month or so. Thanks in advance for your time.

    Read the article

  • Improving browser performance while using lots of tabs?

    - by Andrew
    My browsing habits cause me to open lots of windows and tabs, either related to different projects I'm working on or things I may want to read later. I use OSX and use about 5 spaces with multiple windows in each space. The problem is eventually I'll have around 200 or more tabs open (spread over 15-20 windows) that I don't want to close. Needless to say, my computer's performance starts to degrade. As I write this on my mobile, Safari on my laptop is locking up the computer. I used to use Chrome but found better performance with Safari. What I'd like to know, is there a graph of browser performance based on tab usage? I don't need a browser that keeps all tabs active. It would be great if the browser could increase performance by "putting tabs to sleep". Or if there was some sort of tool for saving a "workspace" of tabs that you could reactivate the next time you are working on that project. What sort of solution can you recommend to solve this problem?

    Read the article

  • Load and Web Performance Testing using Visual Studio Ultimate 2010-Part 3

    - by Tarun Arora
    Welcome back once again, in Part 1 of Load and Web Performance Testing using Visual Studio 2010 I talked about why Performance Testing the application is important, the test tools available in Visual Studio Ultimate 2010 and various test rig topologies, in Part 2 of Load and Web Performance Testing using Visual Studio 2010 I discussed the details of web performance & load tests as well as why it’s important to follow a goal based pattern while performance testing your application. In part 3 I’ll be discussing Test Result Analysis, Test Result Drill through, Test Report Generation, Test Run Comparison, Asp.net Profiler and some closing thoughts. Test Results – I see some creepy worms! In Part 2 we put together a web performance test and a load test, lets run the test to see load test to see how the Web site responds to the load simulation. While the load test is running you will be able to see close to real time analysis in the Load Test Analyser window. You can use the Load Test Analyser to conduct load test analysis in three ways: Monitor a running load test - A condensed set of the performance counter data is maintained in memory. To prevent the results memory requirements from growing unbounded, up to 200 samples for each performance counter are maintained. This includes 100 evenly spaced samples that span the current elapsed time of the run and the most recent 100 samples.         After the load test run is completed - The test controller spools all collected performance counter data to a database while the test is running. Additional data, such as timing details and error details, is loaded into the database when the test completes. The performance data for a completed test is loaded from the database and analysed by the Load Test Analyser. Below you can see a screen shot of the summary view, this provides key results in a format that is compact and easy to read. You can also print the load test summary, this is generated after the test has completed or been stopped.         Analyse the load test results of a previously run load test – We’ll see this in the section where i discuss comparison between two test runs. The performance counters can be plotted on the graphs. You also have the option to highlight a selected part of the test and view details, drill down to the user activity chart where you can hover over to see more details of the test run.   Generate Report => Test Run Comparisons The level of reports you can generate using the Load Test Analyser is astonishing. You have the option to create excel reports and conduct side by side analysis of two test results or to track trend analysis. The tools also allows you to export the graph data either to MS Excel or to a CSV file. You can view the ASP.NET profiler report to conduct further analysis as well. View Data and Diagnostic Attachments opens the Choose Diagnostic Data Adapter Attachment dialog box to select an adapter to analyse the result type. For example, you can select an IntelliTrace adapter, click OK and open the IntelliTrace summary for the test agent that was used in the load test.   Compare results This creates a set of reports that compares the data from two load test results using tables and bar charts. I have taken these screen shots from the MSDN documentation, I would highly recommend exploring the wealth of knowledge available on MSDN. Leaving Thoughts While load testing the application with an excessive load for a longer duration of time, i managed to bring the IIS to its knees by piling up a huge queue of requests waiting to be processed. This clearly means that the IIS had run out of threads as all the threads were busy processing existing request, one easy way of fixing this is by increasing the default number of allocated threads, but this might escalate the problem. The better suggestion is to try and drill down to the actual root cause of the problem. When ever the garbage collection runs it stops processing any pages so all requests that come in during that period are queued up, but realistically the garbage collection completes in fraction of a a second. To understand this better lets look at the .net heap, it is divided into large heap and small heap, anything greater than 85kB in size will be allocated to the Large object heap, the Large object heap is non compacting and remember large objects are expensive to move around, so if you are allocating something in the large object heap, make sure that you really need it! The small object heap on the other hand is divided into generations, so all objects that are supposed to be short-lived are suppose to live in Gen-0 and the long living objects eventually move to Gen-2 as garbage collection goes through.  As you can see in the picture below all < 85 KB size objects are first assigned to Gen-0, when Gen-0 fills up and a new object comes in and finds Gen-0 full, the garbage collection process is started, the process checks for all the dead objects and assigns them as the valid candidate for deletion to free up memory and promotes all the remaining objects in Gen-0 to Gen-1. So in the future when ever you clean up Gen-1 you have to clean up Gen-0 as well. When you fill up Gen – 0 again, all of Gen – 1 dead objects are drenched and rest are moved to Gen-2 and Gen-0 objects are moved to Gen-1 to free up Gen-0, but by this time your Garbage collection process has started to take much more time than it usually takes. Now as I mentioned earlier when garbage collection is being run all page requests that come in during that period are queued up. Does this explain why possibly page requests are getting queued up, apart from this it could also be the case that you are waiting for a long running database process to complete.      Lets explore the heap a bit more… What is really a case of crisis is when the objects are living long enough to make it to Gen-2 and then dying, this is definitely a high cost operation. But sometimes you need objects in memory, for example when you cache data you hold on to the objects because you need to use them right across the user session, which is acceptable. But if you wanted to see what extreme caching can do to your server then write a simple application that chucks in a lot of data in cache, run a load test over it for about 10-15 minutes, forcing a lot of data in memory causing the heap to run out of memory. If you get to such a state where you start running out of memory the IIS as a mode of recovery restarts the worker process. It is great way to free up all your memory in the heap but this would clear the cache. The problem with this is if the customer had 10 items in their shopping basket and that data was stored in the application cache, the user basket will now be empty forcing them either to get frustrated and go to a competitor website or if the customer is really patient, give it another try! How can you address this, well two ways of addressing this; 1. Workaround – A x86 bit processor only allows a maximum of 4GB of RAM, this means the machine effectively has around 3.4 GB of RAM available, the OS needs about 1.5 GB of RAM to run efficiently, the IIS and .net framework also need their share of memory, leaving you a heap of around 800 MB to play with. Because Team builds by default build your application in ‘Compile as any mode’ it means the application is build such that it will run in x86 bit mode if run on a x86 bit processor and run in a x64 bit mode if run on a x64 but processor. The problem with this is not all applications are really x64 bit compatible specially if you are using com objects or external libraries. So, as a quick win if you compiled your application in x86 bit mode by changing the compile as any selection to compile as x86 in the team build, you will be able to run your application on a x64 bit machine in x86 bit mode (WOW – By running Windows on Windows) and what that means is, you could use 8GB+ worth of RAM, if you take away everything else your application will roughly get a heap size of at least 4 GB to play with, which is immense. If you need a heap size of more than 4 GB you have either build a software for NASA or there is something fundamentally wrong in your application. 2. Solution – Now that you have put a workaround in place the IIS will not restart the worker process that regularly, which means you can take a breather and start working to get to the root cause of this memory leak. But this begs a question “How do I Identify possible memory leaks in my application?” Well i won’t say that there is one single tool that can tell you where the memory leak is, but trust me, ‘Performance Profiling’ is a great start point, it definitely gets you started in the right direction, let’s have a look at how. Performance Wizard - Start the Performance Wizard and select Instrumentation, this lets you measure function call counts and timings. Before running the performance session right click the performance session settings and chose properties from the context menu to bring up the Performance session properties page and as shown in the screen shot below, check the check boxes in the group ‘.NET memory profiling collection’ namely ‘Collect .NET object allocation information’ and ‘Also collect the .NET Object lifetime information’.    Now if you fire off the profiling session on your pages you will notice that the results allows you to view ‘Object Lifetime’ which shows you the number of objects that made it to Gen-0, Gen-1, Gen-2, Large heap, etc. Another great feature about the profile is that if your application has > 5% cases where objects die right after making to the Gen-2 storage a threshold alert is generated to alert you. Since you have the option to also view the most expensive methods and by capturing the IntelliTrace data you can drill in to narrow down to the line of code that is the root cause of the problem. Well now that we have seen how crucial memory management is and how easy Visual Studio Ultimate 2010 makes it for us to identify and reproduce the problem with the best of breed tools in the product. Caching One of the main ways to improve performance is Caching. Which basically means you tell the web server that instead of going to the database for each request you keep the data in the webserver and when the user asks for it you serve it from the webserver itself. BUT that can have consequences! Let’s look at some code, trust me caching code is not very intuitive, I define a cache key for almost all searches made through the common search page and cache the results. The approach works fine, first time i get the data from the database and second time data is served from the cache, significant performance improvement, EXCEPT when two users try to do the same operation and run into each other. But it is easy to handle this by adding the lock as you can see in the snippet below. So, as long as a user comes in and finds that the cache is empty, the user locks and starts to get the cache no more concurrency issues. But lets say you are processing 10 requests per second, by the time i have locked the operation to get the results from the database, 9 other users came in and found that the cache key is null so after i have come out and populated the cache they will still go in to get the results again. The application will still be faster because the next set of 10 users and so on would continue to get data from the cache. BUT if we added another null check after locking to build the cache and before actual call to the db then the 9 users who follow me would not make the extra trip to the database at all and that would really increase the performance, but didn’t i say that the code won’t be very intuitive, may be you should leave a comment you don’t want another developer to come in and think what a fresher why is he checking for the cache key null twice !!! The downside of caching is, you are storing the data outside of the database and the data could be wrong because the updates applied to the database would make the data cached at the web server out of sync. So, how do you invalidate the cache? Well if you only had one way of updating the data lets say only one entry point to the data update you can write some logic to say that every time new data is entered set the cache object to null. But this approach will not work as soon as you have several ways of feeding data to the system or your system is scaled out across a farm of web servers. The perfect solution to this is Micro Caching which means you cache the query for a set time duration and invalidate the cache after that set duration. The advantage is every time the user queries for that data with in the time span for which you have cached the results there are no calls made to the database and the data is served right from the server which makes the response immensely quick. Now figuring out the appropriate time span for which you micro cache the query results really depends on the application. Lets say your website gets 10 requests per second, if you retain the cache results for even 1 minute you will have immense performance gains. You would reduce 90% hits to the database for searching. Ever wondered why when you go to e-bookers.com or xpedia.com or yatra.com to book a flight and you click on the book button because the fare seems too exciting and you get an error message telling you that the fare is not valid any more. Yes, exactly => That is a cache failure! These travel sites or price compare engines are not going to hit the database every time you hit the compare button instead the results will be served from the cache, because the query results are micro cached, its a perfect trade-off, by micro caching the results the site gains 100% performance benefits but every once in a while annoys a customer because the fare has expired. But the trade off works in the favour of these sites as they are still able to process up to 30+ page requests per second which means cater to the site traffic by may be losing 1 customer every once in a while to a competitor who is also using a similar caching technique what are the odds that the user will not come back to their site sooner or later? Recap   Resources Below are some Key resource you might like to review. I would highly recommend the documentation, walkthroughs and videos available on MSDN. You can always make use of Fiddler to debug Web Performance Tests. Some community test extensions and plug ins available on Codeplex might also be of interest to you. The Road Ahead Thank you for taking the time out and reading this blog post, you may also want to read Part I and Part II if you haven’t so far. If you enjoyed the post, remember to subscribe to http://feeds.feedburner.com/TarunArora. Questions/Feedback/Suggestions, etc please leave a comment. Next ‘Load Testing in the cloud’, I’ll be working on exploring the possibilities of running Test controller/Agents in the Cloud. See you on the other side! Thank You!   Share this post : CodeProject

    Read the article

  • Effects of HTTP/TCP connection handshakes and server performance

    - by Blankman
    When running apache bench on the same server as the website like: ab -n 1000 -c 10 localhost:8080/ I am most probably not getting accurate results when compared to users hitting the server from various locations. I'm trying to understand how or rather why this will effect real world performance since a user in china will have different latency issues when compared to someone in the same state/country. Say my web server has a maximum thread limit of 100. Can someone explain in detail how end user latency can/will effect server performance. I'm assuming here that each request will be computed equally at say 10ms. What I'm not understand is how external factors can effect overal server performance, specifically internet connections (location, or even device like mobile) and http/tcp handshakes etc.

    Read the article

  • Query performance counters from powershell

    - by Frane Borozan
    I am trying this script to query performance counters in different localized windows server versions. http://www.powershellmagazine.com/2013/07/19/querying-performance-counters-from-powershell/ Everything works as in the article, well partially :-) I am trying to access a counter ID 3906 Terminal Services Session and works well for English windows. However for example in French and German that counter doesn't exist under that ID. I think I figured to find the exact counter under ID 1548 in french and German, but that ID in English is something completely different. Anybody seen this behavior on the performance counters?

    Read the article

  • RAID Array performance on an HP Proliant ML350 G5 Smart Array E200i

    - by Nate Pinchot
    We have a client who is complaining about performance of an application which utilizes an MS SQL database. They do not believe the performance issues are the fault of the application itself. The Smart Array E200i RAID controller has 128MB cache and we have the cache set to 75% read/25% write. The disk array set to enable write caching. Recently we ran a disk performance test using SQLIO based on this guide. We used a 10 GB file for the test found that the average sequential read rate was ~60 MB/sec (megabytes/sec) and the average random read rate was ~30 MB/sec. Are these numbers on par for what the server should be performing? Better than on par? Horrible? Amazing?

    Read the article

  • How does NTFS compression affect performance?

    - by DragonLord
    I've heard that NTFS compression can reduce performance due to extra CPU usage, but I've read reports that it may actually increase performance because of reduced disk reads. How exactly does NTFS compression affect system performance? Notes: I'm running a laptop with a 5400 RPM hard drive, and many of the things I do on it are I/O bound. The processor is a AMD Phenom II with four cores running at 2.0 GHz. The system is defragmented regularly using UltraDefrag. The workload is mixed read-write, with reads occurring somewhat more often than writes. The files to be compressed include personal documents and selected programs, including several (less demanding) games and Visual Studio (which tends to be I/O bound more often than not).

    Read the article

  • Enable: Asp.net connection pool monitoring with performance monitor

    - by BlackHawkDesign
    If this question is at the wrong forum, be free to tell me. I'm a c# developer, but I'm running in a system management issue here. Intro: Im suspecting that an asp.net application is having some issues with the connection pool and that the pool is flooding from time to time. So to make sure, I want to monitor the connection pool. After some searching I found this article : http://blog.idera.com/sql-server/performance-and-monitoring/ensure-proper-sql-server-connection-pooling-2/ Basicly it explains stuff about connection pools and how you can monitor the application pool with performance monitor. The problem: So I logged in to the asp.net server(The sql database is hosted on a different server) which hosts the website. Started performance monitor. But when I want to select 'Current # pooled and nonpooled connections', I have no instance to select. There fore I can't add it. Question How can I create/supply an instance so I can monitor the connection pool? Thanks in advance BHD

    Read the article

  • Server Performance

    - by Burt
    We have a dedicated server that we use to stage websites (our test server). The performance of the server has become really bad and we regularly have to restart it. When performance is poor I have checked task manager for the processes and memory but everything looks OK. We use a content management system and it is always when using the admin section of this CMS that we notice the performance degrade which makes me think it may have something to do with DB calls the CMS is making. Does this sound viable? Any other sggestions of how I can go about testing this? Thanks in advance...

    Read the article

  • Amazon EC2 performance vs desktop

    - by flashnik
    I'm wondering how to compare performance of EC2 instances with standard dedicated servers and desktop. I've found only comparance of defferent clouds. I need to find a solution to perform some computations which require CPU and memory (disc IO is not used). The choice is to use: EC2 (High-CPU) or Xeon 5620/5630 with DDR3 or Core i7-960/980 with DDR3 Can anybody help, how to compare their performance? I'm not speaking about reliability of alternatives, I want to understand pros and cons from the point of just performance.

    Read the article

  • SQL Server Performance & Latching

    - by Colin
    I have a SQL server 2000 instance which runs several concurrent select statements on a group of 4 or 5 tables. Often the performance of the server during these queries becomes extremely diminished. The querys can take up to 10x as long as other runs of the same query, and it gets to the point where simple operations like getting the table list in object explorer or running sp_who can take several minutes. I've done my best to identify the cause of these issues, and the only performance metric which I've found to be off base is Average Latch Wait time. I've read that over 1 second wait time is bad, and mine ranges anywhere from 20 to 75 seconds under heavy use. So my question is, what could be the issue? Shouldn't SQL be able to handle multiple selects on a single table without losing so much performance? Can anyone suggest somewhere to go from here to investigate this problem? Thanks for the help.

    Read the article

  • Flushing disk cache for performance benchmarks?

    - by Ido Hadanny
    I'm doing some performance benchmark on some heavy SQL script running on postgres 8.4 on a ubuntu box (natty). I'm experiencing some pretty un-stable performance, even though I'm supposed to be the only one running on the machine (the same script on the exact same data might run in 20m and then 40m for no specific reason). So, remembering my distant DBA training, I decided I should flush the postgres cache, using sudo /etc/init.d/postgresql restart, but it's still shaky! My question: maybe I'm missing some caches in my disk/os? I'm using a netapp appliance as my storage. Am I on the right track? Do I even want to make sure I get repeatable performance before I start tuning?

    Read the article

  • Recommended website performance monitoring services? [closed]

    - by Dennis G.
    I'm looking for a good performance monitoring service for websites. I know about some of the available general monitoring services that check for uptime and notify you about unavailable services. But I'm specifically looking for a service with an emphasis on performance. I.e., I would like to see reports with detailed performance statistics from multiple locations world-wide, with a break-down on how long it took to fetch the different website resources, including third-party scripts such as Google Analytics and so on (the report should contain similar details such as the FireBug Net tab). Are there any such services and if so, which one is the best?

    Read the article

  • The blocking nature of aggregates

    - by Rob Farley
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here.  In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

    Read the article

< Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26  | Next Page >