Search Results

Search found 20904 results on 837 pages for 'disk performance'.

Page 84/837 | < Previous Page | 80 81 82 83 84 85 86 87 88 89 90 91  | Next Page >

  • When to trash hashmap contents to avoid performance degradation?

    - by Jack
    Hello, I'm woking on Java with a large (millions) hashmap that is actually built with a capacity of 10.000.000 and a load factor of .75 and it's used to cache some values since cached values become useless with time (not accessed anymore) but I can't remove useless ones while on the way I would like to entirely empty the cache when its performance starts to degrade. How can I decide when it's good to do it? For example, with 10 millions capacity and .75 should I empty it when it reaches 7.5 millions of elements? Because I tried various threshold values but I would like to have an analytic one. I've already tested the fact that emping it when it's quite full is a boost for perfomance (first 2-3 algorithm iterations after the wipe just fill it back, then it starts running faster than before the wipe) Thanks

    Read the article

  • What are the performance characteristics of SignalR at scale?

    - by Joel Martinez
    I'm interested in the performance characteristics of SignalR at scale ... particularly, how it behaves at the fringes of capability. When a server is at capacity, what happens? Does it drop messages? Do some clients not get notified? Are messages queued until all are delivered? And if so, will the queue eventually overflow and crash the server? I ask because conducting such a test myself would be impractical, and I'm hoping someone could point me to documentation speaking to this ... or perhaps someone could comment that has seen how SignalR behaves at scale. Thanks! note: I'm familiar with this other stackoverflow question on the stability and scalability of SignalR. But I believe my question is asking a slightly different question in that I'm not concerned with the theoretical scaling limits, I want to know how it behaves when it reaches the limits ... so I know what to be on the lookout for.

    Read the article

  • Is there a linear-time performance guarantee with using an Iterator?

    - by polygenelubricants
    If all that you're doing is a simple one-pass iteration (i.e. only hasNext() and next(), no remove()), are you guaranteed linear time performance and/or amortized constant cost per operation? Is this specified in the Iterator contract anywhere? Are there data structures/Java Collection which cannot be iterated in linear time? java.util.Scanner implements Iterator<String>. A Scanner is hardly a data structure (e.g. remove() makes absolutely no sense). Is this considered a design blunder? Is something like PrimeGenerator implements Iterator<Integer> considered bad design, or is this exactly what Iterator is for? (hasNext() always returns true, next() computes the next number on demand, remove() makes no sense). Similarly, would it have made sense for java.util.Random implements Iterator<Double>?

    Read the article

  • Performance Impact of Generating 100's of Dynamic Methods in Ruby?

    - by viatropos
    What are the performance issues associated with generating 100's of dynamic methods in Ruby? I've been interested in using the Ruby Preferences Gem and noticed that it generates a bunch of helper methods for each preference you set. For instance: class User < ActiveRecord::Base preference :hot_salsa end ...generates something like: user.prefers_hot_salsa? # => false user.prefers_hot_salsa # => false If there are 100's of preferences like this, how does this impact the application? I assume it's not really a big deal but I'm just wondering, theoretically.

    Read the article

  • Why do debug symbols so adversely affect the performance of threaded applications on Linux?

    - by fluffels
    Hi. I'm writing a ray tracer. Recently, I added threading to the program to exploit the additional cores on my i5 Quad Core. In a weird turn of events the debug version of the application is now running slower, but the optimized build is running faster than before I added threading. I'm passing the "-g -pg" flags to gcc for the debug build and the "-O3" flag for the optimized build. Host system: Ubuntu Linux 10.4 AMD64. I know that debug symbols add significant overhead to the program, but the relative performance has always been maintained. I.e. a faster algorithm will always run faster in both debug and optimization builds. Any idea why I'm seeing this behavior?

    Read the article

  • Does INNER JOIN performance depends on order of tables?

    - by Kartic
    A question suddenly came to my mind while I was tuning one stored procedure. Let me ask it - I have two tables, table1 and table2. table1 contains huge data and table2 contains less data. Is there performance-wise any difference between these two queries(I am changing order of the tables)? Query1: SELECT t1.col1, t2.col2 FROM table1 t1 INNER JOIN table2 t2 ON t1.col1=t2.col2 Query2: SELECT t1.col1, t2.col2 FROM table2 t2 INNER JOIN table1 t1 ON t1.col1=t2.col2 We are using Microsoft SQL server 2005.

    Read the article

  • If I take a large datatype. Will it affect performance in sql server

    - by Shantanu Gupta
    If i takes larger datatype where i know i should have taken datatype that was sufficient for possible values that i will insert into a table will affect any performance in sql server in terms of speed or any other way. eg. IsActive (0,1,2,3) not more than 3 in any case. I know i must take tinyint but due to some reasons consider it as compulsion, i am taking every numeric field as bigint and every character field as nVarchar(Max) Please give statistics if possible, to let me try to overcoming that compulsion. I need some solid analysis that can really make someone rethink before taking any datatype.

    Read the article

  • Improving the performance of an nHibernate Data Access Layer.

    - by Amitabh
    I am working on improving the performance of DataAccess Layer of an existing Asp.Net Web Application. The scenerios are. Its a web based application in Asp.Net. DataAccess layer is built using NHibernate 1.2 and exposed as WCF Service. The Entity class is marked with DataContract. Lazy loading is not used and because of the eager-fetching of the relations there is huge no of database objects are loaded in the memory. No of hits to the database is also high. For example I profiled the application using NHProfiler and there were about 50+ sql calls to load one of the Entity object using the primary key. I also can not change code much as its an existing live application with no NUnit test cases at all. Please can I get some suggestions here?

    Read the article

  • Can using non primitive Integer/ Long datatypes too frequently in the application, hurt the performance??

    - by Marcos
    I am using Long/Integer data types very frequently in my application, to build Generic datatypes. I fear that using these wrapper objects instead of primitive data types may be harmful for performance since each time it needs to create objects which is an expensive operation. but also it seems that I have no other choice(when I have to use primtives with generics) rather than just using them. However, still it would be great if you can suggest if there is anything I could do to make it better. or any way if I could just avoid it ?? Also What may be the downsides of this ? Suggestions welcomed!

    Read the article

  • Performance optimization for mssql: decrease stored procedures execution time or unload the server?

    - by tim
    Hello everybody! We have a web service which provides search over hotels. There is a problem with performance: a single request to the service takes around 5000 ms. Almost all of the time is spent in database by executing storing procedures. During the request our server (mssql2008) consumes ~90% of the processor time. When 2 requests are made in parallel the average time grows and is around 7000 ms. When number of request is increasing, the average time of response is increasing as well. We have 20-30 requests per minute. Which kind of optimization is the best in this case having in mind that the goal is to provide stable response time for the service: 1) Try to decrease the stored procedures execution time 2) Try to find the way how to unload the server It is interesting to hear from people who deal with booking sites. Thanks!

    Read the article

  • What's the most performance effective way to have a webbrowser inside a class library ?

    - by Xaqron
    I'm developing a class library. Need some data from internet and this cannot be done with HttpWebRequest in my case so I wanna use WebBrowser component. WebBrowser is used for opening a single page and fetch some data from it, so WebBrowser life-time is very short. Running thread is MTA and no message pump or STA thread is available by default (class library is used by an ASP.NET application). How to create a WebBrowser object, run it with a STA thread, fetch data from a web page and finally dispose it with the least performance impact on the application ? I just need the idea/concept and will find details myself. Thanks guys

    Read the article

  • JavaScript: String Concatenation slow performance? Array.join('')?

    - by NickNick
    I've read that if I have a for loop, I should not use string concation because it's slow. Such as: for (i=0;i<10000000;i++) { str += 'a'; } And instead, I should use Array.join(), since it's much faster: var tmp = []; for (i=0;i<10000000;i++) { tmp.push('a'); } var str = tmp.join(''); However, I have also read that string concatention is ONLY a problem for Internet Explorer and that browsers such as Safari/Chrome, which use Webkit, actually perform FASTER is using string concatention than Array.join(). I've attempting to find a performance comparison between all major browser of string concatenation vs Array.join() and haven't been able to find one. As such, what is faster and more efficient JavaScript code? Using string concatenation or Array.join()?

    Read the article

  • SQL SEVER – Finding Memory Pressure – External and Internal

    - by pinaldave
    Following query will provide details of external and internal memory pressure. It will return the data how much portion in the existing memory is assigned to what kind of memory type. SELECT TYPE, SUM(single_pages_kb) InternalPressure, SUM(multi_pages_kb) ExtermalPressure FROM sys.dm_os_memory_clerks GROUP BY TYPE ORDER BY SUM(single_pages_kb) DESC, SUM(multi_pages_kb) DESC GO What is your method to find memory pressure? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Optimization, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Are SMART goals useful for programmers?

    - by Craig Schwarze
    Several organisations I know use SMART goals for their programmers. SMART is an acronym for Specific, Measurable, Achievable, Relevant and Time-Bound. They are fairly common in large corporations. My own prior experience with SMART goals has not been all that positive. Have other programmers found them an effective way to measure performance? What are some examples of good SMART goals for programmers (if they exist).

    Read the article

  • SQL SERVER – Difference Between DATETIME and DATETIME2

    - by pinaldave
    Yesterday I have written a very quick blog post on SQL SERVER – Difference Between GETDATE and SYSDATETIME and I got tremendous response for the same. I suggest you read that blog post before continuing this blog post today. I had asked people to honestly take part and share their view about above two system function. There are few emails as well few comments on the blog post asking question how did I come to know the difference between the same. The answer is real world issues. I was called in for performance tuning consultancy where I was asked very strange question by one developer. Here is the situation he was facing. System had a single table with two different column of datetime. One column was datelastmodified and second column was datefirstmodified. One of the column was DATETIME and another was DATETIME2. Developer was populating them with SYSDATETIME respectively. He was always thinking that the value inserted in the table will be the same. This table was only accessed by INSERT statement and there was no updates done over it in application.One fine day he ran distinct on both of this column and was in for surprise. He always thought that both of the table will have same data, but in fact they had very different data. He presented this scenario to me. I said this can not be possible but when looked at the resultset, I had to agree with him. Here is the simple script generated to demonstrate the problem he was facing. This is just a sample of original table. DECLARE @Intveral INT SET @Intveral = 10000 CREATE TABLE #TimeTable (FirstDate DATETIME, LastDate DATETIME2) WHILE (@Intveral > 0) BEGIN INSERT #TimeTable (FirstDate, LastDate) VALUES (SYSDATETIME(), SYSDATETIME()) SET @Intveral = @Intveral - 1 END GO SELECT COUNT(DISTINCT FirstDate) D_GETDATE, COUNT(DISTINCT LastDate) D_SYSGETDATE FROM #TimeTable GO SELECT DISTINCT a.FirstDate, b.LastDate FROM #TimeTable a INNER JOIN #TimeTable b ON a.FirstDate = b.LastDate GO SELECT * FROM #TimeTable GO DROP TABLE #TimeTable GO Let us see the resultset. You can clearly see from result that SYSDATETIME() does not populate the same value in the both of the field. In fact the value is either rounded down or rounded up in the field which is DATETIME. Event though we are populating the same value, the values are totally different in both the column resulting the SELF JOIN fail and display different DISTINCT values. The best policy is if you are using DATETIME use GETDATE() and if you are suing DATETIME2 use SYSDATETIME() to populate them with current date and time to accurately address the precision. As DATETIME2 is introduced in SQL Server 2008, above script will only work with SQL SErver 2008 and later versions. I hope I have answered few questions asked yesterday. Reference: Pinal Dave (http://www.SQLAuthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL DateTime, SQL Optimization, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • SQL SERVER – When are Statistics Updated – What triggers Statistics to Update

    - by pinaldave
    If you are an SQL Server Consultant/Trainer involved with Performance Tuning and Query Optimization, I am sure you have faced the following questions many times. When is statistics updated? What is the interval of Statistics update? What is the algorithm behind update statistics? These are the puzzling questions and more. I searched the Internet as well many official MS documents in order to find answers. All of them have provided almost similar algorithm. However, at many places, I have seen a bit of variation in algorithm as well. I have finally compiled the list of various algorithms and decided to share what was the most common “factor” in all of them. I would like to ask for your suggestions as whether following the details, when Statistics is updated, are accurate or not. I will update this blog post with accurate information after receiving your ideas. The answer I have found here is when statistics are expired and not when they are automatically updated. I need your help here to answer when they are updated. Permanent table If the table has no rows, statistics is updated when there is a single change in table. If the number of rows in a table is less than 500, statistics is updated for every 500 changes in table. If the number of rows in table is more than 500, statistics is updated for every 500+20% of rows changes in table. Temporary table If the table has no rows, statistics is updated when there is a single change in table. If the number of rows in table is less than 6, statistics is updated for every 6 changes in table. If the number of rows in table is less than 500, statistics is updated for every 500 changes in table. If the number of rows in table is more than 500, statistics is updated for every 500+20% of rows changes in table. Table variable There is no statistics for Table Variables. If you want to read further about statistics, I suggest that you read the white paper Statistics Used by the Query Optimizer in Microsoft SQL Server 2008. Let me know your opinions about statistics, as well as if there is any update in the above algorithm. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, Readers Question, SQL, SQL Authority, SQL Optimization, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: SQL Statistics

    Read the article

  • SQL SERVER – Checklist for Analyzing Slow-Running Queries

    - by pinaldave
    I am recently working on upgrading my class Microsoft SQL Server 2005/2008 Query Optimization and & Performance Tuning with additional details and more interesting examples. While working on slide deck I realized that I need to have one solid slide which talks about checklist for analyzing slow running queries. A quick search on my saved [...]

    Read the article

  • Easy Listening = CRM On Demand Podcasts

    - by Anne
    OK, here's my NEW favorite resource for CRM On Demand info -- podcasts! Specifically, the CRM On Demand Podcast site -- signed, sealed, and delivered with humor and know-how. Yes, I admit, I know the cast of characters. But let's face it, sometimes dealing with software is just soooo dry! Not so when discussed by the two main commentators, Louis Peters and Robert Davidson, whom someone once referred to as CRM On Demand's "Click and Clack." (Thought that was too good not to pass along!) Anyhow, another huge plus about the site is the option to listen OR to read. Out walking my dog or doing the dishes? Just turn up the podcast. Listening to music or watching TV? I'll read Louis's entertaining write-ups to glean great info about CRM On Demand in a very short period of time. So that you get a better understanding of why I like this site so much, here's a sampling of what's discussed: Five Things about Books of Business As Louis Peters put it in his entry, when you see "Five Things" in the title, "you'll know you're going to get some concrete advice that you can put to work right away." Well, Louis and Robert do just that, pointing you in the right direction when using Books of Business to segment data. Moving to Indexed Fields - A Rough Guide (only an article, not a podcast) I've read all about performance and even helped develop material around it. But nowhere have I heard indexed custom fields referred to as "super heroes." Louis and Robert use imaginative language to describe the process for moving your data to indexed fields for optimal performance. Data Access QA from the Forums I think that everyone would admit that data access and visibility is the most difficult topic to understand in CRM On Demand. Following up on their previous podcast on the same topic, Louis and Robert answer a few key questions from the many postings on the Oracle CRM On Demand forums. And I bet that the scenarios match many companies' business requirements...maybe even yours! We Need to Talk About Adoption Another expert, Tim Koehler, joins Louis to talk about how to drive user adoption: aligning product usage with business results, communicating why and how to use the product, getting feedback on usability, and so on. Hope I've made my point -- turn to these podcasts to hear knowledgeable folks discuss CRM On Demand tips and tricks in entertaining ways. One podcast is even called "SaaS Talk"!

    Read the article

  • Certificate Revocation checking affecting system performance [migrated]

    - by Colm Clarke
    I have a .NET 3.5 desktop application that had been showing periodic slow downs in functionality whenever the test machine it was on was out of the office. I managed to replicate the error on a machine in the office without an internet connection, but it was only when i used ANTS performance profiler that i got a clearer picture of what was going on. In ANTS I saw a "Waiting for synchronization" taking up to 16 seconds that corresponded to the delay I could see in the application when NHibernate tried to load the System.Data.SqlServerCE.dll assembly. If I tried the action again immediately it would work with no delay but if I left it for 5 minutes then it would be slow to load again the next time I tried it. From my research so far it appears to be because the SqlServerCE dll is signed and so the system is trying to connect to get the certificate revocation lists and timing out. Disabling the "Automatically detect settings" setting in the Internet Options LAN settings makes the problem go away, as does disabling the "Check for publishers certificate revocation". But the admins where this application will be deployed are not going to be happy with the idea of disabling certificate checking on a per machine or per user basis so I really need to get the application level disabling of the CRL check working. There is the well documented bug in .net 2.0 which describes this behaviour, and offers a possible fix with a config file element. <?xml version="1.0" encoding="utf-8"?> <configuration> <runtime> <generatePublisherEvidence enabled="false"/> </runtime> </configuration> This is NOT working for me however even though I am using .net 3.5. The SQLServerCE dll is being loaded dynamically by NHibernate and I wonder if the fact that it's dynamic could somehow be why the setting isn't working, but I don't know how I could check that. Can anyone offer suggestions as to why the config setting might not work? Or is there another way I could disable the check at the application level, perhaps a CAS policy setting that I can use to set an exception for the application when it's installed? Or is there something I can change in the application to up the trust level or something like that? I have also tried using to no advantage ServicePointManager.CheckCertificateRevocationList = false; http://rusanu.com/2009/07/24/fix-slow-application-startup-due-to-code-sign-validation/ I have also tried those registry settings out and unfortunately they didn't help. The dlls that appear to be the cause of the hold up are native SQL Server CE dlls, and looking at the stack traces in ProcMon mscorwks.dll doesn't appear to be involved even though the checks on crypto and cert registry keys are being done under the .NET application. It's definitely still something to do with publisher certificate checking because unticking "Check for publisher revocation certificate" still works but something odd is going on.

    Read the article

  • The blocking nature of aggregates

    - by Rob Farley
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here.  In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

    Read the article

  • The blocking nature of aggregates

    - by Rob Farley
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here.  In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

    Read the article

  • Extreme Performance and Scale Delivered by SOA on Oracle Exalogic

    - by J Swaroop
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Calibri","sans-serif"; mso-ascii- mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi- mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Calibri","sans-serif"; mso-ascii- mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi- mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Demands to incorporate internet-scale applications, data, and social media traffic with existing IT infrastructure require extreme availability, reliability, and scalability. In this session on industrial-strength SOA, learn how Oracle Exalogic and Oracle Exadata engineered systems address these requirements. Topics covered: (1) how SOA and BPM benefit from “hardware and software engineered for each other,” (2) how Oracle Exadata provides the data tier with unparalleled scalability and performance for SOA and BPM running on Oracle Exalogic (3) customer case studies (4) best practices and topology guidelines (5) information on tools that help operate, manage, provision, and deploy—to help reduce overall TCO. Extreme engineering at its best! Session details: 10/2/12 (Tuesday) 11:45 AM - Moscone South -308

    Read the article

  • Have You Heard About Project Lucy?

    - by KKline
    Lucy, You Got Some 'Splainin to Do!' Quest Software's latest community initiative, Windows Azure-based Project Lucy, has debuted! Project Lucy is part infrastructure analytics, part social media experiment, and part performance data warehouse. The best things about Project Lucy include: It’s Free - just like our SQLServerPedia website, Project Lucy is free to anyone who wants to upload a trace file It’s 1oo% web-based - you don’t have to download or maintain anything and updates roll out seamlessly,...(read more)

    Read the article

  • SQLRally and SQLRally - Session material

    - by Hugo Kornelis
    I had a great week last week. First at SQLRally Nordic , in Stockholm, where I presented a session on how improvements to the OVER clause can help you simplify queries in SQL Server 2012 enormously. And then I continued straight on into SQLRally Amsterdam , where I delivered a session on the performance implications of using user-defined functions in T-SQL. I understand that both events will make my slides and demo code downloadable from their website, but this may take a while. So those who do not...(read more)

    Read the article

  • Trade offs of linking versus skinning geometry

    - by Jeff
    What are the trade offs between inherent in linking geometry to a node versus using skinned geometry? Specifically: What capabilities do you gain / lose from using each method? What are the performance impacts of doing one over the other? What are the specific situations where you would want to do one over the other? In addition, do the answers to these questions tend to be engine specific? If so, how much?

    Read the article

< Previous Page | 80 81 82 83 84 85 86 87 88 89 90 91  | Next Page >