Search Results

Search found 13654 results on 547 pages for 'ssis performance'.

Page 20/547 | < Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27  | Next Page >

  • The blocking nature of aggregates

    - by Rob Farley
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here.  In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

    Read the article

  • A deadlock was detected while trying to lock variables in SSIS

    Error: 0xC001405C at SQL Log Status: A deadlock was detected while trying to lock variables "User::RowCount" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out. Have you ever considered variable locking when building your SSIS packages? I expect many people haven’t just because most of the time you never see an error like the one above. I’ll try and explain a few key concepts about variable locking and hopefully you never will see that error. First of all, what is all this variable locking all about? Put simply SSIS variables have to be locked before they can be accessed, and then of course unlocked once you have finished with them. This is baked into SSIS, presumably to reduce the risk of race conditions, but with that comes some additional overhead in that you need to be careful to avoid lock conflicts in some scenarios. The most obvious place you will come across any hint of locking (no pun intended) is the Script Task or Script Component with their ReadOnlyVariables and ReadWriteVariables properties. These two properties allow you to enter lists of variables to be used within the task, or to put it another way, these lists of variables to be locked, so that they are available within the task. During the task pre-execute phase the variables and locked, you then use them during the execute phase when you code is run, and then unlocked for you during the post-execute phase. So by entering the variable names in one of the two list, the locking is taken care of for you, and you just read and write to the Dts.Variables collection that is exposed in the task for the purpose. As you can see in the image above, the variable PackageInt is specified, which means when I write the code inside that task I don’t have to worry about locking at all, as shown below. public void Main() { // Set the variable value to something new Dts.Variables["PackageInt"].Value = 199; // Raise an event so we can play in the event handler bool fireAgain = true; Dts.Events.FireInformation(0, "Script Task Code", "This is the script task raising an event.", null, 0, ref fireAgain); Dts.TaskResult = (int)ScriptResults.Success; } As you can see as well as accessing the variable, hassle free, I also raise an event. Now consider a scenario where I have an event hander as well as shown below. Now what if my event handler uses tries to use the same variable as well? Well obviously for the point of this post, it fails with the error quoted previously. The reason why is clearly illustrated if you consider the following sequence of events. Package execution starts Script Task in Control Flow starts Script Task in Control Flow locks the PackageInt variable as specified in the ReadWriteVariables property Script Task in Control Flow executes script, and the On Information event is raised The On Information event handler starts Script Task in On Information event handler starts Script Task in On Information event handler attempts to lock the PackageInt variable (for either read or write it doesn’t matter), but will fail because the variable is already locked. The problem is caused by the event handler task trying to use a variable that is already locked by the task in Control Flow. Events are always raised synchronously, therefore the task in Control Flow that is raising the event will not regain control until the event handler has completed, so we really do have un-resolvable locking conflict, better known as a deadlock. In this scenario we can easily resolve the problem by managing the variable locking explicitly in code, so no need to specify anything for the ReadOnlyVariables and ReadWriteVariables properties. public void Main() { // Set the variable value to something new, with explicit lock control Variables lockedVariables = null; Dts.VariableDispenser.LockOneForWrite("PackageInt", ref lockedVariables); lockedVariables["PackageInt"].Value = 199; lockedVariables.Unlock(); // Raise an event so we can play in the event handler bool fireAgain = true; Dts.Events.FireInformation(0, "Script Task Code", "This is the script task raising an event.", null, 0, ref fireAgain); Dts.TaskResult = (int)ScriptResults.Success; } Now the package will execute successfully because the variable lock has already been released by the time the event is raised, so no conflict occurs. For those of you with a SQL Engine background this should all sound strangely familiar, and boils down to getting in and out as fast as you can to reduce the risk of lock contention, be that SQL pages or SSIS variables. Unfortunately we cannot always manage the locking ourselves. The Execute SQL Task is very often used in conjunction with variables, either to pass in parameter values or get results out. Either way the task will manage the locking for you, and will fail when it cannot lock the variables it requires. The scenario outlined above is clear cut deadlock scenario, both parties are waiting on each other, so it is un-resolvable. The mechanism used within SSIS isn’t actually that clever, and whilst the message says it is a deadlock, it really just means it tried a few times, and then gave up. The last part of the error message is actually the most accurate in terms of the failure, A lock cannot be acquired after 16 attempts. The locks timed out.  Now this may come across as a recommendation to always manage locking manually in the Script Task or Script Component yourself, but I think that would be an overreaction. It is more of a reminder to be aware that in high concurrency scenarios, especially when sharing variables across multiple objects, locking is important design consideration. Update – Make sure you don’t try and use explicit locking as well as leaving the variable names in the ReadOnlyVariables and ReadWriteVariables lock lists otherwise you’ll get the deadlock error, you cannot lock a variable twice!

    Read the article

  • Revisiting ANTS Performance Profiler 7.4

    - by James Michael Hare
    Last year, I did a small review on the ANTS Performance Profiler 6.3, now that it’s a year later and a major version number higher, I thought I’d revisit the review and revise my last post. This post will take the same examples as the original post and update them to show what’s new in version 7.4 of the profiler. Background A performance profiler’s main job is to keep track of how much time is typically spent in each unit of code. This helps when we have a program that is not running at the performance we expect, and we want to know where the program is experiencing issues. There are many profilers out there of varying capabilities. Red Gate’s typically seem to be the very easy to “jump in” and get started with very little training required. So let’s dig into the Performance Profiler. I’ve constructed a very crude program with some obvious inefficiencies. It’s a simple program that generates random order numbers (or really could be any unique identifier), adds it to a list, sorts the list, then finds the max and min number in the list. Ignore the fact it’s very contrived and obviously inefficient, we just want to use it as an example to show off the tool: 1: // our test program 2: public static class Program 3: { 4: // the number of iterations to perform 5: private static int _iterations = 1000000; 6: 7: // The main method that controls it all 8: public static void Main() 9: { 10: var list = new List<string>(); 11: 12: for (int i = 0; i < _iterations; i++) 13: { 14: var x = GetNextId(); 15: 16: AddToList(list, x); 17: 18: var highLow = GetHighLow(list); 19: 20: if ((i % 1000) == 0) 21: { 22: Console.WriteLine("{0} - High: {1}, Low: {2}", i, highLow.Item1, highLow.Item2); 23: Console.Out.Flush(); 24: } 25: } 26: } 27: 28: // gets the next order id to process (random for us) 29: public static string GetNextId() 30: { 31: var random = new Random(); 32: var num = random.Next(1000000, 9999999); 33: return num.ToString(); 34: } 35: 36: // add it to our list - very inefficiently! 37: public static void AddToList(List<string> list, string item) 38: { 39: list.Add(item); 40: list.Sort(); 41: } 42: 43: // get high and low of order id range - very inefficiently! 44: public static Tuple<int,int> GetHighLow(List<string> list) 45: { 46: return Tuple.Create(list.Max(s => Convert.ToInt32(s)), list.Min(s => Convert.ToInt32(s))); 47: } 48: } So let’s run it through the profiler and see what happens! Visual Studio Integration First, let’s look at how the ANTS profilers integrate with Visual Studio’s menu system. Once you install the ANTS profilers, you will get an ANTS menu item with several options: Notice that you can either Profile Performance or Launch ANTS Performance Profiler. These sound similar but achieve two slightly different actions: Profile Performance: this immediately launches the profiler with all defaults selected to profile the active project in Visual Studio. Launch ANTS Performance Profiler: this launches the profiler much the same way as starting it from the Start Menu. The profiler will pre-populate the application and path information, but allow you to change the settings before beginning the profile run. So really, the main difference is that Profile Performance immediately begins profiling with the default selections, where Launch ANTS Performance Profiler allows you to change the defaults and attach to an already-running application. Let’s Fire it Up! So when you fire up ANTS either via Start Menu or Launch ANTS Performance Profiler menu in Visual Studio, you are presented with a very simple dialog to get you started: Notice you can choose from many different options for application type. You can profile executables, services, web applications, or just attach to a running process. In fact, in version 7.4 we see two new options added: ASP.NET Web Application (IIS Express) SharePoint web application (IIS) So this gives us an additional way to profile ASP.NET applications and the ability to profile SharePoint applications as well. You can also choose your level of detail in the Profiling Mode drop down. If you choose Line-Level and method-level timings detail, you will get a lot more detail on the method durations, but this will also slow down profiling somewhat. If you really need the profiler to be as unintrusive as possible, you can change it to Sample method-level timings. This is performing very light profiling, where basically the profiler collects timings of a method by examining the call-stack at given intervals. Which method you choose depends a lot on how much detail you need to find the issue and how sensitive your program issues are to timing. So for our example, let’s just go with the line and method timing detail. So, we check that all the options are correct (if you launch from VS2010, the executable and path are filled in already), and fire it up by clicking the [Start Profiling] button. Profiling the Application Once you start profiling the application, you will see a real-time graph of CPU usage that will indicate how much your application is using the CPU(s) on your system. During this time, you can select segments of the graph and bookmark them, giving them mnemonic names. This can be useful if you want to compare performance in one part of the run to another part of the run. Notice that once you select a block, it will give you the call tree breakdown for that selection only, and the relative performance of those calls. Once you feel you have collected enough information, you can click [Stop Profiling] to stop the application run and information collection and begin a more thorough analysis. Analyzing Method Timings So now that we’ve halted the run, we can look around the GUI and see what we can see. By default, the times are shown in terms of percentage of time of the total run of the application, though you can change it in the View menu item to milliseconds, ticks, or seconds as well. This won’t affect the percentages of methods, it only affects what units the times are shown. Notice also that the major hotspot seems to be in a method without source, ANTS Profiler will filter these out by default, but you can right-click on the line and remove the filter to see more detail. This proves especially handy when a bottleneck is due to a method in the BCL. So now that we’ve removed the filter, we see a bit more detail: In addition, ANTS Performance Profiler gives you the ability to decompile the methods without source so that you can dive even deeper, though typically this isn’t necessary for our purposes. When looking at timings, there are generally two types of timings for each method call: Time: This is the time spent ONLY in this method, not including calls this method makes to other methods. Time With Children: This is the total of time spent in both this method AND including calls this method makes to other methods. In other words, the Time tells you how much work is being done exclusively in this method, and the Time With Children tells you how much work is being done inclusively in this method and everything it calls. You can also choose to display the methods in a tree or in a grid. The tree view is the default and it shows the method calls arranged in terms of the tree representing all method calls and the parent method that called them, etc. This is useful for when you find a hot-spot method, you can see who is calling it to determine if the problem is the method itself, or if it is being called too many times. The grid method represents each method only once with its totals and is useful for quickly seeing what method is the trouble spot. In addition, you can choose to display Methods with source which are generally the methods you wrote (as opposed to native or BCL code), or Any Method which shows not only your methods, but also native calls, JIT overhead, synchronization waits, etc. So these are just two ways of viewing the same data, and you’re free to choose the organization that best suits what information you are after. Analyzing Method Source If we look at the timings above, we see that our AddToList() method (and in particular, it’s call to the List<T>.Sort() method in the BCL) is the hot-spot in this analysis. If ANTS sees a method that is consuming the most time, it will flag it as a hot-spot to help call out potential areas of concern. This doesn’t mean the other statistics aren’t meaningful, but that the hot-spot is most likely going to be your biggest bang-for-the-buck to concentrate on. So let’s select the AddToList() method, and see what it shows in the source window below: Notice the source breakout in the bottom pane when you select a method (from either tree or grid view). This shows you the timings in this method per line of code. This gives you a major indicator of where the trouble-spot in this method is. So in this case, we see that performing a Sort() on the List<T> after every Add() is killing our performance! Of course, this was a very contrived, duh moment, but you’d be surprised how many performance issues become duh moments. Note that this one line is taking up 86% of the execution time of this application! If we eliminate this bottleneck, we should see drastic improvement in the performance. So to fix this, if we still wanted to maintain the List<T> we’d have many options, including: delay Sort() until after all Add() methods, using a SortedSet, SortedList, or SortedDictionary depending on which is most appropriate, or forgoing the sorting all together and using a Dictionary. Rinse, Repeat! So let’s just change all instances of List<string> to SortedSet<string> and run this again through the profiler: Now we see the AddToList() method is no longer our hot-spot, but now the Max() and Min() calls are! This is good because we’ve eliminated one hot-spot and now we can try to correct this one as well. As before, we can then optimize this part of the code (possibly by taking advantage of the fact the list is now sorted and returning the first and last elements). We can then rinse and repeat this process until we have eliminated as many bottlenecks as possible. Calls by Web Request Another feature that was added recently is the ability to view .NET methods grouped by the HTTP requests that caused them to run. This can be helpful in determining which pages, web services, etc. are causing hot spots in your web applications. Summary If you like the other ANTS tools, you’ll like the ANTS Performance Profiler as well. It is extremely easy to use with very little product knowledge required to get up and running. There are profilers built into the higher product lines of Visual Studio, of course, which are also powerful and easy to use. But for quickly jumping in and finding hot spots rapidly, Red Gate’s Performance Profiler 7.4 is an excellent choice. Technorati Tags: Influencers,ANTS,Performance Profiler,Profiler

    Read the article

  • How to measure disk performance?

    - by Jakub Šturc
    I am going to "fix" a friend's computer this weekend. By the symptoms he describes it looks like he has a disk performance problem with his 5400 rpm disk. I want to be sure that disk is the problem so I want to "scientificaly" measure the performance. Which tools do you recommend me for this job? Is there any standard set of numbers I can compare the result of measurement with?

    Read the article

  • Question about network topology and routing performance

    - by algorithms
    Hello I am currently working on a uni project about routing protocols and network performance, one of the criteria i was going to test under was to see what effect lan topology has, ie workstations arranged in mesh, star, ring etc, but i am having doubts as to whether that would have any affect on the routing performance thus would be useless to do, rather i'm thinking it would be better to test under the topology of the routers themselves, ie routers arranged in either star, mesh ring etc. I would appreciate some feedback on this as I am rather confused. Thank You

    Read the article

  • My SqlComand on SSIS - DataFlow OLE DB Command seems not works

    - by Angel Escobedo
    Hello Im using OLE DB Source for get rows from a dBase IV file and it works, then I split the data and perform a group by with aggregate component. So I obtain a row with two columns with "null" value : CompanyID | CompanyName | SubTotal | Tax | TotalRevenue Null Null 145487 27642.53 173129.53 this success because all rows have been grouped with out taking care about the firsts columns and just Summing the valuable columns, so I need to change that null for default values as CompanyID = "100000000" and CompanyName = "Others". I try use SqlCommand on a OLE DB Command Component : SELECT "10000000" AS RUCCLI , "Otros - Varios" AS RAZCLI FROM RGVCAFAC <property id="1505" name="SqlCommand" dataType="System.String" state="default" isArray="false" description="The SQL command to be executed." typeConverter="" UITypeEditor="Microsoft.DataTransformationServices.Controls.ModalMultilineStringEditor, Microsoft.DataTransformationServices.Controls, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91" containsID="false" expressionType="Notify">SELECT "10000000" AS RUCCLI , "Otros - Varios" AS RAZCLI FROM RGVCAFAC</property> but nothings happens, why? and finally the task finish when the data is inserted on a SQL Server Table. Im using the same connection manager on extracting data and transform. (View Code) <DTS:Property DTS:Name="ConnectionString">Data Source=C:\CONTA\Resocen\Agosto\;Provider=Microsoft.Jet.OLEDB.4.0;Persist Security Info=False;Extended Properties=dBASE IV;</DTS:Property></DTS:ConnectionManager></DTS:ObjectData></DTS:ConnectionManager> all work is on memory, Im not using cache manager connections

    Read the article

  • Process results of conditional split in SSIS

    - by Robert
    I have a Data Flow Task and am connecting to a database via an OLE DB Source component to extract data. This data feeds into a Conditional Split component to separate the data based on a simple expression. After the evaluation of this expression, the data will end up in either of two locations: LocationA or LocationB. Alright, I have that all set up and working properly. Once the data is separated into these two locations, additional processing is to be done on the records. Here's where I am stuck: I need the the processing of records in LocationA to occur before the processing of records in LocationB. Is there a way to set precedence of which tasks occur before others? If not, what is the best way to handle this? I was thinking I may need to write the data in LocationA and LocationB back out to the database and create a new data flow task in the control flow to handle the order of which these records must be dealt with. Any help is greatly appreciated!

    Read the article

  • VMWare Server - Writing files to virtual hard drive performance

    - by Ardman
    We have just moved our infrastructure from physical servers to virtual machines. Everything is running great and we are happy with the result of the move. We have identified one problem, and that is reading/writing performance. We have an application that compiles files and writes to disk. This is considerably slower on the new virtual machines compared to the physical machines. Is there a performance bottleneck when writing to a virtual hard drive compared to a physical hard drive?

    Read the article

  • Separating Strings from a CSV File in SSIS 2008

    - by David Stein
    I have data which resembles the following: "D.STEIN","DS","01","ALTRES","TTTTTTFFTT" "D.STEIN","DS","01","APCASH","TTTTTTFFTT" "D.STEIN","DS","01","APINH","TTTTTTFFTT" "D.STEIN","DS","01","APINV","TTTTTTFFTT" "D.STEIN","DS","01","APMISC","TTTTTTFFTT" "D.STEIN","DS","01","APPCHK","TTTTTTFFTT" "D.STEIN","DS","01","APWLNK","TTTTTTFFTT" "D.STEIN","DS","01","ARCOM","TTTTTTFFTT" "D.STEIN","DS","01","ARINV","TTTTTTFFTT" I need to break out the final string into separate columns for import into a SQL Table, one letter into each field. Therefore, "TTTTTTFFTT" will be broken out into 10 separate fields each with a single bit value. I've used a Flat File Source Editor to load the data. How do I accomplish the split?

    Read the article

  • SSIS For Each File Loop and File System Task to copy Files

    - by Marlon
    I'm using a files system task inside a for each loop container, just as described here: link text However, when I execute the package I get this error: [File System Task] Error: An error occurred with the following error message: "The process cannot access the file 'C:\Book1.xlsx' because it is being used by another process.". I do not have the file open, and I assume no one else does, as I am able to copy, and open, and overwrite the file. Any suggestions would be appreciated. If you want an example package plz let me know.

    Read the article

  • After writing SQL statements in MySQL, how to measure the speed / performance of them?

    - by Jian Lin
    I saw something from an "execution plan" article: 10 rows fetched in 0.0003s (0.7344s) How come there are 2 durations shown? What if I don't have large data set yet. For example, if I have only 20, 50, or even just 100 records, I can't really measure how faster 2 different SQL statements compare in term of speed in real life situation? In other words, there needs to be at least hundreds of thousands of records, or even a million records to accurately compares the performance of 2 different SQL statements?

    Read the article

  • Linux RAID-0 performance doesn't scale up over 1 GB/s

    - by wazoox
    I have trouble getting the max throughput out of my setup. The hardware is as follow : dual Quad-Core AMD Opteron(tm) Processor 2376 16 GB DDR2 ECC RAM dual Adaptec 52245 RAID controllers 48 1 TB SATA drives set up as 2 RAID-6 arrays (256KB stripe) + spares. Software : Plain vanilla 2.6.32.25 kernel, compiled for AMD-64, optimized for NUMA; Debian Lenny userland. benchmarks run : disktest, bonnie++, dd, etc. All give the same results. No discrepancy here. io scheduler used : noop. Yeah, no trick here. Up until now I basically assumed that striping (RAID 0) several physical devices should augment performance roughly linearly. However this is not the case here : each RAID array achieves about 780 MB/s write, sustained, and 1 GB/s read, sustained. writing to both RAID arrays simultaneously with two different processes gives 750 + 750 MB/s, and reading from both gives 1 + 1 GB/s. however when I stripe both arrays together, using either mdadm or lvm, the performance is about 850 MB/s writing and 1.4 GB/s reading. at least 30% less than expected! running two parallel writer or reader processes against the striped arrays doesn't enhance the figures, in fact it degrades performance even further. So what's happening here? Basically I ruled out bus or memory contention, because when I run dd on both drives simultaneously, aggregate write speed actually reach 1.5 GB/s and reading speed tops 2 GB/s. So it's not the PCIe bus. I suppose it's not the RAM. It's not the filesystem, because I get exactly the same numbers benchmarking against the raw device or using XFS. And I also get exactly the same performance using either LVM striping and md striping. What's wrong? What's preventing a process from going up to the max possible throughput? Is Linux striping defective? What other tests could I run?

    Read the article

  • VMWare - Writing files to virtual hard drive performance

    - by Ardman
    We have just moved our infrastructure from physical servers to virtual machines. Everything is running great and we are happy with the result of the move. We have identified one problem, and that is reading/writing performance. We have an application that compiles files and writes to disk. This is considerably slower on the new virtual machines compared to the physical machines. Is there a performance bottleneck when writing to a virtual hard drive compared to a physical hard drive?

    Read the article

  • Optimize windows 2008 performance

    - by Giorgi
    Hello, I have windows server 2008 sp2 installed as virtual machine on my personal laptop. I use it only for source control (visual svn) and continuous integration (teamcity). As the virtual machine resources are limited I'd like to optimize it's performance by disabling services and features that are not necessary for my purposes. Can anyone recommend where to start or provide with tips for getting better performance. Thanks.

    Read the article

  • Building dynamic OLAP data marts on-the-fly

    - by DrJohn
    At the forthcoming SQLBits conference, I will be presenting a session on how to dynamically build an OLAP data mart on-the-fly. This blog entry is intended to clarify exactly what I mean by an OLAP data mart, why you may need to build them on-the-fly and finally outline the steps needed to build them dynamically. In subsequent blog entries, I will present exactly how to implement some of the techniques involved. What is an OLAP data mart? In data warehousing parlance, a data mart is a subset of the overall corporate data provided to business users to meet specific business needs. Of course, the term does not specify the technology involved, so I coined the term "OLAP data mart" to identify a subset of data which is delivered in the form of an OLAP cube which may be accompanied by the relational database upon which it was built. To clarify, the relational database is specifically create and loaded with the subset of data and then the OLAP cube is built and processed to make the data available to the end-users via standard OLAP client tools. Why build OLAP data marts? Market research companies sell data to their clients to make money. To gain competitive advantage, market research providers like to "add value" to their data by providing systems that enhance analytics, thereby allowing clients to make best use of the data. As such, OLAP cubes have become a standard way of delivering added value to clients. They can be built on-the-fly to hold specific data sets and meet particular needs and then hosted on a secure intranet site for remote access, or shipped to clients' own infrastructure for hosting. Even better, they support a wide range of different tools for analytical purposes, including the ever popular Microsoft Excel. Extension Attributes: The Challenge One of the key challenges in building multiple OLAP data marts based on the same 'template' is handling extension attributes. These are attributes that meet the client's specific reporting needs, but do not form part of the standard template. Now clearly, these extension attributes have to come into the system via additional files and ultimately be added to relational tables so they can end up in the OLAP cube. However, processing these files and filling dynamically altered tables with SSIS is a challenge as SSIS packages tend to break as soon as the database schema changes. There are two approaches to this: (1) dynamically build an SSIS package in memory to match the new database schema using C#, or (2) have the extension attributes provided as name/value pairs so the file's schema does not change and can easily be loaded using SSIS. The problem with the first approach is the complexity of writing an awful lot of complex C# code. The problem of the second approach is that name/value pairs are useless to an OLAP cube; so they have to be pivoted back into a proper relational table somewhere in the data load process WITHOUT breaking SSIS. How this can be done will be part of future blog entry. What is involved in building an OLAP data mart? There are a great many steps involved in building OLAP data marts on-the-fly. The key point is that all the steps must be automated to allow for the production of multiple OLAP data marts per day (i.e. many thousands, each with its own specific data set and attributes). Now most of these steps have a great deal in common with standard data warehouse practices. The key difference is that the databases are all built to order. The only permanent database is the metadata database (shown in orange) which holds all the metadata needed to build everything else (i.e. client orders, configuration information, connection strings, client specific requirements and attributes etc.). The staging database (shown in red) has a short life: it is built, populated and then ripped down as soon as the OLAP Data Mart has been populated. In the diagram below, the OLAP data mart comprises the two blue components: the Data Mart which is a relational database and the OLAP Cube which is an OLAP database implemented using Microsoft Analysis Services (SSAS). The client may receive just the OLAP cube or both components together depending on their reporting requirements.  So, in broad terms the steps required to fulfil a client order are as follows: Step 1: Prepare metadata Create a set of database names unique to the client's order Modify all package connection strings to be used by SSIS to point to new databases and file locations. Step 2: Create relational databases Create the staging and data mart relational databases using dynamic SQL and set the database recovery mode to SIMPLE as we do not need the overhead of logging anything Execute SQL scripts to build all database objects (tables, views, functions and stored procedures) in the two databases Step 3: Load staging database Use SSIS to load all data files into the staging database in a parallel operation Load extension files containing name/value pairs. These will provide client-specific attributes in the OLAP cube. Step 4: Load data mart relational database Load the data from staging into the data mart relational database, again in parallel where possible Allocate surrogate keys and use SSIS to perform surrogate key lookup during the load of fact tables Step 5: Load extension tables & attributes Pivot the extension attributes from their native name/value pairs into proper relational tables Add the extension attributes to the views used by OLAP cube Step 6: Deploy & Process OLAP cube Deploy the OLAP database directly to the server using a C# script task in SSIS Modify the connection string used by the OLAP cube to point to the data mart relational database Modify the cube structure to add the extension attributes to both the data source view and the relevant dimensions Remove any standard attributes that not required Process the OLAP cube Step 7: Backup and drop databases Drop staging database as it is no longer required Backup data mart relational and OLAP database and ship these to the client's infrastructure Drop data mart relational and OLAP database from the build server Mark order complete Start processing the next order, ad infinitum. So my future blog posts and my forthcoming session at the SQLBits conference will all focus on some of the more interesting aspects of building OLAP data marts on-the-fly such as handling the load of extension attributes and how to dynamically alter the structure of an OLAP cube using C#.

    Read the article

  • Randomly poor 2D performance in Linux Mint 11 when using nvidia driver

    - by SDD
    I am using: - Linux Mint 11 - Geforce 560ti - nVidia driver (installed via helper programm, not from nvidia page) The third party nvidia drivers radomly cause very poor 2D performance. Radomly because the performance can be very great, but after the next reboot or login become very poor. After another reboot or login, this might change again to better or worse. I have no idea why and how and I need your help. Thank you.

    Read the article

  • How to measure disk-performance under Windows?

    - by Alphager
    I'm trying to find out why my application is very slow on a certain machine (runs fine everywhere else). I think i have traced the performance-problems to hard-disk reads and writes and i think it's simply the very slow disk. What tool could i use to measure hd read and write performance under Windows 2003 in a non-destructive way (the partitions on the drives have to remain intact)?

    Read the article

  • The perils of double-dash comments [T-SQL]

    - by jamiet
    I was checking my Twitter feed on my way in to work this morning and was alerted to an interesting blog post by Valentino Vranken that highlights a problem regarding the OLE DB Source in SSIS. In short, using double-dash comments in SQL statements within the OLE DB Source can cause unexpected results. It really is quite an important read if you’re developing SSIS packages so head over to SSIS OLE DB Source, Parameters And Comments: A Dangerous Mix! and be educated. Note that the problem is solved in SSIS2012 and Valentino explains exactly why. If reading Valentino’s post has switched your brain into “learn mode” perhaps also check out my post SSIS: SELECT *... or select from a dropdown in an OLE DB Source component? which highlights another issue to be aware of when using the OLE DB Source. As I was reading Valentino’s post I was reminded of a slidedeck by Chris Adkin entitled T-SQL Coding Guidelines where he recommends never using double-dash comments: That’s good advice! @Jamiet

    Read the article

  • Presenting at Roanoke Code Camp Saturday!

    - by andyleonard
    Introduction I am honored to once again be selected to present at Roanoke Code Camp ! An Introductory Topic One of my presentations is titled "I See a Control Flow Tab. Now What?" It's a Level 100 talk for those wishing to learn how to build their very first SSIS package. This highly-interactive, demo-intense presentation is for beginners and developers just getting started with SSIS. Attend and learn how to build SSIS packages from the ground up . Designing an SSIS Framework I'm also presenting...(read more)

    Read the article

< Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27  | Next Page >