performance tuning - Page 46

Oracle ZFSSA Hybrid Storage Pool Demo

- by Darius Zanganeh

The ZFS Hybrid Storage Pool (HSP) has been around since the ZFSSA first launched. It is one of the main contributors to the high performance we see on the Oracle ZFSSA both in benchmarks as well as many production environments. Below is a short video I made to show at a high level just how impactful this HSP pool is on storage performance. We squeeze a ton of performance out of our drives with our unique use of cache, write optimized ssd and read optimized ssd. Many have written and blogged about this technology, here it is in action. Demo of the Oracle ZFSSA Hybrid Storage Pool and how it speeds up workloads.

Read the article

"Optimal" game loop for 2D side-scroller

- by MrDatabase

Is it possible to describe an "optimal" (in terms of performance) layout for a 2D side-scroller's game loop? In this context the "game loop" takes user input, updates the states of game objects and draws the game objects. For example having a GameObject base class with a deep inheritance hierarchy could be good for maintenance... you can do something like the following: foreach(GameObject g in gameObjects) g.update(); However I think this approach can create performance issues. On the other hand all game objects' data and functions could be global. Which would be a maintenance headache but might be closer to an optimally performing game loop. Any thoughts? I'm interested in practical applications of near optimal game loop structure... even if I get a maintenance headache in exchange for great performance.

Read the article

Will Check Constraints Improve Database Performance?

[Read Full Article]

Read the article

What criteria would I use SQL Stream Insight vs TPL Dataflow [closed]

- by makerofthings7

There is an add-in to the Task Parallel Library (TPL) called TPL Dataflow that allows a variety of data processing scenarios. It seems that there are some parallels to the SQL Stream Insight product, however since SQL's Stream Insight has some interesting licensing around it, and it has a better performance depending on what license I get... I found myself asking myself should I use TPL Dataflow and not have any licensing issues, and possibly better performance. Can anyone tell me if performance is a valid criteria for comparing SQL Stream Insight vs TPL Dataflow? What other criteria should I be looking at when comparing the two?

Read the article

50 Tips to Boost Performance of an ASP.NET Application - Part 1

Normal 0 ... [Read Full Article]

Read the article

50 Tips to Boost Performance of an ASP.NET Application - Part 2

Normal 0 ... [Read Full Article]

Read the article

Can a call to WaitHandle.SignalAndWait be ignored for performance profiling purposes?

- by Dan Tao

I just downloaded the trial version of ANTS Performance Profiler from Red Gate and am investigating some of my team's code. Immediately I notice that there's a particular section of code that ANTS is reporting as eating up to 99% CPU time. I am completely unfamiliar with ANTS or performance profiling in general (that is, aside from self-profiling using what I'm sure are extremely crude and frowned-upon methods such as double timeToComplete = (endTime - startTime).TotalSeconds), so I'm still fiddling around with the application and figuring out how it's used. But I did call the developer responsible for the code in question and his immediate reaction was "Yeah, that doesn't surprise me that it says that; but that code calls SignalAndWait [which I could see for myself, thanks to ANTS], which doesn't use any CPU, it just sits there waiting for something to do." He advised me to simply ignore that code and look for anything ELSE I could find. My question: is it true that SignalAndWait requires NO CPU overhead (and if so, how is this possible?), and is it reasonable that a performance profiler would view it as taking up 99% CPU time? I find this particularly curious because, if it's at 99%, that would suggest that our application is often idle, wouldn't it? And yet its performance has become rather sluggish lately. Like I said, I really am just a beginner when it comes to this tool, and I don't know anything about the WaitHandle class. So ANY information to help me to understand what's going on here would be appreciated.

Read the article

What is the performance impact of CSS's universal selector?

- by Bungle

I'm trying to find some simple client-side performance tweaks in a page that receives millions of monthly pageviews. One concern that I have is the use of the CSS universal selector (*). As an example, consider a very simple HTML document like the following: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <title>Example</title> <style type="text/css"> * { margin: 0; padding: 0; } </head> <body> <h1>This is a heading</h1> <p>This is a paragraph of text.</p> </body> </html> The universal selector will apply the above declaration to the body, h1 and p elements, since those are the only ones in the document. In general, would I see better performance from a rule such as: body, h1, p { margin: 0; padding: 0; } Or would this have exactly the same net effect? Essentially, what I'm asking is if these rules are effectively equivalent in this case, or if the universal selector has to perform more unnecessary work that I may not be aware of. I realize that the performance impact in this example may be very small, but I'm hoping to learn something that may lead to more significant performance improvements in real-world situations. Thanks for any help!

Read the article

Is there a IDE/compiler PC benchmark I can use to compare my PCs performance?

- by RickL

I'm looking for a benchmark (and results on other PCs) which would give me an idea of the development performance gain I could get by upgrading my PC, also the benchmark could be used to justify the upgrade to my boss. I use Visual Studio 2008 for my development, so I'd like to get an idea of by what factor the build times would be improved, and also it would be good if the benchmark could incorporate IDE performance (i.e. when editing, using intellisense, opening code files etc) into its result. I currently have an AMD 3800x2, with 2GB RAM on Vista 32. For example, I'd like to know what kind of performance gain I'd see in Visual Studio 2008 with a Q6600, 4GB RAM on Vista 64. And also with other processors, and other RAM sizes... also see whether hard disk performance is a big factor. EDIT: I mentioned Vista 64 because I'm aware that Vista 32 can only use 3GB RAM maximum. So I'd presume that wanting to use more RAM would require Vista 64, but perhaps it could still be slower overall there is a large overhead in using the 32 bit VS 2008 on 64 bit OS.

Read the article

Performance of stored proc when updating columns selectively based on parameters?

- by kprobst

I'm trying to figure out if this is relatively well-performing T-SQL (this is SQL Server 2008). I need to create a stored procedure that updates a table. The proc accepts as many parameters as there are columns in the table, and with the exception of the PK column, they all default to NULL. The body of the procedure looks like this: CREATE PROCEDURE proc_repo_update @object_id bigint ,@object_name varchar(50) = NULL ,@object_type char(2) = NULL ,@object_weight int = NULL ,@owner_id int = NULL -- ...etc AS BEGIN update object_repo set object_name = ISNULL(@object_name, object_name) ,object_type = ISNULL(@object_type, object_type) ,object_weight = ISNULL(@object_weight, object_weight) ,owner_id = ISNULL(@owner_id, owner_id) -- ...etc where object_id = @object_id return @@ROWCOUNT END So basically: Update a column only if its corresponding parameter was provided, and leave the rest alone. This works well enough, but as the ISNULL call will return the value of the column if the received parameter was null, will SQL Server optimize this somehow? This might be a performance bottleneck on the application where the table might be updated heavily (insertion will be uncommon so the performance there is not a problem). So I'm trying to figure out what's the best way to do this. Is there a way to condition the column expressions with something like CASE WHEN or something? The table will be indexed up the wazoo as well for read performance. Is this the best approach? My alternative at this point is to create the UPDATE expression in code (e.g. inline SQL) and execute it against the server. This would solve my doubts about performance, but I'd rather leave this in a stored proc if possible.

Read the article

Recent improvements in Console Performance

- by loren.konkus

Recently, the WebLogic Server development and support organizations have worked with a number of customers to quantify and improve the performance of the Administration Console in large, distributed configurations where there is significant latency in the communications between the administration server and managed servers. These improvements fall into two categories: Constraining the amount of time that the Console stalls waiting for communication Reducing and streamlining the amount of data required for an update A few releases ago, we added support for a configurable domain-wide mbean "Invocation Timeout" value on the Console's configuration: general, advanced section for a domain. The default value for this setting is 0, which means wait indefinitely and was chosen for compatibility with the behavior of previous releases. This configuration setting applies to all mbean communications between the admin server and managed servers, and is the first line of defense against being blocked by a stalled or completely overloaded managed server. Each site should choose an appropriate timeout value for their environment and network latency. In the next release of WebLogic Server, we've added an additional console preference, "Management Operation Timeout", to the Console's shared preference page. This setting further constrains how long certain console pages will wait for slowly responding servers before returning partial results. While not all Console pages support this yet, key pages such as the Servers Configuration and Control table pages and the Deployments Control pages have been updated to support this. For example, if a user requests a Servers Table page and a Management Operation Timeout occurs, the table is displayed with both local configuration and remote runtime information from the responding managed servers and only local configuration information for servers that did not yet respond. This means that a troublesome managed server does not impede your ability to manage your domain using the Console. To support these changes, these Console pages have been re-written to use the Work Management feature of WebLogic Server to interact with each server or deployment concurrently, which further improves the responsiveness of these pages. The basic algorithm for these pages is: For each configuration mbean (ie, Servers) populate rows with configuration attributes from the fast, local mbean server Find a WorkManager For each server, Create a Work instance to obtain runtime mbean attributes for the server Schedule Work instance in the WorkManager Call WorkManager.waitForAll to wait WorkItems to finish, constrained by Management Operation Timeout For each WorkItem, if the runtime information obtained was not complete, add a message indicating which server has incomplete data Display collected data in table In addition to these changes to constrain how long the console waits for communication, a number of other changes have been made to reduce the amount and scope of managed server interactions for key pages. For example, in previous releases the Deployments Control table looked at the status of a deployment on every managed server, even those servers that the deployment was not currently targeted on. (This was done to handle an edge case where a deployment's target configuration was changed while it remained running on previously targeted servers.) We decided supporting that edge case did not warrant the performance impact for all, and instead only look at the status of a deployment on the servers it is targeted to. Comprehensive status continues to be available if a user clicks on the 'status' field for a deployment. Finally, changes have been made to the System Status portlet to reduce its impact on Console page display times. Obtaining health information for this display requires several mbean interactions with managed servers. In previous releases, this mbean interaction occurred with every display, and any delay or impediment in these interactions was reflected in the display time for every page. To reduce this impact, we've made several changes in this portlet: Using Work Management to obtain health concurrently Applying the operation timeout configuration to constrain how long we will wait Caching health information to reduce the cost during rapid navigation from page to page and only obtaining new health information if the previous information is over 30 seconds old. Eliminating heath collection if this portlet is minimized. Together, these Console changes have resulted in significant performance improvements for the customers with large configurations and high latency that we have worked with during their development, and some lesser performance improvements for those with small configurations and very fast networks. These changes will be included in the 11g Rel 1 patch set 2 (10.3.3.0) release of WebLogic Server.

Read the article

How do I find the cause for a huge difference in performance between two identical Ubuntu servers?

- by the.duckman

I am running two Dell R410 servers in the same rack of a data center. Both have the same hardware configuration, run Ubuntu 10.4, have the same packages installed and run the same Java web servers. No other load. One of them is 20-30% faster than the other, very consistently. I used dstat to figure out, if there are more context switches, IO, swapping or anything, but I see no reason for the difference. With the same workload, (no swapping, virtually no IO), the cpu usage and load is higher on one server. So the difference appears to be mainly CPU bound, but while a simple cpu benchmark using sysbench (with all other load turned off) did yield a difference, it was only 6%. So maybe it is not only CPU but also memory performance. I tried to figure out if the BIOS settings differ in some parameter, did a dump using dmidecode, but that yielded no difference. I compared /proc/cpuinfo, no difference. I compared the output of cpufreq-info, no difference. I am lost. What can I do, to figure out, what is going on?

Read the article

mysql - moving to a lower performance server, how small can I go?

- by pedalpete

Read the article

Windows 7 host with Ubuntu Guest and a performance hit, memory locks?

- by Cyrylski

I have a brand new Lenovo T510 with Core i5 and 4GB of RAM with Windows 7 on it. I Installed Ubuntu 10.10 in a Virtualbox. For some reason system gets really slow on this setup which makes me really angry. There's a video card shared with full 3D support enabled and 1GB of RAM allocated for the Ubuntu machine. It may sound stupid, but WHY is the whole memory consumed in an instant when I run Virtualbox? I struggled for like 10 minutes restraining myself from a brutal reset, and now everything runs smooth but memory "in use" in Resource Monitor is 3GB flat with only Chrome running. I'm new to Windows 7, but I'm really disappointed with performance at this point... I used to work in a different environment with much slower hardware and there was no such problem (WinXP over Ubuntu, 1GB out of 2GB allocated for WinXP guest on intel GMA). This is, until I clogged RAM totally there. But I was capable of running Chrome, Firefox and Apache server on a 1GB RAM in Ubuntu there and Photoshop CS4 on Windows XP and it worked. In this case I can't go beyond setting up Ubuntu properly. I bet I'm doing something wrong.

Read the article

Save object states in .data or attr - Performance vs CSS?

- by Neysor

In response to my answer yesterday about rotating an Image, Jamund told me to use .data() instead of .attr() First I thought that he is right, but then I thought about a bigger context... Is it always better to use .data() instead of .attr()? I looked in some other posts like what-is-better-data-or-attr or jquery-data-vs-attrdata The answers were not satisfactory for me... So I moved on and edited the example by adding CSS. I thought it might be useful to make a different Style on each image if it rotates. My style was the following: .rp[data-rotate="0"] { border:10px solid #FF0000; } .rp[data-rotate="90"] { border:10px solid #00FF00; } .rp[data-rotate="180"] { border:10px solid #0000FF; } .rp[data-rotate="270"] { border:10px solid #00FF00; } Because design and coding are often separated, it could be a nice feature to handle this in CSS instead of adding this functionality into JavaScript. Also in my case the data-rotate is like a special state which the image currently has. So in my opinion it make sense to represent it within the DOM. I also thought this could be a case where it is much better to save with .attr() then with .data(). Never mentioned before in one of the posts I read. But then i thought about performance. Which function is faster? I built my own test following: <!DOCTYPE HTML> <html> <head> <title>test</title> <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script> <script type="text/javascript"> function runfirst(dobj,dname){ console.log("runfirst "+dname); console.time(dname+"-attr"); for(i=0;i<10000;i++){ dobj.attr("data-test","a"+i); } console.timeEnd(dname+"-attr"); console.time(dname+"-data"); for(i=0;i<10000;i++){ dobj.data("data-test","a"+i); } console.timeEnd(dname+"-data"); } function runlast(dobj,dname){ console.log("runlast "+dname); console.time(dname+"-data"); for(i=0;i<10000;i++){ dobj.data("data-test","a"+i); } console.timeEnd(dname+"-data"); console.time(dname+"-attr"); for(i=0;i<10000;i++){ dobj.attr("data-test","a"+i); } console.timeEnd(dname+"-attr"); } $().ready(function() { runfirst($("#rp4"),"#rp4"); runfirst($("#rp3"),"#rp3"); runlast($("#rp2"),"#rp2"); runlast($("#rp1"),"#rp1"); }); </script> </head> <body> <div id="rp1">Testdiv 1</div> <div id="rp2" data-test="1">Testdiv 2</div> <div id="rp3">Testdiv 3</div> <div id="rp4" data-test="1">Testdiv 4</div> </body> </html> It should also show if there is a difference with a predefined data-test or not. One result was this: runfirst #rp4 #rp4-attr: 515ms #rp4-data: 268ms runfirst #rp3 #rp3-attr: 505ms #rp3-data: 264ms runlast #rp2 #rp2-data: 260ms #rp2-attr: 521ms runlast #rp1 #rp1-data: 284ms #rp1-attr: 525ms So the .attr() function did always need more time than the .data() function. This is an argument for .data() I thought. Because performance is always an argument! Then I wanted to post my results here with some questions, and in the act of writing I compared with the questions Stack Overflow showed me (similar titles) And true enough, there was one interesting post about performance I read it and run their example. And now I am confused! This test showed that .data() is slower then .attr() !?!! Why is that so? First I thought it is because of a different jQuery library so I edited it and saved the new one. But the result wasn't changing... So now my questions to you: Why are there some differences in the performance in these two examples? Would you prefer to use data- HTML5 attributes instead of data, if it represents a state? Although it wouldn't be needed at the time of coding? Why - Why not? Now depending on the performance: Would performance be an argument for you using .attr() instead of data, if it shows that .attr() is better? Although data is meant to be used for .data()? UPDATE 1: I did see that without overhead .data() is much faster. Misinterpreted the data :) But I'm more interested in my second question. :) Would you prefer to use data- HTML5 attributes instead of data, if it represents a state? Although it wouldn't be needed at the time of coding? Why - Why not? Are there some other reasons you can think of, to use .attr() and not .data()? e.g. interoperability? because .data() is jquery style and HTML Attributes can be read by all... UPDATE 2: As we see from T.J Crowder's speed test in his answer attr is much faster then data! which is again confusing me :) But please! Performance is an argument, but not the highest! So give answers to my other questions please too!

Read the article

General monitoring for SQL Server Analysis Services using Performance Monitor

- by Testas

A recent customer engagement required a setup of a monitoring solution for SSAS, due to the time restrictions placed upon this, native Windows Performance Monitor (Perfmon) and SQL Server Profiler Monitoring Tools was used as using a third party tool would have meant the customer providing an additional monitoring server that was not available.I wanted to outline the performance monitoring counters that was used to monitor the system on which SSAS was running. Due to the slow query performance that was occurring during certain scenarios, perfmon was used to establish if any pressure was being placed on the Disk, CPU or Memory subsystem when concurrent connections access the same query, and Profiler to pinpoint how the query was being managed within SSAS, profiler I will leave for another blogThis guide is not designed to provide a definitive list of what should be used when monitoring SSAS, different situations may require the addition or removal of counters as presented by the situation. However I hope that it serves as a good basis for starting your monitoring of SSAS. I would also like to acknowledge Chris Webb’s awesome chapters from “Expert Cube Development” that also helped shape my monitoring strategy:http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!6657.entrySimulating ConnectionsTo simulate the additional connections to the SSAS server whilst monitoring, I used ascmd to simulate multiple connections to the typical and worse performing queries that were identified by the customer. A similar sript can be downloaded from codeplex at http://www.codeplex.com/SQLSrvAnalysisSrvcs. File name: ASCMD_StressTestingScripts.zip. Performance MonitorWithin performance monitor, a counter log was created that contained the list of counters below. The important point to note when running the counter log is that the RUN AS property within the counter log properties should be changed to an account that has rights to the SSAS instance when monitoring MSAS counters. Failure to do so means that the counter log runs under the system account, no errors or warning are given while running the counter log, and it is not until you need to view the MSAS counters that they will not be displayed if run under the default account that has no right to SSAS. If your connection simulation takes hours, this could prove quite frustrating if not done beforehand JThe counters used…… Object Counter Instance Justification System Processor Queue legnth N/A Indicates how many threads are waiting for execution against the processor. If this counter is consistently higher than around 5 when processor utilization approaches 100%, then this is a good indication that there is more work (active threads) available (ready for execution) than the machine's processors are able to handle. System Context Switches/sec N/A Measures how frequently the processor has to switch from user- to kernel-mode to handle a request from a thread running in user mode. The heavier the workload running on your machine, the higher this counter will generally be, but over long term the value of this counter should remain fairly constant. If this counter suddenly starts increasing however, it may be an indicating of a malfunctioning device, especially if the Processor\Interrupts/sec\(_Total) counter on your machine shows a similar unexplained increase Process % Processor Time sqlservr Definately should be used if Processor\% Processor Time\(_Total) is maxing at 100% to assess the effect of the SQL Server process on the processor Process % Processor Time msmdsrv Definately should be used if Processor\% Processor Time\(_Total) is maxing at 100% to assess the effect of the SQL Server process on the processor Process Working Set sqlservr If the Memory\Available bytes counter is decreaing this counter can be run to indicate if the process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures the size of the working set for each process, which indicates the number of allocated pages the process can address without generating a page fault. Process Working Set msmdsrv If the Memory\Available bytes counter is decreaing this counter can be run to indicate if the process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures the size of the working set for each process, which indicates the number of allocated pages the process can address without generating a page fault. Processor % Processor Time _Total and individual cores measures the total utilization of your processor by all running processes. If multi-proc then be mindful only an average is provided Processor % Privileged Time _Total To see how the OS is handling basic IO requests. If kernel mode utilization is high, your machine is likely underpowered as it's too busy handling basic OS housekeeping functions to be able to effectively run other applications. Processor % User Time _Total To see how the applications is interacting from a processor perspective, a high percentage utilisation determine that the server is dealing with too many apps and may require increasing thje hardware or scaling out Processor Interrupts/sec _Total The average rate, in incidents per second, at which the processor received and serviced hardware interrupts. Shoulr be consistant over time but a sudden unexplained increase could indicate a device malfunction which can be confirmed using the System\Context Switches/sec counter Memory Pages/sec N/A Indicates the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays, this is the primary counter to watch for indication of possible insufficient RAM to meet your server's needs. A good idea here is to configure a perfmon alert that triggers when the number of pages per second exceeds 50 per paging disk on your system. May also want to see the configuration of the page file on the Server Memory Available Mbytes N/A is the amount of physical memory, in bytes, available to processes running on the computer. if this counter is greater than 10% of the actual RAM in your machine then you probably have more than enough RAM. monitor it regularly to see if any downward trend develops, and set an alert to trigger if it drops below 2% of the installed RAM. Physical Disk Disk Transfers/sec for each physical disk If it goes above 10 disk I/Os per second then you've got poor response time for your disk. Physical Disk Idle Time _total If Disk Transfers/sec is above 25 disk I/Os per second use this counter. which measures the percent time that your hard disk is idle during the measurement interval, and if you see this counter fall below 20% then you've likely got read/write requests queuing up for your disk which is unable to service these requests in a timely fashion. Physical Disk Disk queue legnth For the OLAP and SQL physical disk A value that is consistently less than 2 means that the disk system is handling the IO requests against the physical disk Network Interface Bytes Total/sec For the NIC Should be monitored over a period of time to see if there is anb increase/decrease in network utilisation Network Interface Current Bandwidth For the NIC is an estimate of the current bandwidth of the network interface in bits per second (BPS). MSAS 2005: Memory Memory Limit High KB N/A Shows (as a percentage) the high memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini MSAS 2005: Memory Memory Limit Low KB N/A Shows (as a percentage) the low memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini MSAS 2005: Memory Memory Usage KB N/A Displays the memory usage of the server process. MSAS 2005: Memory File Store KB N/A Displays the amount of memory that is reserved for the Cache. Note if total memory limit in the msmdsrv.ini is set to 0, no memory is reserved for the cache MSAS 2005: Storage Engine Query Queries from Cache Direct / sec N/A Displays the rate of queries answered from the cache directly MSAS 2005: Storage Engine Query Queries from Cache Filtered / Sec N/A Displays the Rate of queries answered by filtering existing cache entry. MSAS 2005: Storage Engine Query Queries from File / Sec N/A Displays the Rate of queries answered from files. MSAS 2005: Storage Engine Query Average time /query N/A Displays the average time of a query MSAS 2005: Connection Current connections N/A Displays the number of connections against the SSAS instance MSAS 2005: Connection Requests / sec N/A Displays the rate of query requests per second MSAS 2005: Locks Current Lock Waits N/A Displays thhe number of connections waiting on a lock MSAS 2005: Threads Query Pool job queue Length N/A The number of queries in the job queue MSAS 2005:Proc Aggregations Temp file bytes written/sec N/A Shows the number of bytes of data processed in a temporary file MSAS 2005:Proc Aggregations Temp file rows written/sec N/A Shows the number of bytes of data processed in a temporary file

Read the article

SQL Server Prefetch and Query Performance

Prefetching can make a surprising difference to SQL Server query execution times where there is a high incidence of waiting for disk i/o operations, but the benefits come at a cost. Mostly, the Query Optimizer gets it right, but occasionally there are queries that would benefit from tuning. Get smart with SQL Backup ProGet faster, smaller backups with integrated verification.Quickly and easily DBCC CHECKDB your backups. Learn more.

Read the article

SQL SERVER – Introduction to Wait Stats and Wait Types – Wait Type – Day 1 of 28

- by pinaldave

I have been working a lot on Wait Stats and Wait Types recently. Last Year, I requested blog readers to send me their respective server’s wait stats. I appreciate their kind response as I have received Wait stats from my readers. I took each of the results and carefully analyzed them. I provided necessary feedback to the person who sent me his wait stats and wait types. Based on the feedbacks I got, many of the readers have tuned their server. After a while I got further feedbacks on my recommendations and again, I collected wait stats. I recorded the wait stats and my recommendations and did further research. At some point at time, there were more than 10 different round trips of the recommendations and suggestions. Finally, after six month of working my hands on performance tuning, I have collected some real world wisdom because of this. Now I plan to share my findings with all of you over here. Before anything else, please note that all of these are based on my personal observations and opinions. They may or may not match the theory available at other places. Some of the suggestions may not match your situation. Remember, every server is different and consequently, there is more than one solution to a particular problem. However, this series is written with kept wait stats in mind. While I was working on various performance tuning consultations, I did many more things than just tuning wait stats. Today we will discuss how to capture the wait stats. I use the script diagnostic script created by my friend and SQL Server Expert Glenn Berry to collect wait stats. Here is the script to collect the wait stats: -- Isolate top waits for server instance since last restart or statistics clear WITH Waits AS (SELECT wait_type, wait_time_ms / 1000. AS wait_time_s, 100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct, ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn FROM sys.dm_os_wait_stats WHERE wait_type NOT IN ('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK' ,'SLEEP_SYSTEMTASK','SQLTRACE_BUFFER_FLUSH','WAITFOR', 'LOGMGR_QUEUE','CHECKPOINT_QUEUE' ,'REQUEST_FOR_DEADLOCK_SEARCH','XE_TIMER_EVENT','BROKER_TO_FLUSH','BROKER_TASK_STOP','CLR_MANUAL_EVENT' ,'CLR_AUTO_EVENT','DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT' ,'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP')) SELECT W1.wait_type, CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s, CAST(W1.pct AS DECIMAL(12, 2)) AS pct, CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct FROM Waits AS W1 INNER JOIN Waits AS W2 ON W2.rn <= W1.rn GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct HAVING SUM(W2.pct) - W1.pct < 99 OPTION (RECOMPILE); -- percentage threshold GO This script uses Dynamic Management View sys.dm_os_wait_stats to collect the wait stats. It omits the system-related wait stats which are not useful to diagnose performance-related bottleneck. Additionally, not OPTION (RECOMPILE) at the end of the DMV will ensure that every time the query runs, it retrieves new data and not the cached data. This dynamic management view collects all the information since the time when the SQL Server services have been restarted. You can also manually clear the wait stats using the following command: DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR); Once the wait stats are collected, we can start analysis them and try to see what is causing any particular wait stats to achieve higher percentages than the others. Many waits stats are related to one another. When the CPU pressure is high, all the CPU-related wait stats show up on top. But when that is fixed, all the wait stats related to the CPU start showing reasonable percentages. It is difficult to have a sure solution, but there are good indications and good suggestions on how to solve this. I will keep this blog post updated as I will post more details about wait stats and how I reduce them. The reference to Book On Line is over here. Of course, I have selected February to run this Wait Stats series. I am already cheating by having the smallest month to run this series. :) Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: DMV, Pinal Dave, PostADay, SQL, SQL Authority, SQL Optimization, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL, Technology

Read the article

Speeding up procedural texture generation

- by FalconNL

Recently I've begun working on a game that takes place in a procedurally generated solar system. After a bit of a learning curve (having neither worked with Scala, OpenGL 2 ES or Libgdx before), I have a basic tech demo going where you spin around a single procedurally textured planet: The problem I'm running into is the performance of the texture generation. A quick overview of what I'm doing: a planet is a cube that has been deformed to a sphere. To each side, a n x n (e.g. 256 x 256) texture is applied, which are bundled in one 8n x n texture that is sent to the fragment shader. The last two spaces are not used, they're only there to make sure the width is a power of 2. The texture is currently generated on the CPU, using the updated 2012 version of the simplex noise algorithm linked to in the paper 'Simplex noise demystified'. The scene I'm using to test the algorithm contains two spheres: the planet and the background. Both use a greyscale texture consisting of six octaves of 3D simplex noise, so for example if we choose 128x128 as the texture size there are 128 x 128 x 6 x 2 x 6 = about 1.2 million calls to the noise function. The closest you will get to the planet is about what's shown in the screenshot and since the game's target resolution is 1280x720 that means I'd prefer to use 512x512 textures. Combine that with the fact the actual textures will of course be more complicated than basic noise (There will be a day and night texture, blended in the fragment shader based on sunlight, and a specular mask. I need noise for continents, terrain color variation, clouds, city lights, etc.) and we're looking at something like 512 x 512 x 6 x 3 x 15 = 70 million noise calls for the planet alone. In the final game, there will be activities when traveling between planets, so a wait of 5 or 10 seconds, possibly 20, would be acceptable since I can calculate the texture in the background while traveling, though obviously the faster the better. Getting back to our test scene, performance on my PC isn't too terrible, though still too slow considering the final result is going to be about 60 times worse: 128x128 : 0.1s 256x256 : 0.4s 512x512 : 1.7s This is after I moved all performance-critical code to Java, since trying to do so in Scala was a lot worse. Running this on my phone (a Samsung Galaxy S3), however, produces a more problematic result: 128x128 : 2s 256x256 : 7s 512x512 : 29s Already far too long, and that's not even factoring in the fact that it'll be minutes instead of seconds in the final version. Clearly something needs to be done. Personally, I see a few potential avenues, though I'm not particularly keen on any of them yet: Don't precalculate the textures, but let the fragment shader calculate everything. Probably not feasible, because at one point I had the background as a fullscreen quad with a pixel shader and I got about 1 fps on my phone. Use the GPU to render the texture once, store it and use the stored texture from then on. Upside: might be faster than doing it on the CPU since the GPU is supposed to be faster at floating point calculations. Downside: effects that cannot (easily) be expressed as functions of simplex noise (e.g. gas planet vortices, moon craters, etc.) are a lot more difficult to code in GLSL than in Scala/Java. Calculate a large amount of noise textures and ship them with the application. I'd like to avoid this if at all possible. Lower the resolution. Buys me a 4x performance gain, which isn't really enough plus I lose a lot of quality. Find a faster noise algorithm. If anyone has one I'm all ears, but simplex is already supposed to be faster than perlin. Adopt a pixel art style, allowing for lower resolution textures and fewer noise octaves. While I originally envisioned the game in this style, I've come to prefer the realistic approach. I'm doing something wrong and the performance should already be one or two orders of magnitude better. If this is the case, please let me know. If anyone has any suggestions, tips, workarounds, or other comments regarding this problem I'd love to hear them.

Read the article

Intel Extreme Tuning utility options are greyed

- by Abhishek Sha

I'm having a ASUS K55VM with Intel Core i7 3610QM (IvyBridge) with a NVIDIA GT630M. I'm trying to operate the Intel XTU, but as you can see in the screenshot, all the options are greyed out. Can you please help with this situation. Another are is the CPU Throttling (Intel SpeedStep) which is always shown as 0%. But in the Intel Turbo Monitor, the Speed keeps dynamically changing. Then why is the CPU Throttling always at 0%?:

Read the article

Why do I see a large performance hit with DRBD?

- by BHS

I see a much larger performance hit with DRBD than their user manual says I should get. I'm using DRBD 8.3.7 (Fedora 13 RPMs). I've setup a DRBD test and measured throughput of disk and network without DRBD: dd if=/dev/zero of=/data.tmp bs=512M count=1 oflag=direct 536870912 bytes (537 MB) copied, 4.62985 s, 116 MB/s / is a logical volume on the disk I'm testing with, mounted without DRBD iperf: [ 4] 0.0-10.0 sec 1.10 GBytes 941 Mbits/sec According to Throughput overhead expectations, the bottleneck would be whichever is slower, the network or the disk and DRBD should have an overhead of 3%. In my case network and I/O seem to be pretty evenly matched. It sounds like I should be able to get around 100 MB/s. So, with the raw drbd device, I get dd if=/dev/zero of=/dev/drbd2 bs=512M count=1 oflag=direct 536870912 bytes (537 MB) copied, 6.61362 s, 81.2 MB/s which is slower than I would expect. Then, once I format the device with ext4, I get dd if=/dev/zero of=/mnt/data.tmp bs=512M count=1 oflag=direct 536870912 bytes (537 MB) copied, 9.60918 s, 55.9 MB/s This doesn't seem right. There must be some other factor playing into this that I'm not aware of. global_common.conf global { usage-count yes; } common { protocol C; } syncer { al-extents 1801; rate 33M; } data_mirror.res resource data_mirror { device /dev/drbd1; disk /dev/sdb1; meta-disk internal; on cluster1 { address 192.168.33.10:7789; } on cluster2 { address 192.168.33.12:7789; } } For the hardware I have two identical machines: 6 GB RAM Quad core AMD Phenom 3.2Ghz Motherboard SATA controller 7200 RPM 64MB cache 1TB WD drive The network is 1Gb connected via a switch. I know that a direct connection is recommended, but could it make this much of a difference? Edited I just tried monitoring the bandwidth used to try to see what's happening. I used ibmonitor and measured average bandwidth while I ran the dd test 10 times. I got: avg ~450Mbits writing to ext4 avg ~800Mbits writing to raw device It looks like with ext4, drbd is using about half the bandwidth it uses with the raw device so there's a bottleneck that is not the network.

Read the article

Tuning Linux IP routing parameters -- secret_interval and tcp_mem

- by Jeff Atwood

We had a little failover problem with one of our HAProxy VMs today. When we dug into it, we found this: Jan 26 07:41:45 haproxy2 kernel: [226818.070059] __ratelimit: 10 callbacks suppressed Jan 26 07:41:45 haproxy2 kernel: [226818.070064] Out of socket memory Jan 26 07:41:47 haproxy2 kernel: [226819.560048] Out of socket memory Jan 26 07:41:49 haproxy2 kernel: [226822.030044] Out of socket memory Which, per this link, apparently has to do with low default settings for net.ipv4.tcp_mem. So we increased them by 4x from their defaults (this is Ubuntu Server, not sure if the Linux flavor matters): current values are: 45984 61312 91968 new values are: 183936 245248 367872 After that, we started seeing a bizarre error message: Jan 26 08:18:49 haproxy1 kernel: [ 2291.579726] Route hash chain too long! Jan 26 08:18:49 haproxy1 kernel: [ 2291.579732] Adjust your secret_interval! Shh.. it's a secret!! This apparently has to do with /proc/sys/net/ipv4/route/secret_interval which defaults to 600 and controls periodic flushing of the route cache The secret_interval instructs the kernel how often to blow away ALL route hash entries regardless of how new/old they are. In our environment this is generally bad. The CPU will be busy rebuilding thousands of entries per second every time the cache is cleared. However we set this to run once a day to keep memory leaks at bay (though we've never had one). While we are happy to reduce this, it seems odd to recommend dropping the entire route cache at regular intervals, rather than simply pushing old values out of the route cache faster. After some investigation, we found /proc/sys/net/ipv4/route/gc_elasticity which seems to be a better option for keeping the route table size in check: gc_elasticity can best be described as the average bucket depth the kernel will accept before it starts expiring route hash entries. This will help maintain the upper limit of active routes. We adjusted elasticity from 8 to 4, in the hopes of the route cache pruning itself more aggressively. The secret_interval does not feel correct to us. But there are a bunch of settings and it's unclear which are really the right way to go here. /proc/sys/net/ipv4/route/gc_elasticity (8) /proc/sys/net/ipv4/route/gc_interval (60) /proc/sys/net/ipv4/route/gc_min_interval (0) /proc/sys/net/ipv4/route/gc_timeout (300) /proc/sys/net/ipv4/route/secret_interval (600) /proc/sys/net/ipv4/route/gc_thresh (?) rhash_entries (kernel parameter, default unknown?) We don't want to make the Linux routing worse, so we're kind of afraid to mess with some of these settings. Can anyone advise which routing parameters are best to tune, for a high traffic HAProxy instance?

Read the article

Solaris TCP stack tuning

- by disserman

We have a large web project (about 2-3k requests per second), using haproxy (http://haproxy.1wt.eu/) as a frontend and load balancer between the java application servers. The frontend (haproxy) is running on Linux but we are going to migrate it to the Solaris 10 as all our other servers are running under Solaris. After switching a traffic I see the two things: a) the web site became loading slower (5-10 seconds with images in comparison to 2-3 seconds on Linux) b) sometimes haproxy fails to perform a "lifecheck" (get a special web page and analyze http response code) due to the socket timeout. After switching traffic back to Linux everything is okay. I've tried to tune all params I found in /dev/tcp but no progress. I believe the problem is in some open socket limitations. If someone can point me to the answer, I would be greatly appreciated. p.s. haproxy is running under Xen DomU on Linux (Kernel 2.6.18, Debian 5), under zone on Solaris (10 u8). the only thing we did on Linux is increasing of ip_conntrack_max (I believe Solaris option tcp_conn_req_max_q is the equivalent).

Read the article

Tuning Windows 7 for use in a VM

- by intuited

I'm running Windows 7 in a VirtualBox Virtual Machine, and would like to make it run in a more streamlined fashion. I'll be using the install primarily for testing web apps, and have no need for it to run quickly. I would like it to run with minimal memory requirements, and with minimal changes to its virtual hard drive's contents. Changes to the hard drive contents, for example the paging file, result in larger snapshot sizes. Another recent post of mine seems to be related to this issue, but does not directly address issues with Windows. One concern that I have is that Windows seems to be using 17% of its paging file even with over 900MB of memory marked "Standby" or "Free". My uneducated guess is that this is being used to store indexes or some other data that helps to speed up the system but is not really necessary. I'm also wondering if it's normal for Windows to use over 500 MB of "In Use" memory with no apps running. Will this amount decrease if I reduce the amount of "installed" memory in the VM? What steps can I take to reduce the system's memory footprint without incurring an increase in paging file usage?

Read the article

postgres memory allocation tuning 2

- by pstanton

i've got a Ubuntu Linux system with 12Gb memory most of which (at least 10Gb) can be allocated solely to postgres. the system also has a 6 disk 15k SCSI RAID 10 setup. The process i'm trying to optimise is twofold. firstly a single threaded, single connection will do many inserts into 2-4 tables linked by foreign key. secondly many different complex queries are run against the resulting data, using group by extensively. this part especially needs to be optimised. i have four of these processes running at once in order to make use of the quad core CPU, therefore there will generally be no more than 5 concurrent connections (1 spare for admin tasks). what configuration changes to the default Postgres config would you recommend? I'm looking for the optimum values for things like work_mem, shared_buffers etc. relevant doco thanks!

Search Results

Search found 13249 results on 530 pages for 'performance tuning'.

Page 46/530 | < Previous Page | 42 43 44 45 46 47 48 49 50 51 52 53 | Next Page >

- by Darius Zanganeh

- by MrDatabase

- by makerofthings7

- by Dan Tao

- by Bungle

- by RickL

- by kprobst

- by loren.konkus

- by the.duckman

- by pedalpete

- by Cyrylski

- by Neysor

- by Testas

- by pinaldave

- by FalconNL

- by Abhishek Sha

- by BHS

- by Jeff Atwood

- by disserman

- by intuited

- by pstanton

< Previous Page | 42 43 44 45 46 47 48 49 50 51 52 53 | Next Page >