Search Results

Search found 13608 results on 545 pages for 'performance dashboard'.

Page 18/545 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >

Baseline / Benchmark Physical and virtual server performance

- by EyeonTech

I am setting up a new server and there are some options. I want to perform some benchmarks and I need your help in determining the best tools and if possible run pre-configured benchmarks designed for SQL servers on Windows Server 2008/2012. Step 1. Run a performance monitor on the current Live SQL server (Windows Server 2008 Virtual machine running on ESXi. New server Hardware rundown: Intel® Server System R1304BTLSHBN - 1U Rack, LGA1155 http://ark.intel.com/products/53559/Intel-Server-System-R1304BTLSHBN Intel Xeon E3-1270V2 2x Intel SSD 330 Series 240GB 2.5in SATA 6Gb/s 25nm 1x WD 2TB WD2002FAEX 2TB 64M SATA3 CAVIAR BLACK 4x 8GB 1333MHz DDR3 ECC CL9 DIMM There are several options for configurations and I want to benchmark some of them and share the results. Option 1. Configure 2x SSDs at RAID 0. Install Windows Server 2008 directly to the 2TB WD Caviar HDD. Store Database files on the RAID 0 Volume. Benchmark the OS direct on the hardware as an SQL Server. Store SQL Backup databases on the 2TB WD Caviar HDD. Option 2. Configure 2x SSDs at RAID 0. Install Windows Server 2012 directly to the 2TB WD Caviar HDD. Install Hyper-V. Install the SQL Server (Server 2008) as a virtual machine. Store the Virtual Hard Disks on the SSDs. Option 3. Configure 2x SSDs at RAID 0. Install VMWare ESXi on a partition of the 2TB WD Caviar HDD. Install the SQL Server (Server 2008) as a virtual machine. Store the Virtual Hard Disks on the SSDs. I have a few tools in mind from http://technet.microsoft.com/en-us/library/cc768530(v=bts.10).aspx. Any tools with pre-configured test would be fantastic. Specifically if there are pre-configured perfmon sets avaliable. Any opinions on the setup to gain the best results is welcome. Thanks in advance.

Read the article
MySQL performance over a (local) network much slower than I would expect

- by user15241

MySQL queries in my production environment are taking much longer than I would expect them too. The site in question is a fairly large Drupal site, with many modules installed. The webserver (Nginx) and database server (mysql) are hosted on separated machines, connected by a 100mbps LAN connection (hosted by Rackspace). I have the exact same site running on my laptop for development. Obviously, on my laptop, the webserver and database server are on the same box. Here are the results of my database query times: Production: Executed 291 queries in 320.33 milliseconds. (homepage) Executed 517 queries in 999.81 milliseconds. (content page) Development: Executed 316 queries in 46.28 milliseconds. (homepage) Executed 586 queries in 79.09 milliseconds. (content page) As can clearly be seen from these results, the time involved with querying the MySQL database is much shorter on my laptop, where the MySQL server is running on the same database as the web server. Why is this?! One factor must be the network latency. On average, a round trip from from the webserver to the database server takes 0.16ms (shown by ping). That must be added to every singe MySQL query. So, taking the content page example above, where there are 517 queries executed. Network latency alone will add 82ms to the total query time. However, that doesn't account for the difference I am seeing (79ms on my laptop vs 999ms on the production boxes). What other factors should I be looking at? I had thought about upgrading the NIC to a gigabit connection, but clearly there is something else involved. I have run the MySQL performance tuning script from http://www.day32.com/MySQL/ and it tells me that my database server is configured well (better than my laptop apparently). The only problem reported is "Of 4394 temp tables, 48% were created on disk". This is true in both environments and in the production environment I have even tried increasing max_heap_table_size and Current tmp_table_size to 1GB, with no change (I think this is because I have some BLOB and TEXT columns).

Read the article
Performance Tuning a High-Load Apache Server

- by futureal

I am looking to understand some server performance problems I am seeing with a (for us) heavily loaded web server. The environment is as follows: Debian Lenny (all stable packages + patched to security updates) Apache 2.2.9 PHP 5.2.6 Amazon EC2 large instance The behavior we're seeing is that the web typically feels responsive, but with a slight delay to begin handling a request -- sometimes a fraction of a second, sometimes 2-3 seconds in our peak usage times. The actual load on the server is being reported as very high -- often 10.xx or 20.xx as reported by top. Further, running other things on the server during these times (even vi) is very slow, so the load is definitely up there. Oddly enough Apache remains very responsive, other than that initial delay. We have Apache configured as follows, using prefork: StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 150 MaxRequestsPerChild 0 And KeepAlive as: KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 5 Looking at the server-status page, even at these times of heavy load we are rarely hitting the client cap, usually serving between 80-100 requests and many of those in the keepalive state. That tells me to rule out the initial request slowness as "waiting for a handler" but I may be wrong. Amazon's CloudWatch monitoring tells me that even when our OS is reporting a load of 15, our instance CPU utilization is between 75-80%. Example output from top: top - 15:47:06 up 31 days, 1:38, 8 users, load average: 11.46, 7.10, 6.56 Tasks: 221 total, 28 running, 193 sleeping, 0 stopped, 0 zombie Cpu(s): 66.9%us, 22.1%sy, 0.0%ni, 2.6%id, 3.1%wa, 0.0%hi, 0.7%si, 4.5%st Mem: 7871900k total, 7850624k used, 21276k free, 68728k buffers Swap: 0k total, 0k used, 0k free, 3750664k cached The majority of the processes look like: 24720 www-data 15 0 202m 26m 4412 S 9 0.3 0:02.97 apache2 24530 www-data 15 0 212m 35m 4544 S 7 0.5 0:03.05 apache2 24846 www-data 15 0 209m 33m 4420 S 7 0.4 0:01.03 apache2 24083 www-data 15 0 211m 35m 4484 S 7 0.5 0:07.14 apache2 24615 www-data 15 0 212m 35m 4404 S 7 0.5 0:02.89 apache2 Example output from vmstat at the same time as the above: procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 8 0 0 215084 68908 3774864 0 0 154 228 5 7 32 12 42 9 6 21 0 198948 68936 3775740 0 0 676 2363 4022 1047 56 16 9 15 23 0 0 169460 68936 3776356 0 0 432 1372 3762 835 76 21 0 0 23 1 0 140412 68936 3776648 0 0 280 0 3157 827 70 25 0 0 20 1 0 115892 68936 3776792 0 0 188 8 2802 532 68 24 0 0 6 1 0 133368 68936 3777780 0 0 752 71 3501 878 67 29 0 1 0 1 0 146656 68944 3778064 0 0 308 2052 3312 850 38 17 19 24 2 0 0 202104 68952 3778140 0 0 28 90 2617 700 44 13 33 5 9 0 0 188960 68956 3778200 0 0 8 0 2226 475 59 17 6 2 3 0 0 166364 68956 3778252 0 0 0 21 2288 386 65 19 1 0 And finally, output from Apache's server-status: Server uptime: 31 days 2 hours 18 minutes 31 seconds Total accesses: 60102946 - Total Traffic: 974.5 GB CPU Usage: u209.62 s75.19 cu0 cs0 - .0106% CPU load 22.4 requests/sec - 380.3 kB/second - 17.0 kB/request 107 requests currently being processed, 6 idle workers C.KKKW..KWWKKWKW.KKKCKK..KKK.KKKK.KK._WK.K.K.KKKKK.K.R.KK..C.C.K K.C.K..WK_K..KKW_CK.WK..W.KKKWKCKCKW.W_KKKKK.KKWKKKW._KKK.CKK... KK_KWKKKWKCKCWKK.KKKCK.......................................... ................................................................ From my limited experience I draw the following conclusions/questions: We may be allowing far too many KeepAlive requests I do see some time spent waiting for IO in the vmstat although not consistently and not a lot (I think?) so I am not sure this is a big concern or not, I am less experienced with vmstat Also in vmstat, I see in some iterations a number of processes waiting to be served, which is what I am attributing the initial page load delay on our web server to, possibly erroneously We serve a mixture of static content (75% or higher) and script content, and the script content is often fairly processor intensive, so finding the right balance between the two is important; long term we want to move statics elsewhere to optimize both servers but our software is not ready for that today I am happy to provide additional information if anybody has any ideas, the other note is that this is a high-availability production installation so I am wary of making tweak after tweak, and is why I haven't played with things like the KeepAlive value myself yet.

Read the article
What is recommended minimum object size for gzip performance benefits?

- by utt73

I'm working on improving page speed display times, and one of the methods is to gzip content from the webserver. Google recommends: Note that gzipping is only beneficial for larger resources. Due to the overhead and latency of compression and decompression, you should only gzip files above a certain size threshold; we recommend a minimum range between 150 and 1000 bytes. Gzipping files below 150 bytes can actually make them larger. We serve our content through Akamai, using their network for a proxy and CDN. What they've told me: Following up on your question regarding what is the minimum size Akamai will compress the requested object when sending it to the end user: The minimum size is 860 bytes. My reply: What is the reason(s) for why Akamai's minimum size is 860 bytes? And why, for example, is this not the case for files Akamai serves for facebook? (see below) Google recommends to gzip more agressively. And that seems appropriate on our site where the most frequent hits, by far, are AJAX calls that are <860 bytes. Akamai's response: The reasons 860 bytes is the minimum size for compression is twofold: (1) The overhead of compressing an object under 860 bytes outweighs performance gain. (2) Objects under 860 bytes can be transmitted via a single packet anyway, so there isn't a compelling reason to compress them. So I'm here for some fact checking. Is the 860 byte limit due to packet size the end of this reasoning? Why would high traffic sites push this down to the 150 byte limit... just to save on bandwidth costs (since CDNs base their charges on bandwith offloaded from origin), or is there a performance gain in doing so?

Read the article
How do I measure performance of a virtual server?

- by Sergey

I've got a VPS running Ubuntu. Being a virtual server, I understand that it shares resources with unknown number of other servers, and I'm noticing that it's considerably slower than my desktop machine. Is there some tool to measure the performance of the virtual machine? I'd be curious to see some approximate measure similar to bogomips, possibly for CPU (operations/sec), memory and disk read/write speed. I'd like to be able to compare those numbers to my desktop machine. I'm not interested in the specs of the actual physical machine my VPS is running on - by doing cat /proc/cpuinfo I can see that it's a nice quad-core Xeon machine, but it doesn't matter to me. I'm basically interested in how fast a program would run in my VPS - how many CPU operations it can make in a second, how many bytes to write to RAM or to disk. I only have ssh access to the machine so the tool need to be command-line. I could write a script which, say, does some calculations in a loop for a second and counts how many loops it was able to do, or something similar to measure disk and RAM performance. But I'm sure something like this already exists.

Read the article
Performing client-side OAuth authorized Twitter API calls versus server side, how much of a difference is there in terms of performance?

- by Terence Ponce

I'm working on a Twitter application in Ruby on Rails. One of the biggest arguments that I have with other people on the project is the method of calling the Twitter API. Before, everything was done on the server: OAuth login, updating the user's Twitter data, and retrieving tweets. Retrieving tweets was the heaviest thing to do since we don't store the tweets in our database, so viewing the tweets means that we have to call the API every time. One of the people in the project suggested that we call the tweets through Javascript instead to lessen the load on the server. We used GET search, which, correct me if I'm wrong, will be removed when v1.0 becomes completely deprecated, but that really isn't a concern now. When the Twitter API has migrated completely to v1.1 (again, correct me if I'm wrong), every calls to the API must be authenticated, so we have to authenticate our Javascript requests to the API. As said here: We don't support or recommend performing OAuth directly through Javascript -- it's insecure and puts your application at risk. The only acceptable way to perform it is if you kept all keys and secrets server-side, computed the OAuth signatures and parameters server side, then issued the request client-side from the server-generated OAuth values. If we do exactly what Twitter suggests, the only difference between this and doing everything server-side is that our server won't have to contact the Twitter API anymore every time the user wants to view tweets. Here's how I would picture what's happening every time the user makes a request: If we do it through Javascript, it would be harder on my part because I would have to create the signatures manually for every request, but I will gladly do it if the boost in performance is worth all the trouble. Doing it through Ruby on Rails would be very easy since the Twitter gem does most of the grunt work already, so I'm really encouraging the other people in the project to agree with me. Is the difference in performance trivial or is it significant enough to switch to Javascript?

Read the article
How to squeeze the maximum performance out of Unity and GNOME 3?

- by melvincv

I see that I do not get good performance with the new Unity desktop, but I should say that Unity has improved a lot since the last edition Ubuntu 11.10. How to squeeze the maximum performance out of 1. Unity 2. GNOME 3 My system specs: -Processors- Intel(R) Pentium(R) Dual CPU E2180 @ 2.00GHz -Memory- Total Memory : 2049996 kB -PCI Devices- Host bridge : Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller (rev 10) PCI bridge : Intel Corporation 82G33/G31/P35/P31 Express PCI Express Root Port (rev 10) (prog-if 00 [Normal decode]) VGA compatible controller : Intel Corporation 82G33/G31 Express Integrated Graphics Controller (rev 10) (prog-if 00 [VGA controller]) USB controller : Intel Corporation N10/ICH 7 Family USB UHCI Controller #1 (rev 01) (prog-if 00 [UHCI]) USB controller : Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 01) (prog-if 00 [UHCI]) USB controller : Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 01) (prog-if 00 [UHCI]) USB controller : Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 01) (prog-if 00 [UHCI]) USB controller : Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 01) (prog-if 20 [EHCI]) PCI bridge : Intel Corporation 82801 PCI Bridge (rev e1) (prog-if 01 [Subtractive decode]) ISA bridge : Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) IDE interface : Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) (prog-if 8a [Master SecP PriP]) IDE interface : Intel Corporation N10/ICH7 Family SATA Controller [IDE mode] (rev 01) (prog-if 8f [Master SecP SecO PriP PriO]) SMBus : Intel Corporation N10/ICH 7 Family SMBus Controller (rev 01) Ethernet controller : Intel Corporation PRO/100 VE Network Connection (rev 01)

Read the article
Premature-Optimization and Performance Anxiety

- by James Michael Hare

While writing my post analyzing the new .NET 4 ConcurrentDictionary class (here), I fell into one of the classic blunders that I myself always love to warn about. After analyzing the differences of time between a Dictionary with locking versus the new ConcurrentDictionary class, I noted that the ConcurrentDictionary was faster with read-heavy multi-threaded operations. Then, I made the classic blunder of thinking that because the original Dictionary with locking was faster for those write-heavy uses, it was the best choice for those types of tasks. In short, I fell into the premature-optimization anti-pattern. Basically, the premature-optimization anti-pattern is when a developer is coding very early for a perceived (whether rightly-or-wrongly) performance gain and sacrificing good design and maintainability in the process. At best, the performance gains are usually negligible and at worst, can either negatively impact performance, or can degrade maintainability so much that time to market suffers or the code becomes very fragile due to the complexity. Keep in mind the distinction above. I'm not talking about valid performance decisions. There are decisions one should make when designing and writing an application that are valid performance decisions. Examples of this are knowing the best data structures for a given situation (Dictionary versus List, for example) and choosing performance algorithms (linear search vs. binary search). But these in my mind are macro optimizations. The error is not in deciding to use a better data structure or algorithm, the anti-pattern as stated above is when you attempt to over-optimize early on in such a way that it sacrifices maintainability. In my case, I was actually considering trading the safety and maintainability gains of the ConcurrentDictionary (no locking required) for a slight performance gain by using the Dictionary with locking. This would have been a mistake as I would be trading maintainability (ConcurrentDictionary requires no locking which helps readability) and safety (ConcurrentDictionary is safe for iteration even while being modified and you don't risk the developer locking incorrectly) -- and I fell for it even when I knew to watch out for it. I think in my case, and it may be true for others as well, a large part of it was due to the time I was trained as a developer. I began college in in the 90s when C and C++ was king and hardware speed and memory were still relatively priceless commodities and not to be squandered. In those days, using a long instead of a short could waste precious resources, and as such, we were taught to try to minimize space and favor performance. This is why in many cases such early code-bases were very hard to maintain. I don't know how many times I heard back then to avoid too many function calls because of the overhead -- and in fact just last year I heard a new hire in the company where I work declare that she didn't want to refactor a long method because of function call overhead. Now back then, that may have been a valid concern, but with today's modern hardware even if you're calling a trivial method in an extremely tight loop (which chances are the JIT compiler would optimize anyway) the results of removing method calls to speed up performance are negligible for the great majority of applications. Now, obviously, there are those coding applications where speed is absolutely king (for example drivers, computer games, operating systems) where such sacrifices may be made. But I would strongly advice against such optimization because of it's cost. Many folks that are performing an optimization think it's always a win-win. That they're simply adding speed to the application, what could possibly be wrong with that? What they don't realize is the cost of their choice. For every piece of straight-forward code that you obfuscate with performance enhancements, you risk the introduction of bugs in the long term technical debt of the application. It will become so fragile over time that maintenance will become a nightmare. I've seen such applications in places I have worked. There are times I've seen applications where the designer was so obsessed with performance that they even designed their own memory management system for their application to try to squeeze out every ounce of performance. Unfortunately, the application stability often suffers as a result and it is very difficult for anyone other than the original designer to maintain. I've even seen this recently where I heard a C++ developer bemoaning that in VS2010 the iterators are about twice as slow as they used to be because Microsoft added range checking (probably as part of the 0x standard implementation). To me this was almost a joke. Twice as slow sounds bad, but it almost never as bad as you think -- especially if you're gaining safety. The only time twice is really that much slower is when once was too slow to begin with. Think about it. 2 minutes is slow as a response time because 1 minute is slow. But if an iterator takes 1 microsecond to move one position and a new, safer iterator takes 2 microseconds, this is trivial! The only way you'd ever really notice this would be in iterating a collection just for the sake of iterating (i.e. no other operations). To my mind, the added safety makes the extra time worth it. Always favor safety and maintainability when you can. I know it can be a hard habit to break, especially if you started out your career early or in a language such as C where they are very performance conscious. But in reality, these type of micro-optimizations only end up hurting you in the long run. Remember the two laws of optimization. I'm not sure where I first heard these, but they are so true: For beginners: Do not optimize. For experts: Do not optimize yet. This is so true. If you're a beginner, resist the urge to optimize at all costs. And if you are an expert, delay that decision. As long as you have chosen the right data structures and algorithms for your task, your performance will probably be more than sufficient. Chances are it will be network, database, or disk hits that will be your slow-down, not your code. As they say, 98% of your code's bottleneck is in 2% of your code so premature-optimization may add maintenance and safety debt that won't have any measurable impact. Instead, code for maintainability and safety, and then, and only then, when you find a true bottleneck, then you should go back and optimize further.

Read the article
How to avoid Memory "Hard Fault/sec"

- by Flavio Oliveira

i've a problem on my windows 2008 server x64, and i cannot understand how can i solve it. i'm looking to Resource Monitor and see about 100 to 200 hard faults/sec. and generally the machine is slow. As i've readed a bit it is caused by a "memory Page" that is no longer available on physical memory and causes a io operations (disk) and it is a problem. The current hardware is a intel core2duo E8400 (3.0GHz) with 6GB RAM on a Windows Server Web 64-bit. Actually the machine have about 2GB Ram used what having 4Gb available to use, Why is the machine requires that high level of Disk operations? what can i do to increase the performance? Im experiencing a memory issues? what should be my starting point?

Read the article
SQL server peformance, virtual memory usage

- by user45641

Hello, I have a very large DB used mostly for analytics. The performance overall is very sluggish. I just noticed that when running the query below, the amount of virtual memory used greatly exceeds the amount of physical memory available. Currently, physical memory is 10GB (10238 MB) whereas the virtual memory returns significantly more - 8388607 MB. That seems really wrong, but I'm at a bit of a loss on how to proceed. USE [master]; GO select cpu_count , hyperthread_ratio , physical_memory_in_bytes / 1048576 as 'mem_MB' , virtual_memory_in_bytes / 1048576 as 'virtual_mem_MB' , max_workers_count , os_error_mode , os_priority_class from sys.dm_os_sys_info

Read the article
Benchmarking Java programs

- by stefan-ock

For university, I perform bytecode modifications and analyze their influence on performance of Java programs. Therefore, I need Java programs---in best case used in production---and appropriate benchmarks. For instance, I already got HyperSQL and measure its performance by the benchmark program PolePosition. The Java programs running on a JVM without JIT compiler. Thanks for your help! P.S.: I cannot use programs to benchmark the performance of the JVM or of the Java language itself (such as Wide Finder).

Read the article
Can compressing Program Files save space *and* give a significant boost to SSD performance?

- by Christopher Galpin

Considering solid-state disk space is still an expensive resource, compressing large folders has appeal. Thanks to VirtualStore, could Program Files be a case where it might even improve performance? Discovery In particular I have been reading: SSD and NTFS Compression Speed Increase? Does NTFS compression slow SSD/flash performance? Will somebody benchmark whole disk compression (HD,SSD) please? (may have to scroll up) The first link is particularly dreamy, but maybe head a little too far in the clouds. The third link has this sexy semi-log graph (logarithmic scale!). Quote (with notes): Using highly compressable data (IOmeter), you get at most a 30x performance increase [for reads], and at least a 49x performance DECREASE [for writes]. Assuming I interpreted and clarified that sentence correctly, this single user's benchmark has me incredibly interested. Although write performance tanks wretchedly, read performance still soars. It gave me an idea. Idea: VirtualStore It so happens that thanks to sanity saving security features introduced in Windows Vista, write access to certain folders such as Program Files is virtualized for non-administrator processes. Which means, in normal (non-elevated) usage, a program or game's attempt to write data to its install location in Program Files (which is perhaps a poor location) is redirected to %UserProfile%\AppData\Local\VirtualStore, somewhere entirely different. Thus, to my understanding, writes to Program Files should primarily only occur when installing an application. This makes compressing it not only a huge source of space gain, but also a potential candidate for performance gain. Testing The beginning of this post has me a bit timid, it suggests benchmarking NTFS compression on a whole drive is difficult because turning it off "doesn't decompress the objects". However it seems to me the compact command is perfectly capable of doing so for both drives and individual folders. Could it be only marking them for decompression the next time the OS reads from them? I need to find the answer before I begin my own testing.

Read the article
Performance question: Inverting an array of pointers in-place vs array of values

- by Anders

The background for asking this question is that I am solving a linearized equation system (Ax=b), where A is a matrix (typically of dimension less than 100x100) and x and b are vectors. I am using a direct method, meaning that I first invert A, then find the solution by x=A^(-1)b. This step is repated in an iterative process until convergence. The way I'm doing it now, using a matrix library (MTL4): For every iteration I copy all coeffiecients of A (values) in to the matrix object, then invert. This the easiest and safest option. Using an array of pointers instead: For my particular case, the coefficients of A happen to be updated between each iteration. These coefficients are stored in different variables (some are arrays, some are not). Would there be a potential for performance gain if I set up A as an array containing pointers to these coefficient variables, then inverting A in-place? The nice thing about the last option is that once I have set up the pointers in A before the first iteration, I would not need to copy any values between successive iterations. The values which are pointed to in A would automatically be updated between iterations. So the performance question boils down to this, as I see it: - The matrix inversion process takes roughly the same amount of time, assuming de-referencing of pointers is non-expensive. - The array of pointers does not need the extra memory for matrix A containing values. - The array of pointers option does not have to copy all NxN values of A between each iteration. - The values that are pointed to the array of pointers option are generally NOT ordered in memory. Hopefully, all values lie relatively close in memory, but *A[0][1] is generally not next to *A[0][0] etc. Any comments to this? Will the last remark affect performance negatively, thus weighing up for the positive performance effects?

Read the article
Performance impact: What is the optimal payload for SqlBulkCopy.WriteToServer()?

- by Linchi Shea

For many years, I have been using a C# program to generate the TPC-C compliant data for testing. The program relies on the SqlBulkCopy class to load the data generated by the program into the SQL Server tables. In general, the performance of this C# data loader is satisfactory. Lately however, I found myself in a situation where I needed to generate a much larger amount of data than I typically do and the data needed to be loaded within a confined time frame. So I was driven to look into the code...(read more)

Read the article
Performance triage

- by Dave

Folks often ask me how to approach a suspected performance issue. My personal strategy is informed by the fact that I work on concurrency issues. (When you have a hammer everything looks like a nail, but I'll try to keep this general). A good starting point is to ask yourself if the observed performance matches your expectations. Expectations might be derived from known system performance limits, prototypes, and other software or environments that are comparable to your particular system-under-test. Some simple comparisons and microbenchmarks can be useful at this stage. It's also useful to write some very simple programs to validate some of the reported or expected system limits. Can that disk controller really tolerate and sustain 500 reads per second? To reduce the number of confounding factors it's better to try to answer that question with a very simple targeted program. And finally, nothing beats having familiarity with the technologies that underlying your particular layer. On the topic of confounding factors, as our technology stacks become deeper and less transparent, we often find our own technology working against us in some unexpected way to choke performance rather than simply running into some fundamental system limit. A good example is the warm-up time needed by just-in-time compilers in Java Virtual Machines. I won't delve too far into that particular hole except to say that it's rare to find good benchmarks and methodology for java code. Another example is power management on x86. Power management is great, but it can take a while for the CPUs to throttle up from low(er) frequencies to full throttle. And while I love "turbo" mode, it makes benchmarking applications with multiple threads a chore as you have to remember to turn it off and then back on otherwise short single-threaded runs may look abnormally fast compared to runs with higher thread counts. In general for performance characterization I disable turbo mode and fix the power governor at "performance" state. Another source of complexity is the scheduler, which I've discussed in prior blog entries. Lets say I have a running application and I want to better understand its behavior and performance. We'll presume it's warmed up, is under load, and is an execution mode representative of what we think the norm would be. It should be in steady-state, if a steady-state mode even exists. On Solaris the very first thing I'll do is take a set of "pstack" samples. Pstack briefly stops the process and walks each of the stacks, reporting symbolic information (if available) for each frame. For Java, pstack has been augmented to understand java frames, and even report inlining. A few pstack samples can provide powerful insight into what's actually going on inside the program. You'll be able to see calling patterns, which threads are blocked on what system calls or synchronization constructs, memory allocation, etc. If your code is CPU-bound then you'll get a good sense where the cycles are being spent. (I should caution that normal C/C++ inlining can diffuse an otherwise "hot" method into other methods. This is a rare instance where pstack sampling might not immediately point to the key problem). At this point you'll need to reconcile what you're seeing with pstack and your mental model of what you think the program should be doing. They're often rather different. And generally if there's a key performance issue, you'll spot it with a moderate number of samples. I'll also use OS-level observability tools to lock for the existence of bottlenecks where threads contend for locks; other situations where threads are blocked; and the distribution of threads over the system. On Solaris some good tools are mpstat and too a lesser degree, vmstat. Try running "mpstat -a 5" in one window while the application program runs concurrently. One key measure is the voluntary context switch rate "vctx" or "csw" which reflects threads descheduling themselves. It's also good to look at the user; system; and idle CPU percentages. This can give a broad but useful understanding if your threads are mostly parked or mostly running. For instance if your program makes heavy use of malloc/free, then it might be the case you're contending on the central malloc lock in the default allocator. In that case you'd see malloc calling lock in the stack traces, observe a high csw/vctx rate as threads block for the malloc lock, and your "usr" time would be less than expected. Solaris dtrace is a wonderful and invaluable performance tool as well, but in a sense you have to frame and articulate a meaningful and specific question to get a useful answer, so I tend not to use it for first-order screening of problems. It's also most effective for OS and software-level performance issues as opposed to HW-level issues. For that reason I recommend mpstat & pstack as my the 1st step in performance triage. If some other OS-level issue is evident then it's good to switch to dtrace to drill more deeply into the problem. Only after I've ruled out OS-level issues do I switch to using hardware performance counters to look for architectural impediments.

Read the article
Disabling CPU management

- by Tiffany Walker

If I add the following processor.max_cstate=0 to the kernel command line for boot up, does that disable all CPU power management and throttling? I also found: http://www.experts-exchange.com/OS/Linux/Administration/A_3492-Avoiding-CPU-speed-scaling-in-modern-Linux-distributions-Running-CPU-at-full-speed-Tips.html The link talks of Change CPU governor from 'ondemand' to 'performance' for all CPUs/cores and disabling kondemand from kernel. Server is for web hosting UPDATES: 2.6.32-379.1.1.lve1.1.7.6.el6.x86_64 #1 SMP Sat Aug 4 09:56:37 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux . # dmidecode 2.11 SMBIOS 2.6 present. 74 structures occupying 2878 bytes. Table at 0x0009F000. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: American Megatrends Inc. Version: 1.0c Release Date: 05/27/2010 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 4096 kB Characteristics: ISA is supported PCI is supported PNP is supported BIOS is upgradeable BIOS shadowing is allowed ESCD support is available Boot from CD is supported Selectable boot is supported BIOS ROM is socketed EDD is supported 5.25"/1.2 MB floppy services are supported (int 13h) 3.5"/720 kB floppy services are supported (int 13h) 3.5"/2.88 MB floppy services are supported (int 13h) Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) CGA/mono video services are supported (int 10h) ACPI is supported USB legacy is supported LS-120 boot is supported ATAPI Zip drive boot is supported BIOS boot specification is supported Targeted content distribution is supported BIOS Revision: 8.16 Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: Supermicro Product Name: X8SIE Version: 0123456789 Serial Number: 0123456789 UUID: 49434D53-0200-9033-2500-33902500D52C Wake-up Type: Power Switch SKU Number: To Be Filled By O.E.M. Family: To Be Filled By O.E.M. Handle 0x0002, DMI type 2, 15 bytes Base Board Information Manufacturer: Supermicro Product Name: X8SIE Version: 0123456789 Serial Number: VM11S61561 Asset Tag: To Be Filled By O.E.M. Features: Board is a hosting board Board is replaceable Location In Chassis: To Be Filled By O.E.M. Chassis Handle: 0x0003 Type: Motherboard Contained Object Handles: 0 Handle 0x0003, DMI type 3, 21 bytes Chassis Information Manufacturer: Supermicro Type: Sealed-case PC Lock: Not Present Version: 0123456789 Serial Number: 0123456789 Asset Tag: To Be Filled By O.E.M. Boot-up State: Safe Power Supply State: Safe Thermal State: Safe Security Status: None OEM Information: 0x00000000 Height: Unspecified Number Of Power Cords: 1 Contained Elements: 0

Read the article
Very long (>300s) request processing time on Apache Server serving static content from particular IP

- by Ron Bieber

We are running an Apache 2.2 server for a very large web site. Over the past few months we have been having some users reporting slow response times, while others (including our resources, both on the internal network and our home networks) do not see any degradation in performance. After a ton of investigation, we finally found a "Deny from none" statement in our configuration that was causing reverse DNS lookups (which were timing out) that solved the bulk of our issues, but we still have some customers that we are seeing in the Apache logs (using %D in the log format) with request processing times of 300s for images, css, javascript and other static content. We've checked all Deny / Allow statements for reoccurrence of "none", as well as all other things we know of that would cause reverse DNS lookups (such as using "REMOTE_HOST" in rewrite rules, using %a instead of %h in our log format configuration) as well as verified that HostnameLookups is set to "Off". As an aside, we've also validated that reverse DNS lookups for folks having this problem do not time out - so I'm fairly certain DNS is not an issue in this case. I've run out of ideas. Are there any Apache configuration scenarios that someone can point me to that I might be missing that would cause request times for static content to take so long only for certain users? Thank you in advance.

Read the article
Understanding RedHats recommended tuned profiles

- by espenfjo

We are going to roll out tuned (and numad) on ~1000 servers, the majority of them being VMware servers either on NetApp or 3Par storage. According to RedHats documentation we should choose the virtual-guestprofile. What it is doing can be seen here: tuned.conf We are changing the IO scheduler to NOOP as both VMware and the NetApp/3Par should do sufficient scheduling for us. However, after investigating a bit I am not sure why they are increasing vm.dirty_ratio and kernel.sched_min_granularity_ns. As far as I have understood increasing increasing vm.dirty_ratio to 40% will mean that for a server with 20GB ram, 8GB can be dirty at any given time unless vm.dirty_writeback_centisecsis hit first. And while flushing these 8GB all IO for the application will be blocked until the dirty pages are freed. Increasing the dirty_ratio would probably mean higher write performance at peaks as we now have a larger cache, but then again when the cache fills IO will be blocked for a considerably longer time (Several seconds). The other is why they are increasing the sched_min_granularity_ns. If I understand it correctly increasing this value will decrease the number of time slices per epoch(sched_latency_ns) meaning that running tasks will get more time to finish their work. I can understand this being a very good thing for applications with very few threads, but for eg. apache or other processes with a lot of threads would this not be counter-productive?

Read the article
F# performance in scientific computing

- by aaa

hello. I am curious as to how F# performance compares to C++ performance? I asked a similar question with regards to Java, and the impression I got was that Java is not suitable for heavy numbercrunching. I have read that F# is supposed to be more scalable and more performant, but how is this real-world performance compares to C++? specific questions about current implementation are: How well does it do floating-point? Does it allow vector instructions how friendly is it towards optimizing compilers? How big a memory foot print does it have? Does it allow fine-grained control over memory locality? does it have capacity for distributed memory processors, for example Cray? what features does it have that may be of interest to computational science where heavy number processing is involved? Are there actual scientific computing implementations that use it? Thanks

Read the article
VS2010 + Resharper 5 performance issues

- by Jeremy Roberts

I have been using VS2010 with Resharper 5 for several weeks and am having a performance issue. Sometimes when typing, the cursor will lag and the keystrokes won't show instantaneously. Also, scrolling will lag at times. There is a forum thread started and JetBrains has been responding. Several people (including myself) have added their voice and uploaded some performance profiles. If anyone here has has this issue, I would encourage you to visit the thread and let JetBrains know about it. Has anyone had this problem and have a suggestion to restore performance?

Read the article
Measuring Web Page Performance on Client vs. Server

- by Yaakov Ellis

I am working with a web page (ASP.net 3.5) that is very complicated and in certain circumstances has major performance issues. It uses Ajax (through the Telerik AjaxManager) for most of its functionality. I would like to be able to measure in some way the amounts of time for the following, for each request: On client submitting request to server Client-to-Server On server initializing request On server processing request Server-to-Client Client rendering, JavaScript processing I have monitored the database traffic and cannot find any obvious culprit. On the other hand, I have a suspicion that some of the Ajax interactions are causing performance issues. However, until I have a way to track the times involved, make a baseline measurement, and measure performance as I tweak, it will be hard to work on the issue. So what is the best way to measure all of these? Is there one tool that can do it? Combination of FireBug and logging inserted into different places in the page life-cycle?

Read the article
Looking for SQL Server Performance Monitor Tools

- by the-locster

I may be approaching this problem from the wrong angle but what I'm thinking of is some kind of performance monitor tool for SQl server that works in a similar way to code performance tools, e.g. I;d like to see an output of how many times each stored procedure was called, average executuion time and possibly various resource usage stats such as cache/index utilisation, resultign disk access and table scans, etc. As far as I can tell the performance monitor that comes with SQL Server just logs the various calls but doesn't report he variosu stats I'm looking for. Potentially I just need a tool to analyze the log output?

Read the article
Performance Counters Registry validation

- by anchandra

I have a C# application that adds some performance counters when it starts up. But if the registry HKEY_LOCAL_MACHINE-SOFTWARE-Microsoft-Windows NT-CurrentVersion-Perflib is corrupted (missing or invalid data), the operation of checking the existence of the performance counters (PerformanceCounterCategory.Exists(category) takes a really long time (around 30 secs) before finally throwing exception (InvalidOperation: Category does not exist). My question is how can i verify the validity of the registry before trying to add the performance counters (and what validity means) or if there is a way i can timeout the perf counter operations, so that it doesn't take 30 seconds to get an exception.

Read the article
Entity Framework associations killing performance

- by Chris

Here is the performance test i am looking at. I have 8 different entities that are table per type. Some of the entities contain over 100 thousand rows. This particular application does several recursive calculations on the client so I think it may be best to preload the data instead of lazy loading. If there are no associations I can load the entire database in about 3 seconds. As I add associations in any way the performance starts to drastically decline. I am loading all the data the same way (just calling toList() on the entity attached to the context). I ran the test with edmx generated classes and self tracking entities and had similar results. I am sure if I were to try and deal with the associations myself, similar to how I would in a dataset, the performance problem would go away. On the other hand I am pretty sure this is not how the entity framework was intended to being used. Any thoughts or ideas?

Read the article
Performance statistics hooks

- by tinny

Lets be honest, most software that developers produce has quite modest performance requirements. E.g. Systems perhaps serving 100's of requests per second, if that. But lets assume for a moment (or even dream) that you where perhaps involved in the "next big thing" (whatever that means) and you wanted to put some sort of performance statistics logging in place to help you out when all those users come flying in. Performance statistics logging, how would you approach this requirement? Perhaps you would use some sort of generic framework for this? Or roll your own solution? What would you log? How granular? Or would you not even bother putting anything in place and rather deal with this issue when it actually became an issue? It would be really interesting to hear your thoughts on this topic.

Read the article

Search Results

Search found 13608 results on 545 pages for 'performance dashboard'.

Page 18/545 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >

- by EyeonTech

- by user15241

- by futureal

- by utt73

- by Sergey

- by Terence Ponce

- by melvincv

- by James Michael Hare

- by Flavio Oliveira

- by user45641

- by stefan-ock

- by Christopher Galpin

- by Anders

- by Linchi Shea

- by Dave

- by Tiffany Walker

- by Ron Bieber

- by espenfjo

- by aaa

- by Jeremy Roberts

- by Yaakov Ellis

- by the-locster

- by anchandra

- by Chris

- by tinny

< Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >