Search Results

Search found 13757 results on 551 pages for 'performance diagnostics'.

Page 91/551 | < Previous Page | 87 88 89 90 91 92 93 94 95 96 97 98  | Next Page >

  • Faster zlib alternatives

    - by BarsMonster
    I wonder, if there are any faster builds of zlib around with more advanced optimizations? If it's possible to optimize it using SSE instructions or Intel C++ compiller, or some trick which were patented earlier (I know patents were a serious limitation during gzip/zlib development), have anyone bothered to implement that? I am especially interested in compression speed, which have a direct impact on high-performance web-services serving static & dynamic content.

    Read the article

  • Echo 404 directly from nginx to improve performance

    - by user64204
    I am in charge of production servers serving static content for a website. Those servers are constantly being crawled by bots looking for potential exploits (which isn't that much of a problem security-wise because no application can be reached behind the web server) but generates thousands of 404 per day, sometimes per hour. I am looking into ways of blocking those requests but it's tricky (you want to make sure you don't block legitimate traffic and these bots are becoming more and more clever at looking like they're legit) and is going to take me a while to find an acceptable solution. In the meantime I would like to reduce the performance impact of serving those 404 pages. Indeed we're using nginx which by default is configured to serve it's 404 page from the disk (This can be changed using the error_page directive but in the end the 404 will either have to be served from disk or from another external source (e.g. upstream application which would be worst)) which isn't ideal. I ran a test with ab on my local machine with a basic configuration: in one case I echo a message directly from nginx so the disk isn't touched at all, in the other case I hit a missing page and nginx serves its 404 from disk. server { # [...] the default nginx stuff location / { } location /this_page_exists { echo "this page was found"; } } Here are the test results (my laptop has Intel(R) Core(TM) i7-2670QM + SSD in case you're wondering why they are so high): $ ab -n 500000 -c 1000 http://localhost/this_page_exists Requests per second: 25609.16 [#/sec] (mean) $ ab -n 500000 -c 1000 http://localhost/this_page_doesnt_exists Requests per second: 22905.72 [#/sec] (mean) As you can see, returning a value with echo is 11% ((25609-22905)÷22905×100) faster than serving the 404 page from disk. Accordingly I would like to echo a simple 404 Page not Found string from nginx. I tried many things so far but they all failed, essentially the idea was this: location / { try_files $uri @not_found; } location @not_found { echo "404 - Page not found"; } The problem is that as soon as the echo directive is used, the http response code is set to 200. I tried changing that by doing error_page 200 = 400 but that breaks the configuration. How can I serve a 404 page directly from nginx? (without hacking the source which may be might next step)

    Read the article

  • Why does Kerberos need Ticket Granting Server?

    - by Narsil
    It's probably something fundamental but I can't find a certain statement. Why can't KDC authenticate then provide the service ticket directly. Is it about security or performance or some other thing? Since users don't log in each time they request a service and assumably they will keep logged in for a long time, AS doesn't seem so busy. Why do they have to be seperated?

    Read the article

  • Computer "Server"

    - by user328379
    so at home we had the idea of instead of buying 3 different pc's we would somehow create a "server" for the computers where a cable would come to our screens and keyboard and mouses, so the actual pc was somewhere else in the house with all the others. Does such a thing exist? And is it possible to have such a thing for high performance workflow? (Compiling, High-End Games, just as if it was a separate pc )

    Read the article

  • AWS Free Usage Tier + Cloudflare... possible?

    - by crashintoty
    If I throw my MySQL/PHP app up on a Amazon EC2 instance (using their AWS Free Usage Tier program) and couple it with CloudFlare (the free plan of course) roughly how many daily visitors can I comfortably handle before performance starts to suffer? Just looking for a rough estimate or educated guess - I understand this setup might be less than ideal but I'm still very curious nonetheless. Thanks in advance

    Read the article

  • Up-to-date Comparison of High-Speed USB Flash Drives

    - by Zoredache
    I am looking for comparison of the performance of USB flash drives. I have found several older comparisons, but I am trying to find a more up-to-date comparisons that apply to the larger storage sizes (32-128GB). I can try looking up the specs of various drives, but vendors have been known to exaggerate, or use numbers that are on accurate in tests that do not reflect actual usage. I was hoping to find 3rd party site which had perform testing.

    Read the article

  • Clean install vs disk image

    - by Thanos
    Once a year I am making a clean install on windows, in order to keep my system fast. After posting a question on making a bootable windows usb with exe programs where I was adviced to make a disk image, a new question rose. What is the difference in making a disk image and performing a clean install on windows? Which is better in terms of speed, general performance, value for time and transfering between different computers?

    Read the article

  • How to measure TCP connection time in Linux

    - by Paul Draper
    I want to measure the overhead in creating a TCP connection. I know of many tools like hping and netperf, but they seem oriented at measuring latency. I want to know how long the 3-way handshake takes, and allocating any buffers, etc., and then closing it. So I want to open a real, legitimate TCP connection, and then close it. Are there any tools that will do that and help me measure performance?

    Read the article

  • Possible reasons for high CPU load of taskmgr.exe process on VM?

    - by mjn
    On a VMware virtual machine which has severe performance problems I can see a constant average of 20+ percent CPU load for the TASKMGR.EXE (task manager) process. The apps running on this server have lower load, around 4 to 10 percent average. The VM is running Windows 2003 Server Standard with 3.75 GB assigned RAM. I suspect that the task manager CPU load has something to do with other VM instances on the VMWare server but could not see a similar value on internal ESXi systems (the problematic VM runs in the customers IT).

    Read the article

  • How to get Ubuntu to perform better on an older computer?

    - by alex
    Ubuntu 9.1 runs quite slugglish on my old laptop from 2004. Slower than Windows XP that was on there. It has 512mb RAM and probably 1.2ghz (can't remember) CPU. I have turned off Visual Effects under Appearance Preferences. Are there any other tricks to get better performance, or do I just need a better computer to try Ubuntu? Thanks

    Read the article

  • Can a SQL Server have a CPU bottleneck when Processor Time is under 30%

    - by Sleepless
    Is it in principle possible for the CPU to be the bottleneck on a SQL Server if the Performance Counter Processor:Processor Time is constantly under 30% on all cores? Or does low Processor Time automatically allow me to rule out the CPU as a potential trouble source? I am asking this because SQL Nexus lists CPU as the top bottleneck on a server with low Processor Time values.

    Read the article

  • Performance in backpropagation algorithm

    - by Taban
    I've written a matlab program for standard backpropagation algorithm, it is my homework and I should not use matlab toolbox, so I write the entire code by myself. This link helped me for backpropagation algorithm. I have a data set of 40 random number and initial weights randomly. As output, I want to see a diagram that shows the performance. I used mse and plot function to see performance for 20 epochs but the result is this: I heard that performance should go up through backpropagation, so I want to know is there any problem with my code or this result is normal because local minimums. This is my code: Hidden_node=inputdlg('Enter the number of Hidden nodes'); a=0.5;%initialize learning rate hiddenn=str2num(Hidden_node{1,1}); randn('seed',0); %creating data set s=2; N=10; m=[5 -5 5 5;-5 -5 5 -5]; S = s*eye(2); [l,c] = size(m); x = []; % Creating the training set for i = 1:c x = [x mvnrnd(m(:,i)',S,N)']; end % target value toutput=[ones(1,N) zeros(1,N) ones(1,N) zeros(1,N)]; for epoch=1:20; %number of epochs for kk=1:40; %number of patterns %initial weights of hidden layer for ii=1 : 2; for jj=1 :hiddenn; whidden{ii,jj}=rand(1); end end initial the wights of output layer for ii=1 : hiddenn; woutput{ii,1}=rand(1); end for ii=1:hiddenn; x1=x(1,kk); x2=x(2,kk); w1=whidden{1,ii}; w2=whidden{2,ii}; activation{1,ii}=(x1(1,1)*w1(1,1))+(x2(1,1)*w2(1,1)); end %calculate output of hidden nodes for ii=1:hiddenn; hidden_to_out{1,ii}=logsig(activation{1,ii}); end activation_O{1,1}=0; for jj=1:hiddenn; activation_O{1,1} = activation_O{1,1}+(hidden_to_out{1,jj}*woutput{jj,1}); end %calculate output out{1,1}=logsig(activation_O{1,1}); out_for_plot(1,kk)= out{1,ii}; %calculate error for output node delta_out{1,1}=(toutput(1,kk)-out{1,1}); %update weight of output node for ii=1:hiddenn; woutput{ii,jj}=woutput{ii,jj}+delta_out{1,jj}*hidden_to_out{1,ii}*dlogsig(activation_O{1,jj},logsig(activation_O{1,jj}))*a; end %calculate error of hidden nodes for ii=1:hiddenn; delta_hidden{1,ii}=woutput{ii,1}*delta_out{1,1}; end %update weight of hidden nodes for ii=1:hiddenn; for jj=1:2; whidden{jj,ii}= whidden{jj,ii}+(delta_hidden{1,ii}*dlogsig(activation{1,ii},logsig(activation{1,ii}))*x(jj,kk)*a); end end a=a/(1.1);%decrease learning rate end %calculate performance e=toutput(1,kk)-out_for_plot(1,1); perf(1,epoch)=mse(e); end plot(perf); Thanks a lot.

    Read the article

  • optimizing iPhone OpenGL ES fill rate

    - by NateS
    I have an Open GL ES game on the iPhone. My framerate is pretty sucky, ~20fps. Using the Xcode OpenGL ES performance tool on an iPhone 3G, it shows: Renderer Utilization: 95% to 99% Tiler Utilization: ~27% I am drawing a lot of pretty large images with a lot of blending. If I reduce the number of images drawn, framerates go from ~20 to ~40, though the performance tool results stay about the same (renderer still maxed). I think I'm being limited by the fill rate of the iPhone 3G, but I'm not sure. My questions are: How can I determine with more granularity where the bottleneck is? That is my biggest problem, I just don't know what is taking all the time. If it is fillrate, is there anything I do to improve it besides just drawing less? I am using texture atlases. I have tried to minimize image binds, though it isn't always possible (drawing order, not everything fits on one 1024x1024 texture, etc). Every frame I do 10 image binds. This seem pretty reasonable, but I could be mistaken. I'm using vertex arrays and glDrawArrays. I don't really have a lot of geometry. I can try to be more precise if needed. Each image is 2 triangles and I try to batch things were possible, though often (maybe half the time) images are drawn with individual glDrawArrays calls. Besides the images, I have ~60 triangles worth of geometry being rendered in ~6 glDrawArrays calls. I often glTranslate before calling glDrawArrays. Would it improve the framerate to switch to VBOs? I don't think it is a huge amount of geometry, but maybe it is faster for other reasons? Are there certain things to watch out for that could reduce performance? Eg, should I avoid glTranslate, glColor4g, etc? I'm using glScissor in a 3 places per frame. Each use consists of 2 glScissor calls, one to set it up, and one to reset it to what it was. I don't know if there is much of a performance impact here. If I used PVRTC would it be able to render faster? Currently all my images are GL_RGBA. I don't have memory issues. Here is a rough idea of what I'm drawing, in this order: 1) Switch to perspective matrix. 2) Draw a full screen background image 3) Draw a full screen image with translucency (this one has a scrolling texture). 4) Draw a few sprites. 5) Switch to ortho matrix. 6) Draw a few sprites. 7) Switch to perspective matrix. 8) Draw sprites and some other textured geometry. 9) Switch to ortho matrix. 10) Draw a few sprites (eg, game HUD). Steps 1-6 draw a bunch of background stuff. 8 draws most of the game content. 10 draws the HUD. As you can see, there are many layers, some of them full screen and some of the sprites are pretty large (1/4 of the screen). The layers use translucency, so I have to draw them in back-to-front order. This is further complicated by needing to draw various layers in ortho and others in perspective. I will gladly provide additional information if reqested. Thanks in advance for any performance tips or general advice on my problem!

    Read the article

  • Performance of Cluster Shared Volume file copy from SAN

    - by Sequenzia
    I am hoping someone can help me out with a strange issue. We are running a Microsoft Failover Cluster with Server 2008 R2 and an Equallogic PS4000 SAN. Our main configuration has 2 Dell Poweredge T710 Servers in the cluster. We have CSV and Quorm setup. The servers each have 10 Broadcom 1Gb NICs. Right now 4 of the NICS are on the iSCSI network for accessing the SAN. They use MPIO and the Dell HIT pack. We have 5 VMs running on each node and everything runs smooth. No noticeable performance issues or anything. From the SAN I can see the 4 iSCSI connections from each server to each volume (CSV and Quorm). Again, it seems to perform great. The problem I am running into is with backups. I have tried a few backup programs like backupchain and Veeam. The problem is both of them are very very slow to backup the VMs. For instance I have a 500GB (fixed disc) VHD that’s running on the cluster. It takes over 18 hours to backup that VHD and that’s with compression and depuping turned off which is supposed to be the fasted. We also have a separate server that is just for backups. It has a lot of directed attached storage. As part of the troubleshooting I decided to bring that server into the cluster as a node. It now has access to the CSV and can read from C:\clusterstorage\volume1 which is where our VHDs live. This backup server only has 2 NICs. 1 NIC is going to the iSCSI network and the other is just on the main network. It has Intel NICS in it without any sort of MPIO or teaming. So with the 3rd server now in the cluster I started doing some benchmarking. I have a test VHD that’s about 7GBs that’s stored in the CSV. I have tested file copying that VHD from all 3 servers to directed attached storage in the respective server. The 2 Dell servers that are the main nodes in the cluster (they house the VMs) are reading that file at about 20Mbs/Sec. Which at that rate is way to slow for the backups. The other server which only has 1 NIC to the SAN is reading at around 100Mbs/Sec. I spent a few hours on the phone with Dell today about this . We went through all kind of tests and he was pretty dumb founded. He really has no idea why that server with only 1 NIC is reading about 5 times as fast as the servers with 4 NICS and MPIO. We looked at the network utilization of the NICs while the file copy was going on. The servers with the 4 NICs had a small increase of activity during the file copy but they only went up to around 8-10% on all 4 NICs. The other server with the 1 NIC jumped up to over 80% during the file copy. I plan on doing some more testing after hours and calling Dell back tomorrow but I really am confused (and so is Dell’s support rep) why I cannot get faster file copy access to the CSV on those servers. Anyone have any input on this? Any feedback would be greatly appreciated. Thanks in advance.

    Read the article

  • Improving SAS multipath to JBOD performance on Linux

    - by user36825
    Hello all I'm trying to optimize a storage setup on some Sun hardware with Linux. Any thoughts would be greatly appreciated. We have the following hardware: Sun Blade X6270 2* LSISAS1068E SAS controllers 2* Sun J4400 JBODs with 1 TB disks (24 disks per JBOD) Fedora Core 12 2.6.33 release kernel from FC13 (also tried with latest 2.6.31 kernel from FC12, same results) Here's the datasheet for the SAS hardware: http://www.sun.com/storage/storage_networking/hba/sas/PCIe.pdf It's using PCI Express 1.0a, 8x lanes. With a bandwidth of 250 MB/sec per lane, we should be able to do 2000 MB/sec per SAS controller. Each controller can do 3 Gb/sec per port and has two 4 port PHYs. We connect both PHYs from a controller to a JBOD. So between the JBOD and the controller we have 2 PHYs * 4 SAS ports * 3 Gb/sec = 24 Gb/sec of bandwidth, which is more than the PCI Express bandwidth. With write caching enabled and when doing big writes, each disk can sustain about 80 MB/sec (near the start of the disk). With 24 disks, that means we should be able to do 1920 MB/sec per JBOD. multipath { rr_min_io 100 uid 0 path_grouping_policy multibus failback manual path_selector "round-robin 0" rr_weight priorities alias somealias no_path_retry queue mode 0644 gid 0 wwid somewwid } I tried values of 50, 100, 1000 for rr_min_io, but it doesn't seem to make much difference. Along with varying rr_min_io I tried adding some delay between starting the dd's to prevent all of them writing over the same PHY at the same time, but this didn't make any difference, so I think the I/O's are getting properly spread out. According to /proc/interrupts, the SAS controllers are using a "IR-IO-APIC-fasteoi" interrupt scheme. For some reason only core #0 in the machine is handling these interrupts. I can improve performance slightly by assigning a separate core to handle the interrupts for each SAS controller: echo 2 /proc/irq/24/smp_affinity echo 4 /proc/irq/26/smp_affinity Using dd to write to the disk generates "Function call interrupts" (no idea what these are), which are handled by core #4, so I keep other processes off this core too. I run 48 dd's (one for each disk), assigning them to cores not dealing with interrupts like so: taskset -c somecore dd if=/dev/zero of=/dev/mapper/mpathx oflag=direct bs=128M oflag=direct prevents any kind of buffer cache from getting involved. None of my cores seem maxed out. The cores dealing with interrupts are mostly idle and all the other cores are waiting on I/O as one would expect. Cpu0 : 0.0%us, 1.0%sy, 0.0%ni, 91.2%id, 7.5%wa, 0.0%hi, 0.2%si, 0.0%st Cpu1 : 0.0%us, 0.8%sy, 0.0%ni, 93.0%id, 0.2%wa, 0.0%hi, 6.0%si, 0.0%st Cpu2 : 0.0%us, 0.6%sy, 0.0%ni, 94.4%id, 0.1%wa, 0.0%hi, 4.8%si, 0.0%st Cpu3 : 0.0%us, 7.5%sy, 0.0%ni, 36.3%id, 56.1%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 1.3%sy, 0.0%ni, 85.7%id, 4.9%wa, 0.0%hi, 8.1%si, 0.0%st Cpu5 : 0.1%us, 5.5%sy, 0.0%ni, 36.2%id, 58.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 5.0%sy, 0.0%ni, 36.3%id, 58.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 5.1%sy, 0.0%ni, 36.3%id, 58.5%wa, 0.0%hi, 0.0%si, 0.0%st Cpu8 : 0.1%us, 8.3%sy, 0.0%ni, 27.2%id, 64.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu9 : 0.1%us, 7.9%sy, 0.0%ni, 36.2%id, 55.8%wa, 0.0%hi, 0.0%si, 0.0%st Cpu10 : 0.0%us, 7.8%sy, 0.0%ni, 36.2%id, 56.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu11 : 0.0%us, 7.3%sy, 0.0%ni, 36.3%id, 56.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu12 : 0.0%us, 5.6%sy, 0.0%ni, 33.1%id, 61.2%wa, 0.0%hi, 0.0%si, 0.0%st Cpu13 : 0.1%us, 5.3%sy, 0.0%ni, 36.1%id, 58.5%wa, 0.0%hi, 0.0%si, 0.0%st Cpu14 : 0.0%us, 4.9%sy, 0.0%ni, 36.4%id, 58.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 0.1%us, 5.4%sy, 0.0%ni, 36.5%id, 58.1%wa, 0.0%hi, 0.0%si, 0.0%st Given all this, the throughput reported by running "dstat 10" is in the range of 2200-2300 MB/sec. Given the math above I would expect something in the range of 2*1920 ~= 3600+ MB/sec. Does anybody have any idea where my missing bandwidth went? Thanks!

    Read the article

  • MySQL Memory usage

    - by Rob Stevenson-Leggett
    Our MySQL server seems to be using a lot of memory. I've tried looking for slow queries and queries with no index and have halved the peak CPU usage and Apache memory usage but the MySQL memory stays constantly at 2.2GB (~51% of available memory on the server). Here's the graph from Plesk. Running top in the SSH window shows the same figures. Does anyone have any ideas on why the memory usage is constant like this and not peaks and troughs with usage of the app? Here's the output of the MySQL Tuning Primer script: -- MYSQL PERFORMANCE TUNING PRIMER -- - By: Matthew Montgomery - MySQL Version 5.0.77-log x86_64 Uptime = 1 days 14 hrs 4 min 21 sec Avg. qps = 22 Total Questions = 3059456 Threads Connected = 13 Warning: Server has not been running for at least 48hrs. It may not be safe to use these recommendations To find out more information on how each of these runtime variables effects performance visit: http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html Visit http://www.mysql.com/products/enterprise/advisors.html for info about MySQL's Enterprise Monitoring and Advisory Service SLOW QUERIES The slow query log is enabled. Current long_query_time = 1 sec. You have 6 out of 3059477 that take longer than 1 sec. to complete Your long_query_time seems to be fine BINARY UPDATE LOG The binary update log is NOT enabled. You will not be able to do point in time recovery See http://dev.mysql.com/doc/refman/5.0/en/point-in-time-recovery.html WORKER THREADS Current thread_cache_size = 0 Current threads_cached = 0 Current threads_per_sec = 2 Historic threads_per_sec = 0 Threads created per/sec are overrunning threads cached You should raise thread_cache_size MAX CONNECTIONS Current max_connections = 100 Current threads_connected = 14 Historic max_used_connections = 20 The number of used connections is 20% of the configured maximum. Your max_connections variable seems to be fine. INNODB STATUS Current InnoDB index space = 6 M Current InnoDB data space = 18 M Current InnoDB buffer pool free = 0 % Current innodb_buffer_pool_size = 8 M Depending on how much space your innodb indexes take up it may be safe to increase this value to up to 2 / 3 of total system memory MEMORY USAGE Max Memory Ever Allocated : 2.07 G Configured Max Per-thread Buffers : 274 M Configured Max Global Buffers : 2.01 G Configured Max Memory Limit : 2.28 G Physical Memory : 3.84 G Max memory limit seem to be within acceptable norms KEY BUFFER Current MyISAM index space = 4 M Current key_buffer_size = 7 M Key cache miss rate is 1 : 40 Key buffer free ratio = 81 % Your key_buffer_size seems to be fine QUERY CACHE Query cache is supported but not enabled Perhaps you should set the query_cache_size SORT OPERATIONS Current sort_buffer_size = 2 M Current read_rnd_buffer_size = 256 K Sort buffer seems to be fine JOINS Current join_buffer_size = 132.00 K You have had 16 queries where a join could not use an index properly You should enable "log-queries-not-using-indexes" Then look for non indexed joins in the slow query log. If you are unable to optimize your queries you may want to increase your join_buffer_size to accommodate larger joins in one pass. Note! This script will still suggest raising the join_buffer_size when ANY joins not using indexes are found. OPEN FILES LIMIT Current open_files_limit = 1024 files The open_files_limit should typically be set to at least 2x-3x that of table_cache if you have heavy MyISAM usage. Your open_files_limit value seems to be fine TABLE CACHE Current table_cache value = 64 tables You have a total of 426 tables You have 64 open tables. Current table_cache hit rate is 1% , while 100% of your table cache is in use You should probably increase your table_cache TEMP TABLES Current max_heap_table_size = 16 M Current tmp_table_size = 32 M Of 15134 temp tables, 9% were created on disk Effective in-memory tmp_table_size is limited to max_heap_table_size. Created disk tmp tables ratio seems fine TABLE SCANS Current read_buffer_size = 128 K Current table scan ratio = 2915 : 1 read_buffer_size seems to be fine TABLE LOCKING Current Lock Wait ratio = 1 : 142213 Your table locking seems to be fine The app is a facebook game with about 50-100 concurrent users. Thanks, Rob

    Read the article

  • Linux software RAID6: rebuild slow

    - by Ole Tange
    I am trying to find the bottleneck in the rebuilding of a software raid6. ## Pause rebuilding when measuring raw I/O performance # echo 1 > /proc/sys/dev/raid/speed_limit_min # echo 1 > /proc/sys/dev/raid/speed_limit_max ## Drop caches so that does not interfere with measuring # sync ; echo 3 | tee /proc/sys/vm/drop_caches >/dev/null # time parallel -j0 "dd if=/dev/{} bs=256k count=4000 | cat >/dev/null" ::: sdbd sdbc sdbf sdbm sdbl sdbk sdbe sdbj sdbh sdbg 4000+0 records in 4000+0 records out 1048576000 bytes (1.0 GB) copied, 7.30336 s, 144 MB/s [... similar for each disk ...] # time parallel -j0 "dd if=/dev/{} skip=15000000 bs=256k count=4000 | cat >/dev/null" ::: sdbd sdbc sdbf sdbm sdbl sdbk sdbe sdbj sdbh sdbg 4000+0 records in 4000+0 records out 1048576000 bytes (1.0 GB) copied, 12.7991 s, 81.9 MB/s [... similar for each disk ...] So we can read sequentially at 140 MB/s in the outer tracks and 82 MB/s in the inner tracks on all the drives simultaneously. Sequential write performance is similar. This would lead me to expect a rebuild speed of 82 MB/s or more. # echo 800000 > /proc/sys/dev/raid/speed_limit_min # echo 800000 > /proc/sys/dev/raid/speed_limit_max # cat /proc/mdstat md2 : active raid6 sdbd[10](S) sdbc[9] sdbf[0] sdbm[8] sdbl[7] sdbk[6] sdbe[11] sdbj[4] sdbi[3](F) sdbh[2] sdbg[1] 27349121408 blocks super 1.2 level 6, 128k chunk, algorithm 2 [9/8] [UUU_UUUUU] [=========>...........] recovery = 47.3% (1849905884/3907017344) finish=855.9min speed=40054K/sec But we only get 40 MB/s. And often this drops to 30 MB/s. # iostat -dkx 1 sdbc 0.00 8023.00 0.00 329.00 0.00 33408.00 203.09 0.70 2.12 1.06 34.80 sdbd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdbe 13.00 0.00 8334.00 0.00 33388.00 0.00 8.01 0.65 0.08 0.06 47.20 sdbf 0.00 0.00 8348.00 0.00 33388.00 0.00 8.00 0.58 0.07 0.06 48.00 sdbg 16.00 0.00 8331.00 0.00 33388.00 0.00 8.02 0.71 0.09 0.06 48.80 sdbh 961.00 0.00 8314.00 0.00 37100.00 0.00 8.92 0.93 0.11 0.07 54.80 sdbj 70.00 0.00 8276.00 0.00 33384.00 0.00 8.07 0.78 0.10 0.06 48.40 sdbk 124.00 0.00 8221.00 0.00 33380.00 0.00 8.12 0.88 0.11 0.06 47.20 sdbl 83.00 0.00 8262.00 0.00 33380.00 0.00 8.08 0.96 0.12 0.06 47.60 sdbm 0.00 0.00 8344.00 0.00 33376.00 0.00 8.00 0.56 0.07 0.06 47.60 iostat says the disks are not 100% busy (but only 40-50%). This fits with the hypothesis that the max is around 80 MB/s. Since this is software raid the limiting factor could be CPU. top says: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 38520 root 20 0 0 0 0 R 64 0.0 2947:50 md2_raid6 6117 root 20 0 0 0 0 D 53 0.0 473:25.96 md2_resync So md2_raid6 and md2_resync are clearly busy taking up 64% and 53% of a CPU respectively, but not near 100%. The chunk size (128k) of the RAID was chosen after measuring which chunksize gave the least CPU penalty. If this speed is normal: What is the limiting factor? Can I measure that? If this speed is not normal: How can I find the limiting factor? Can I change that?

    Read the article

  • Parallel processing slower than sequential?

    - by zebediah49
    EDIT: For anyone who stumbles upon this in the future: Imagemagick uses a MP library. It's faster to use available cores if they're around, but if you have parallel jobs, it's unhelpful. Do one of the following: do your jobs serially (with Imagemagick in parallel mode) set MAGICK_THREAD_LIMIT=1 for your invocation of the imagemagick binary in question. By making Imagemagick use only one thread, it slows down by 20-30% in my test cases, but meant I could run one job per core without issues, for a significant net increase in performance. Original question: While converting some images using ImageMagick, I noticed a somewhat strange effect. Using xargs was significantly slower than a standard for loop. Since xargs limited to a single process should act like a for loop, I tested that, and found it to be about the same. Thus, we have this demonstration. Quad core (AMD Athalon X4, 2.6GHz) Working entirely on a tempfs (16g ram total; no swap) No other major loads Results: /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 1 convert -auto-level real 0m3.784s user 0m2.240s sys 0m0.230s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 2 convert -auto-level real 0m9.097s user 0m28.020s sys 0m0.910s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 10 convert -auto-level real 0m9.844s user 0m33.200s sys 0m1.270s Can anyone think of a reason why running two instances of this program takes more than twice as long in real time, and more than ten times as long in processor time to complete the same task? After that initial hit, more processes do not seem to have as significant of an effect. I thought it might have to do with disk seeking, so I did that test entirely in ram. Could it have something to do with how Convert works, and having more than one copy at once means it cannot use processor cache as efficiently or something? EDIT: When done with 1000x 769KB files, performance is as expected. Interesting. /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 1 convert -auto-level real 3m37.679s user 5m6.980s sys 0m6.340s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 1 convert -auto-level real 3m37.152s user 5m6.140s sys 0m6.530s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 2 convert -auto-level real 2m7.578s user 5m35.410s sys 0m6.050s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 4 convert -auto-level real 1m36.959s user 5m48.900s sys 0m6.350s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 10 convert -auto-level real 1m36.392s user 5m54.840s sys 0m5.650s

    Read the article

  • Performance of Silverlight Datagrid in Silverlight 3 vs Silverlight 4 on a mac

    - by Simon
    I'm using Silverlight Beta 4 for a LOB application. After finding out today that I'll have to wait perhaps 4 months to be able to develop with SL4 on Visual Studio 2010 I'm thinking I need to downgrade my application to SL3 but thats another question. The problem is I'm noticing absolutely abismal performance for simple datagrids that work just fine on a PC when I'm running on a Mac. These grids contain only 5-10 columns and maybe 50 rows. Paging up and down takes about 1-2 seconds sometimes. I would appreciate anybody's experience in which of the following is the best solution: reverting to Silverlight 3 and hoping DataGrid is faster switching to 3rd party datagrid such as Telerik forgetting silverlight altogether I was hoping that possibly SL4 runtime might be updated but that won't happen probably for 3-4 months. Just a reminder - this is specifically a mac issue. Performance on my PC while slightly slow to populate the grid initially is fine.

    Read the article

  • RAD/Eclipse Eclipse Test and Performance Tools Platform, export data to text file

    - by Berlin Brown
    I am using the RAD (also on Eclipse) Test and Performance Monitoring. I monitor CPU performance time with it, on particular methods, etc. It is a good tool for my monitoring my applications but I can't copy/paste or export the output to a text file format. So I can send to the others. There has to be a way to export this? Also, I can save the output to file but it is '*.trcxml' binary file? has anyone seen a parser for this file format?

    Read the article

  • iPhone Foundation - performance implications of mutable and xxxWithCapacity:0

    - by Adam Eberbach
    All of the collection classes have two versions - mutable and immutable, such as NSArray and NSMutableArray. Is the distinction merely to promote careful programming by providing a const collection or is there some performance hit when using a mutable object as opposed to immutable? Similarly each of the collection classes has a method xxxxWithCapacity, like [NSMutableArray arrayWithCapacity:0]. I often use zero as the argument because it seems a better choice than guessing wrongly how many objects might be added. Is there some performance advantage to creating a collection with capacity for enough objects in advance? If not why isn't the function something like + (id)emptyArray?

    Read the article

< Previous Page | 87 88 89 90 91 92 93 94 95 96 97 98  | Next Page >