Search Results

Search found 14643 results on 586 pages for 'performance comparison'.

Page 105/586 | < Previous Page | 101 102 103 104 105 106 107 108 109 110 111 112  | Next Page >

  • Possible reasons for high CPU load of taskmgr.exe process on VM?

    - by mjn
    On a VMware virtual machine which has severe performance problems I can see a constant average of 20+ percent CPU load for the TASKMGR.EXE (task manager) process. The apps running on this server have lower load, around 4 to 10 percent average. The VM is running Windows 2003 Server Standard with 3.75 GB assigned RAM. I suspect that the task manager CPU load has something to do with other VM instances on the VMWare server but could not see a similar value on internal ESXi systems (the problematic VM runs in the customers IT).

    Read the article

  • How to get Ubuntu to perform better on an older computer?

    - by alex
    Ubuntu 9.1 runs quite slugglish on my old laptop from 2004. Slower than Windows XP that was on there. It has 512mb RAM and probably 1.2ghz (can't remember) CPU. I have turned off Visual Effects under Appearance Preferences. Are there any other tricks to get better performance, or do I just need a better computer to try Ubuntu? Thanks

    Read the article

  • Can a SQL Server have a CPU bottleneck when Processor Time is under 30%

    - by Sleepless
    Is it in principle possible for the CPU to be the bottleneck on a SQL Server if the Performance Counter Processor:Processor Time is constantly under 30% on all cores? Or does low Processor Time automatically allow me to rule out the CPU as a potential trouble source? I am asking this because SQL Nexus lists CPU as the top bottleneck on a server with low Processor Time values.

    Read the article

  • Performance in backpropagation algorithm

    - by Taban
    I've written a matlab program for standard backpropagation algorithm, it is my homework and I should not use matlab toolbox, so I write the entire code by myself. This link helped me for backpropagation algorithm. I have a data set of 40 random number and initial weights randomly. As output, I want to see a diagram that shows the performance. I used mse and plot function to see performance for 20 epochs but the result is this: I heard that performance should go up through backpropagation, so I want to know is there any problem with my code or this result is normal because local minimums. This is my code: Hidden_node=inputdlg('Enter the number of Hidden nodes'); a=0.5;%initialize learning rate hiddenn=str2num(Hidden_node{1,1}); randn('seed',0); %creating data set s=2; N=10; m=[5 -5 5 5;-5 -5 5 -5]; S = s*eye(2); [l,c] = size(m); x = []; % Creating the training set for i = 1:c x = [x mvnrnd(m(:,i)',S,N)']; end % target value toutput=[ones(1,N) zeros(1,N) ones(1,N) zeros(1,N)]; for epoch=1:20; %number of epochs for kk=1:40; %number of patterns %initial weights of hidden layer for ii=1 : 2; for jj=1 :hiddenn; whidden{ii,jj}=rand(1); end end initial the wights of output layer for ii=1 : hiddenn; woutput{ii,1}=rand(1); end for ii=1:hiddenn; x1=x(1,kk); x2=x(2,kk); w1=whidden{1,ii}; w2=whidden{2,ii}; activation{1,ii}=(x1(1,1)*w1(1,1))+(x2(1,1)*w2(1,1)); end %calculate output of hidden nodes for ii=1:hiddenn; hidden_to_out{1,ii}=logsig(activation{1,ii}); end activation_O{1,1}=0; for jj=1:hiddenn; activation_O{1,1} = activation_O{1,1}+(hidden_to_out{1,jj}*woutput{jj,1}); end %calculate output out{1,1}=logsig(activation_O{1,1}); out_for_plot(1,kk)= out{1,ii}; %calculate error for output node delta_out{1,1}=(toutput(1,kk)-out{1,1}); %update weight of output node for ii=1:hiddenn; woutput{ii,jj}=woutput{ii,jj}+delta_out{1,jj}*hidden_to_out{1,ii}*dlogsig(activation_O{1,jj},logsig(activation_O{1,jj}))*a; end %calculate error of hidden nodes for ii=1:hiddenn; delta_hidden{1,ii}=woutput{ii,1}*delta_out{1,1}; end %update weight of hidden nodes for ii=1:hiddenn; for jj=1:2; whidden{jj,ii}= whidden{jj,ii}+(delta_hidden{1,ii}*dlogsig(activation{1,ii},logsig(activation{1,ii}))*x(jj,kk)*a); end end a=a/(1.1);%decrease learning rate end %calculate performance e=toutput(1,kk)-out_for_plot(1,1); perf(1,epoch)=mse(e); end plot(perf); Thanks a lot.

    Read the article

  • optimizing iPhone OpenGL ES fill rate

    - by NateS
    I have an Open GL ES game on the iPhone. My framerate is pretty sucky, ~20fps. Using the Xcode OpenGL ES performance tool on an iPhone 3G, it shows: Renderer Utilization: 95% to 99% Tiler Utilization: ~27% I am drawing a lot of pretty large images with a lot of blending. If I reduce the number of images drawn, framerates go from ~20 to ~40, though the performance tool results stay about the same (renderer still maxed). I think I'm being limited by the fill rate of the iPhone 3G, but I'm not sure. My questions are: How can I determine with more granularity where the bottleneck is? That is my biggest problem, I just don't know what is taking all the time. If it is fillrate, is there anything I do to improve it besides just drawing less? I am using texture atlases. I have tried to minimize image binds, though it isn't always possible (drawing order, not everything fits on one 1024x1024 texture, etc). Every frame I do 10 image binds. This seem pretty reasonable, but I could be mistaken. I'm using vertex arrays and glDrawArrays. I don't really have a lot of geometry. I can try to be more precise if needed. Each image is 2 triangles and I try to batch things were possible, though often (maybe half the time) images are drawn with individual glDrawArrays calls. Besides the images, I have ~60 triangles worth of geometry being rendered in ~6 glDrawArrays calls. I often glTranslate before calling glDrawArrays. Would it improve the framerate to switch to VBOs? I don't think it is a huge amount of geometry, but maybe it is faster for other reasons? Are there certain things to watch out for that could reduce performance? Eg, should I avoid glTranslate, glColor4g, etc? I'm using glScissor in a 3 places per frame. Each use consists of 2 glScissor calls, one to set it up, and one to reset it to what it was. I don't know if there is much of a performance impact here. If I used PVRTC would it be able to render faster? Currently all my images are GL_RGBA. I don't have memory issues. Here is a rough idea of what I'm drawing, in this order: 1) Switch to perspective matrix. 2) Draw a full screen background image 3) Draw a full screen image with translucency (this one has a scrolling texture). 4) Draw a few sprites. 5) Switch to ortho matrix. 6) Draw a few sprites. 7) Switch to perspective matrix. 8) Draw sprites and some other textured geometry. 9) Switch to ortho matrix. 10) Draw a few sprites (eg, game HUD). Steps 1-6 draw a bunch of background stuff. 8 draws most of the game content. 10 draws the HUD. As you can see, there are many layers, some of them full screen and some of the sprites are pretty large (1/4 of the screen). The layers use translucency, so I have to draw them in back-to-front order. This is further complicated by needing to draw various layers in ortho and others in perspective. I will gladly provide additional information if reqested. Thanks in advance for any performance tips or general advice on my problem!

    Read the article

  • How to determine if two generic type values are equal?

    - by comecme
    I'm trying to figure out how I can successfully determine if two generic type values are equal to each other. Based on Mark Byers' answer on this question I would think I can just use value.Equals() where value is a generic type. My actual problem is in a LinkedList implementation, but the problem can be shown with this simpler example. class GenericOjbect<T> { public T Value { get; private set; } public GenericOjbect(T value) { Value = value; } public bool Equals(T value) { return (Value.Equals(value)); } } Now I define an instance of GenericObject<StringBuilder> containing new StringBuilder("StackOverflow"). I would expect to get true if I call Equals(new StringBuilder("StackOverflow") on this GenericObject instance, but I get false. A sample program showing this: using System; using System.Text; class Program { static void Main() { var sb1 = new StringBuilder("StackOverflow"); var sb2 = new StringBuilder("StackOverflow"); Console.WriteLine("StringBuilder compare"); Console.WriteLine("1. == " + (sb1 == sb2)); Console.WriteLine("2. Object.Equals " + (Object.Equals(sb1, sb2))); Console.WriteLine("3. this.Equals " + (sb1.Equals(sb2))); var go1 = new GenericOjbect<StringBuilder>(sb1); var go2 = new GenericOjbect<StringBuilder>(sb2); Console.WriteLine("\nGenericObject compare"); Console.WriteLine("1. == " + (go1 == go2)); Console.WriteLine("2. Object.Equals " + (Object.Equals(go1, go2))); Console.WriteLine("3. this.Equals " + (go1.Equals(go2))); Console.WriteLine("4. Value.Equals " + (go1.Value.Equals(go2.Value))); } } For the three methods of comparing two StringBuilder objects, only the StringBuilder.Equals instance method (the third line) returns true. This is what I expected. But when comparing the GenericObject objects, its Equals() method (the third line) returns false. Interestingly enough, the fourth compare method does return true. I'd think the third and fourth comparison are actually doing the same thing. I would have expected true. Because in the Equals() method of the GenericObject class, both value and Value are of type T which in this case is a StringBuilder. Based on Mark Byers' answer in this question, I would've expected the Value.Equals() method to be using the StringBuilder's Equals() method. And as I've shown, the StringBuilder's Equal() method does return true. I've even tried public bool Equals(T value) { return EqualityComparer<T>.Default.Equals(Value, value); } but that also returns false. So, two questions here: Why doesn't the code return true? How could I implement the Equals method so it does return true?

    Read the article

  • Performance of Cluster Shared Volume file copy from SAN

    - by Sequenzia
    I am hoping someone can help me out with a strange issue. We are running a Microsoft Failover Cluster with Server 2008 R2 and an Equallogic PS4000 SAN. Our main configuration has 2 Dell Poweredge T710 Servers in the cluster. We have CSV and Quorm setup. The servers each have 10 Broadcom 1Gb NICs. Right now 4 of the NICS are on the iSCSI network for accessing the SAN. They use MPIO and the Dell HIT pack. We have 5 VMs running on each node and everything runs smooth. No noticeable performance issues or anything. From the SAN I can see the 4 iSCSI connections from each server to each volume (CSV and Quorm). Again, it seems to perform great. The problem I am running into is with backups. I have tried a few backup programs like backupchain and Veeam. The problem is both of them are very very slow to backup the VMs. For instance I have a 500GB (fixed disc) VHD that’s running on the cluster. It takes over 18 hours to backup that VHD and that’s with compression and depuping turned off which is supposed to be the fasted. We also have a separate server that is just for backups. It has a lot of directed attached storage. As part of the troubleshooting I decided to bring that server into the cluster as a node. It now has access to the CSV and can read from C:\clusterstorage\volume1 which is where our VHDs live. This backup server only has 2 NICs. 1 NIC is going to the iSCSI network and the other is just on the main network. It has Intel NICS in it without any sort of MPIO or teaming. So with the 3rd server now in the cluster I started doing some benchmarking. I have a test VHD that’s about 7GBs that’s stored in the CSV. I have tested file copying that VHD from all 3 servers to directed attached storage in the respective server. The 2 Dell servers that are the main nodes in the cluster (they house the VMs) are reading that file at about 20Mbs/Sec. Which at that rate is way to slow for the backups. The other server which only has 1 NIC to the SAN is reading at around 100Mbs/Sec. I spent a few hours on the phone with Dell today about this . We went through all kind of tests and he was pretty dumb founded. He really has no idea why that server with only 1 NIC is reading about 5 times as fast as the servers with 4 NICS and MPIO. We looked at the network utilization of the NICs while the file copy was going on. The servers with the 4 NICs had a small increase of activity during the file copy but they only went up to around 8-10% on all 4 NICs. The other server with the 1 NIC jumped up to over 80% during the file copy. I plan on doing some more testing after hours and calling Dell back tomorrow but I really am confused (and so is Dell’s support rep) why I cannot get faster file copy access to the CSV on those servers. Anyone have any input on this? Any feedback would be greatly appreciated. Thanks in advance.

    Read the article

  • Pros/cons of reading connection string from physical file vs Application object (ASP.NET)?

    - by HaterTot
    my ASP.NET application reads an xml file to determine which environment it's currently in (e.g. local, development, production). It checks this file every single time it opens a connection to the database, in order to know which connection string to grab from the Application Settings. I'm entering a phase of development where efficiency is becoming a concern. I don't think it's a good idea to have to read a file on a physical disk ever single time I wish to access the database (very often). I was considering storing the connection string in Application["ConnectionString"]. So the code would be public static string GetConnectionString { if (Application["ConnectionString"] == null) { XmlDocument doc = new XmlDocument(); doc.Load(HttpContext.Current.Request.PhysicalApplicationPath + "bin/ServerEnvironment.xml"); XmlElement xe = (XmlElement) xnl[0]; switch (xe.InnerText.ToString().ToLower()) { case "local": connString = Settings.Default.ConnectionStringLocal; break; case "development": connString = Settings.Default.ConnectionStringDevelopment; break; case "production": connString = Settings.Default.ConnectionStringProduction; break; default: throw new Exception("no connection string defined"); } Application["ConnectionString"] = connString; } return Application["ConnectionString"].ToString(); } I didn't design the application so I figure there must have been a reason for reading the xml file every time (to change settings while the application runs?) I have very little concept of the inner workings here. What are the pros and cons? Do you think I'd see a small performance gain by implementing the function above? THANKS

    Read the article

  • Improving SAS multipath to JBOD performance on Linux

    - by user36825
    Hello all I'm trying to optimize a storage setup on some Sun hardware with Linux. Any thoughts would be greatly appreciated. We have the following hardware: Sun Blade X6270 2* LSISAS1068E SAS controllers 2* Sun J4400 JBODs with 1 TB disks (24 disks per JBOD) Fedora Core 12 2.6.33 release kernel from FC13 (also tried with latest 2.6.31 kernel from FC12, same results) Here's the datasheet for the SAS hardware: http://www.sun.com/storage/storage_networking/hba/sas/PCIe.pdf It's using PCI Express 1.0a, 8x lanes. With a bandwidth of 250 MB/sec per lane, we should be able to do 2000 MB/sec per SAS controller. Each controller can do 3 Gb/sec per port and has two 4 port PHYs. We connect both PHYs from a controller to a JBOD. So between the JBOD and the controller we have 2 PHYs * 4 SAS ports * 3 Gb/sec = 24 Gb/sec of bandwidth, which is more than the PCI Express bandwidth. With write caching enabled and when doing big writes, each disk can sustain about 80 MB/sec (near the start of the disk). With 24 disks, that means we should be able to do 1920 MB/sec per JBOD. multipath { rr_min_io 100 uid 0 path_grouping_policy multibus failback manual path_selector "round-robin 0" rr_weight priorities alias somealias no_path_retry queue mode 0644 gid 0 wwid somewwid } I tried values of 50, 100, 1000 for rr_min_io, but it doesn't seem to make much difference. Along with varying rr_min_io I tried adding some delay between starting the dd's to prevent all of them writing over the same PHY at the same time, but this didn't make any difference, so I think the I/O's are getting properly spread out. According to /proc/interrupts, the SAS controllers are using a "IR-IO-APIC-fasteoi" interrupt scheme. For some reason only core #0 in the machine is handling these interrupts. I can improve performance slightly by assigning a separate core to handle the interrupts for each SAS controller: echo 2 /proc/irq/24/smp_affinity echo 4 /proc/irq/26/smp_affinity Using dd to write to the disk generates "Function call interrupts" (no idea what these are), which are handled by core #4, so I keep other processes off this core too. I run 48 dd's (one for each disk), assigning them to cores not dealing with interrupts like so: taskset -c somecore dd if=/dev/zero of=/dev/mapper/mpathx oflag=direct bs=128M oflag=direct prevents any kind of buffer cache from getting involved. None of my cores seem maxed out. The cores dealing with interrupts are mostly idle and all the other cores are waiting on I/O as one would expect. Cpu0 : 0.0%us, 1.0%sy, 0.0%ni, 91.2%id, 7.5%wa, 0.0%hi, 0.2%si, 0.0%st Cpu1 : 0.0%us, 0.8%sy, 0.0%ni, 93.0%id, 0.2%wa, 0.0%hi, 6.0%si, 0.0%st Cpu2 : 0.0%us, 0.6%sy, 0.0%ni, 94.4%id, 0.1%wa, 0.0%hi, 4.8%si, 0.0%st Cpu3 : 0.0%us, 7.5%sy, 0.0%ni, 36.3%id, 56.1%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 1.3%sy, 0.0%ni, 85.7%id, 4.9%wa, 0.0%hi, 8.1%si, 0.0%st Cpu5 : 0.1%us, 5.5%sy, 0.0%ni, 36.2%id, 58.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 5.0%sy, 0.0%ni, 36.3%id, 58.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 5.1%sy, 0.0%ni, 36.3%id, 58.5%wa, 0.0%hi, 0.0%si, 0.0%st Cpu8 : 0.1%us, 8.3%sy, 0.0%ni, 27.2%id, 64.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu9 : 0.1%us, 7.9%sy, 0.0%ni, 36.2%id, 55.8%wa, 0.0%hi, 0.0%si, 0.0%st Cpu10 : 0.0%us, 7.8%sy, 0.0%ni, 36.2%id, 56.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu11 : 0.0%us, 7.3%sy, 0.0%ni, 36.3%id, 56.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu12 : 0.0%us, 5.6%sy, 0.0%ni, 33.1%id, 61.2%wa, 0.0%hi, 0.0%si, 0.0%st Cpu13 : 0.1%us, 5.3%sy, 0.0%ni, 36.1%id, 58.5%wa, 0.0%hi, 0.0%si, 0.0%st Cpu14 : 0.0%us, 4.9%sy, 0.0%ni, 36.4%id, 58.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 0.1%us, 5.4%sy, 0.0%ni, 36.5%id, 58.1%wa, 0.0%hi, 0.0%si, 0.0%st Given all this, the throughput reported by running "dstat 10" is in the range of 2200-2300 MB/sec. Given the math above I would expect something in the range of 2*1920 ~= 3600+ MB/sec. Does anybody have any idea where my missing bandwidth went? Thanks!

    Read the article

  • IIS7 ASP.NET application - 2 identical apps in 2 identical app pools, 1 is responsive and 1 is not

    - by Ben
    I have an ASP.NET (v4.0) web app that is installed in a virtual directory (as an application) and is hosted in it's own app pool. This is repeated for each instance of the app (i.e. per customer). The app pools are integrated (not classic) mode and LoadUserProfile is set to true. Otherwise, default settings. Each instance currently has it's own copy of the code/config, and it's own data folder (basic file read/writes). 1 instance of this app runs well (operation used for comparison takes ~4 seconds). Every other instance runs slowly (from 10-25 seconds for the same operation). If I move the slower instance to the "fastest" app pool that instance springs to life. If I move the faster instance into the slower app pool that instance slows to a crawl. The app pools were created in the same way initially - manually. I later used the powershell copy routine to ensure an exact copy of the faster app pool and still the same behaviour. Comparing the apppool.config files shows they are identical barring the virtual directory assignments. There are no shared resources that are being blocked, so far as I can tell, and I tested that by shutting down the performant app pool and restarting... slow is still slow, and then when I restart that app pool (so it's loaded last) it's still faster...

    Read the article

  • moving from WinXP to WinServer in VmWare

    - by Alex
    I have a Vmware machine for.Net application testing. Current setup: Host OS: win7 Guest OS: Right now the guest OS is Win Xp Pro x64, which runs great with just 1 gigabyte of RAM and 10 gigs of disk space. * This part can be skipped * As I said, there was a program that I needed to test, but unfortunately, by default, Vmware installs crappy display drivers(called SVGA II) on XP machines and there is NO way to upgrade them! This resulted in my program's error (the program used SlimDX (DirectX wrapper) to do some stuff..). Eventually I found out that display drivers most certainly is the problem. For example, Windows 7 virtual machine uses SVGA 3D drivers and I have NO problems running my SlimDX-based program. Now, regarding Windows Server 2008! Apparently, WDDM driver is supported by WS2008, which means that I'll be able to install SVGA 3D and to test my DX apps. * end of skip * The questions are: Will WS2008 be as smooth with just 1 gig of RAM just like Win XP was? Will 10 gigs of HDD be enough? Or the server requires more? Will I be able to install .Net ver. 4 on WS2008? Are there any limitations that I need to be aware of as a .Net programmer? EDIT: I was hoping that WS2008 is XP-based, not Vista-vased/W7-based. In comparison, W7 virtual machine with 2 gigs of RAM and 2 proc cores nearly kills my Host OS. Whereas, WinXp runs extremely fast even with 1 core and 1 gig of RAM. That's the main reason why I want to try WS2008..

    Read the article

  • MySQL Memory usage

    - by Rob Stevenson-Leggett
    Our MySQL server seems to be using a lot of memory. I've tried looking for slow queries and queries with no index and have halved the peak CPU usage and Apache memory usage but the MySQL memory stays constantly at 2.2GB (~51% of available memory on the server). Here's the graph from Plesk. Running top in the SSH window shows the same figures. Does anyone have any ideas on why the memory usage is constant like this and not peaks and troughs with usage of the app? Here's the output of the MySQL Tuning Primer script: -- MYSQL PERFORMANCE TUNING PRIMER -- - By: Matthew Montgomery - MySQL Version 5.0.77-log x86_64 Uptime = 1 days 14 hrs 4 min 21 sec Avg. qps = 22 Total Questions = 3059456 Threads Connected = 13 Warning: Server has not been running for at least 48hrs. It may not be safe to use these recommendations To find out more information on how each of these runtime variables effects performance visit: http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html Visit http://www.mysql.com/products/enterprise/advisors.html for info about MySQL's Enterprise Monitoring and Advisory Service SLOW QUERIES The slow query log is enabled. Current long_query_time = 1 sec. You have 6 out of 3059477 that take longer than 1 sec. to complete Your long_query_time seems to be fine BINARY UPDATE LOG The binary update log is NOT enabled. You will not be able to do point in time recovery See http://dev.mysql.com/doc/refman/5.0/en/point-in-time-recovery.html WORKER THREADS Current thread_cache_size = 0 Current threads_cached = 0 Current threads_per_sec = 2 Historic threads_per_sec = 0 Threads created per/sec are overrunning threads cached You should raise thread_cache_size MAX CONNECTIONS Current max_connections = 100 Current threads_connected = 14 Historic max_used_connections = 20 The number of used connections is 20% of the configured maximum. Your max_connections variable seems to be fine. INNODB STATUS Current InnoDB index space = 6 M Current InnoDB data space = 18 M Current InnoDB buffer pool free = 0 % Current innodb_buffer_pool_size = 8 M Depending on how much space your innodb indexes take up it may be safe to increase this value to up to 2 / 3 of total system memory MEMORY USAGE Max Memory Ever Allocated : 2.07 G Configured Max Per-thread Buffers : 274 M Configured Max Global Buffers : 2.01 G Configured Max Memory Limit : 2.28 G Physical Memory : 3.84 G Max memory limit seem to be within acceptable norms KEY BUFFER Current MyISAM index space = 4 M Current key_buffer_size = 7 M Key cache miss rate is 1 : 40 Key buffer free ratio = 81 % Your key_buffer_size seems to be fine QUERY CACHE Query cache is supported but not enabled Perhaps you should set the query_cache_size SORT OPERATIONS Current sort_buffer_size = 2 M Current read_rnd_buffer_size = 256 K Sort buffer seems to be fine JOINS Current join_buffer_size = 132.00 K You have had 16 queries where a join could not use an index properly You should enable "log-queries-not-using-indexes" Then look for non indexed joins in the slow query log. If you are unable to optimize your queries you may want to increase your join_buffer_size to accommodate larger joins in one pass. Note! This script will still suggest raising the join_buffer_size when ANY joins not using indexes are found. OPEN FILES LIMIT Current open_files_limit = 1024 files The open_files_limit should typically be set to at least 2x-3x that of table_cache if you have heavy MyISAM usage. Your open_files_limit value seems to be fine TABLE CACHE Current table_cache value = 64 tables You have a total of 426 tables You have 64 open tables. Current table_cache hit rate is 1% , while 100% of your table cache is in use You should probably increase your table_cache TEMP TABLES Current max_heap_table_size = 16 M Current tmp_table_size = 32 M Of 15134 temp tables, 9% were created on disk Effective in-memory tmp_table_size is limited to max_heap_table_size. Created disk tmp tables ratio seems fine TABLE SCANS Current read_buffer_size = 128 K Current table scan ratio = 2915 : 1 read_buffer_size seems to be fine TABLE LOCKING Current Lock Wait ratio = 1 : 142213 Your table locking seems to be fine The app is a facebook game with about 50-100 concurrent users. Thanks, Rob

    Read the article

  • Linux software RAID6: rebuild slow

    - by Ole Tange
    I am trying to find the bottleneck in the rebuilding of a software raid6. ## Pause rebuilding when measuring raw I/O performance # echo 1 > /proc/sys/dev/raid/speed_limit_min # echo 1 > /proc/sys/dev/raid/speed_limit_max ## Drop caches so that does not interfere with measuring # sync ; echo 3 | tee /proc/sys/vm/drop_caches >/dev/null # time parallel -j0 "dd if=/dev/{} bs=256k count=4000 | cat >/dev/null" ::: sdbd sdbc sdbf sdbm sdbl sdbk sdbe sdbj sdbh sdbg 4000+0 records in 4000+0 records out 1048576000 bytes (1.0 GB) copied, 7.30336 s, 144 MB/s [... similar for each disk ...] # time parallel -j0 "dd if=/dev/{} skip=15000000 bs=256k count=4000 | cat >/dev/null" ::: sdbd sdbc sdbf sdbm sdbl sdbk sdbe sdbj sdbh sdbg 4000+0 records in 4000+0 records out 1048576000 bytes (1.0 GB) copied, 12.7991 s, 81.9 MB/s [... similar for each disk ...] So we can read sequentially at 140 MB/s in the outer tracks and 82 MB/s in the inner tracks on all the drives simultaneously. Sequential write performance is similar. This would lead me to expect a rebuild speed of 82 MB/s or more. # echo 800000 > /proc/sys/dev/raid/speed_limit_min # echo 800000 > /proc/sys/dev/raid/speed_limit_max # cat /proc/mdstat md2 : active raid6 sdbd[10](S) sdbc[9] sdbf[0] sdbm[8] sdbl[7] sdbk[6] sdbe[11] sdbj[4] sdbi[3](F) sdbh[2] sdbg[1] 27349121408 blocks super 1.2 level 6, 128k chunk, algorithm 2 [9/8] [UUU_UUUUU] [=========>...........] recovery = 47.3% (1849905884/3907017344) finish=855.9min speed=40054K/sec But we only get 40 MB/s. And often this drops to 30 MB/s. # iostat -dkx 1 sdbc 0.00 8023.00 0.00 329.00 0.00 33408.00 203.09 0.70 2.12 1.06 34.80 sdbd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdbe 13.00 0.00 8334.00 0.00 33388.00 0.00 8.01 0.65 0.08 0.06 47.20 sdbf 0.00 0.00 8348.00 0.00 33388.00 0.00 8.00 0.58 0.07 0.06 48.00 sdbg 16.00 0.00 8331.00 0.00 33388.00 0.00 8.02 0.71 0.09 0.06 48.80 sdbh 961.00 0.00 8314.00 0.00 37100.00 0.00 8.92 0.93 0.11 0.07 54.80 sdbj 70.00 0.00 8276.00 0.00 33384.00 0.00 8.07 0.78 0.10 0.06 48.40 sdbk 124.00 0.00 8221.00 0.00 33380.00 0.00 8.12 0.88 0.11 0.06 47.20 sdbl 83.00 0.00 8262.00 0.00 33380.00 0.00 8.08 0.96 0.12 0.06 47.60 sdbm 0.00 0.00 8344.00 0.00 33376.00 0.00 8.00 0.56 0.07 0.06 47.60 iostat says the disks are not 100% busy (but only 40-50%). This fits with the hypothesis that the max is around 80 MB/s. Since this is software raid the limiting factor could be CPU. top says: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 38520 root 20 0 0 0 0 R 64 0.0 2947:50 md2_raid6 6117 root 20 0 0 0 0 D 53 0.0 473:25.96 md2_resync So md2_raid6 and md2_resync are clearly busy taking up 64% and 53% of a CPU respectively, but not near 100%. The chunk size (128k) of the RAID was chosen after measuring which chunksize gave the least CPU penalty. If this speed is normal: What is the limiting factor? Can I measure that? If this speed is not normal: How can I find the limiting factor? Can I change that?

    Read the article

  • Parallel processing slower than sequential?

    - by zebediah49
    EDIT: For anyone who stumbles upon this in the future: Imagemagick uses a MP library. It's faster to use available cores if they're around, but if you have parallel jobs, it's unhelpful. Do one of the following: do your jobs serially (with Imagemagick in parallel mode) set MAGICK_THREAD_LIMIT=1 for your invocation of the imagemagick binary in question. By making Imagemagick use only one thread, it slows down by 20-30% in my test cases, but meant I could run one job per core without issues, for a significant net increase in performance. Original question: While converting some images using ImageMagick, I noticed a somewhat strange effect. Using xargs was significantly slower than a standard for loop. Since xargs limited to a single process should act like a for loop, I tested that, and found it to be about the same. Thus, we have this demonstration. Quad core (AMD Athalon X4, 2.6GHz) Working entirely on a tempfs (16g ram total; no swap) No other major loads Results: /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 1 convert -auto-level real 0m3.784s user 0m2.240s sys 0m0.230s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 2 convert -auto-level real 0m9.097s user 0m28.020s sys 0m0.910s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 10 convert -auto-level real 0m9.844s user 0m33.200s sys 0m1.270s Can anyone think of a reason why running two instances of this program takes more than twice as long in real time, and more than ten times as long in processor time to complete the same task? After that initial hit, more processes do not seem to have as significant of an effect. I thought it might have to do with disk seeking, so I did that test entirely in ram. Could it have something to do with how Convert works, and having more than one copy at once means it cannot use processor cache as efficiently or something? EDIT: When done with 1000x 769KB files, performance is as expected. Interesting. /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 1 convert -auto-level real 3m37.679s user 5m6.980s sys 0m6.340s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 1 convert -auto-level real 3m37.152s user 5m6.140s sys 0m6.530s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 2 convert -auto-level real 2m7.578s user 5m35.410s sys 0m6.050s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 4 convert -auto-level real 1m36.959s user 5m48.900s sys 0m6.350s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 10 convert -auto-level real 1m36.392s user 5m54.840s sys 0m5.650s

    Read the article

  • Performance of Silverlight Datagrid in Silverlight 3 vs Silverlight 4 on a mac

    - by Simon
    I'm using Silverlight Beta 4 for a LOB application. After finding out today that I'll have to wait perhaps 4 months to be able to develop with SL4 on Visual Studio 2010 I'm thinking I need to downgrade my application to SL3 but thats another question. The problem is I'm noticing absolutely abismal performance for simple datagrids that work just fine on a PC when I'm running on a Mac. These grids contain only 5-10 columns and maybe 50 rows. Paging up and down takes about 1-2 seconds sometimes. I would appreciate anybody's experience in which of the following is the best solution: reverting to Silverlight 3 and hoping DataGrid is faster switching to 3rd party datagrid such as Telerik forgetting silverlight altogether I was hoping that possibly SL4 runtime might be updated but that won't happen probably for 3-4 months. Just a reminder - this is specifically a mac issue. Performance on my PC while slightly slow to populate the grid initially is fine.

    Read the article

  • RAD/Eclipse Eclipse Test and Performance Tools Platform, export data to text file

    - by Berlin Brown
    I am using the RAD (also on Eclipse) Test and Performance Monitoring. I monitor CPU performance time with it, on particular methods, etc. It is a good tool for my monitoring my applications but I can't copy/paste or export the output to a text file format. So I can send to the others. There has to be a way to export this? Also, I can save the output to file but it is '*.trcxml' binary file? has anyone seen a parser for this file format?

    Read the article

  • Fact or fiction: Webkit’s rendering performance is much lower when used in Qt

    - by Jen
    Using Qt’s Webkit implementation renders much slower than directly implementing the Webkit engine -- is this true or just a myth? From my own experience, I found the load time of a complex page about twice as long in Qt’s “Fancy Browser” example as it does in Google Chrome (which also incorporates a port of Webkit), but I hardly think that is a fair comparison. Any insights on this?

    Read the article

  • iPhone Foundation - performance implications of mutable and xxxWithCapacity:0

    - by Adam Eberbach
    All of the collection classes have two versions - mutable and immutable, such as NSArray and NSMutableArray. Is the distinction merely to promote careful programming by providing a const collection or is there some performance hit when using a mutable object as opposed to immutable? Similarly each of the collection classes has a method xxxxWithCapacity, like [NSMutableArray arrayWithCapacity:0]. I often use zero as the argument because it seems a better choice than guessing wrongly how many objects might be added. Is there some performance advantage to creating a collection with capacity for enough objects in advance? If not why isn't the function something like + (id)emptyArray?

    Read the article

  • Logging strategy vs. performance

    - by vtortola
    Hi, I'm developing a web application that has to support lots of simultaneous requests, and I'd like to keep it fast enough. I have now to implement a logging strategy, I'm gonna use log4net, but ... what and how should I log? I mean: How logging impacts in performance? is it possible/recomendable logging using async calls? Is better use a text file or a database? Is it possible to do it conditional? for example, default log to the database, and if it fails, the switch to a text file. What about multithreading? should I care about synchronization when I use log4net? or it's thread safe out of the box? In the requirements appear that the application should cache a couple of things per request, and I'm afraid of the performance impact of that. Cheers.

    Read the article

  • .NET or Windows Synchronization Primitives Performance Specifications

    - by ovanes
    Hello *, I am currently writing a scientific article, where I need to be very exact with citation. Can someone point me to either MSDN, MSDN article, some published article source or a book, where I can find performance comparison of Windows or .NET Synchronization primitives. I know that these are in the descending performance order: Interlocked API, Critical Section, .NET lock-statement, Monitor, Mutex, EventWaitHandle, Semaphore. Many Thanks, Ovanes P.S. I found a great book: Concurrent Programming on Windows by Joe Duffy. This book is written by one of the head concurrency developers for .NET Framework and is simply brilliant with lots of explanations, how things work or were implemented.

    Read the article

  • NSURLConnection performance

    - by oksk
    Hi all, I'm using NSURLConnection for downloading some images in my app currently. Before implementing via this, I implemented it by NSData(dataWithContentOfURL) in NSThread. But I wanted to cancel during downloading images, So I changed it to NSURLConnection. But It happens other problem. Performance was very low after changing. For example, There is at least 5seconds for downloading images at NSThread(NSData async) But, There is more than 2 or 3 times than it at NSURLConnection(async) !! Can I enhance performance ?? How?? (* sorry about my question with NSData(dataWithContentOfFile). correct question is dataWithContentOfURL)

    Read the article

< Previous Page | 101 102 103 104 105 106 107 108 109 110 111 112  | Next Page >