Search Results

Search found 69 results on 3 pages for 'iostat'.

Page 1/3 | 1 2 3 | Next Page >

"iostat" command different in two equal machines

- by Oz.

We have several machines on Amazon (ec2) of the type c1.xlarge with 8 cpus, running the Amazon AMI. Details on the machine: 7 GB of memory 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each) 1690 GB of instance storage 64-bit platform I/O Performance: High API name: c1.xlarge One out of the several machines is showing a high load average, since we have run the last yum upgrade a couple of weeks a go. We did not yet update the other machines, and everything looks normal on them. The strange thing is that the top command not showing any hint for the cause of the load. CPUs are - 4.8%us, 1.1%sy, 0.0%ni, 94.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st. Mem is about 1.5GB free. Any idea what could it be, or where else can we check? iostat command on the proper machine: avg-cpu: %user %nice %system %iowait %steal %idle 8.97 0.03 4.46 0.19 0.14 86.23 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvdap1 1.60 0.69 55.38 587620 47254184 xvdfp2 2.64 1.10 61.04 934786 52091056 xvdfp4 0.86 0.19 41.72 163866 35601920 xvdfp1 4.37 36.59 73.89 31220810 63051504 xvdfp3 8.03 7.08 94.63 6045402 80749184 iostat command on problematic machine: avg-cpu: %user %nice %system %iowait %steal %idle 9.29 0.04 5.55 0.26 0.11 84.74 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvdap1 2.13 3.34 68.85 246244 5077888 xvdfp1 7.60 74.31 104.88 5480362 7734840 xvdfp3 13.22 73.67 125.00 5433386 9218600 xvdfp4 1.11 0.76 65.08 55762 4799248 xvdfp2 4.16 3.31 99.17 243818 7313264 Many thanks for the help.

Read the article
can someone explain IOSTAT ouput?

- by user37197

i'am having IBM server with Redhat 5 ElsmP connected to the IBM Storage over iSCSI (in sdb ) can someone explain this output from iostat command avg-cpu: %user %nice %system %iowait %steal %idle 12.79 0.01 4.53 72.22 0.00 10.45 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 95.63 48.88 240.95 485589164 2393706728 sdb 29.20 350.49 402.08 3481983365 3994494696 move large file to sdb very slowly,it seem normaly?

Read the article
What iowait values are ok?

- by alfish

I am trying to find the bottleneck of a server running a fairly busy php/mysql site. My first culprit was io but iostat shows that on average iowait consumes only %3.60 of cpu time. here is the complete result of issuing iostat: avg-cpu: %user %nice %system %iowait %steal %idle 65.78 0.00 8.52 3.60 0.00 22.10 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 42.36 138.28 1754.70 408630278 5185216128 So I am wondering if the iowait is within acceptable range, and if not, whether switching from SATA to SSD would dramatically reduce it?

Read the article
No apparent reason for high load average

- by Oz.

We have several web servers running on Amazon (ec2) c1.xlarge, over Amazon AMI. The servers are duplicates of each other, running the exact same hardware and software. Each server spec is: 7 GB of memory 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each) 1690 GB of instance storage 64-bit platform I/O Performance: High API name: c1.xlarge A couple of weeks ago we have run a yum upgrade on one of the servers. Starting on this upgrade the upgraded server started showing a high load average. Needless to say, we did not update the other servers and we can not do so until we understand the reason for this behavior. The strange thing is that when we compare the servers using top or iostat, we can not find the reason for the high load. Note that we have moved traffic from the "problematic" server to the others, which have made the "problematic" server less crowded in terms of requests, and still his load is higher. Do you have any idea what could it be, or where else can we check? Many thanks for the help! Oz. # # proper server # w command # 00:42:26 up 2 days, 19:54, 2 users, load average: 0.41, 0.48, 0.49 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT pts/1 82.80.137.29 00:28 14:05 0.01s 0.01s -bash pts/2 82.80.137.29 00:38 0.00s 0.02s 0.00s w # # proper server # iostat command # Linux 3.2.12-3.2.4.amzn1.x86_64 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 9.03 0.02 4.26 0.17 0.13 86.39 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvdap1 1.63 1.50 55.00 367236 13444008 xvdfp1 4.41 45.93 70.48 11227226 17228552 xvdfp2 2.61 2.01 59.81 491890 14620104 xvdfp3 8.16 14.47 94.23 3536522 23034376 xvdfp4 0.98 0.79 45.86 192818 11209784 # # problematic server # w command # 00:43:26 up 2 days, 21:52, 2 users, load average: 1.35, 1.10, 1.17 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT pts/0 82.80.137.29 00:28 15:04 0.02s 0.02s -bash pts/1 82.80.137.29 00:38 0.00s 0.05s 0.00s w # # problematic server # iostat command # Linux 3.2.20-1.29.6.amzn1.x86_64 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 7.97 0.04 3.43 0.19 0.07 88.30 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvdap1 2.10 1.49 76.54 374660 19253592 xvdfp1 5.64 40.98 85.92 10308946 21612112 xvdfp2 3.97 4.32 93.18 1087090 23439488 xvdfp3 10.87 30.30 115.14 7622474 28961720 xvdfp4 1.12 0.28 65.54 71034 16487112

Read the article
Finding throuput of CPU and Hardrive on Solaris

- by Jim

How do i find the throughput of a CPU and the Hardisk on an open solaris machine. Using MPstat or iostat. I'm having a hard time identifying the throughput if it is given at all in the commands output. Eg. in mpstat there is very little explanation as to what the columns mean http://docs.sun.com/app/docs/doc/816-5166/mpstat-1m?l=en&a=view&q=syscl+mpstat I've been using the syscl column divided by time interval to find the throughput but to be honest i have no idea what a system call truelly is. I'm trying to to analyze a hardrive and CPU while writing a file to the hardisk and when at rest Thanks in advance. Jim

Read the article
Finding throuput of CPU and Hardrive on Solaris

- by Jim

How do I find the throughput of a CPU and the hard disk on an OpenSolaris machine? Using mpstat or iostat? I'm having a hard time identifying the throughput if it is given at all in the commands output. For example, in mpstat there is very little explanation as to what the columns mean. I've been using the syscl column divided by time interval to find the throughput but to be honest I have no idea what a system call truly is. I'm trying to to analyze a hardrive and CPU while writing a file to the hardisk and when at rest.

Read the article
How do I know if my disks are being hit with too many I/O reads or writes or both?

- by Mark F

I know a bit about disk I/O and bottlenecks relating to this especially when relating to databases. How do I really know what the max I/O numbers will be for my disks? What metric might be available to me for working out roughly (but needs to be a good approximation) of how much capacity (if you will) have I got left available in I/O. I've seen it before where things are bubbling along nicely and then all of a sudden, everything screams to a halt, and it ends up being an I/O bound problem. Is there a better way to predict when I/O is reaching its limits? This article was interesting but not giving the answer I desire. So, is my best bet surrounding just looking at 'CPU I/O WAIT'? There must be a more reactive method than this.

Read the article
How do I know if my disks are being hit with too much IO reads or writes or both?

- by Mark F

Hi All, So I know a bit about disk I/O and bottlenecks relating to this especially when relating to databases. But how do I really know what the max IO numbers will be for my disks? What metric might be available to me for working out roughly (but needs to be a good approximation) of how much capacity (if you will) have I got left available in I/O. I've seen it before where things are bubbling along nicely and then all of a sudden, everything screams to a halt, and it ends up being an IO bound problem. Is there a better way to predict when IO is reaching its limits? This article was interesting but not giving the answer I desire. "http://serverfault.com/questions/61510/linux-how-can-i-see-whats-waiting-for-disk-io". So is my best bet surrounding just looking at 'CPU IO WAIT'? There must be a more reactive method for this? Best, M

Read the article
Linux disk IO load breakdown, by filesystem path and/or process?

- by Ryan B. Lynch

Does anyone have experience with a tool that can provide an indication of disk IO load by filesystem path. I use to 'iostat' utility, frequently, to learn how much disk activity is taking place on a Linux host. 'iostat' provides a per-device breakdown, so you can see activity on a particular block device. But it doesn't go any deeper than that--you can't, for instance, query the write load generated by 'httpd' in the directory '/var/log/httpd/'.

Read the article
Strange Recurrent Excessive I/O Wait

- by Chris

I know quite well that I/O wait has been discussed multiple times on this site, but all the other topics seem to cover constant I/O latency, while the I/O problem we need to solve on our server occurs at irregular (short) intervals, but is ever-present with massive spikes of up to 20k ms a-wait and service times of 2 seconds. The disk affected is /dev/sdb (Seagate Barracuda, for details see below). A typical iostat -x output would at times look like this, which is an extreme sample but by no means rare: iostat (Oct 6, 2013) tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 16.00 0.00 156.00 9.75 21.89 288.12 36.00 57.60 5.50 0.00 44.00 8.00 48.79 2194.18 181.82 100.00 2.00 0.00 16.00 8.00 46.49 3397.00 500.00 100.00 4.50 0.00 40.00 8.89 43.73 5581.78 222.22 100.00 14.50 0.00 148.00 10.21 13.76 5909.24 68.97 100.00 1.50 0.00 12.00 8.00 8.57 7150.67 666.67 100.00 0.50 0.00 4.00 8.00 6.31 10168.00 2000.00 100.00 2.00 0.00 16.00 8.00 5.27 11001.00 500.00 100.00 0.50 0.00 4.00 8.00 2.96 17080.00 2000.00 100.00 34.00 0.00 1324.00 9.88 1.32 137.84 4.45 59.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22.00 44.00 204.00 11.27 0.01 0.27 0.27 0.60 Let me provide you with some more information regarding the hardware. It's a Dell 1950 III box with Debian as OS where uname -a reports the following: Linux xx 2.6.32-5-amd64 #1 SMP Fri Feb 15 15:39:52 UTC 2013 x86_64 GNU/Linux The machine is a dedicated server that hosts an online game without any databases or I/O heavy applications running. The core application consumes about 0.8 of the 8 GBytes RAM, and the average CPU load is relatively low. The game itself, however, reacts rather sensitive towards I/O latency and thus our players experience massive ingame lag, which we would like to address as soon as possible. iostat: avg-cpu: %user %nice %system %iowait %steal %idle 1.77 0.01 1.05 1.59 0.00 95.58 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 13.16 25.42 135.12 504701011 2682640656 sda 1.52 0.74 20.63 14644533 409684488 Uptime is: 19:26:26 up 229 days, 17:26, 4 users, load average: 0.36, 0.37, 0.32 Harddisk controller: 01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04) Harddisks: Array 1, RAID-1, 2x Seagate Cheetah 15K.5 73 GB SAS Array 2, RAID-1, 2x Seagate ST3500620SS Barracuda ES.2 500GB 16MB 7200RPM SAS Partition information from df: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 480191156 30715200 425083668 7% /home /dev/sda2 7692908 437436 6864692 6% / /dev/sda5 15377820 1398916 13197748 10% /usr /dev/sda6 39159724 19158340 18012140 52% /var Some more data samples generated with iostat -dx sdb 1 (Oct 11, 2013) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 15.00 0.00 70.00 0.00 656.00 9.37 4.50 1.83 4.80 33.60 sdb 0.00 0.00 0.00 2.00 0.00 16.00 8.00 12.00 836.00 500.00 100.00 sdb 0.00 0.00 0.00 3.00 0.00 32.00 10.67 9.96 1990.67 333.33 100.00 sdb 0.00 0.00 0.00 4.00 0.00 40.00 10.00 6.96 3075.00 250.00 100.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00 100.00 sdb 0.00 0.00 0.00 2.00 0.00 16.00 8.00 2.62 4648.00 500.00 100.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 100.00 sdb 0.00 0.00 0.00 1.00 0.00 16.00 16.00 1.69 7024.00 1000.00 100.00 sdb 0.00 74.00 0.00 124.00 0.00 1584.00 12.77 1.09 67.94 6.94 86.00 Characteristic charts generated with rrdtool can be found here: iostat plot 1, 24 min interval: http://imageshack.us/photo/my-images/600/yqm3.png/ iostat plot 2, 120 min interval: http://imageshack.us/photo/my-images/407/griw.png/ As we have a rather large cache of 5.5 GBytes, we thought it might be a good idea to test if the I/O wait spikes would perhaps be caused by cache miss events. Therefore, we did a sync and then this to flush the cache and buffers: echo 3 > /proc/sys/vm/drop_caches and directly afterwards the I/O wait and service times virtually went through the roof, and everything on the machine felt like slow motion. During the next few hours the latency recovered and everything was as before - small to medium lags in short, unpredictable intervals. Now my question is: does anybody have any idea what might cause this annoying behaviour? Is it the first indication of the disk array or the raid controller dying, or something that can be easily mended by rebooting? (At the moment we're very reluctant to do this, however, because we're afraid that the disks might not come back up again.) Any help is greatly appreciated. Thanks in advance, Chris. Edited to add: we do see one or two processes go to 'D' state in top, one of which seems to be kjournald rather frequently. If I'm not mistaken, however, this does not indicate the processes causing the latency, but rather those affected by it - correct me if I'm wrong. Does the information about uninterruptibly sleeping processes help us in any way to address the problem? @Andy Shinn requested smartctl data, here it is: smartctl -a -d megaraid,2 /dev/sdb yields: smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Device: SEAGATE ST3500620SS Version: MS05 Serial number: Device type: disk Transport protocol: SAS Local Time is: Mon Oct 14 20:37:13 2013 CEST Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: 20 C Drive Trip Temperature: 68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 1236631092 Blocks received from initiator = 1097862364 Blocks read from cache and sent to initiator = 1383620256 Number of read and write commands whose size <= segment size = 531295338 Number of read and write commands whose size > segment size = 51986460 Vendor (Seagate/Hitachi) factory information number of hours powered up = 36556.93 number of minutes until next internal SMART test = 32 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 509271032 47 0 509271079 509271079 20981.423 0 write: 0 0 0 0 0 5022.039 0 verify: 1870931090 196 0 1870931286 1870931286 100558.708 0 Non-medium error count: 0 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed 16 36538 - [- - -] # 2 Background short Completed 16 36514 - [- - -] # 3 Background short Completed 16 36490 - [- - -] # 4 Background short Completed 16 36466 - [- - -] # 5 Background short Completed 16 36442 - [- - -] # 6 Background long Completed 16 36420 - [- - -] # 7 Background short Completed 16 36394 - [- - -] # 8 Background short Completed 16 36370 - [- - -] # 9 Background long Completed 16 36364 - [- - -] #10 Background short Completed 16 36361 - [- - -] #11 Background long Completed 16 2 - [- - -] #12 Background short Completed 16 0 - [- - -] Long (extended) Self Test duration: 6798 seconds [113.3 minutes] smartctl -a -d megaraid,3 /dev/sdb yields: smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Device: SEAGATE ST3500620SS Version: MS05 Serial number: Device type: disk Transport protocol: SAS Local Time is: Mon Oct 14 20:37:26 2013 CEST Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: 19 C Drive Trip Temperature: 68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 288745640 Blocks received from initiator = 1097848399 Blocks read from cache and sent to initiator = 1304149705 Number of read and write commands whose size <= segment size = 527414694 Number of read and write commands whose size > segment size = 51986460 Vendor (Seagate/Hitachi) factory information number of hours powered up = 36596.83 number of minutes until next internal SMART test = 28 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 610862490 44 0 610862534 610862534 20470.133 0 write: 0 0 0 0 0 5022.480 0 verify: 2861227413 203 0 2861227616 2861227616 100872.443 0 Non-medium error count: 1 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed 16 36580 - [- - -] # 2 Background short Completed 16 36556 - [- - -] # 3 Background short Completed 16 36532 - [- - -] # 4 Background short Completed 16 36508 - [- - -] # 5 Background short Completed 16 36484 - [- - -] # 6 Background long Completed 16 36462 - [- - -] # 7 Background short Completed 16 36436 - [- - -] # 8 Background short Completed 16 36412 - [- - -] # 9 Background long Completed 16 36404 - [- - -] #10 Background short Completed 16 36401 - [- - -] #11 Background long Completed 16 2 - [- - -] #12 Background short Completed 16 0 - [- - -] Long (extended) Self Test duration: 6798 seconds [113.3 minutes]

Read the article
How to monitor IO svctm with every 5 mins frequency using nagios?

- by sabya

I want to collect samples of iostat's svctm, await every 5 mins from all of my servers and store them in nagios. I want to get the values for what is happening in every 5 minutes (not since boot time, iostat's first output gives values since boot time). How can I do it in nagios? EDIT The tps should NOT be calculated #of transactions happened since reboot divided by uptime. What I want is # of transferred happened in last X mins divided X*60.

Read the article
Investigate disk writes further to find out which process writes to my SSD

- by zuba

I try to minimize disk writes to my new SSD system drive. I'm stuck with iostat output: ~ > iostat -d 10 /dev/sdb Linux 2.6.32-44-generic (Pluto) 13.11.2012 _i686_ (2 CPU) Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 8,60 212,67 119,45 21010156 11800488 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 3,00 0,00 40,00 0 400 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 1,70 0,00 18,40 0 184 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 1,20 0,00 28,80 0 288 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 2,20 0,00 32,80 0 328 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 1,20 0,00 23,20 0 232 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 3,40 19,20 42,40 192 424 As I see there are writes to sdb. How can I resolve which process writes? I know about iotop, but it doesn't show which filesystem is being accessed.

Read the article
fortran error I/O

- by jpcgandre

I get this error when compiling: forrtl: severe (256): unformatted I/O to unit open for formatted transfers, unit 27, file C:\Abaqus_JOBS\w.txt The error occurs in the beginning of the analysis. At the start, the file w.txt is created but is empty. The error may be related to the fact that I want to read from an empty file. My code is: OPEN(27, FILE = "C:/Abaqus_JOBS/w.txt", status = "UNKNOWN") READ(27, *, iostat=stat) w IF (stat .NE. 0) CALL del_file(27, stat) SUBROUTINE del_file(uFile, stat) IMPLICIT NONE INTEGER uFile, stat C If the unit is not open, stat will be non-zero CLOSE(unit=uFile, status='delete', iostat=stat) END SUBROUTINE Ref: Close multiple files If you agree with my opion about the cause of the error, is there a way to solve it? Thanks

Read the article
Linux IO monitoring per file?

- by MattK

I am interested in a utility or process for monitoring disk IO per file on CentOS. On Win2008, the resmon utility allows this type of drilldown, but none of the Linux utilities I have found do this (iostat, iotop, dstat, nmon). My interest in monitoring IO bottlenecks on database servers. With MSSQL, I have found it an informative diagnostic to know which files / filespaces are getting hit the hardest.

Read the article
NFS I/O monitoring

- by Gordon

I have a NFS mounted directory, and I'd like to monitor the I/O usage on it (MB/s reads and writes). What's the recommended way to do that ? This is the NFS client, I don't have access to the NFS server. I'm not interested in general I/O usage (otherwise I would use vmstat/iostat). It also has multiple NFS mounts, I'm interested in monitoring just one specific mount (or I might have used ethereal). Thanks!

Read the article
Simpler alternatives to commands with complicated options/syntax [closed]

- by oxy

A few I've found myself: HTTPie instead of cURL http PUT example.org name=John [email protected] https://github.com/jkbr/httpie ffind instead of find ffind --type=f make-?dist\.sh$ https://github.com/sjl/friendly-find Still in prototype phase dstat instead of netstat/iostat/vmstat/etc Dstat's output by default is designed for being interpreted by humans in real-time https ://github.com/dagwieers/dstat Silver Searcher better than Ack better than Grep It searches through code about 3x-5x faster than Ack. https ://github.com/ggreer/the_silver_searcher

Read the article
iotop for Linux kernel 2.6.18

- by Lightsauce

So it has to come to my attention that iotop isn't availalbe for 2.6.18 since it's less than 2.6.20 and requires Python 2.6+. I've done some research and came across this article: http://lserinol.blogspot.com/2009/09/io-usage-per-process-on-linux.html According to this, if these process have io stats in /proc/pid#/io (where pid# is the process #) it's doable regardless of the kernel version. So, in reality, I could upgrade Python to 2.6 and test out iotop. However, my flavor of Linux, CentOS release 5.5 (Final), only supports Python 2.4.3-44.el5 currently. If I were to do uninstall from yum, it doesn't look so pretty. It ends up wanting to uninstall 235 packages, most of which are very important! I read in one place, online (I forget the URL from yesterday), that you can install Python 2.6+ parallel to this one, and have the rpm install for iotop use that. Well, I didn't choose that route. I figured, what the heck, lets write iotop (not copying it, but reverse engineering it without actually looking at it's code/it in use) in bash. I thought it would just grab the /proc/pid#/io file and parse stats. So I wrote a script to grab the top 10 rchar, wchar, read_bytes, and write_bytes by collecting all these stats from all the /proc/pid#/io files, sorting them by each metric, then grabbing the top 10 highest values. The conclusion, the data seems completely useless. Does anybody know any resources for advanced Linux where I can figure out how to take these /proc/pid#/ directories and figure out what the heck they are doing with io on the disk? My main goal is to figure out what exactly is causing high load on my disk. I just know it's on the / partition (/dev/sda2 in this case), and I'm not really sure how to narrow it down without the help of iotop. If I run iostat to grab metrics for 1 minute, every second, the first result it gives me shows a high 'kB_read/s', so that makes me think, it's reading mostly. However, if I watch the update it gives me every second, it's actually just showing values for kB_wrtn/s. This makes me think the initial value iostat gives me is misleading.

Read the article
How to get average network load instead of instant

- by Adam Ryczkowski

Welcome, I use conky to see network load statistics with sampling every 8 seconds in order to get somewhat more smooth history chart. Unfortunately, all values i get are not average for this 8 second period, but they are sampled from much smaller time span, so charts are the same choppy, as if they were sampled from 1 second or less. Is there any way to get conky (or at least System Monitor) display system properties averaged over specified amount of time, just like Windows' task manager does? I would like to have conky display hard drive usage from iostat, but there will be little use if it, if conky reports instant values not averaged over time.

Read the article
Determining cause of high NFS/IO utilization without iotop

- by Matt

I have a server that is doing an NFSv4 export for user's home directories. There are roughly 25 users (mostly developers/analysts) and about 40 servers mounting the home directory export. Performance is miserable, with users often seeing multi-second lags for simple commands (like ls, or writing a small text file). Sometimes the home directory mount completely hangs for minutes, with users getting "permission denied" errors. The hardware is a Dell R510 with dual E5620 CPUs and 8 GB RAM. There are eight 15k 2.5” 600 GB drives (Seagate ST3600057SS) configured in hardware RAID-6 with a single hot spare. RAID controller is a Dell PERC H700 w/512MB cache (Linux sees this as a LSI MegaSAS 9260). OS is CentOS 5.6, home directory partition is ext3, with options “rw,data=journal,usrquota”. I have the HW RAID configured to present two virtual disks to the OS: /dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for the home directories. What I find curious, and suspicious, is that the sda device often has very high utilization, even though it only contains the OS. I would expect this virtual drive to be idle almost all the time. The system is not swapping, according to "free" and "vmstat". Why would there be major load on this device? Here is a 30-second snapshot from iostat: Time: 09:37:28 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 44.09 0.03 107.76 0.13 607.40 11.27 0.89 8.27 7.27 78.35 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 44.09 0.03 107.76 0.13 607.40 11.27 0.89 8.27 7.27 78.35 sdb 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 sdb1 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 dm-0 0.00 0.00 0.03 151.82 0.13 607.26 8.00 1.25 8.23 5.16 78.35 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 dm-3 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 Looks like iotop is the ideal tool to use to sniff out these kinds of issues. But I'm on CentOS 5.6, which doesn't have a new enough kernel to support that program. I looked at Determining which process is causing heavy disk I/O?, and besides iotop, one of the suggestions said to do a "echo 1 /proc/sys/vm/block_dump". I did that (after directing kernel messages to tempfs). In about 13 minutes I had about 700k reads or writes, roughly half from kjournald and the other half from nfsd: # egrep " kernel: .*(READ|WRITE)" messages | wc -l 768439 # egrep " kernel: kjournald.*(READ|WRITE)" messages | wc -l 403615 # egrep " kernel: nfsd.*(READ|WRITE)" messages | wc -l 314028 For what it's worth, for the last hour, utilization has constantly been over 90% for the home directory drive. My 30-second iostat keeps showing output like this: Time: 09:36:30 PM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.46 0.20 11.33 0.80 71.71 12.58 0.24 20.53 14.37 16.56 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 6.46 0.20 11.33 0.80 71.71 12.58 0.24 20.53 14.37 16.56 sdb 137.29 7.00 549.92 3.80 22817.19 43.19 82.57 3.02 5.45 1.74 96.32 sdb1 137.29 7.00 549.92 3.80 22817.19 43.19 82.57 3.02 5.45 1.74 96.32 dm-0 0.00 0.00 0.20 17.76 0.80 71.04 8.00 0.38 21.21 9.22 16.57 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 687.47 10.80 22817.19 43.19 65.48 4.62 6.61 1.43 99.81 dm-3 0.00 0.00 687.47 10.80 22817.19 43.19 65.48 4.62 6.61 1.43 99.82

Read the article
Linux software RAID6: rebuild slow

- by Ole Tange

I am trying to find the bottleneck in the rebuilding of a software raid6. ## Pause rebuilding when measuring raw I/O performance # echo 1 > /proc/sys/dev/raid/speed_limit_min # echo 1 > /proc/sys/dev/raid/speed_limit_max ## Drop caches so that does not interfere with measuring # sync ; echo 3 | tee /proc/sys/vm/drop_caches >/dev/null # time parallel -j0 "dd if=/dev/{} bs=256k count=4000 | cat >/dev/null" ::: sdbd sdbc sdbf sdbm sdbl sdbk sdbe sdbj sdbh sdbg 4000+0 records in 4000+0 records out 1048576000 bytes (1.0 GB) copied, 7.30336 s, 144 MB/s [... similar for each disk ...] # time parallel -j0 "dd if=/dev/{} skip=15000000 bs=256k count=4000 | cat >/dev/null" ::: sdbd sdbc sdbf sdbm sdbl sdbk sdbe sdbj sdbh sdbg 4000+0 records in 4000+0 records out 1048576000 bytes (1.0 GB) copied, 12.7991 s, 81.9 MB/s [... similar for each disk ...] So we can read sequentially at 140 MB/s in the outer tracks and 82 MB/s in the inner tracks on all the drives simultaneously. Sequential write performance is similar. This would lead me to expect a rebuild speed of 82 MB/s or more. # echo 800000 > /proc/sys/dev/raid/speed_limit_min # echo 800000 > /proc/sys/dev/raid/speed_limit_max # cat /proc/mdstat md2 : active raid6 sdbd[10](S) sdbc[9] sdbf[0] sdbm[8] sdbl[7] sdbk[6] sdbe[11] sdbj[4] sdbi[3](F) sdbh[2] sdbg[1] 27349121408 blocks super 1.2 level 6, 128k chunk, algorithm 2 [9/8] [UUU_UUUUU] [=========>...........] recovery = 47.3% (1849905884/3907017344) finish=855.9min speed=40054K/sec But we only get 40 MB/s. And often this drops to 30 MB/s. # iostat -dkx 1 sdbc 0.00 8023.00 0.00 329.00 0.00 33408.00 203.09 0.70 2.12 1.06 34.80 sdbd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdbe 13.00 0.00 8334.00 0.00 33388.00 0.00 8.01 0.65 0.08 0.06 47.20 sdbf 0.00 0.00 8348.00 0.00 33388.00 0.00 8.00 0.58 0.07 0.06 48.00 sdbg 16.00 0.00 8331.00 0.00 33388.00 0.00 8.02 0.71 0.09 0.06 48.80 sdbh 961.00 0.00 8314.00 0.00 37100.00 0.00 8.92 0.93 0.11 0.07 54.80 sdbj 70.00 0.00 8276.00 0.00 33384.00 0.00 8.07 0.78 0.10 0.06 48.40 sdbk 124.00 0.00 8221.00 0.00 33380.00 0.00 8.12 0.88 0.11 0.06 47.20 sdbl 83.00 0.00 8262.00 0.00 33380.00 0.00 8.08 0.96 0.12 0.06 47.60 sdbm 0.00 0.00 8344.00 0.00 33376.00 0.00 8.00 0.56 0.07 0.06 47.60 iostat says the disks are not 100% busy (but only 40-50%). This fits with the hypothesis that the max is around 80 MB/s. Since this is software raid the limiting factor could be CPU. top says: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 38520 root 20 0 0 0 0 R 64 0.0 2947:50 md2_raid6 6117 root 20 0 0 0 0 D 53 0.0 473:25.96 md2_resync So md2_raid6 and md2_resync are clearly busy taking up 64% and 53% of a CPU respectively, but not near 100%. The chunk size (128k) of the RAID was chosen after measuring which chunksize gave the least CPU penalty. If this speed is normal: What is the limiting factor? Can I measure that? If this speed is not normal: How can I find the limiting factor? Can I change that?

Read the article
What's going on with my server? High load, lots of idle CPU time, low disk utilization

- by Jonathan

I run a web site and send a legitimate opt-in, daily email newsletter to subscribers. Both the web hosting and email sending are done by the same machine. I have about 100,000 subscribers who have opted in to my daily email newsletter. My PHP script did a pretty good job sending mail to all of them until fairly recently, but as the list has grown I can't keep up. When I run top, I have very high load--usually at least 6 or 7, sometimes as high as 15--even though I only have two CPUs. However, when I run sar, my CPU is idle an average of about 30% of the time. So, it seems I'm not CPU bound. When I run iostat, it seems as though I'm not disk bound because my %util for each device is very low (no more than 5%). Given that I don't seem to be CPU bound or disk bound, why is top reporting such high load? Additionally, since I don't seem to be CPU bound or disk bound, why is my email sending script not able to keep up? Here's what I see when running top: top - 11:33:28 up 74 days, 18:49, 2 users, load average: 7.65, 8.79, 8.28 Tasks: 168 total, 5 running, 162 sleeping, 0 stopped, 1 zombie Cpu(s): 38.9%us, 58.6%sy, 0.8%ni, 0.0%id, 0.7%wa, 0.2%hi, 0.8%si, 0.0%st Mem: 3083012k total, 2144436k used, 938576k free, 281136k buffers Swap: 2048248k total, 39164k used, 2009084k free, 1470412k cached Here's what I see when running iostat -mx: avg-cpu: %user %nice %system %iowait %steal %idle 34.80 1.20 55.24 0.37 0.00 8.38 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.19 71.70 1.59 29.45 0.02 0.07 5.90 0.55 17.82 1.16 3.59 sda1 0.00 0.00 0.00 0.00 0.00 0.00 7.10 0.00 13.80 13.72 0.00 sda2 0.05 50.45 1.13 24.57 0.01 0.29 24.25 0.35 13.43 1.15 2.97 sda3 0.05 10.17 0.20 2.33 0.01 0.05 43.75 0.05 20.96 2.45 0.62 sda4 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 70.50 70.50 0.00 sda5 0.07 0.22 0.03 0.07 0.00 0.00 32.84 0.08 856.19 8.03 0.08 sda6 0.02 5.45 0.03 0.72 0.00 0.02 67.55 0.02 26.72 5.26 0.39 sda7 0.00 1.56 0.00 0.42 0.00 0.01 38.04 0.00 8.88 5.84 0.24 sda8 0.01 3.84 0.20 1.35 0.00 0.02 28.55 0.05 31.90 4.08 0.63 Here's what I see when running sar: 09:40:02 AM CPU %user %nice %system %iowait %steal %idle 09:50:01 AM all 30.59 1.01 49.80 0.23 0.00 18.37 10:00:08 AM all 31.73 0.92 51.66 0.13 0.00 15.55 10:10:06 AM all 30.43 0.99 48.94 0.26 0.00 19.38 10:20:01 AM all 29.58 1.00 47.76 0.25 0.00 21.42 10:30:01 AM all 29.37 1.02 47.30 0.18 0.00 22.13 10:40:06 AM all 32.50 1.01 52.94 0.16 0.00 13.39 10:50:01 AM all 30.49 1.00 49.59 0.15 0.00 18.77 11:00:01 AM all 29.43 0.99 47.71 0.17 0.00 21.71 11:10:07 AM all 30.26 0.93 49.48 0.83 0.00 18.50 11:20:02 AM all 29.83 0.81 48.51 1.32 0.00 19.52 11:30:06 AM all 31.18 0.88 51.33 1.15 0.00 15.47 Average: all 26.21 1.15 42.62 0.48 0.00 29.54 Here are the top handful of processes listed at the particular time I happened to run top -c: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8180 mysql 16 0 57448 19m 2948 S 26.6 0.7 4702:26 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --pid-file=/var/lib/mysql/bristno.pid --skip-external-locking 26956 brristno 17 0 0 0 0 Z 8.0 0.0 0:00.24 [php] <defunct> 26958 brristno 17 0 94408 43m 37m R 5.0 1.4 0:00.15 /usr/bin/php /home/brristno/public_html/dbv.php 22852 nobody 16 0 9628 2900 1524 S 0.7 0.1 0:00.17 /usr/local/apache/bin/httpd -k start -DSSL 8591 brristno 34 19 96896 13m 6652 S 0.3 0.4 0:29.82 /usr/local/bin/php /home/brristno/bin/mailer.php 1qwqyb6 i0gbor 24469 nobody 16 0 9628 2880 1508 S 0.3 0.1 0:00.08 /usr/local/apache/bin/httpd -k start -DSSL 25495 nobody 15 0 9628 2876 1500 S 0.3 0.1 0:00.06 /usr/local/apache/bin/httpd -k start -DSSL 26149 nobody 15 0 9628 2864 1504 S 0.3 0.1 0:00.04 /usr/local/apache/bin/httpd -k start -DSSL

Read the article
Linux halts every few seconds

- by Zeppomedio

We're having an issue where one our Linux boxes (Ubuntu 10.04 LTS, running on EC2 with a quadruple-large size, 68GB of RAM and 8 virtual cores with 3.25GHz each) freezes up every few seconds. Typing in an ssh session will freeze, and running strace on one of the Postgresql processes that's running usually shows: 02:37:41.567990 semop(7831581, {{3, -1, 0}}, 1 for a few seconds before it proceeds (it always gets stuck at that semop). OProfile shows that most of the time is spent in the kernel (60%) versus 37% in Postgresql. The result of these halts (which began suddenly a day ago) is that load on the box has gone from 0.7 to 10+, and causes our entire stack to slow done. Any ideas on how to track down what's going on? iostat doesn't show the disks being particularly slow or overloaded, and top shows user cpu % spike from 8% to about 40% whenever these back-ups happen.

Read the article
Postfix performance

- by Brian G

Running postfix on ubuntu, sending alot of mail ( ~ 1 million messages ) per day. loads are extremly high but not much in terms of cpu and memory load. Anyone in a similiar situation and know how to remove the bottleneck? All mail on this server is outbound. I would have to assume the bottleneck is disk. Just an update, here is what iostat looks like: avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.12 99.88 0.00 0.00 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 12.38 0.00 2.48 0.00 118.81 48.00 0.00 0.00 0.00 0.00 sdb 1.49 22.28 72.28 42.57 629.70 1041.58 14.55 135.56 834.31 8.71 100.00 Are these numbers in line with the performance you would expect from a single disk? sdb is dedicated to postfix. I think it is queue shuffling, from incoming-active-deferred More details from questions: Server: Quad core Xeon(R) CPU E5405 @ 2.00GH with 4 GB ram Load average: 464.88, 489.11, 483.91, 4 cores. but the memory utilization and cpu is minimal Postfix instances between 16 - 32

Read the article
Getting ZFS per dataset IO statistics (or NFS per export IO statistics)

- by jkj

Where do I find statistics about how IO is divided between zfs datasets? (zpool iostat only tells me how much IO a pool is experiencing.) All the relevant datasets are used through NFS, so I'd be happy with per export NFS IO statistics also. We're currently running OpenIndiana [edit] It seems that operation and byte counter are available in kstat kstat -p unix:*:vopstats_??????? ... unix:0:vopstats_2d90002:nputpage 50 unix:0:vopstats_2d90002:nread 12390785 ... unix:0:vopstats_2d90002:read_bytes 22272845340 unix:0:vopstats_2d90002:readdir_bytes 477996168 ... ...but the strange hexadecimal ID numbers have to be resolved from /etc/mnttab (better ideas?) rpool/export/home/jkj /export/home/jkj zfs rw,...,dev=2d90002 1308471917 Now writing a munin plugin to use the data...

Read the article
Ubuntu server very slow out of the blue sky (Rails, passenger, nginx)

- by snitko

I run Ubuntu server 8.04 on Linode with multiple Rails apps under Passenger + nginx. Today I've noticed it takes quite a lot of time to load a page (5-10 secs). And it's not only websites, ssh seems to be affected too. Having no clue why this may be happening, I started to check different things. I checked how the log files are rotated, I checked if there's enough free disk space and memory. I also checked IO rate, here's the output: $ iostat avg-cpu: %user %nice %system %iowait %steal %idle 0.17 0.00 0.02 0.57 0.16 99.07 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvda 2.25 39.50 16.08 147042 59856 xvdb 0.00 0.05 0.00 192 0 xvdc 2.20 25.93 24.93 96530 92808 xvdd 0.01 0.12 0.00 434 16 xvde 0.04 0.23 0.35 858 1304 xvdf 0.37 0.31 4.12 1162 15352 Rebooting didn't help either. Any ideas where should I be looking?

Read the article

1 2 3 | Next Page >