Search Results

Search found 69 results on 3 pages for 'iostat'.

Page 2/3 | < Previous Page | 1 2 3  | Next Page >

  • baffling stat pauses - (reiserfs)

    - by Twirrim
    I've got a SuSE server with reiserfs formatted partitions, all on a RAID1 mirror. I'm noticing odd spikes in iowait, but generally all seems okay. iostat claims not even 3% iowait in general: avg-cpu: %user %nice %system %iowait %steal %idle 6.53 0.03 1.45 2.92 0.00 89.07 After tracking down some odd behaviour when ls'ing, it appears "stat" (ls is aliased to include color which does a stat process on each file) randomly takes 5 seconds on a file in the directory. ReiserFS does a file system sync every 5 seconds, but that's interval between not length of time taken to sync. Out of curiosity I did remount using noatime to see if that would help though as it would reduce sync's workload, but no joy. Anyone got any thoughts what might be causing this pause? Disks appear healthy, RAID controller believes the RAID is healthy, and io stats show the disks aren't working very hard at all.

    Read the article

  • What metric captures why my OSX machine is so slow during XCode indexing

    - by Ben Flynn
    My entire machine OSX Lion machine slows down while XCode 4.4 is indexing. The CPU is less than 10% busy, I've got over 500 MB free memory, plenty of disk space, disk IO rate is not high, network activity is not high. Indexing just a few files can take minutes and builds are extremely slow. While this is going on, even loading a new web page in Chrome can be slow. Knowing how to fix it would be great, but more fundamentally how can I measure what is actually going slowly? What metrics should I be looking at? Nothing in Activity Monitor, iostat, top, or sar betray anything about what's going on to me. Even getting a man page is interminable.

    Read the article

  • What is this dm-0 device?

    - by Jeff Shattock
    While poking around trying to figure out why a Linux - Linux file transfer is running slower than I think it should, I stumbled across something I'm not familiar with. /dev/dm-0 seems to be my bottleneck, but I have no idea what it is. On my destination server, the iostat command shows a device at the bottom, /dev/dm-0, as being 100% utilized. This server has 6 disks in a mdadm raid5 set, with LVM running on top of it. Each of the underlying disks are sitting around 50% util. The transfer is writing to a logical volume located on this raidset. What is this /dev/dm-0 thing? Once I know what it is, maybe I can find how to increase its speed, or at least understand why its the speed that it is.

    Read the article

  • Ubuntu server very slow out of the blue sky (Rails, passenger, nginx)

    - by snitko
    I run Ubuntu server 8.04 on Linode with multiple Rails apps under Passenger + nginx. Today I've noticed it takes quite a lot of time to load a page (5-10 secs). And it's not only websites, ssh seems to be affected too. Having no clue why this may be happening, I started to check different things. I checked how the log files are rotated, I checked if there's enough free disk space and memory. I also checked IO rate, here's the output: $ iostat avg-cpu: %user %nice %system %iowait %steal %idle 0.17 0.00 0.02 0.57 0.16 99.07 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvda 2.25 39.50 16.08 147042 59856 xvdb 0.00 0.05 0.00 192 0 xvdc 2.20 25.93 24.93 96530 92808 xvdd 0.01 0.12 0.00 434 16 xvde 0.04 0.23 0.35 858 1304 xvdf 0.37 0.31 4.12 1162 15352 Rebooting didn't help either. Any ideas where should I be looking?

    Read the article

  • Diagnosing high CPU waiting

    - by Will
    I have a monitoring server that is running icinga/collectd/graphite with about 50 hosts. I have noticed high load/slugging performance on the box. If you take a look at top, you'll see: Cpu(s): 0.6%us, 0.2%sy, 0.0%ni, 7.6%id, 23.4%wa, 0.0%hi, 0.2%si, 0.0%st Notice the HUGE %wa value, which as far as I know means a network or disk bottleneck. ifconfig shows no dropping packets and there's not a ton of bandwidth going on, so that leaves disk issues, right? There's not a lot of disk writing going on either...iotop is reporting we're only writing a little over 1 MB per second and the RAID tool reports everything is A-OK and write caching is enabled. How do I go about trying to figure out how to fix this? UPDATE: iostat -x output is: avg-cpu: %user %nice %system %iowait %steal %idle 0.62 0.10 0.31 9.65 0.00 89.31 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.21 33.34 83.55 16.54 1599.94 399.07 19.97 43.21 416.98 3.71 37.13

    Read the article

  • Can someone explain the "use-cases" for the default munin graphs?

    - by exhuma
    When installing munin, it activates a default set of plugins (at least on ubuntu). Alternatively, you can simply run munin-node-configure to figure out which plugins are supported on your system. Most of these plugins plot straight-forward data. My question is not to explain the nature of the data (well... maybe for some) but what is it that you look for in these graphs? It is easy to install munin and see fancy graphs. But having the graphs and not being able to "read" them renders them totally useless. I am going to list standard plugins which are enabled by default on my system. So it's going to be a long list. For completeness, I am also going to list plugins which I think to understand and give a short explanation as to what I think it's used for. Pleas correct if I am wrong with any of them. So let me split this questions in three parts: Plugins where I don't even understand the data Plugins where I understand the data but don't know what I should look out for Plugins which I think to understand Plugins where I don't even understand the data These may contain questions that are not necessarily aimed at munin alone. Not understanding the data usually mean a gap in fundamental knowledge on operating systems/hardware.... ;) Feel free to respond with a "giyf" answer. These are plugins where I can only guess what's going on... I hardly want to look at these "guessing"... Disk IOs per device (IOs/second)What's an IO. I know it stands for input/output. But that's as far as it goes. Disk latency per device (Average IO wait)Not a clue what an "IO wait" is... IO Service TimeThis one is a huge mess, and it's near impossible to see something in the graph at all. Plugins where I understand the data but don't know what I should look out for IOStat (blocks/second read/written)I assume, the thing to look out for in here are spikes? Which would mean that the device is in heavy use? Available entropy (bytes)I assume that this is important for random number generation? Why would I graph this? So far the value has always been near constant. VMStat (running/I/O sleep processes)What's the difference between this one and the "processes" graph? Both show running/sleeping processes, whereas the "Processes" graph seems to have more details. Disk throughput per device (bytes/second read/written) What's thedifference between this one and the "IOStat" graph? inode table usageWhat should I look for in this graph? Plugins which I think to understand I'll be guessing some things here... correct me if I am wrong. Disk usage in percent (percent)How much disk space is used/remaining. As this is approaching 100%, you should consider cleaning up or extend the partition. This is extremely important for the root partition. Firewall Throughput (packets/second)The number of packets passing through the firewall. If this is spiking for a longer period of time, it could be a sign of a DOS attack (or we are simply recieving a large file). It can also give you an idea about your firewall performance. If it's levelling out and you need more "power" you should consider load balancing. If it's levelling out and see a correlation with your CPU load, it could also mean that your hardware is not fast enough. Correlations with disk usage could point to excessive LOG targets in you FW config. eth0 errors (packets in/out)Network errors. If this value is increasing, it could be a sign of faulty hardware. eth0 traffic (bits/second in/out)Raw network traffic. This should correlate with Firewall throughput. number of threadsAn ever-increasing value might point to a process not properly closing threads. Investigate! processesBreakdown of active processes (including sleeping). A quick spike in here might point to a fork-bomb. A slowly, but ever-increasing value might point to an application spawning sub-processes but not properly closing them. Investigate using ps faux. process priorityThis shows the distribution of process priorities. Having only high-priority processes is not of much use. Consider de-prioritizing some. cpu usageFairly straight-forward. If this is spiking, you may have an attack going on, or a process is hogging the CPU. Idf it's slowly increasing and approaching max in normal operations, you should consider upgrading your hardware (or load-balancing). file table usageNumber of actively open files. If this is reaching max, you may have a process opening, but not properly releasing files. load averageShows an summarized value for the system load. Should correlate with CPU usage. Increasing values can come from a number of sources. Look for correlations with other graphs. memory usageA graphical representation of you memory. As long as you have a lot of unused+cache+buffers you are fine. swap in/outShows the activity on your swap partition. This should always be 0. If you see activity on this, you should add more memory to your machine!

    Read the article

  • Throughput = BS * IOPS?

    - by Marvin
    I've seen in many places that throughput = bs * iops should be true. For example writing at 128k block size to a SAS disk that can support 190 IOPS should give a throughput of ~23 MBps - 23.75(MBs) = 128(BS)*190(SAS-15 IOPS)/1024. Now when I tested it in a VM against a monster NetApp filer I got theses results: # dd if=/dev/zero of=/tmp/dd.out bs=4k count=2097152 8589934592 bytes (8.6 GB) copied, 61.5996 seconds, 139 MB/s To view the IO rate of the VM I used iostat and esxtop, and they both showed around 250 IOPS. So to my understanding the throughput was supposed to be ~1000k: 1000(KBs) = 4(BS)*250(IOPS). dd of 8GB is twice the size of RAM of course, so no page caching here. What am I missing? Thanks!

    Read the article

  • Solaris disk I/O spike... which monitoring tools do I need?

    - by bonez05
    Hi all, My Solaris 10 websever / database server disk io keeps spiking at intermittent times. Using iostat -xtc 5 the reads / sec will jump from 3.0 to 1450.0 and %busy will jump to 98% The apache access log doesn't indicate anything out of the ordinary. In other words, requests aren't any higher than usual. top doesn't generate anything useful. The cpu usage is fine with mysql using about 20% and nothing else to speak of really. What monitoring tool should I be using to see what process is using excessive disk I/O? or if there are any other suggestions I'm all ears. Thanks

    Read the article

  • How do I log file system read/writes by filename in Linux?

    - by Casey
    I'm looking for a simple method that will log file system operations. It should display the name of the file being accessed or modified. I'm familiar with powertop, and it appears this works to an extent, in so much that it show the user files that were written to. Is there any other utilities that support this feature. Some of my findings: powertop: best for write access logging, but more focused on CPU activity iotop: shows real time disk access by process, but not file name lsof: shows the open files per process, but not real time file access iostat: shows the real time I/O performance of disk/arrays but does not indicate file or process

    Read the article

  • Various problems with software raid1 array built with Samsung 840 Pro SSDs

    - by Andy B
    I am bringing to ServerFault a problem that is tormenting me for 6+ months. I have a CentOS 6 (64bit) server with an md software raid-1 array with 2 x Samsung 840 Pro SSDs (512GB). Problems: Serious write speed problems: root [~]# time dd if=arch.tar.gz of=test4 bs=2M oflag=sync 146+1 records in 146+1 records out 307191761 bytes (307 MB) copied, 23.6788 s, 13.0 MB/s real 0m23.680s user 0m0.000s sys 0m0.932s When doing the above (or any other larger copy) the load spikes to unbelievable values (even over 100) going up from ~ 1. When doing the above I've also noticed very weird iostat results: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 1589.50 0.00 54.00 0.00 13148.00 243.48 0.60 11.17 0.46 2.50 sdb 0.00 1627.50 0.00 16.50 0.00 9524.00 577.21 144.25 1439.33 60.61 100.00 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md2 0.00 0.00 0.00 1602.00 0.00 12816.00 8.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 And it keeps it this way until it actually writes the file to the device (out from swap/cache/memory). The problem is that the second SSD in the array has svctm and await roughly 100 times larger than the second. For some reason the wear is different between the 2 members of the array root [~]# smartctl --attributes /dev/sda | grep -i wear 177 Wear_Leveling_Count 0x0013 094% 094 000 Pre-fail Always - 180 root [~]# smartctl --attributes /dev/sdb | grep -i wear 177 Wear_Leveling_Count 0x0013 070% 070 000 Pre-fail Always - 1005 The first SSD has a wear of 6% while the second SSD has a wear of 30%!! It's like the second SSD in the array works at least 5 times as hard as the first one as proven by the first iteration of iostat (the averages since reboot): Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 10.44 51.06 790.39 125.41 8803.98 1633.11 11.40 0.33 0.37 0.06 5.64 sdb 9.53 58.35 322.37 118.11 4835.59 1633.11 14.69 0.33 0.76 0.29 12.97 md1 0.00 0.00 1.88 1.33 15.07 10.68 8.00 0.00 0.00 0.00 0.00 md2 0.00 0.00 1109.02 173.12 10881.59 1620.39 9.75 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.41 0.01 3.10 0.02 7.42 0.00 0.00 0.00 0.00 What I've tried: I've updated the firmware to DXM05B0Q (following reports of dramatic improvements for 840Ps after this update). I have looked for "hard resetting link" in dmesg to check for cable/backplane issues but nothing. I have checked the alignment and I believe they are aligned correctly (1MB boundary, listing below) I have checked /proc/mdstat and the array is Optimal (second listing below). root [~]# fdisk -ul /dev/sda Disk /dev/sda: 512.1 GB, 512110190592 bytes 255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00026d59 Device Boot Start End Blocks Id System /dev/sda1 2048 4196351 2097152 fd Linux raid autodetect Partition 1 does not end on cylinder boundary. /dev/sda2 * 4196352 4605951 204800 fd Linux raid autodetect Partition 2 does not end on cylinder boundary. /dev/sda3 4605952 814106623 404750336 fd Linux raid autodetect root [~]# fdisk -ul /dev/sdb Disk /dev/sdb: 512.1 GB, 512110190592 bytes 255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0003dede Device Boot Start End Blocks Id System /dev/sdb1 2048 4196351 2097152 fd Linux raid autodetect Partition 1 does not end on cylinder boundary. /dev/sdb2 * 4196352 4605951 204800 fd Linux raid autodetect Partition 2 does not end on cylinder boundary. /dev/sdb3 4605952 814106623 404750336 fd Linux raid autodetect /proc/mdstat root # cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb2[1] sda2[0] 204736 blocks super 1.0 [2/2] [UU] md2 : active raid1 sdb3[1] sda3[0] 404750144 blocks super 1.0 [2/2] [UU] md1 : active raid1 sdb1[1] sda1[0] 2096064 blocks super 1.1 [2/2] [UU] unused devices: Running a read test with hdparm root [~]# hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 664 MB in 3.00 seconds = 221.33 MB/sec root [~]# hdparm -t /dev/sdb /dev/sdb: Timing buffered disk reads: 288 MB in 3.01 seconds = 95.77 MB/sec But look what happens if I add --direct root [~]# hdparm --direct -t /dev/sda /dev/sda: Timing O_DIRECT disk reads: 788 MB in 3.01 seconds = 262.08 MB/sec root [~]# hdparm --direct -t /dev/sdb /dev/sdb: Timing O_DIRECT disk reads: 534 MB in 3.02 seconds = 176.90 MB/sec Both tests increase but /dev/sdb doubles while /dev/sda increases maybe 20%. I just don't know what to make of this. As suggested by Mr. Wagner I've done another read test with dd this time and it confirms the hdparm test: root [/home2]# dd if=/dev/sda of=/dev/null bs=1G count=10 10+0 records in 10+0 records out 10737418240 bytes (11 GB) copied, 38.0855 s, 282 MB/s root [/home2]# dd if=/dev/sdb of=/dev/null bs=1G count=10 10+0 records in 10+0 records out 10737418240 bytes (11 GB) copied, 115.24 s, 93.2 MB/s So sda is 3 times faster than sdb. Or maybe sdb is doing also something else besides what sda does. Is there some way to find out if sdb is doing more than what sda does? UPDATE Again, as suggested by Mr. Wagner, I have swapped the 2 SSDs. And as he thought it would happen, the problem moved from sdb to sda. So I guess I'll RMA one of the SSDs. I wonder if the cage might be problematic. What is wrong with this array? Please help!

    Read the article

  • How to tell if linux disk IO is causing excessive (> 1 second) application stalls

    - by noahz
    I have a Java application performing a large volume (hundreds of MB) of continuous output (streaming plain text) to about a dozen files a ext3 SAN filesystem. Occasionally, this application pauses for several seconds at a time. I suspect that something related to ext3 vsfs (Veritas Filesystem) functionality (and/or how it interacts with the OS) is the culprit. What steps can I take to confirm or refute this theory? I am aware of iostat and /proc/diskstats as starting points. Revised title to de-emphasize journaling and emphasize "stalls" I have done some googling and found at least one article that seems to describe behavior like I am observing: Solving the ext3 latency problem Additional Information Red Hat Enterprise Linux Server release 5.3 (Tikanga) Kernel: 2.6.18-194.32.1.el5 Primary application disk is fiber-channel SAN: lspci | grep -i fibre 14:00.0 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03) Mount info: type vxfs (rw,tmplog,largefiles,mincache=tmpcache,ioerror=mwdisable) 0 0 cat /sys/block/VxVM123456/queue/scheduler noop anticipatory [deadline] cfq

    Read the article

  • Per process I/O accounting on AIX

    - by ipozgaj
    Is there a way of getting per process I/O statistics on AIX, i.e. to get current disk I/O rate of a process? Commands like iostat, nmon, topas etc. can't display such data. Filemon also doesn't help. Actually, what I would need is something much like iotop(1) command on Linux. Update: it seems there is no builtin command(s) to do this. I will most probably make my own by using the SPMI API.

    Read the article

  • Linux freezes every few seconds

    - by Zeppomedio
    We're having an issue where one our Linux boxes (Ubuntu 10.04 LTS, running on EC2 with a quadruple-large size, 68GB of RAM and 8 virtual cores with 3.25GHz each) freezes up every few seconds. Typing in an ssh session will freeze, and running strace on one of the Postgresql processes that's running usually shows: 02:37:41.567990 semop(7831581, {{3, -1, 0}}, 1 for a few seconds before it proceeds (it always gets stuck at that semop). OProfile shows that most of the time is spent in the kernel (60%) versus 37% in Postgresql. The result of these halts (which began suddenly a day ago) is that load on the box has gone from 0.7 to 10+, and causes our entire stack to slow done. Any ideas on how to track down what's going on? iostat doesn't show the disks being particularly slow or overloaded, and top shows user cpu % spike from 8% to about 40% whenever these back-ups happen.

    Read the article

  • How do I log file system read/writes by filename in Linux?

    - by Casey
    I'm looking for a simple method that will log file system operations. It should display the name of the file being accessed or modified. I'm familiar with powertop, and it appears this works to an extent, in so much that it show the user files that were written to. Is there any other utilities that support this feature. Some of my findings: powertop: best for write access logging, but more focused on CPU activity iotop: shows real time disk access by process, but not file name lsof: shows the open files per process, but not real time file access iostat: shows the real time I/O performance of disk/arrays but does not indicate file or process

    Read the article

  • Why would Linux VM in vSphere ESXi 5.5 show dramatically increased disk i/o latency?

    - by mhucka
    I'm stumped and I hope someone else will recognize the symptoms of this problem. Hardware: new Dell T110 II, dual-core Pentium G860 2.9 GHz, onboard SATA controller, one new 500 GB 7200 RPM cabled hard drive inside the box, other drives inside but not mounted yet. No RAID. Software: fresh CentOS 6.5 virtual machine under VMware ESXi 5.5.0 (build 174 + vSphere Client). 2.5 GB RAM allocated. The disk is how CentOS offered to set it up, namely as a volume inside an LVM Volume Group, except that I skipped having a separate /home and simply have / and /boot. CentOS is patched up, ESXi patched up, latest VMware tools installed in the VM. No users on the system, no services running, no files on the disk but the OS installation. I'm interacting with the VM via the VM virtual console in vSphere Client. Before going further, I wanted to check that I configured things more or less reasonably. I ran the following command as root in a shell on the VM: for i in 1 2 3 4 5 6 7 8 9 10; do dd if=/dev/zero of=/test.img bs=8k count=256k conv=fdatasync done I.e., just repeat the dd command 10 times, which results in printing the transfer rate each time. The results are disturbing. It starts off well: 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 20.451 s, 105 MB/s 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 20.4202 s, 105 MB/s ... but after 7-8 of these, it then prints 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GG) copied, 82.9779 s, 25.9 MB/s 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 84.0396 s, 25.6 MB/s 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 103.42 s, 20.8 MB/s If I wait a significant amount of time, say 30-45 minutes, and run it again, it again goes back to 105 MB/s, and after several rounds (sometimes a few, sometimes 10+), it drops to ~20-25 MB/s again. Plotting the disk latency in vSphere's interface, it shows periods of high disk latency hitting 1.2-1.5 seconds during the times that dd reports the low throughput. (And yes, things get pretty unresponsive while that's happening.) What could be causing this? I'm comfortable that it is not due to the disk failing, because I also had configured two other disks as an additional volume in the same system. At first I thought I did something wrong with that volume, but after commenting the volume out from /etc/fstab and rebooting, and trying the tests on / as shown above, it became clear that the problem is elsewhere. It is probably an ESXi configuration problem, but I'm not very experienced with ESXi. It's probably something stupid, but after trying to figure this out for many hours over multiple days, I can't find the problem, so I hope someone can point me in the right direction. (P.S.: yes, I know this hardware combo won't win any speed awards as a server, and I have reasons for using this low-end hardware and running a single VM, but I think that's besides the point for this question [unless it's actually a hardware problem].) ADDENDUM #1: Reading other answers such as this one made me try adding oflag=direct to dd. However, it makes no difference in the pattern of results: initially the numbers are higher for many rounds, then they drop to 20-25 MB/s. (The initial absolute numbers are in the 50 MB/s range.) ADDENDUM #2: Adding sync ; echo 3 > /proc/sys/vm/drop_caches into the loop does not make a difference at all. ADDENDUM #3: To take out further variables, I now run dd such that the file it creates is larger than the amount of RAM on the system. The new command is dd if=/dev/zero of=/test.img bs=16k count=256k conv=fdatasync oflag=direct. Initial throughput numbers with this version of the command are ~50 MB/s. They drop to 20-25 MB/s when things go south. ADDENDUM #4: Here is the output of iostat -d -m -x 1 running in another terminal window while performance is "good" and then again when it's "bad". (While this is going on, I'm running dd if=/dev/zero of=/test.img bs=16k count=256k conv=fdatasync oflag=direct.) First, when things are "good", it shows this: When things go "bad", iostat -d -m -x 1 shows this:

    Read the article

  • Utility to record IO statistics (random/sequential, block sizes, read/write ratio) in Unix

    - by Michael Pearson
    As part of provisioning our new server (see other SF) I'd like to find out the following: ratio of random to sequential reads & writes amount of data read & written at a time (pref in histogram form) I can already figure out our reads/writes on a per-operation and overall data level using iostat & dstat, but I'd like to know more. For example, I'd like to know that we're mostly random 16kb reads, or a lot of sequential 64kb reads with random writes. We're (currently) on an Ubuntu 10.04 VM. Is there a utility that I can run that will record and present this information for me?

    Read the article

  • Why is my HDD going back from standy?

    - by Pablo
    My hard drives, connected to Ubuntu server are producing the following log every exactly 5 minutes. Nov 1 14:10:50 localhost kernel: [ 1602.884936] ata2.00: hard resetting link Nov 1 14:10:51 localhost kernel: [ 1603.226804] ata2.01: hard resetting link Nov 1 14:10:52 localhost kernel: [ 1604.274533] ata2.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Nov 1 14:10:52 localhost kernel: [ 1604.274548] ata2.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Nov 1 14:10:52 localhost kernel: [ 1604.356669] ata2.00: configured for UDMA/133 Nov 1 14:10:52 localhost kernel: [ 1604.375247] ata2.01: configured for UDMA/133 Nov 1 14:10:52 localhost kernel: [ 1604.375265] ata2: EH complete I don't think this is related to hard drive failure, because it happens for ALL hard drives connected and ONLY when I write spindown_time = 12 in /etc/hdparm.conf. The reason I add this value is to put disks into standby mode after 60 seconds, which is happening after that period (checked with hdparm -C). The first clue I thought that smartd was running and spinning the drive. However, I couldn't find it in ps -aux | grep smart. Additionally, iostat does show that nobody accessed those drives, since Blk_read, Blk_wrtn remain unchanged. I also killed all processes that may be doing something with hdd(eg SAMBA). So I guess the problem is solely with hdparm... I have no more clue where that 5 minute value hides.

    Read the article

  • High Server Load cannot figure out why

    - by Tim Bolton
    My server is currently running CentOS 5.2, with WHM 11.34. Currently, we're at 6.43 to 12 for a load average. The sites that we're hosting are taking a lot time to respond and resolve. top doesn't show anything out of the ordinary and iftop doesn't show a lot of traffic. We have many resellers, and some not so good at writing code, how can we find the culprit? vmstat output: vmstat procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 2 84 78684 154916 1021080 0 0 72 274 0 14 6 3 80 12 0 top output (ordered by %CPU) top - 21:44:43 up 5 days, 10:39, 3 users, load average: 3.36, 4.18, 4.73 Tasks: 222 total, 3 running, 219 sleeping, 0 stopped, 0 zombie Cpu(s): 5.8%us, 2.3%sy, 0.2%ni, 79.6%id, 11.8%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 2074580k total, 1863044k used, 211536k free, 174828k buffers Swap: 2040212k total, 84k used, 2040128k free, 987604k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15930 mysql 15 0 138m 46m 4380 S 4 2.3 1:45.87 mysqld 21772 igniteth 17 0 23200 7152 3932 R 4 0.3 0:00.02 php 1586 root 10 -5 0 0 0 S 2 0.0 11:45.19 kjournald 21759 root 15 0 2416 1024 732 R 2 0.0 0:00.01 top 1 root 15 0 2156 648 560 S 0 0.0 0:26.31 init 2 root RT 0 0 0 0 S 0 0.0 0:00.35 migration/0 3 root 34 19 0 0 0 S 0 0.0 0:00.32 ksoftirqd/0 4 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 5 root RT 0 0 0 0 S 0 0.0 0:02.00 migration/1 6 root 34 19 0 0 0 S 0 0.0 0:00.11 ksoftirqd/1 7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1 8 root RT 0 0 0 0 S 0 0.0 0:01.29 migration/2 9 root 34 19 0 0 0 S 0 0.0 0:00.26 ksoftirqd/2 10 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/2 11 root RT 0 0 0 0 S 0 0.0 0:00.90 migration/3 12 root 34 19 0 0 0 R 0 0.0 0:00.20 ksoftirqd/3 13 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/3 top output (ordered by CPU time) top - 21:46:12 up 5 days, 10:41, 3 users, load average: 2.88, 3.82, 4.55 Tasks: 217 total, 1 running, 216 sleeping, 0 stopped, 0 zombie Cpu(s): 3.7%us, 2.0%sy, 2.0%ni, 67.2%id, 25.0%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 2074580k total, 1959516k used, 115064k free, 183116k buffers Swap: 2040212k total, 84k used, 2040128k free, 1090308k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TIME COMMAND 32367 root 16 0 215m 212m 1548 S 0 10.5 62:03.63 62:03 tailwatchd 1586 root 10 -5 0 0 0 S 0 0.0 11:45.27 11:45 kjournald 1576 root 10 -5 0 0 0 S 0 0.0 2:37.86 2:37 kjournald 27722 root 16 0 2556 1184 800 S 0 0.1 1:48.94 1:48 top 15930 mysql 15 0 138m 46m 4380 S 4 2.3 1:48.63 1:48 mysqld 2932 root 34 19 0 0 0 S 0 0.0 1:41.05 1:41 kipmi0 226 root 10 -5 0 0 0 S 0 0.0 1:34.33 1:34 kswapd0 2671 named 25 0 74688 7400 2116 S 0 0.4 1:23.58 1:23 named 3229 root 15 0 10300 3348 2724 S 0 0.2 0:40.85 0:40 sshd 1580 root 10 -5 0 0 0 S 0 0.0 0:30.62 0:30 kjournald 1 root 17 0 2156 648 560 S 0 0.0 0:26.32 0:26 init 2616 root 15 0 1816 576 480 S 0 0.0 0:23.50 0:23 syslogd 1584 root 10 -5 0 0 0 S 0 0.0 0:18.67 0:18 kjournald 4342 root 34 19 27692 11m 2116 S 0 0.5 0:18.23 0:18 yum-updatesd 8044 bollingp 15 0 3456 2036 740 S 1 0.1 0:15.56 0:15 imapd 26 root 10 -5 0 0 0 S 0 0.0 0:14.18 0:14 kblockd/1 7989 gmailsit 16 0 3196 1748 736 S 0 0.1 0:10.43 0:10 imapd iostat -xtk 1 10 output [root@server1 tmp]# iostat -xtk 1 10 Linux 2.6.18-53.el5 12/18/2012 Time: 09:51:06 PM avg-cpu: %user %nice %system %iowait %steal %idle 5.83 0.19 2.53 11.85 0.00 79.60 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 1.37 118.83 18.70 54.27 131.47 692.72 22.59 4.90 67.19 3.10 22.59 sdb 0.35 39.33 20.33 61.43 158.79 403.22 13.75 5.23 63.93 3.77 30.80 Time: 09:51:07 PM avg-cpu: %user %nice %system %iowait %steal %idle 1.50 0.00 0.50 24.00 0.00 74.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 25.00 2.00 2.00 128.00 108.00 118.00 0.03 7.25 4.00 1.60 sdb 0.00 16.00 41.00 145.00 200.00 668.00 9.33 107.92 272.72 5.38 100.10 Time: 09:51:08 PM avg-cpu: %user %nice %system %iowait %steal %idle 2.00 0.00 1.50 29.50 0.00 67.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 95.00 3.00 33.00 12.00 480.00 27.33 0.07 1.72 1.31 4.70 sdb 0.00 14.00 1.00 228.00 4.00 960.00 8.42 143.49 568.01 4.37 100.10 Time: 09:51:09 PM avg-cpu: %user %nice %system %iowait %steal %idle 13.28 0.00 2.76 21.30 0.00 62.66 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 21.00 1.00 19.00 16.00 192.00 20.80 0.06 3.55 1.30 2.60 sdb 0.00 36.00 28.00 181.00 124.00 884.00 9.65 121.16 617.31 4.79 100.10 Time: 09:51:10 PM avg-cpu: %user %nice %system %iowait %steal %idle 4.74 0.00 1.50 25.19 0.00 68.58 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 20.00 3.00 15.00 12.00 136.00 16.44 0.17 7.11 3.11 5.60 sdb 0.00 0.00 103.00 60.00 544.00 248.00 9.72 52.35 545.23 6.14 100.10 Time: 09:51:11 PM avg-cpu: %user %nice %system %iowait %steal %idle 1.24 0.00 1.24 25.31 0.00 72.21 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 75.00 4.00 28.00 16.00 416.00 27.00 0.08 3.72 2.03 6.50 sdb 2.00 9.00 124.00 17.00 616.00 104.00 10.21 3.73 213.73 7.10 100.10 Time: 09:51:12 PM avg-cpu: %user %nice %system %iowait %steal %idle 1.00 0.00 0.75 24.31 0.00 73.93 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 24.00 1.00 9.00 4.00 132.00 27.20 0.01 1.20 1.10 1.10 sdb 4.00 40.00 103.00 48.00 528.00 212.00 9.80 105.21 104.32 6.64 100.20 Time: 09:51:13 PM avg-cpu: %user %nice %system %iowait %steal %idle 2.50 0.00 1.75 23.25 0.00 72.50 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 125.74 3.96 46.53 15.84 689.11 27.92 0.20 4.06 2.41 12.18 sdb 2.97 0.00 91.09 84.16 419.80 471.29 10.17 85.85 590.78 5.66 99.11 Time: 09:51:14 PM avg-cpu: %user %nice %system %iowait %steal %idle 0.75 0.00 0.50 24.94 0.00 73.82 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 88.00 1.00 7.00 4.00 380.00 96.00 0.04 4.38 3.00 2.40 sdb 3.00 7.00 111.00 44.00 540.00 208.00 9.65 18.58 581.79 6.46 100.10 Time: 09:51:15 PM avg-cpu: %user %nice %system %iowait %steal %idle 11.03 0.00 3.26 26.57 0.00 59.15 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 145.00 7.00 53.00 28.00 792.00 27.33 0.15 2.50 1.55 9.30 sdb 1.00 0.00 155.00 0.00 800.00 0.00 10.32 2.85 18.63 6.46 100.10 [root@server1 tmp]# MySQL Show Full Processlist mysql> show full processlist; +------+---------------+-----------+-----------------------+----------------+------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+---------------+-----------+-----------------------+----------------+------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | DB_USER_ONE | localhost | DB_ONE | Query | 3 | waiting for handler insert | INSERT DELAYED INTO defers (mailtime,msgid,email,transport_method,message,host,ip,router,deliveryuser,deliverydomain) VALUES(FROM_UNIXTIME('1355879748'),'1TivwL-0003y8-8l','[email protected]','remote_smtp','SMTP error from remote mail server after initial connection: host mx1.mail.tw.yahoo.com [203.188.197.119]: 421 4.7.0 [TS01] Messages from 75.125.90.146 temporarily deferred due to user complaints - 4.16.55.1; see http://postmaster.yahoo.com/421-ts01.html','mx1.mail.tw.yahoo.com','203.188.197.119','lookuphost','','') | | 2 | DELAYED | localhost | DB_ONE | Delayed insert | 52 | insert | | | 3 | DELAYED | localhost | DB_ONE | Delayed insert | 68 | insert | | | 911 | DELAYED | localhost | DB_ONE | Delayed insert | 99 | Waiting for INSERT | | | 993 | DB_USER_TWO | localhost | DB_TWO | Sleep | 832 | | NULL | | 994 | DB_USER_ONE | localhost | DB_ONE | Query | 185 | Locked | delete from failures where FROM_UNIXTIME(UNIX_TIMESTAMP(NOW())-1296000) > mailtime | | 1102 | DB_USER_THREE | localhost | DB_THREE | Query | 29 | NULL | commit | | 1249 | DB_USER_FOUR | localhost | DB_FOUR | Query | 13 | NULL | commit | | 1263 | root | localhost | DB_FIVE | Query | 0 | NULL | show full processlist | | 1264 | DB_USER_SIX | localhost | DB_SIX | Query | 3 | NULL | commit | +------+---------------+-----------+-----------------------+----------------+------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 10 rows in set (0.00 sec)

    Read the article

  • Brendan Gregg's "Systems Performance: Enterprise and the Cloud"

    - by user12608550
    Long ago, the prerequisite UNIX performance book was Adrian Cockcroft's 1994 classic, Sun Performance and Tuning: Sparc & Solaris, later updated in 1998 as Java and the Internet. As Solaris evolved to include the invaluable DTrace observability features, new essential performance references have been published, such as Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris (2006)  by McDougal, Mauro, and Gregg, and DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD (2011), also by Mauro and Gregg. Much has occurred in Solaris Land since those books appeared, notably Oracle's acquisition of Sun Microsystems in 2010 and the demise of the OpenSolaris community. But operating system technologies have continued to improve markedly in recent years, driven by stunning advances in multicore processor architecture, virtualization, and the massive scalability requirements of cloud computing. A new performance reference was needed, and I eagerly waited for something that thoroughly covered modern, distributed computing performance issues from the ground up. Well, there's a new classic now, authored yet again by Brendan Gregg, former Solaris kernel engineer at Sun and now Lead Performance Engineer at Joyent. Systems Performance: Enterprise and the Cloud is a modern, very comprehensive guide to general system performance principles and practices, as well as a highly detailed reference for specific UNIX and Linux observability tools used to examine and diagnose operating system behaviour.  It provides thorough definitions of terms, explains performance diagnostic Best Practices and "Worst Practices" (called "anti-methods"), and covers key observability tools including DTrace, SystemTap, and all the traditional UNIX utilities like vmstat, ps, iostat, and many others. The book focuses on operating system performance principles and expands on these with respect to Linux (Ubuntu, Fedora, and CentOS are cited), and to Solaris and its derivatives [1]; it is not directed at any one OS so it is extremely useful as a broad performance reference. The author goes beyond the intricacies of performance analysis and shows how to interpret and visualize statistical information gathered from the observability tools.  It's often difficult to extract understanding from voluminous rows of text output, and techniques are provided to assist with summarizing, visualizing, and interpreting the performance data. Gregg includes myriad useful references from the system performance literature, including a "Who's Who" of contributors to this great body of diagnostic tools and methods. This outstanding book should be required reading for UNIX and Linux system administrators as well as anyone charged with diagnosing OS performance issues.  Moreover, the book can easily serve as a textbook for a graduate level course in operating systems [2]. [1] Solaris 11, of course, and Joyent's SmartOS (developed from OpenSolaris) [2] Gregg has taught system performance seminars for many years; I have also taught such courses...this book would be perfect for the OS component of an advanced CS curriculum.

    Read the article

  • ??????????

    - by Allen Gao
    ???????RAC??????????????????10gR2?11gR1.??????????????CRS???????1.ocssd : ???????????(Node Monitoring)????(Group Management),??CRS????????????????????????,????????????(network heartbeat)?????(disk heartbeat)???,?????????????????????,????????????,??????????????????,?????node kill escalation(???11gR1????????),??????????????????????????(reboot time,???3?)????????:ocssd.bin??????????????????????????,??????????????????????????????,misscount(???30?,??????????????600?),????????????,???????????????????,?????????????2???,??????,??????????????,?????????????????????:ocssd.bin?????????????(Voting File)??????????,?????????????????????????????,disk timeou(???200?),?????????????????????,CRS???[N/2]+1????????,??N??????,??????2.oclsomon:????????ocssd????,????ocssd.bin??????,???????3.oprocd:??????Linux?Unix??,????????????????????????????????,?????????:????????????init.cssd?????????????????????????1.??????2.<crs???>/log/<????>/cssd/ocssd.log3.oprocd.log(/etc/oracle/oprocd/*.log.* ? /var/opt/oracle/oprocd/*.log.*)4.<crs???>/log/<????>/cssd/oclsomon/oclsomon.log5. Oracle OSWatcher ????????????????????1.?ocssd???????????ocssd.log???????,????????????????????????????????,???????,OSW??(traceroute???),???????(cluster interconnect)??????,?????????[ CSSD]2012-03-02 23:56:18.749 [3086] >WARNING: clssnmPollingThread: node <node_name> at 50% heartbeat fatal, eviction in 14.494 seconds[ CSSD]2012-03-02 23:56:25.749 [3086] >WARNING: clssnmPollingThread: node <node_name> at 75% heartbeat fatal, eviction in 7.494 seconds[ CSSD]2012-03-02 23:56:32.749 [3086] >WARNING: clssnmPollingThread: node <node_name>at 90% heartbeat fatal, eviction in 0.494 seconds[CSSD]2012-03-02 23:56:33.243 [3086] >TRACE:   clssnmPollingThread: Eviction started for node <node_name>, flags 0x040d, state 3, wt4c 0[CSSD]2012-03-02 23:56:33.243 [3086] >TRACE:   clssnmDiscHelper: <node_name>, node(4) connection failed, con (1128a5530), probe(0)[CSSD]2012-03-02 23:56:33.243 [3086] >TRACE:   clssnmDiscHelper: node 4 clean up, con (1128a5530), init state 5, cur state 5[CSSD]2012-03-02 23:56:33.243 [3600] >TRACE:   clssnmDoSyncUpdate: Initiating sync 196446491[CSSD]2012-03-02 23:56:33.243 [3600] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (27000)ms??:???????ocssd.log?????????????????????,?????????????????????ocssd.log???????,??????????????????????????????,OSWatcher??(iostat???),???i/o????????,?????????2010-08-13 18:34:37.423: [    CSSD][150477728]clssnmvDiskOpen: Opening /dev/sdb82010-08-13 18:34:37.423: [    CLSF][150477728]Opened hdl:0xf4336530 for dev:/dev/sdb8:2010-08-13 18:34:37.429: [   SKGFD][150477728]ERROR: -9(Error 27072, OS Error (Linux Error: 5: Input/output errorAdditional information: 4Additional information: 720913Additional information: -1))2010-08-13 18:34:37.429: [    CSSD][150477728](:CSSNM00060: )clssnmvReadBlocks: read failed at offset 17 of /dev/sdb82010-08-13 18:34:38.205: [    CSSD][4110736288](:CSSNM00058: )clssnmvDiskCheck: No I/O completions for 200880 ms for voting file /dev/sdb8)2010-08-13 18:34:38.206: [    CSSD][4110736288](:CSSNM00018: )clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 12010-08-13 18:34:38.206: [    CSSD][4110736288]###################################2010-08-13 18:34:38.206: [    CSSD][4110736288]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread 2010-08-13 18:34:38.206: [    CSSD][4110736288]###################################2. ?oclsomon???????????oclsomon.log ?????,??????????ocssd????,??ocssd??????(RT)???,?????????????(?cpu)??,?????????????,OSW??(vmstat,top???),?????????3.?oprocd???????????oprocd?????????,?????????oprocd????? Dec 21 16:15:30.369857 | LASTGASP | AlarmHandler:  timeout(2312 msec) exceeds interval(1000 msec)+margin(500 msec).   Rebooting NOW.??oprocd?????????????????????,?????ntp(?????????),??diagwait=13 ????????,??,?????????????,??????CRS,???????????????,??????????????oprocd????,??,?????OSWatcher??(vmstat,top???),??????????????????????????????,????????????????? ???????,??????MOS ???Note 265769.1 :Troubleshooting 10g and 11.1 Clusterware RebootsNote 1050693.1 :Troubleshooting 11.2 Clusterware Node Evictions (Reboots)

    Read the article

  • Munin node not listing any plugins on new Fedora 14 installation

    - by Dave Forgac
    I have just installed munin-node from the base repo on Fedora 14 and then started it. I found that my munin server is not able to collect data from this node so I tried connecting via telnet to test. When connecting via telnet I see that no plugins are listed: [dave@host ~]# telnet localhost 4949 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. # munin node at host.example.com list quit Connection closed by foreign host. [dave@host ~]# I did not modify anything after the installation. The munin-node.conf is allowing connections from 127.0.0.1 and the default set of plugins in /etc/munin/plugins/ are symlinked to the plugins in /usr/share/munin/plugins/. Here is the working output of the telnet test of the 'list' command should look like (this is on a Fedora 13 host): [dave@www ~]$ telnet localhost 4949 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. # munin node at www.example.com list apache_accesses apache_processes apache_volume cpu df df_inode entropy forks fw_packets if_err_eth0 if_err_eth1 if_eth0 if_eth1 interrupts iostat iostat_ios irqstats load memory munin_stats mysql_ mysql_bytes mysql_innodb mysql_queries mysql_slowqueries mysql_threads netstat open_files open_inodes postfix_mailqueue postfix_mailvolume proc_pri processes swap threads uptime users vmstat yum quit Connection closed by foreign host. [dave@www ~]$ Edited to show output of munin-node-configure: [root@host ~]# munin-node-configure Plugin | Used | Extra information ------ | ---- | ----------------- acpi | no | amavis | no | ... http_loadtime | no | if_ | yes | eth1 eth0 if_err_ | yes | eth0 eth1 ifx_concurrent_sessions_ | no | interrupts | yes | ... uptime | yes | users | yes | varnish_ | no | vserver_resources | no | yum | yes | zimbra_ | no | Any suggestions on what to check next?

    Read the article

  • no disk io but iowait very high

    - by Dan
    there is no disk io going results of iotop Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO< COMMAND 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init [3] 1930 be/4 named 0.00 B/s 0.00 B/s 0.00 % 0.00 % named -u ~d/run-root 1931 be/4 named 0.00 B/s 0.00 B/s 0.00 % 0.00 % named -u ~d/run-root 1932 be/4 named 0.00 B/s 0.00 B/s 0.00 % 0.00 % named -u ~d/run-root 1933 be/4 named 0.00 B/s 0.00 B/s 0.00 % 0.00 % named -u ~d/run-root 1810 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % sh /usr/b~user=mysql 9795 be/4 apache 0.00 B/s 0.00 B/s 0.00 % 0.00 % httpd 8004 be/4 apache 0.00 B/s 0.00 B/s 0.00 % 0.00 % httpd 3226 be/4 postfix 0.00 B/s 0.00 B/s 0.00 % 0.00 % tlsmgr -l -t unix -u 8154 be/4 apache 0.00 B/s 0.00 B/s 0.00 % 0.00 % httpd 9759 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % find -name php.ini 9249 be/4 apache 0.00 B/s 0.00 B/s 0.00 % 0.00 % httpd 1756 be/4 postfix 0.00 B/s 0.00 B/s 0.00 % 0.00 % psa-pc-re~@localhost 1863 be/4 mysql 0.00 B/s 0.00 B/s 0.00 % 0.00 % mysqld --~mysql.sock 3123 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % crond 1758 be/4 postfix 0.00 B/s 0.00 B/s 0.00 % 0.00 % psa-pc-re~@localhost 1865 be/4 mysql 0.00 B/s 0.00 B/s 0.00 % 0.00 % mysqld --~mysql.sock 1592 be/4 sw-cp-se 0.00 B/s 0.00 B/s 0.00 % 0.00 % sw-cp-ser~ver/config 7612 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % sshd: root@pts/0 7614 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % sftp-server 7615 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % -bash 1602 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % sshd 8003 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % httpd but iowait very high ? iostat report avg-cpu: %user %nice %system %iowait %steal %idle 0.83 0.00 0.13 13.83 0.00 85.20 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn server runs like a snail what could be wrong here ? thanks

    Read the article

  • Simple end-to-end load and bottleneck monitoring for DB-based web sites

    - by T.J. Crowder
    What tools do you use / would you recommend for monitoring a Linux-based, DB-based website's servers for bottlenecks and load? The obvious goal being to know when growth has gotten to the point where it's necessary to scale up (or out) one or more of the bits and pieces because the current system won't be managing the load if an observed trend continues. I'm looking for general recommendations based on standard Linux load metrics, disk I/O metrics, network I/O metrics, etc., but if specifics are helpful: It'll be Tomcat6 using APR (possibly with a Varnish or similar caching and balancing front-end), MySQL, and either Ubuntu 8.04 LTS or 10.04 LTS depending on timing. I know about top, vmstat, iostat, bwmon and the like that collect and parse info from the /proc file system (et. al.); and obviously MySQL provides a lot of queriable performance information. I could use those directly, probably automating periodic monitoring logs with scripts and such. But I have a suspicion that I'd be reinventing a wheel... For example, Hyperic HQ seems to be along the lines of what I'm looking for. Others? Meta: I tend to think of "recommendation" questions as needing to be CW because there's no one right answer, but I see a lot of these here that aren't CWs, so I haven't marked it as one. I'll happily do so if enough people think I should.

    Read the article

  • Can enabling a RAID controller's writeback cache harm overall performance?

    - by Nathan O'Sullivan
    I have an 8 drive RAID 10 setup connected to an Adaptec 5805Z, running Centos 5.5 and deadline scheduler. A basic dd read test shows 400mb/sec, and a basic dd write test shows about the same. When I run the two simultaneously, I see the read speed drop to ~5mb/sec while the write speed stays at more or less the same 400mb/sec. The output of iostat -x as you would expect, shows that very few read transactions are being executed while the disk is bombarded with writes. If i turn the controller's writeback cache off, I dont see a 50:50 split but I do see a marked improvement, somewhere around 100mb/s reads and 300mb/s writes. I've also found if I lower the nr_requests setting on the drive's queue (somewhere around 8 seems optimal) I can end up with 150mb/sec reads and 150mb/sec writes; ie. a reduction in total throughput but certainly more suitable for my workload. Is this a real phenomenon? Or is my synthetic test too simplistic? The reason this could happen seems clear enough, when the scheduler switches from reads to writes, it can run heaps of write requests because they all just land in the controllers cache but must be carried out at some point. I would guess the actual disk writes are occuring when the scheduler starts trying to perform reads again, resulting in very few read requests being executed. This seems a reasonable explanation, but it also seems like a massive drawback to using writeback cache on an system with non-trivial write loads. I've been searching for discussions around this all afternoon and found nothing. What am I missing?

    Read the article

  • Server slowdown

    - by Clinton Bosch
    I have a GWT application running on Tomcat on a cloud linux(Ubuntu) server, recently I released a new version of the application and suddenly my server response times have gone from 500ms average to 15s average. I have run every monitoring tool I know. iostat says my disks are 0.03% utilised mysqltuner.pl says I am OK other see below top says my processor is 99% idle and load average: 0.20, 0.31, 0.33 memory usage is 50% (-/+ buffers/cache: 3997 3974) mysqltuner output [OK] Logged in using credentials from debian maintenance account. -------- General Statistics -------------------------------------------------- [--] Skipped version check for MySQLTuner script [OK] Currently running supported MySQL version 5.1.63-0ubuntu0.10.04.1-log [OK] Operating on 64-bit architecture -------- Storage Engine Statistics ------------------------------------------- [--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster [--] Data in MyISAM tables: 370M (Tables: 52) [--] Data in InnoDB tables: 697M (Tables: 1749) [!!] Total fragmented tables: 1754 -------- Security Recommendations ------------------------------------------- [OK] All database users have passwords assigned -------- Performance Metrics ------------------------------------------------- [--] Up for: 19h 25m 41s (1M q [28.122 qps], 1K conn, TX: 2B, RX: 1B) [--] Reads / Writes: 98% / 2% [--] Total buffers: 1.0G global + 2.7M per thread (500 max threads) [OK] Maximum possible memory usage: 2.4G (30% of installed RAM) [OK] Slow queries: 0% (1/1M) [OK] Highest usage of available connections: 34% (173/500) [OK] Key buffer size / total MyISAM indexes: 16.0M/279.0K [OK] Key buffer hit rate: 99.9% (50K cached / 40 reads) [OK] Query cache efficiency: 61.4% (844K cached / 1M selects) [!!] Query cache prunes per day: 553779 [OK] Sorts requiring temporary tables: 0% (0 temp sorts / 34K sorts) [OK] Temporary tables created on disk: 4% (4K on disk / 102K total) [OK] Thread cache hit rate: 84% (185 created / 1K connections) [!!] Table cache hit rate: 0% (256 open / 27K opened) [OK] Open file limit used: 0% (20/2K) [OK] Table locks acquired immediately: 100% (692K immediate / 692K locks) [OK] InnoDB data size / buffer pool: 697.2M/1.0G -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance MySQL started within last 24 hours - recommendations may be inaccurate Enable the slow query log to troubleshoot bad queries Increase table_cache gradually to avoid file descriptor limits Variables to adjust: query_cache_size (> 16M) table_cache (> 256)

    Read the article

< Previous Page | 1 2 3  | Next Page >