vmstat - Page 2 - Developer IT

Brendan Gregg's "Systems Performance: Enterprise and the Cloud"

- by user12608550

Long ago, the prerequisite UNIX performance book was Adrian Cockcroft's 1994 classic, Sun Performance and Tuning: Sparc & Solaris, later updated in 1998 as Java and the Internet. As Solaris evolved to include the invaluable DTrace observability features, new essential performance references have been published, such as Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris (2006) by McDougal, Mauro, and Gregg, and DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD (2011), also by Mauro and Gregg. Much has occurred in Solaris Land since those books appeared, notably Oracle's acquisition of Sun Microsystems in 2010 and the demise of the OpenSolaris community. But operating system technologies have continued to improve markedly in recent years, driven by stunning advances in multicore processor architecture, virtualization, and the massive scalability requirements of cloud computing. A new performance reference was needed, and I eagerly waited for something that thoroughly covered modern, distributed computing performance issues from the ground up. Well, there's a new classic now, authored yet again by Brendan Gregg, former Solaris kernel engineer at Sun and now Lead Performance Engineer at Joyent. Systems Performance: Enterprise and the Cloud is a modern, very comprehensive guide to general system performance principles and practices, as well as a highly detailed reference for specific UNIX and Linux observability tools used to examine and diagnose operating system behaviour. It provides thorough definitions of terms, explains performance diagnostic Best Practices and "Worst Practices" (called "anti-methods"), and covers key observability tools including DTrace, SystemTap, and all the traditional UNIX utilities like vmstat, ps, iostat, and many others. The book focuses on operating system performance principles and expands on these with respect to Linux (Ubuntu, Fedora, and CentOS are cited), and to Solaris and its derivatives [1]; it is not directed at any one OS so it is extremely useful as a broad performance reference. The author goes beyond the intricacies of performance analysis and shows how to interpret and visualize statistical information gathered from the observability tools. It's often difficult to extract understanding from voluminous rows of text output, and techniques are provided to assist with summarizing, visualizing, and interpreting the performance data. Gregg includes myriad useful references from the system performance literature, including a "Who's Who" of contributors to this great body of diagnostic tools and methods. This outstanding book should be required reading for UNIX and Linux system administrators as well as anyone charged with diagnosing OS performance issues. Moreover, the book can easily serve as a textbook for a graduate level course in operating systems [2]. [1] Solaris 11, of course, and Joyent's SmartOS (developed from OpenSolaris) [2] Gregg has taught system performance seminars for many years; I have also taught such courses...this book would be perfect for the OS component of an advanced CS curriculum.

Read the article

rkhunter 1.4 different results than version before?

- by dschinn1001

with rkhunter version before ubuntu-update from 12.04 to 12.10 I had NOT these warnings like listed here: Performing file properties checks Checking for prerequisites [ Warning ] /usr/sbin/adduser [ Warning ] /usr/sbin/chroot [ Warning ] /usr/sbin/cron [ Warning ] /usr/sbin/groupadd [ Warning ] /usr/sbin/groupdel [ Warning ] /usr/sbin/groupmod [ Warning ] /usr/sbin/grpck [ Warning ] /usr/sbin/nologin [ Warning ] /usr/sbin/pwck [ Warning ] /usr/sbin/rsyslogd [ Warning ] /usr/sbin/tcpd [ Warning ] /usr/sbin/useradd [ Warning ] /usr/sbin/userdel [ Warning ] /usr/sbin/usermod [ Warning ] /usr/sbin/vipw [ Warning ] /usr/bin/awk [ Warning ] /usr/bin/basename [ Warning ] /usr/bin/chattr [ Warning ] /usr/bin/curl [ Warning ] /usr/bin/cut [ Warning ] /usr/bin/diff [ Warning ] /usr/bin/dirname [ Warning ] /usr/bin/dpkg [ Warning ] /usr/bin/dpkg-query [ Warning ] /usr/bin/du [ Warning ] /usr/bin/env [ Warning ] /usr/bin/file [ Warning ] /usr/bin/find [ Warning ] /usr/bin/GET [ Warning ] /usr/bin/groups [ Warning ] /usr/bin/head [ Warning ] /usr/bin/id [ Warning ] /usr/bin/killall [ Warning ] /usr/bin/last [ Warning ] /usr/bin/lastlog [ Warning ] /usr/bin/ldd [ Warning ] /usr/bin/less [ Warning ] /usr/bin/locate [ Warning ] /usr/bin/logger [ Warning ] /usr/bin/lsattr [ Warning ] /usr/bin/lsof [ Warning ] /usr/bin/lynx [ Warning ] /usr/bin/mail [ Warning ] /usr/bin/md5sum [ Warning ] /usr/bin/mlocate [ Warning ] /usr/bin/newgrp [ Warning ] /usr/bin/passwd [ Warning ] /usr/bin/perl [ Warning ] /usr/bin/pgrep [ Warning ] /usr/bin/pkill [ Warning ] /usr/bin/pstree [ Warning ] /usr/bin/rkhunter [ Warning ] /usr/bin/rpm [ Warning ] /usr/bin/runcon [ Warning ] /usr/bin/sha1sum [ Warning ] /usr/bin/sha224sum [ Warning ] /usr/bin/sha256sum [ Warning ] /usr/bin/sha384sum [ Warning ] /usr/bin/sha512sum [ Warning ] /usr/bin/size [ Warning ] /usr/bin/sort [ Warning ] /usr/bin/stat [ Warning ] /usr/bin/strace [ Warning ] /usr/bin/strings [ Warning ] /usr/bin/sudo [ Warning ] /usr/bin/tail [ Warning ] /usr/bin/test [ Warning ] /usr/bin/top [ Warning ] /usr/bin/touch [ Warning ] /usr/bin/tr [ Warning ] /usr/bin/uniq [ Warning ] /usr/bin/users [ Warning ] /usr/bin/vmstat [ Warning ] /usr/bin/w [ Warning ] /usr/bin/watch [ Warning ] /usr/bin/wc [ Warning ] /usr/bin/wget [ Warning ] /usr/bin/whatis [ Warning ] /usr/bin/whereis [ Warning ] /usr/bin/which [ Warning ] /usr/bin/who [ Warning ] /usr/bin/whoami [ Warning ] /usr/bin/unhide.rb [ Warning ] /usr/bin/gawk [ Warning ] /usr/bin/lwp-request [ Warning ] /usr/bin/heirloom-mailx [ Warning ] /usr/bin/w.procps [ Warning ] /sbin/depmod [ Warning ] /sbin/fsck [ Warning ] /sbin/ifconfig [ Warning ] /sbin/ifdown [ Warning ] /sbin/ifup [ Warning ] /sbin/init [ Warning ] /sbin/insmod [ Warning ] /sbin/ip [ Warning ] /sbin/lsmod [ Warning ] /sbin/modinfo [ Warning ] /sbin/modprobe [ Warning ] /sbin/rmmod [ Warning ] /sbin/route [ Warning ] /sbin/runlevel [ Warning ] /sbin/sulogin [ Warning ] /sbin/sysctl [ Warning ] /bin/bash [ Warning ] /bin/cat [ Warning ] /bin/chmod [ Warning ] /bin/chown [ Warning ] /bin/cp [ Warning ] /bin/date [ Warning ] /bin/df [ Warning ] /bin/dmesg [ Warning ] /bin/echo [ Warning ] /bin/ed [ Warning ] /bin/egrep [ Warning ] /bin/fgrep [ Warning ] /bin/fuser [ Warning ] /bin/grep [ Warning ] /bin/ip [ Warning ] /bin/kill [ Warning ] /bin/less [ Warning ] /bin/login [ Warning ] /bin/ls [ Warning ] /bin/lsmod [ Warning ] /bin/mktemp [ Warning ] /bin/more [ Warning ] /bin/mount [ Warning ] /bin/mv [ Warning ] /bin/netstat [ Warning ] /bin/ping [ Warning ] /bin/ps [ Warning ] /bin/pwd [ Warning ] /bin/readlink [ Warning ] /bin/sed [ Warning ] /bin/sh [ Warning ] /bin/su [ Warning ] /bin/touch [ Warning ] /bin/uname [ Warning ] /bin/which [ Warning ] /bin/dash [ Warning ] It seems that rkhunter 1.4 is oversensitive somehow about changed bin-files ? chkrootkit finds nothing and no warnings too.

Read the article

ZFS for Database Log Files

- by user12620111

I've been troubled by drop outs in CPU usage in my application server, characterized by the CPUs suddenly going from close to 90% CPU busy to almost completely CPU idle for a few seconds. Here is an example of a drop out as shown by a snippet of vmstat data taken while the application server is under a heavy workload. # vmstat 1 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s3 s4 s5 s6 in sy cs us sy id 1 0 0 130160176 116381952 0 16 0 0 0 0 0 0 0 0 0 207377 117715 203884 70 21 9 12 0 0 130160160 116381936 0 25 0 0 0 0 0 0 0 0 0 200413 117162 197250 70 20 9 11 0 0 130160176 116381920 0 16 0 0 0 0 0 0 1 0 0 203150 119365 200249 72 21 7 8 0 0 130160176 116377808 0 19 0 0 0 0 0 0 0 0 0 169826 96144 165194 56 17 27 0 0 0 130160176 116377800 0 16 0 0 0 0 0 0 0 0 1 10245 9376 9164 2 1 97 0 0 0 130160176 116377792 0 16 0 0 0 0 0 0 0 0 2 15742 12401 14784 4 1 95 0 0 0 130160176 116377776 2 16 0 0 0 0 0 0 1 0 0 19972 17703 19612 6 2 92 14 0 0 130160176 116377696 0 16 0 0 0 0 0 0 0 0 0 202794 116793 199807 71 21 8 9 0 0 130160160 116373584 0 30 0 0 0 0 0 0 18 0 0 203123 117857 198825 69 20 11 This behavior occurred consistently while the application server was processing synthetic transactions: HTTP requests from JMeter running on an external machine. I explored many theories trying to explain the drop outs, including: Unexpected JMeter behavior Network contention Java Garbage Collection Application Server thread pool problems Connection pool problems Database transaction processing Database I/O contention Graphing the CPU %idle led to a breakthrough: Several of the drop outs were 30 seconds apart. With that insight, I went digging through the data again and looking for other outliers that were 30 seconds apart. In the database server statistics, I found spikes in the iostat "asvc_t" (average response time of disk transactions, in milliseconds) for the disk drive that was being used for the database log files. Here is an example: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 2053.6 0.0 8234.3 0.0 0.2 0.0 0.1 0 24 c3t60080E5...F4F6d0s0 0.0 2162.2 0.0 8652.8 0.0 0.3 0.0 0.1 0 28 c3t60080E5...F4F6d0s0 0.0 1102.5 0.0 10012.8 0.0 4.5 0.0 4.1 0 69 c3t60080E5...F4F6d0s0 0.0 74.0 0.0 7920.6 0.0 10.0 0.0 135.1 0 100 c3t60080E5...F4F6d0s0 0.0 568.7 0.0 6674.0 0.0 6.4 0.0 11.2 0 90 c3t60080E5...F4F6d0s0 0.0 1358.0 0.0 5456.0 0.0 0.6 0.0 0.4 0 55 c3t60080E5...F4F6d0s0 0.0 1314.3 0.0 5285.2 0.0 0.7 0.0 0.5 0 70 c3t60080E5...F4F6d0s0 Here is a little more information about my database configuration: The database and application server were running on two different SPARC servers. Storage for the database was on a storage array connected via 8 gigabit Fibre Channel Data storage and log file were on different physical disk drives Reliable low latency I/O is provided by battery backed NVRAM Highly available: Two Fibre Channel links accessed via MPxIO Two Mirrored cache controllers The log file physical disks were mirrored in the storage device Database log files on a ZFS Filesystem with cutting-edge technologies, such as copy-on-write and end-to-end checksumming Why would I be getting service time spikes in my high-end storage? First, I wanted to verify that the database log disk service time spikes aligned with the application server CPU drop outs, and they did: At first, I guessed that the disk service time spikes might be related to flushing the write through cache on the storage device, but I was unable to validate that theory. After searching the WWW for a while, I decided to try using a separate log device: # zpool add ZFS-db-41 log c3t60080E500017D55C000015C150A9F8A7d0 The ZFS log device is configured in a similar manner as described above: two physical disks mirrored in the storage array. This change to the database storage configuration eliminated the application server CPU drop outs: Here is the zpool configuration: # zpool status ZFS-db-41 pool: ZFS-db-41 state: ONLINE scan: none requested config: NAME STATE ZFS-db-41 ONLINE c3t60080E5...F4F6d0 ONLINE logs c3t60080E5...F8A7d0 ONLINE Now, the I/O spikes look like this: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1053.5 0.0 4234.1 0.0 0.8 0.0 0.7 0 75 c3t60080E5...F8A7d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1131.8 0.0 4555.3 0.0 0.8 0.0 0.7 0 76 c3t60080E5...F8A7d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1167.6 0.0 4682.2 0.0 0.7 0.0 0.6 0 74 c3t60080E5...F8A7d0s0 0.0 162.2 0.0 19153.9 0.0 0.7 0.0 4.2 0 12 c3t60080E5...F4F6d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1247.2 0.0 4992.6 0.0 0.7 0.0 0.6 0 71 c3t60080E5...F8A7d0s0 0.0 41.0 0.0 70.0 0.0 0.1 0.0 1.6 0 2 c3t60080E5...F4F6d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1241.3 0.0 4989.3 0.0 0.8 0.0 0.6 0 75 c3t60080E5...F8A7d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1193.2 0.0 4772.9 0.0 0.7 0.0 0.6 0 71 c3t60080E5...F8A7d0s0 We can see the steady flow of 4k writes to the ZIL device from O_SYNC database log file writes. The spikes are from flushing the transaction group. Like almost all problems that I run into, once I thoroughly understand the problem, I find that other people have documented similar experiences. Thanks to all of you who have documented alternative approaches. Saved for another day: now that the problem is obvious, I should try "zfs:zfs_immediate_write_sz" as recommended in the ZFS Evil Tuning Guide. References: The ZFS Intent Log Solaris ZFS, Synchronous Writes and the ZIL Explained ZFS Evil Tuning Guide: Cache Flushes ZFS Evil Tuning Guide: Tuning ZFS for Database Performance

Read the article

Problem: Munin Graph

- by Pablo

I've been trying to install Munin for 15 days, I looked for information, analized logs, I even deleted and reinstalled Munin using YUM. I'm hosted at Media Temple on a VPS with CentOS. The problem is still there and It's driving me nuts. Graphics are shown as following: http://imageshack.us/photo/my-images/833/capturadepantalla201106u.png/ This is the configuration of my munin.conf file dbdir /var/lib/munin htmldir /var/www/munin logdir /var/log/munin rundir /var/run/munin [localhost] address **.**.***.*** #IP VPS This is the configuration of my munin-node.conf file log_level 4 log_file /var/log/munin/munin-node.log port 4949 pid_file /var/run/munin/munin-node.pid background 1 setseid 1 # Which port to bind to; host * user root group root setsid yes # Regexps for files to ignore ignore_file ~$ ignore_file \.bak$ ignore_file %$ ignore_file \.dpkg-(tmp|new|old|dist)$ ignore_file \.rpm(save|new)$ allow ^127\.0\.0\.1$ Thanks so much, I appreciate all the answers UPDATE munin-graph.log Jun 22 16:30:02 - Starting munin-graph Jun 22 16:30:02 - Processing domain: localhost Jun 22 16:30:02 - Graphed service : open_inodes (0.14 sec * 4) Jun 22 16:30:02 - Graphed service : sendmail_mailtraffic (0.10 sec * 4) Jun 22 16:30:02 - Graphed service : apache_processes (0.12 sec * 4) Jun 22 16:30:02 - Graphed service : entropy (0.10 sec * 4) Jun 22 16:30:02 - Graphed service : sendmail_mailstats (0.14 sec * 4) Jun 22 16:30:02 - Graphed service : processes (0.14 sec * 4) Jun 22 16:30:03 - Graphed service : apache_accesses (0.27 sec * 4) Jun 22 16:30:03 - Graphed service : apache_volume (0.15 sec * 4) Jun 22 16:30:03 - Graphed service : df (0.21 sec * 4) Jun 22 16:30:03 - Graphed service : netstat (0.19 sec * 4) Jun 22 16:30:03 - Graphed service : interrupts (0.14 sec * 4) Jun 22 16:30:03 - Graphed service : swap (0.14 sec * 4) Jun 22 16:30:04 - Graphed service : load (0.11 sec * 4) Jun 22 16:30:04 - Graphed service : sendmail_mailqueue (0.13 sec * 4) Jun 22 16:30:04 - Graphed service : cpu (0.21 sec * 4) Jun 22 16:30:04 - Graphed service : df_inode (0.16 sec * 4) Jun 22 16:30:04 - Graphed service : open_files (0.16 sec * 4) Jun 22 16:30:04 - Graphed service : forks (0.13 sec * 4) Jun 22 16:30:05 - Graphed service : memory (0.26 sec * 4) Jun 22 16:30:05 - Graphed service : nfs_client (0.36 sec * 4) Jun 22 16:30:05 - Graphed service : vmstat (0.10 sec * 4) Jun 22 16:30:05 - Processed node: localhost (3.45 sec) Jun 22 16:30:05 - Processed domain: localhost (3.45 sec) Jun 22 16:30:05 - Munin-graph finished (3.46 sec)

Read the article

High load average, low CPU and IO (Centos 5.7)

- by Ben

A Drupal 7 site with CiviCRM, after running smoothly for a year on a 1&1 VPS suddenly became unresponsive. Now pages eventually load, but can take more than a minute. Looking at resource use in Virtuozzo, the load average carries a warning, and has remained above 1. While I understand this isn't particularly high, this is a change from when the site was working. Here is a typical snapshot of top: top - 03:10:32 up 3:21, 1 user, load average: 1.16, 1.22, 1.30 Tasks: 43 total, 1 running, 42 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.1%sy, 0.1%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2097152k total, 1015112k used, 1082040k free, 0k buffers Swap: 0k total, 0k used, 0k free, 0k cached CPU idle level never seems to go much below 70%. wa is virtually always at 0. There appears to be lots of free memory. And here is some vmstat output, again showign wa at 0, plenty of free memory, and an idle CPU: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 0 1100872 0 0 0 0 2783 23672 0 538 1 0 99 0 0 1 0 0 1100872 0 0 0 0 0 16 0 101754 0 0 100 0 0 0 0 0 1100872 0 0 0 0 0 17 0 103133 0 0 100 0 0 0 0 0 1100872 0 0 0 0 0 1 0 102080 0 0 100 0 0 1 0 0 1100872 0 0 0 0 0 6 0 99881 0 0 100 0 0 0 0 0 1100872 0 0 0 0 0 1 0 105187 0 0 100 0 0 I've spoken to 1&1 but they don't have any ideas as to what could be causing the high load average. Instead they suggested an upgrade :) I've looked for processes that might be causing this, examined MySQL showprocesslist, and restarted the container with no result. Does anyone have more troubleshooting suggestions or insights?

Read the article

Munin node not listing any plugins on new Fedora 14 installation

- by Dave Forgac

I have just installed munin-node from the base repo on Fedora 14 and then started it. I found that my munin server is not able to collect data from this node so I tried connecting via telnet to test. When connecting via telnet I see that no plugins are listed: [dave@host ~]# telnet localhost 4949 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. # munin node at host.example.com list quit Connection closed by foreign host. [dave@host ~]# I did not modify anything after the installation. The munin-node.conf is allowing connections from 127.0.0.1 and the default set of plugins in /etc/munin/plugins/ are symlinked to the plugins in /usr/share/munin/plugins/. Here is the working output of the telnet test of the 'list' command should look like (this is on a Fedora 13 host): [dave@www ~]$ telnet localhost 4949 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. # munin node at www.example.com list apache_accesses apache_processes apache_volume cpu df df_inode entropy forks fw_packets if_err_eth0 if_err_eth1 if_eth0 if_eth1 interrupts iostat iostat_ios irqstats load memory munin_stats mysql_ mysql_bytes mysql_innodb mysql_queries mysql_slowqueries mysql_threads netstat open_files open_inodes postfix_mailqueue postfix_mailvolume proc_pri processes swap threads uptime users vmstat yum quit Connection closed by foreign host. [dave@www ~]$ Edited to show output of munin-node-configure: [root@host ~]# munin-node-configure Plugin | Used | Extra information ------ | ---- | ----------------- acpi | no | amavis | no | ... http_loadtime | no | if_ | yes | eth1 eth0 if_err_ | yes | eth0 eth1 ifx_concurrent_sessions_ | no | interrupts | yes | ... uptime | yes | users | yes | varnish_ | no | vserver_resources | no | yum | yes | zimbra_ | no | Any suggestions on what to check next?

Read the article

MySQL is killing the server IO.

- by OneOfOne

I manage a fairly large/busy vBulletin forums (running on gigenet cloud), the database is ~ 10 GB (~9 milion posts, ~60 queries per second), lately MySQL have been grinding the disk like there's no tomorrow according to iotop and slowing the site. The last idea I can think of is using replication, but I'm not sure how much that would help and worried about database sync. I'm out of ideas, any tips on how to improve the situation would be highly appreciated. Specs : Debian Lenny 64bit ~12Ghz (6x2GHz) CPU, 7520gb RAM, 160gb disk. Kernel : 2.6.32-4-amd64 mysqld Ver 5.1.54-0.dotdeb.0 for debian-linux-gnu on x86_64 ((Debian)) Other software: vBulletin 3.8.4 memcached 1.2.2 PHP 5.3.5-0.dotdeb.0 (fpm-fcgi) (built: Jan 7 2011 00:07:27) lighttpd/1.4.28 (ssl) - a light and fast webserver PHP and vBulletin are configured to use memcached. MySQL Settings : [mysqld] key_buffer = 128M max_allowed_packet = 16M thread_cache_size = 8 myisam-recover = BACKUP max_connections = 1024 query_cache_limit = 2M query_cache_size = 128M expire_logs_days = 10 max_binlog_size = 100M key_buffer_size = 128M join_buffer_size = 8M tmp_table_size = 16M max_heap_table_size = 16M table_cache = 96 Other : From the cloud's IO chart, we're averaging 100mb/s read. > vmstat procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 9 0 73140 36336 8968 1859160 0 0 42 15 3 2 6 1 89 5 > /etc/init.d/mysql status Threads: 49 Questions: 252139 Slow queries: 164 Opens: 53573 Flush tables: 1 Open tables: 337 Queries per second avg: 61.302. moved from superuser

Read the article

Simple end-to-end load and bottleneck monitoring for DB-based web sites

- by T.J. Crowder

What tools do you use / would you recommend for monitoring a Linux-based, DB-based website's servers for bottlenecks and load? The obvious goal being to know when growth has gotten to the point where it's necessary to scale up (or out) one or more of the bits and pieces because the current system won't be managing the load if an observed trend continues. I'm looking for general recommendations based on standard Linux load metrics, disk I/O metrics, network I/O metrics, etc., but if specifics are helpful: It'll be Tomcat6 using APR (possibly with a Varnish or similar caching and balancing front-end), MySQL, and either Ubuntu 8.04 LTS or 10.04 LTS depending on timing. I know about top, vmstat, iostat, bwmon and the like that collect and parse info from the /proc file system (et. al.); and obviously MySQL provides a lot of queriable performance information. I could use those directly, probably automating periodic monitoring logs with scripts and such. But I have a suspicion that I'd be reinventing a wheel... For example, Hyperic HQ seems to be along the lines of what I'm looking for. Others? Meta: I tend to think of "recommendation" questions as needing to be CW because there's no one right answer, but I see a lot of these here that aren't CWs, so I haven't marked it as one. I'll happily do so if enough people think I should.

Read the article

Surprising corruption and never-ending fsck after resizing a filesystem.

- by Steve Kemp

System in question has Debian Lenny installed, running a 2.65.27.38 kernel. System has 16Gb memory, and 8x1Tb drives running behind a 3Ware RAID card. The storage is managed via LVM. Short version: Running a KVM guest which had 1.7Tb storage allocated to it. The guest was reaching a full-disk. So we decided to resize the disk that it was running upon We're pretty familiar with LVM, and KVM, so we figured this would be a painless operation: Stop the KVM guest. Extend the size of the LVM partition: "lvextend -L+500Gb ..." Check the filesystem : "e2fsck -f /dev/mapper/..." Resize the filesystem: "resize2fs /dev/mapper/" Start the guest. The guest booted successfully, and running "df" showed the extra space, however a short time later the system decided to remount the filesystem read-only, without any explicit indication of error. Being paranoid we shut the guest down and ran the filesystem check again, given the new size of the filesystem we expected this to take a while, however it has now been running for 24 hours and there is no indication of how long it will take. Using strace I can see the fsck is "doing stuff", similarly running "vmstat 1" I can see that there are a lot of block input/output operations occurring. So now my question is threefold: Has anybody come across a similar situation? Generally we've done this kind of resize in the past with zero issues. What is the most likely cause? (3Ware card shows the RAID arrays of the backing stores as being A-OK, the host system hasn't rebooted and nothing in dmesg looks important/unusual) Ignoring brtfs + ext3 (not mature enough to trust) should we make our larger partitions in a different filesystem in the future to avoid either this corruption (whatever the cause) or reduce the fsck time? xfs seems like the obvious candidate?

Read the article

How should I monitor memory usage/performance in SunOS/Solaris?

- by exhuma

Last week we decided to add some SunOS (uname -a = SunOS bbs-sam-belair 5.10 Generic_127128-11 i86pc i386 i86pc) machines into our running munin instance. First off, the machines are pre-configured appliances, so, I want to avoid touching the system too much without supervision of the service provider. But adding it to munin was fairly easy by writing a small socket-service (if anyone is interested, I put it up on github: https://github.com/munin-monitoring/contrib/tree/master/tools/pypmmn) Yesterday, I implemented/adapted the required plugins for our machines. And here the questions start: First, I have not found a way to determine detailed memory usage values. I get the total memory by running prtconf | grep Memory, and the free memory using vmstat. Fiddling together a munin-plugin, gives me the following graph: This is pretty much uninformative. Compare this to the default plugin for linux nodes which has a lot more detail: Most importantly, this shows me how much memory is actually used by applications. So, first question: Is it possible to get detailed memory information on SunOS with the default system tools (i.e. not using top)? Onto the next puzzle: Seeing the graphs, I noticed activity in the "Paging in/out" graphs, even though the memory graph still has unused memory: Upon further investigation, I found out that df reports that /tmp is mounted on swap. Drilling around on the web, I understood that df will display swap, but in fact, it's mounted as a tmpfs. Now I don't know if this explains the swap activity. The default munin-plugin for solaris uses kstat -p -c misc -m cpu_stat to get these values. I find it already strange that this is using the cpu_stat module. So maybe I simply misinterpret the "paging" graphs? Second question: Do the paging graphs indicate that parts of the memory are paged to disk? Or is the activity caused by file operations in /tmp?

Read the article

Unexplained cache RAM drops on Linux machine

- by FunkyChicken

I run a CentOS 5.7 64 machine with 24gb ram and running kernel 2.6.18-274.12.1.el5. This machine runs only Nginx, php-fpm and Xcache as extra applications. Since about 3 weeks my memory behavior on this machine has changed and I cannot explain why. There are no crons running which flush anything like this. There are also no large numbers of files being deleted/changed during these drops. The 'cached' memory gets dropped about every few hours, but it's never a set gap between flushes, this indicates to me that some bottleneck gets reached instead. It also always seems to be when total memory usages gets to about 18GB, but again, not always exactly 18GB. This is a graph of my memory usage: As you can see in the graph the 'buffers' always stay more or less the same, it is mainly the 'cache' that gets dropped. Running vmstat -m I have outputted the memory usage just before and just after a memory drop. The output is here: http://pastebin.com/diff.php?i=hJqZqztm 'old version' being before, 'new version' being after a drop. About 3 weeks ago my server crashed during a heavy DDOS attack, after I rebooted the machine this odd behavior started. I have checked a bunch of logs, restarted the machine again, and cannot find any indication what changed. During these 'cache' memory drops, my iNode usage drops at the same time. Does anyone have any idea what might be causing this behavior? Clearly my RAM isn't full, so I am curious why this could be happening.

Read the article

Website has become slower on a VPS, was much fast on a shared host. What's wrong?

- by Arpit Tambi

My shared host suspended my website stating system overload, so I moved my website to a VPS which has 4GB RAM. But for some reason the website has become very slow. This is the vmstat output - procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 3050500 0 0 0 0 0 1 0 0 0 0 100 0 0 Here's the Apache Benchmark output for a STATIC html page I ran on the server itself - Benchmarking www.ask-oracle.com (be patient)...apr_poll: The timeout specified has expired (70007) Total of 20 requests completed Update: Server Config: List item Centos 5.6 4 cores cpu 4 GB RAM LAMP stack with APC Wordpress Only one website It takes almost double time to load now, same website was much fast on shared hosting. I know I need to tweak some settings but have no clue where to start from? I have already tried to optimize apache, mysql etc. Update 2: CPU usage is low, see uptime output: 11:09:02 up 7 days, 21:26, 1 user, load average: 0.09, 0.11, 0.09 Update 3: When I load any webpage, browser shows "Waiting" for a long time and then page loads quickly. So I suspect server can accept only limited connections and holds extra connections in a waiting state. How to check this? Update 4: Following is the output on executing netperf TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain (127.0.0.1) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 9615.40 [root@ip-118-139-177-244 j3ngn5ri6r01t3]# Here are the Apache MPM settings from httpd.conf, do they look okay? <IfModule worker.c> StartServers 5 MaxClients 100 MinSpareThreads 50 MaxSpareThreads 250 ThreadsPerChild 125 MaxRequestsPerChild 10000 ServerLimit 100 </IfModule>

Read the article

MySQL is killing the server IO.

- by OneOfOne

I manage a fairly large/busy vBulletin forums (running on gigenet cloud), the database is ~ 10 GB (~9 milion posts, ~60 queries per second), lately MySQL have been grinding the disk like there's no tomorrow according to iotop and slowing the site. The last idea I can think of is using replication, but I'm not sure how much that would help and worried about database sync. I'm out of ideas, any tips on how to improve the situation would be highly appreciated. Specs : Debian Lenny 64bit ~12Ghz (6 cores) CPU, 7520gb RAM, 160gb disk. Kernel : 2.6.32-4-amd64 mysqld Ver 5.1.54-0.dotdeb.0 for debian-linux-gnu on x86_64 ((Debian)) Other software: vBulletin 3.8.4 memcached 1.2.2 PHP 5.3.5-0.dotdeb.0 (fpm-fcgi) (built: Jan 7 2011 00:07:27) lighttpd/1.4.28 (ssl) - a light and fast webserver PHP and vBulletin are configured to use memcached. MySQL Settings : [mysqld] key_buffer = 128M max_allowed_packet = 16M thread_cache_size = 8 myisam-recover = BACKUP max_connections = 1024 query_cache_limit = 2M query_cache_size = 128M expire_logs_days = 10 max_binlog_size = 100M key_buffer_size = 128M join_buffer_size = 8M tmp_table_size = 16M max_heap_table_size = 16M table_cache = 96 Other : > vmstat procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 9 0 73140 36336 8968 1859160 0 0 42 15 3 2 6 1 89 5 > /etc/init.d/mysql status Threads: 49 Questions: 252139 Slow queries: 164 Opens: 53573 Flush tables: 1 Open tables: 337 Queries per second avg: 61.302. Edit Additional info.

Read the article

Massive number of context switches on ksoftirqd

- by Pace

We have two servers that are grinding to a halt. One is a VM and the other is bare metal. Neither of them are running similar code but they are on the same network. It appears that an incredible number of context switches are arising from ksoftirqd (which is taking up a lot of CPU). vmstat output procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 605092 182496 2637556 0 0 0 0 4177 519187 8 19 73 0 0 2 0 0 605092 182496 2637556 0 0 0 0 4792 520980 8 19 74 0 0 3 0 0 605092 182496 2637552 0 0 0 0 2137 659640 18 26 56 0 0 ... pidstat output TCK4-BM-06A:~ # pidstat -w -I 5 Linux 2.6.32.12-0.7-default (TCK4-BM-06A) 07/02/2012 _x86_64_ 03:03:01 PM PID cswch/s nvcswch/s Command 03:03:06 PM 1 0.20 0.00 init 03:03:06 PM 4 386666.27 0.00 ksoftirqd/0 03:03:06 PM 6 0.60 0.00 ksoftirqd/1 03:03:06 PM 8 378213.17 0.00 ksoftirqd/2 03:03:06 PM 10 0.20 0.00 ksoftirqd/3 03:03:06 PM 12 0.20 0.00 ksoftirqd/4 03:03:06 PM 26 377115.37 0.00 ksoftirqd/11 03:03:06 PM 27 1.80 0.00 events/0 03:03:06 PM 28 1.00 0.00 events/1 03:03:06 PM 29 1.00 0.00 events/2 03:03:06 PM 30 1.00 0.00 events/3 03:03:06 PM 31 0.80 0.00 events/4 03:03:06 PM 32 0.80 0.00 events/5 ... My initial thought is that, since both are on the same network, something is flooding the network. Is this consistent with the data?

Read the article

How can I measure actual memory usage from my running processes?

- by NullUser

I have two servers, server1 and server2. Both of them are identical HP blades, running the exact same OS (RHEL 5.5). Here's the output of free for both of them: ### server1: total used free shared buffers cached Mem: 8017848 2746596 5271252 0 212772 1768800 -/+ buffers/cache: 765024 7252824 Swap: 14188536 0 14188536 ### server2: total used free shared buffers cached Mem: 8017848 4494836 3523012 0 212724 3136568 -/+ buffers/cache: 1145544 6872304 Swap: 14188536 0 14188536 If I understand correctly, server2 is using significantly more memory for disk I/O caching, which still counts as memory used. But both are running the same OS and if I remember correctly, I configured both with the same parameters when they were installed. I did a diff on /etc/sysctl.conf and they are identical. The problem is, I am collecting memory usage and other metrics over a period of time, (eg: vmstat, iostat, etc.) while a load is generated on the system. The memory used for caching is throwing off my calculations on the results. How can I measure actual memory usage from my running processes, rather than system usage? Is used - (buffers + cached) a valid way to measure this?

Read the article

How to find out which process is hogging the linux server?

- by user1149518

We have a RHEL server. Today it suddenly became slow. Symptoms - It was responding slow to ping queries from other server. When I try to login using ssh, it was taking about 10 seconds to login. I was able to resolve the problem by doing some guess work. I killed one process which I thought was culprit. Which resolved the problem. Though I would like to know what's proper approach to detect the culprit in such kind of "slow server" situations. Le me know proper way to resolving such slowness issues and decting the process causing the slowness. These were the conditions when the server was slow - # vmstat 3 3 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 1 176 6730868 285052 4899676 0 0 3 4 0 0 1 1 97 1 0 0 0 176 6751576 285064 4899704 0 0 0 115 15307 37171 1 1 96 3 0 0 0 176 6751948 285068 4899700 0 0 0 23 14813 39559 1 1 98 1 0 # top top - 16:38:18 up 150 days, 19:36, 64 users, load average: 1.68, 1.46, 1.44 Tasks: 1287 total, 2 running, 1284 sleeping, 1 stopped, 0 zombie Cpu(s): 1.3%us, 1.7%sy, 0.1%ni, 95.9%id, 0.7%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 16620824k total, 9867124k used, 6753700k free, 287424k buffers Swap: 8193140k total, 176k used, 8192964k free, 4898996k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26258 khk 34 19 130m 47m 7088 S 11.2 0.3 385:32.42 edm Though I would like to know what's proper approach to detect the culprit in such kind of "slow server" situations. Le me know proper way to resolving such slowness issues and decting the process causing the slowness.

Read the article

unexplainable packet drops with 5 ethernet NICs and low traffic on Ubuntu

- by jon

I'm stuck on problem where my machine started to drops packets with no sign of ANY system load or high interrupt usage after an upgrade to Ubuntu 12.04. My server is a network monitoring sensor, running Ubuntu LTS 12.04, it passively collects packets from 5 interfaces doing network intrusion type stuff. Before the upgrade I managed to collect 200+GB of packets a day while writing them to disk with around 0% packet loss depending on the day with the help of CPU affinity and NIC IRQ to CPU bindings. Now I lose a great deal of packets with none of my applications running and at very low PPS rate which a modern workstation NIC would have no trouble with. Specs: x64 Xeon 4 cores 3.2 Ghz 16 GB RAM NICs: 5 Intel Pro NICs using the e1000 driver (NAPI). [1] eth0 and eth1 are integrated NICs (in the motherboard) There are 2 other PCI-X network cards, each with 2 Ethernet ports. 3 of the interfaces are running at Gigabit Ethernet, the others are not because they're attached to hubs. Specs: [2] http://support.dell.com/support/edocs/systems/pe2850/en/ug/t1390aa.htm uptime 17:36:00 up 1:43, 2 users, load average: 0.00, 0.01, 0.05 # uname -a Linux nms 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux I also have the CPU governor set to performance mode and irqbalance off. The problem still occurs with them on. # lspci -t -vv -[0000:00]-+-00.0 Intel Corporation E7520 Memory Controller Hub +-02.0-[01-03]--+-00.0-[02]----0e.0 Dell PowerEdge Expandable RAID controller 4 | \-00.2-[03]-- +-04.0-[04]-- +-05.0-[05-07]--+-00.0-[06]----07.0 Intel Corporation 82541GI Gigabit Ethernet Controller | \-00.2-[07]----08.0 Intel Corporation 82541GI Gigabit Ethernet Controller +-06.0-[08-0a]--+-00.0-[09]--+-04.0 Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) | | \-04.1 Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) | \-00.2-[0a]--+-02.0 Digium, Inc. Wildcard TE210P/TE212P dual-span T1/E1/J1 card 3.3V | +-03.0 Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) | \-03.1 Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) +-1d.0 Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 +-1d.1 Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 +-1d.2 Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 +-1d.7 Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller +-1e.0-[0b]----0d.0 Advanced Micro Devices [AMD] nee ATI RV100 QY [Radeon 7000/VE] +-1f.0 Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge \-1f.1 Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller I believe the NIC nor the NIC drivers are dropping the packets because ethtool reports 0 under rx_missed_errors and rx_no_buffer_count for each interface. On the old system, if it couldn't keep up this is where the drops would be. I drop packets on multiple interfaces just about every second, usually in small increments of 2-4. I tried all these sysctl values, I'm currently using the uncommented ones. # cat /etc/sysctl.conf # high net.core.netdev_max_backlog = 3000000 net.core.rmem_max = 16000000 net.core.rmem_default = 8000000 # defaults #net.core.netdev_max_backlog = 1000 #net.core.rmem_max = 131071 #net.core.rmem_default = 163480 # moderate #net.core.netdev_max_backlog = 10000 #net.core.rmem_max = 33554432 #net.core.rmem_default = 33554432 Here's an example of an interface stats report with ethtool. They are all the same, nothing is out of the ordinary ( I think ), so I'm only going to show one: ethtool -S eth2 NIC statistics: rx_packets: 7498 tx_packets: 0 rx_bytes: 2722585 tx_bytes: 0 rx_broadcast: 327 tx_broadcast: 0 rx_multicast: 1504 tx_multicast: 0 rx_errors: 0 tx_errors: 0 tx_dropped: 0 multicast: 1504 collisions: 0 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_no_buffer_count: 0 rx_missed_errors: 0 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 tx_window_errors: 0 tx_abort_late_coll: 0 tx_deferred_ok: 0 tx_single_coll_ok: 0 tx_multi_coll_ok: 0 tx_timeout_count: 0 tx_restart_queue: 0 rx_long_length_errors: 0 rx_short_length_errors: 0 rx_align_errors: 0 tx_tcp_seg_good: 0 tx_tcp_seg_failed: 0 rx_flow_control_xon: 0 rx_flow_control_xoff: 0 tx_flow_control_xon: 0 tx_flow_control_xoff: 0 rx_long_byte_count: 2722585 rx_csum_offload_good: 0 rx_csum_offload_errors: 0 alloc_rx_buff_failed: 0 tx_smbus: 0 rx_smbus: 0 dropped_smbus: 01 # ifconfig eth0 Link encap:Ethernet HWaddr 00:11:43:e0:e2:8c UP BROADCAST RUNNING NOARP PROMISC ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:373348 errors:16 dropped:95 overruns:0 frame:16 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:356830572 (356.8 MB) TX bytes:0 (0.0 B) eth1 Link encap:Ethernet HWaddr 00:11:43:e0:e2:8d UP BROADCAST RUNNING NOARP PROMISC ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:13616 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:8690528 (8.6 MB) TX bytes:0 (0.0 B) eth2 Link encap:Ethernet HWaddr 00:04:23:e1:77:6a UP BROADCAST RUNNING NOARP PROMISC ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:7750 errors:0 dropped:471 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2780935 (2.7 MB) TX bytes:0 (0.0 B) eth3 Link encap:Ethernet HWaddr 00:04:23:e1:77:6b UP BROADCAST RUNNING NOARP PROMISC ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:5112 errors:0 dropped:206 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:639472 (639.4 KB) TX bytes:0 (0.0 B) eth4 Link encap:Ethernet HWaddr 00:04:23:b6:35:6c UP BROADCAST RUNNING NOARP PROMISC ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:961467 errors:0 dropped:935 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:958561305 (958.5 MB) TX bytes:0 (0.0 B) eth5 Link encap:Ethernet HWaddr 00:04:23:b6:35:6d inet addr:192.168.1.6 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:4264 errors:0 dropped:16 overruns:0 frame:0 TX packets:699 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:572228 (572.2 KB) TX bytes:124456 (124.4 KB) I tried the defaults, then started to play around with settings. I wasn't using any flow control and I increased the RxDescriptor count to 4096 before the upgrade as well without any problems. # cat /etc/modprobe.d/e1000.conf options e1000 XsumRX=0,0,0,0,0 RxDescriptors=4096,4096,4096,4096,4096 FlowControl=0,0,0,0,0 debug=16 Here's my network configuration file, I turned off checksumming and various offloading mechanisms along with setting CPU affinity with heavy use interfaces getting an entire CPU and light use interfaces sharing a CPU. I used these settings prior to the upgrade without problems. # cat /etc/network/interfaces # The loopback network interface auto lo iface lo inet loopback # The primary network interface auto eth0 iface eth0 inet manual pre-up /sbin/ethtool -G eth0 rx 4096 tx 0 pre-up /sbin/ethtool -K eth0 gro off gso off rx off pre-up /sbin/ethtool -A eth0 rx off autoneg off up ifconfig eth0 0.0.0.0 -arp promisc mtu 1500 allmulti txqueuelen 0 up post-up echo "4" > /proc/irq/48/smp_affinity down ifconfig eth0 down post-down /sbin/ethtool -G eth0 rx 256 tx 256 post-down /sbin/ethtool -K eth0 gro on gso on rx on post-down /sbin/ethtool -A eth0 rx on autoneg on auto eth1 iface eth1 inet manual pre-up /sbin/ethtool -G eth1 rx 4096 tx 0 pre-up /sbin/ethtool -K eth1 gro off gso off rx off pre-up /sbin/ethtool -A eth1 rx off autoneg off up ifconfig eth1 0.0.0.0 -arp promisc mtu 1500 allmulti txqueuelen 0 up post-up echo "4" > /proc/irq/49/smp_affinity down ifconfig eth1 down post-down /sbin/ethtool -G eth1 rx 256 tx 256 post-down /sbin/ethtool -K eth1 gro on gso on rx on post-down /sbin/ethtool -A eth1 rx on autoneg on auto eth2 iface eth2 inet manual pre-up /sbin/ethtool -G eth2 rx 4096 tx 0 pre-up /sbin/ethtool -K eth2 gro off gso off rx off pre-up /sbin/ethtool -A eth2 rx off autoneg off up ifconfig eth2 0.0.0.0 -arp promisc mtu 1500 allmulti txqueuelen 0 up post-up echo "1" > /proc/irq/82/smp_affinity down ifconfig eth2 down post-down /sbin/ethtool -G eth2 rx 256 tx 256 post-down /sbin/ethtool -K eth2 gro on gso on rx on post-down /sbin/ethtool -A eth2 rx on autoneg on auto eth3 iface eth3 inet manual pre-up /sbin/ethtool -G eth3 rx 4096 tx 0 pre-up /sbin/ethtool -K eth3 gro off gso off rx off pre-up /sbin/ethtool -A eth3 rx off autoneg off up ifconfig eth3 0.0.0.0 -arp promisc mtu 1500 allmulti txqueuelen 0 up post-up echo "2" > /proc/irq/83/smp_affinity down ifconfig eth3 down post-down /sbin/ethtool -G eth3 rx 256 tx 256 post-down /sbin/ethtool -K eth3 gro on gso on rx on post-down /sbin/ethtool -A eth3 rx on autoneg on auto eth4 iface eth4 inet manual pre-up /sbin/ethtool -G eth4 rx 4096 tx 0 pre-up /sbin/ethtool -K eth4 gro off gso off rx off pre-up /sbin/ethtool -A eth4 rx off autoneg off up ifconfig eth4 0.0.0.0 -arp promisc mtu 1500 allmulti txqueuelen 0 up post-up echo "4" > /proc/irq/77/smp_affinity down ifconfig eth4 down post-down /sbin/ethtool -G eth4 rx 256 tx 256 post-down /sbin/ethtool -K eth4 gro on gso on rx on post-down /sbin/ethtool -A eth4 rx on autoneg on auto eth5 iface eth5 inet static pre-up /etc/fw.conf address 192.168.1.1 netmask 255.255.255.0 broadcast 192.168.1.255 gateway 192.168.1.1 dns-nameservers 192.168.1.2 192.168.1.3 up ifconfig eth5 up post-up echo "8" > /proc/irq/77/smp_affinity down ifconfig eth5 down Here's a few examples of packet drops, i ran one after another, probabling totaling 3 or 4 seconds. You can see increases in the drops from the 1st and 3rd. This was a non-busy time, very little traffic. # awk '{ print $1,$5 }' /proc/net/dev Inter-| face drop eth3: 225 lo: 0 eth2: 505 eth1: 0 eth5: 17 eth0: 105 eth4: 1034 # awk '{ print $1,$5 }' /proc/net/dev Inter-| face drop eth3: 225 lo: 0 eth2: 507 eth1: 0 eth5: 17 eth0: 105 eth4: 1034 # awk '{ print $1,$5 }' /proc/net/dev Inter-| face drop eth3: 227 lo: 0 eth2: 512 eth1: 0 eth5: 17 eth0: 105 eth4: 1039 I tried the pci=noacpi options. With and without, it's the same. This is what my interrupt stats looked like before the upgrade, after, with ACPI on PCI it showed multiple NICs bound to an interrupt and shared with other devices such as USB drives which I didn't like so I think i'm going to keep it with ACPI off as it's easier to designate sole purpose interrupts. Is there any advantage I would have using the default i.e. ACPI w/ PCI. ? # cat /etc/default/grub | grep CMD_LINE GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 noacpi pci=noacpi" GRUB_CMDLINE_LINUX="" # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 45 0 0 16 IO-APIC-edge timer 1: 1 0 0 7936 IO-APIC-edge i8042 2: 0 0 0 0 XT-PIC-XT-PIC cascade 6: 0 0 0 3 IO-APIC-edge floppy 8: 0 0 0 1 IO-APIC-edge rtc0 9: 0 0 0 0 IO-APIC-edge acpi 12: 0 0 0 1809 IO-APIC-edge i8042 14: 1 0 0 4498 IO-APIC-edge ata_piix 15: 0 0 0 0 IO-APIC-edge ata_piix 16: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2 18: 0 0 0 1350 IO-APIC-fasteoi uhci_hcd:usb4, radeon 19: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 23: 0 0 0 4099 IO-APIC-fasteoi ehci_hcd:usb1 38: 0 0 0 61963 IO-APIC-fasteoi megaraid 48: 0 0 1002319 4 IO-APIC-fasteoi eth0 49: 0 0 38772 3 IO-APIC-fasteoi eth1 77: 0 0 130076 432159 IO-APIC-fasteoi eth4 78: 0 0 0 23917 IO-APIC-fasteoi eth5 82: 1329033 0 0 4 IO-APIC-fasteoi eth2 83: 0 4886525 0 6 IO-APIC-fasteoi eth3 NMI: 5 6 4 5 Non-maskable interrupts LOC: 61409 57076 64257 114764 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts IWI: 0 0 0 0 IRQ work interrupts RES: 17956 25333 13436 14789 Rescheduling interrupts CAL: 22436 607 539 478 Function call interrupts TLB: 1525 1458 4600 4151 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 Machine check exceptions MCP: 16 16 16 16 Machine check polls ERR: 0 MIS: 0 Here's sample output of vmstat, showing the system. Barebones system right now. root@nms:~# vmstat -S m 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 14992 192 1029 0 0 56 2 419 29 1 0 99 0 0 0 0 14992 192 1029 0 0 0 0 922 27 0 0 100 0 0 0 0 14991 192 1029 0 0 0 36 763 50 0 0 100 0 0 0 0 14991 192 1029 0 0 0 0 646 35 0 0 100 0 0 0 0 14991 192 1029 0 0 0 0 722 54 0 0 100 0 0 0 0 14991 192 1029 0 0 0 0 793 27 0 0 100 0 ^C Here's dmesg output. I can't figure out why my PCI-X slots are negotiated as PCI. The network cards are all PCI-X with the exception of the integrated NICs that came with the server. In the output below it looks as if eth3 and eth2 negotiated at PCI-X speeds rather than PCI:66Mhz. Wouldn't they all drop to PCI:66Mhz? If your integrated NICs are PCI, as labeled below (eth0,eth1), then wouldn't all devices on your bus speed drop down to that slower bus speed? If not, I still don't know why only one of my NICs ( each has two ethernet ports) is labeled as PCI-X in the output below. Does that mean it is running at PCI-X speeds are is it showing that it's capable? # dmesg | grep e1000 [ 3678.349337] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI [ 3678.349342] e1000: Copyright (c) 1999-2006 Intel Corporation. [ 3678.349394] e1000 0000:06:07.0: PCI->APIC IRQ transform: INT A -> IRQ 48 [ 3678.409725] e1000 0000:06:07.0: Receive Descriptors set to 4096 [ 3678.409730] e1000 0000:06:07.0: Checksum Offload Disabled [ 3678.409734] e1000 0000:06:07.0: Flow Control Disabled [ 3678.586409] e1000 0000:06:07.0: eth0: (PCI:66MHz:32-bit) 00:11:43:e0:e2:8c [ 3678.586419] e1000 0000:06:07.0: eth0: Intel(R) PRO/1000 Network Connection [ 3678.586642] e1000 0000:07:08.0: PCI->APIC IRQ transform: INT A -> IRQ 49 [ 3678.649854] e1000 0000:07:08.0: Receive Descriptors set to 4096 [ 3678.649859] e1000 0000:07:08.0: Checksum Offload Disabled [ 3678.649863] e1000 0000:07:08.0: Flow Control Disabled [ 3678.826436] e1000 0000:07:08.0: eth1: (PCI:66MHz:32-bit) 00:11:43:e0:e2:8d [ 3678.826444] e1000 0000:07:08.0: eth1: Intel(R) PRO/1000 Network Connection [ 3678.826627] e1000 0000:09:04.0: PCI->APIC IRQ transform: INT A -> IRQ 82 [ 3679.093266] e1000 0000:09:04.0: Receive Descriptors set to 4096 [ 3679.093271] e1000 0000:09:04.0: Checksum Offload Disabled [ 3679.093275] e1000 0000:09:04.0: Flow Control Disabled [ 3679.130239] e1000 0000:09:04.0: eth2: (PCI-X:133MHz:64-bit) 00:04:23:e1:77:6a [ 3679.130246] e1000 0000:09:04.0: eth2: Intel(R) PRO/1000 Network Connection [ 3679.130449] e1000 0000:09:04.1: PCI->APIC IRQ transform: INT B -> IRQ 83 [ 3679.397312] e1000 0000:09:04.1: Receive Descriptors set to 4096 [ 3679.397318] e1000 0000:09:04.1: Checksum Offload Disabled [ 3679.397321] e1000 0000:09:04.1: Flow Control Disabled [ 3679.434350] e1000 0000:09:04.1: eth3: (PCI-X:133MHz:64-bit) 00:04:23:e1:77:6b [ 3679.434360] e1000 0000:09:04.1: eth3: Intel(R) PRO/1000 Network Connection [ 3679.434553] e1000 0000:0a:03.0: PCI->APIC IRQ transform: INT A -> IRQ 77 [ 3679.704072] e1000 0000:0a:03.0: Receive Descriptors set to 4096 [ 3679.704077] e1000 0000:0a:03.0: Checksum Offload Disabled [ 3679.704081] e1000 0000:0a:03.0: Flow Control Disabled [ 3679.738364] e1000 0000:0a:03.0: eth4: (PCI:33MHz:64-bit) 00:04:23:b6:35:6c [ 3679.738371] e1000 0000:0a:03.0: eth4: Intel(R) PRO/1000 Network Connection [ 3679.738538] e1000 0000:0a:03.1: PCI->APIC IRQ transform: INT B -> IRQ 78 [ 3680.046060] e1000 0000:0a:03.1: eth5: (PCI:33MHz:64-bit) 00:04:23:b6:35:6d [ 3680.046067] e1000 0000:0a:03.1: eth5: Intel(R) PRO/1000 Network Connection [ 3682.132415] e1000: eth0 NIC Link is Up 100 Mbps Half Duplex, Flow Control: None [ 3682.224423] e1000: eth1 NIC Link is Up 100 Mbps Half Duplex, Flow Control: None [ 3682.316385] e1000: eth2 NIC Link is Up 100 Mbps Half Duplex, Flow Control: None [ 3682.408391] e1000: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None [ 3682.500396] e1000: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None [ 3682.708401] e1000: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX At first I thought it was the NIC drivers but I'm not so sure. I really have no idea where else to look at the moment. Any help is greatly appreciated as I'm struggling with this. If you need more information just ask. Thanks! [1]http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/Documentation/networking/e1000.txt?v=2.6.11.8 [2] http://support.dell.com/support/edocs/systems/pe2850/en/ug/t1390aa.htm

Read the article

Free RAM disappears - Memory leak?

- by Izzy

On a fresh started system, free reports about 1.5G used RAM (8G RAM alltogether, Ubuntu 12.04 with lightdm and plasma desktop, one konsole window started). Having the apps running I use, it still consumes not more than 2G. However, having the system running for a couple of days, more and more of my free RAM disappears -- without showing up in the list of used apps: while smem --pie=name reports less than 20% used (and 80% being available), everything else says differently. free -m for example reports on about day 7: total used free shared buffers cached Mem: 7459 7013 446 0 178 997 -/+ buffers/cache: 5836 1623 Swap: 9536 296 9240 (so you can see, it's not the buffers or the cache). Today this finally ended with the system crashing completely: the windows manager being gone, apps "hanging in the air" (frameless) -- and a popup notifying me about "too many open files". Syslog reports: kernel: [856738.020829] VFS: file-max limit 752838 reached So I closed those applications I was able to close, and killed X using Ctrl-Alt-backspace. X tried to come up again after that with failsafeX, but was unable to do so as it could no longer detect its configuration. So I switched to a console using Ctrl-Alt-F2, captured all information I could think of (vmstat, free, smem, proc/meminfo, lsof, ps aux), and finally rebooted. X again came up with failsafeX; this time I told it to "recover from my backed-up configuration", then switched to a console and successfully used startx to bring up the graphical environment. I have no real clue to what is causing this issue -- though it must have to do either with X itself, or with some user processes running on X -- as after killing X, free -m output looked like this: total used free shared buffers cached Mem: 7459 2677 4781 0 62 419 -/+ buffers/cache: 2195 5263 Swap: 9536 59 9477 (~3.5GB being freed) -- to compare with the output after a fresh start: total used free shared buffers cached Mem: 7459 1483 5975 0 63 730 -/+ buffers/cache: 689 6769 Swap: 9536 0 9536 Two more helpful outputs are provided by memstat -u. Shortly before the crash: User Count Swap USS PSS RSS mail 1 0 200 207 616 whoopsie 1 764 740 817 2300 colord 1 3200 836 894 2156 root 62 70404 352996 382260 569920 izzy 80 177508 1465416 1519266 1851840 After having X killed: User Count Swap USS PSS RSS mail 1 0 184 188 356 izzy 1 1400 708 739 1080 whoopsie 1 848 668 826 1772 colord 1 3204 804 888 1728 root 62 54876 131708 149950 267860 And after a restart, back in X: User Count Swap USS PSS RSS mail 1 0 212 217 628 whoopsie 1 0 1536 1880 5096 colord 1 0 3740 4217 7936 root 54 0 148668 180911 345132 izzy 47 0 370928 437562 915056 Edit: Just added two graphs from my monitoring system. Interesting to see: everytime when there's a "jump" in memory consumption, CPU peaks as well. Just found this right now -- and it reminds me of another indicator pointing to X itself: Often when returning to my machine and unlocking the screen, I found something doing heavvy work on my CPU. Checking with top, it always turned out to be /usr/bin/X :0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch -background none. So after this long explanation, finally my questions: What could be the possible causes? How can I better identify involved processes/applications? What steps could be taken to avoid this behaviour -- short from rebooting the machine all X days? I was running 8.04 (Hardy) for about 5 years on my old machine, never having experienced the like (always more than 100 days uptime, before rebooting for e.g. kernel updates). This now is a complete new machine with a fresh install of 8.04. In case it matters, some specs: AMD A4-3400 APU with Radeon(tm) HD Graphics, using the open-source ati/radeon driver (so no fglrx installed), 8GB RAM, WDC WD1002FAEX-0 hdd (1TB), Asus F1A75-V Evo mainboard. Ubuntu 12.04 64-bit with KDE4/Plasma. Apps usually open more or less permanently include Evolution, Firefox, konsole (with Midnight Commander running inside, about 4 tabs), and LibreOffice -- plus occasionally Calibre, Gimp and Moneyplex (banking software I'm already using for almost 20 years now, in a version which did fine on Hardy).

Read the article

Apache and MySQL taking all the memory? Maximum connections?

- by lpfavreau

I've a had one of our servers going down (network wise) but keeping its uptime (so looks the server is not losing its power) recently. I've asked my hosting company to investigate and I've been told, after investigation, that Apache and MySQL were at all time using 80% of the memory and peaking at 95% and that I might be needing to add some more RAM to the server. One of their justifications to adding more RAM was that I was using the default max connections setting (125 for MySQL and 150 for Apache) and that for handling those 150 simultaneous connections, I would need at least 3Gb of memory instead of the 1Gb I have at the moment. Now, I understand that tweaking the max connections might be better than me leaving the default setting although I didn't feel it was a concern at the moment, having had servers with the same configuration handle more traffic than the current 1 or 2 visitors before the lunch, telling myself I'd tweak it depending on the visits pattern later. I've also always known Apache was more memory hungry under default settings than its competitor such as nginx and lighttpd. Nonetheless, looking at the stats of my machine, I'm trying to see how my hosting company got those numbers. I'm getting: # free -m total used free shared buffers cached Mem: 1000 944 56 0 148 725 -/+ buffers/cache: 71 929 Swap: 1953 0 1953 Which I guess means that yes, the server is reserving around 95% of its memory at the moment but I also thought it meant that only 71 out of the 1000 total were really used by the applications at the moment looking a the buffers/cache row. Also I don't see any swapping: # vmstat 60 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 57612 151704 742596 0 0 1 1 3 11 0 0 100 0 0 0 0 57604 151704 742596 0 0 0 1 1 24 0 0 100 0 0 0 0 57604 151704 742596 0 0 0 2 1 18 0 0 100 0 0 0 0 57604 151704 742596 0 0 0 0 1 13 0 0 100 0 And finally, while requesting a page: top - 08:33:19 up 3 days, 13:11, 2 users, load average: 0.06, 0.02, 0.00 Tasks: 81 total, 1 running, 80 sleeping, 0 stopped, 0 zombie Cpu(s): 1.3%us, 0.3%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1024616k total, 976744k used, 47872k free, 151716k buffers Swap: 2000052k total, 0k used, 2000052k free, 742596k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24914 www-data 20 0 26296 8640 3724 S 2 0.8 0:00.06 apache2 23785 mysql 20 0 125m 18m 5268 S 1 1.9 0:04.54 mysqld 24491 www-data 20 0 25828 7488 3180 S 1 0.7 0:00.02 apache2 1 root 20 0 2844 1688 544 S 0 0.2 0:01.30 init ... So, I'd like to know, experts of serverfault: Do I really need more RAM at the moment? How do they calculate that for 150 simultaneous connections I'd need 3Gb? Thanks for your help!

Read the article

Ubuntu Server 12.04 CPU Load

- by zertux

I have a Server (2x Hexa-Core Xeon E5649 2.53GHz w/HT with 32GB RAM and 20000 GB Bandwidth) running Ubuntu Server 12.04 LTS. The server runs LAMP and serves one website only, the estimated number of users is to be ~ 15,000 at the same time. At the moment i have around 2000 users online each of them runs 50 MySQL queries (small values mostly select and insert) from the beginning until the end of the session. Server CPU Load is high at this number of connections while the RAM usage is almost 1GB out of 32GB its worth mentioning that the server was running very fast with no problems at all but am concerned about the load average. http://s12.postimage.org/z7hi6mz3h/photo.png top - 03:02:43 up 9 min, 2 users, load average: 50.83, 30.14, 12.83 Tasks: 432 total, 1 running, 430 sleeping, 0 stopped, 1 zombie Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 66.5%id, 33.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 32939992k total, 3111604k used, 29828388k free, 84108k buffers Swap: 2048280k total, 0k used, 2048280k free, 1621640k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2860 root 20 0 25820 2288 1420 S 3 0.0 0:11.18 htop 1182 root 20 0 0 0 0 D 2 0.0 0:01.46 kjournald 1935 mysql 20 0 12.3g 161m 7924 S 1 0.5 102:31.45 mysqld 11 root 20 0 0 0 0 S 0 0.0 0:00.38 kworker/0:1 1822 www-data 20 0 247m 25m 4188 D 0 0.1 0:01.81 apache2 2920 www-data 20 0 0 0 0 Z 0 0.0 0:01.20 apache2 <defunct> 2942 www-data 20 0 247m 23m 3056 D 0 0.1 0:00.20 apache2 3516 www-data 20 0 247m 23m 3028 D 0 0.1 0:00.06 apache2 3521 www-data 20 0 247m 23m 3020 D 0 0.1 0:00.09 apache2 3664 www-data 20 0 247m 23m 3132 D 0 0.1 0:00.09 apache2 3674 www-data 20 0 247m 23m 3252 D 0 0.1 0:00.06 apache2 3713 www-data 20 0 247m 23m 3040 D 0 0.1 0:00.09 apache2 1 root 20 0 24328 2284 1344 S 0 0.0 0:03.09 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0 0.0 0:00.01 ksoftirqd/0 6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0 7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 8 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1 9 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/1:0 root@server:~/codes# vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 19 0 0 29684012 86112 1689844 0 0 19 590 254 231 48 0 47 5 23 0 0 29704812 86128 1697672 0 0 4 320 11100 8121 77 1 22 0 33 0 0 29671044 86156 1705308 0 0 0 5440 13190 9140 95 1 4 0 33 3 0 29670088 86160 1706288 0 0 0 32932 12275 7297 99 0 1 0 35 0 0 29693456 86188 1710724 0 0 4 676 12701 7867 98 1 1 0 ^C I have not changed any of the default configurations that comes with Ubuntu. Is this load normal for such powerful server ? is there any optimization i can make to Apache/MySQL to minimize the load ? What do you recommend ?

Read the article

Why is my concurrency capacity so low for my web app on a LAMP EC2 instance?

- by AMF

I come from a web developer background and have been humming along building my PHP app, using the CakePHP framework. The problem arose when I began the ab (Apache Bench) testing on the Amazon EC2 instance in which the app resides. I'm getting pretty horrendous average page load times, even though I'm running a c1.medium instance (2 cores, 2GB RAM), and I think I'm doing everything right. I would run: ab -n 200 -c 20 http://localhost/heavy-but-view-cached-page.php Here are the results: Concurrency Level: 20 Time taken for tests: 48.197 seconds Complete requests: 200 Failed requests: 0 Write errors: 0 Total transferred: 392111200 bytes HTML transferred: 392047600 bytes Requests per second: 4.15 [#/sec] (mean) Time per request: 4819.723 [ms] (mean) Time per request: 240.986 [ms] (mean, across all concurrent requests) Transfer rate: 7944.88 [Kbytes/sec] received While the ab test is running, I run VMStat, which shows that Swap stays at 0, CPU is constantly at 80-100% (although I'm not sure I can trust this on a VM), RAM utilization ramps up to about 1.6G (leaving 400M free). Load goes up to about 8 and site slows to a crawl. Here's what I think I'm doing right on the code side: In Chrome browser uncached pages typically load in 800-1000ms, and cached pages load in 300-500ms. Not stunning, but not terrible either. Thanks to view caching, there might be at most one DB query per page-load to write session data. So we can rule out a DB bottleneck. I have APC on. I am using Memcached to serve the view cache and other site caches. xhprof code profiler shows that cached pages take up 10MB-40MB in memory and 100ms - 1000ms in wall time. Pages that would be the worst offenders would look something like this in xhprof: Total Incl. Wall Time (microsec): 330,143 microsecs Total Incl. CPU (microsecs): 320,019 microsecs Total Incl. MemUse (bytes): 36,786,192 bytes Total Incl. PeakMemUse (bytes): 46,667,008 bytes Number of Function Calls: 5,195 My Apache config: KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 3 <IfModule mpm_prefork_module> StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 120 MaxRequestsPerChild 1000 </IfModule> Is there something wrong with the server? Some gotcha with the EC2? Or is it my code? Some obvious setting I should look into? Too many DNS lookups? What am I missing? I really want to get to 1,000 concurrency capacity, but at this rate, it ain't gonna happen.

Read the article

Rkhunter 122 suspect files; do I have a problem?

- by user276166

I am new to ubuntu. I am using Xfce Ubuntu 14.04 LTS. I have ran rkhunter a few weeks age and only got a few warnings. The forum said that they were normal. But, this time rkhunter reported 122 warnings. Please advise. casey@Shaman:~$ sudo rkhunter -c [ Rootkit Hunter version 1.4.0 ] Checking system commands... Performing 'strings' command checks Checking 'strings' command [ OK ] Performing 'shared libraries' checks Checking for preloading variables [ None found ] Checking for preloaded libraries [ None found ] Checking LD_LIBRARY_PATH variable [ Not found ] Performing file properties checks Checking for prerequisites [ Warning ] /usr/sbin/adduser [ Warning ] /usr/sbin/chroot [ Warning ] /usr/sbin/cron [ OK ] /usr/sbin/groupadd [ Warning ] /usr/sbin/groupdel [ Warning ] /usr/sbin/groupmod [ Warning ] /usr/sbin/grpck [ Warning ] /usr/sbin/nologin [ Warning ] /usr/sbin/pwck [ Warning ] /usr/sbin/rsyslogd [ Warning ] /usr/sbin/useradd [ Warning ] /usr/sbin/userdel [ Warning ] /usr/sbin/usermod [ Warning ] /usr/sbin/vipw [ Warning ] /usr/bin/awk [ Warning ] /usr/bin/basename [ Warning ] /usr/bin/chattr [ Warning ] /usr/bin/cut [ Warning ] /usr/bin/diff [ Warning ] /usr/bin/dirname [ Warning ] /usr/bin/dpkg [ Warning ] /usr/bin/dpkg-query [ Warning ] /usr/bin/du [ Warning ] /usr/bin/env [ Warning ] /usr/bin/file [ Warning ] /usr/bin/find [ Warning ] /usr/bin/GET [ Warning ] /usr/bin/groups [ Warning ] /usr/bin/head [ Warning ] /usr/bin/id [ Warning ] /usr/bin/killall [ OK ] /usr/bin/last [ Warning ] /usr/bin/lastlog [ Warning ] /usr/bin/ldd [ Warning ] /usr/bin/less [ OK ] /usr/bin/locate [ OK ] /usr/bin/logger [ Warning ] /usr/bin/lsattr [ Warning ] /usr/bin/lsof [ OK ] /usr/bin/mail [ OK ] /usr/bin/md5sum [ Warning ] /usr/bin/mlocate [ OK ] /usr/bin/newgrp [ Warning ] /usr/bin/passwd [ Warning ] /usr/bin/perl [ Warning ] /usr/bin/pgrep [ Warning ] /usr/bin/pkill [ Warning ] /usr/bin/pstree [ OK ] /usr/bin/rkhunter [ OK ] /usr/bin/rpm [ Warning ] /usr/bin/runcon [ Warning ] /usr/bin/sha1sum [ Warning ] /usr/bin/sha224sum [ Warning ] /usr/bin/sha256sum [ Warning ] /usr/bin/sha384sum [ Warning ] /usr/bin/sha512sum [ Warning ] /usr/bin/size [ Warning ] /usr/bin/sort [ Warning ] /usr/bin/stat [ Warning ] /usr/bin/strace [ Warning ] /usr/bin/strings [ Warning ] /usr/bin/sudo [ Warning ] /usr/bin/tail [ Warning ] /usr/bin/test [ Warning ] /usr/bin/top [ Warning ] /usr/bin/touch [ Warning ] /usr/bin/tr [ Warning ] /usr/bin/uniq [ Warning ] /usr/bin/users [ Warning ] /usr/bin/vmstat [ Warning ] /usr/bin/w [ Warning ] /usr/bin/watch [ Warning ] /usr/bin/wc [ Warning ] /usr/bin/wget [ Warning ] /usr/bin/whatis [ Warning ] /usr/bin/whereis [ Warning ] /usr/bin/which [ OK ] /usr/bin/who [ Warning ] /usr/bin/whoami [ Warning ] /usr/bin/unhide.rb [ Warning ] /usr/bin/mawk [ Warning ] /usr/bin/lwp-request [ Warning ] /usr/bin/heirloom-mailx [ OK ] /usr/bin/w.procps [ Warning ] /sbin/depmod [ Warning ] /sbin/fsck [ Warning ] /sbin/ifconfig [ Warning ] /sbin/ifdown [ Warning ] /sbin/ifup [ Warning ] /sbin/init [ Warning ] /sbin/insmod [ Warning ] /sbin/ip [ Warning ] /sbin/lsmod [ Warning ] /sbin/modinfo [ Warning ] /sbin/modprobe [ Warning ] /sbin/rmmod [ Warning ] /sbin/route [ Warning ] /sbin/runlevel [ Warning ] /sbin/sulogin [ Warning ] /sbin/sysctl [ Warning ] /bin/bash [ Warning ] /bin/cat [ Warning ] /bin/chmod [ Warning ] /bin/chown [ Warning ] /bin/cp [ Warning ] /bin/date [ Warning ] /bin/df [ Warning ] /bin/dmesg [ Warning ] /bin/echo [ Warning ] /bin/ed [ OK ] /bin/egrep [ Warning ] /bin/fgrep [ Warning ] /bin/fuser [ OK ] /bin/grep [ Warning ] /bin/ip [ Warning ] /bin/kill [ Warning ] /bin/less [ OK ] /bin/login [ Warning ] /bin/ls [ Warning ] /bin/lsmod [ Warning ] /bin/mktemp [ Warning ] /bin/more [ Warning ] /bin/mount [ Warning ] /bin/mv [ Warning ] /bin/netstat [ Warning ] /bin/ping [ Warning ] /bin/ps [ Warning ] /bin/pwd [ Warning ] /bin/readlink [ Warning ] /bin/sed [ Warning ] /bin/sh [ Warning ] /bin/su [ Warning ] /bin/touch [ Warning ] /bin/uname [ Warning ] /bin/which [ OK ] /bin/kmod [ Warning ] /bin/dash [ Warning ] [Press <ENTER> to continue] Checking for rootkits... Performing check of known rootkit files and directories 55808 Trojan - Variant A [ Not found ] ADM Worm [ Not found ] AjaKit Rootkit [ Not found ] Adore Rootkit [ Not found ] aPa Kit [ Not found ] Apache Worm [ Not found ] Ambient (ark) Rootkit [ Not found ] Balaur Rootkit [ Not found ] BeastKit Rootkit [ Not found ] beX2 Rootkit [ Not found ] BOBKit Rootkit [ Not found ] cb Rootkit [ Not found ] CiNIK Worm (Slapper.B variant) [ Not found ] Danny-Boy's Abuse Kit [ Not found ] Devil RootKit [ Not found ] Dica-Kit Rootkit [ Not found ] Dreams Rootkit [ Not found ] Duarawkz Rootkit [ Not found ] Enye LKM [ Not found ] Flea Linux Rootkit [ Not found ] Fu Rootkit [ Not found ] Fuck`it Rootkit [ Not found ] GasKit Rootkit [ Not found ] Heroin LKM [ Not found ] HjC Kit [ Not found ] ignoKit Rootkit [ Not found ] IntoXonia-NG Rootkit [ Not found ] Irix Rootkit [ Not found ] Jynx Rootkit [ Not found ] KBeast Rootkit [ Not found ] Kitko Rootkit [ Not found ] Knark Rootkit [ Not found ] ld-linuxv.so Rootkit [ Not found ] Li0n Worm [ Not found ] Lockit / LJK2 Rootkit [ Not found ] Mood-NT Rootkit [ Not found ] MRK Rootkit [ Not found ] Ni0 Rootkit [ Not found ] Ohhara Rootkit [ Not found ] Optic Kit (Tux) Worm [ Not found ] Oz Rootkit [ Not found ] Phalanx Rootkit [ Not found ] Phalanx2 Rootkit [ Not found ] Phalanx2 Rootkit (extended tests) [ Not found ] Portacelo Rootkit [ Not found ] R3dstorm Toolkit [ Not found ] RH-Sharpe's Rootkit [ Not found ] RSHA's Rootkit [ Not found ] Scalper Worm [ Not found ] Sebek LKM [ Not found ] Shutdown Rootkit [ Not found ] SHV4 Rootkit [ Not found ] SHV5 Rootkit [ Not found ] Sin Rootkit [ Not found ] Slapper Worm [ Not found ] Sneakin Rootkit [ Not found ] 'Spanish' Rootkit [ Not found ] Suckit Rootkit [ Not found ] Superkit Rootkit [ Not found ] TBD (Telnet BackDoor) [ Not found ] TeLeKiT Rootkit [ Not found ] T0rn Rootkit [ Not found ] trNkit Rootkit [ Not found ] Trojanit Kit [ Not found ] Tuxtendo Rootkit [ Not found ] URK Rootkit [ Not found ] Vampire Rootkit [ Not found ] VcKit Rootkit [ Not found ] Volc Rootkit [ Not found ] Xzibit Rootkit [ Not found ] zaRwT.KiT Rootkit [ Not found ] ZK Rootkit [ Not found ] [Press <ENTER> to continue] Performing additional rootkit checks Suckit Rookit additional checks [ OK ] Checking for possible rootkit files and directories [ None found ] Checking for possible rootkit strings [ None found ] Performing malware checks Checking running processes for suspicious files [ None found ] Checking for login backdoors [ None found ] Checking for suspicious directories [ None found ] Checking for sniffer log files [ None found ] Performing Linux specific checks Checking loaded kernel modules [ OK ] Checking kernel module names [ OK ] [Press <ENTER> to continue] Checking the network... Performing checks on the network ports Checking for backdoor ports [ None found ] Checking for hidden ports [ Skipped ] Performing checks on the network interfaces Checking for promiscuous interfaces [ None found ] Checking the local host... Performing system boot checks Checking for local host name [ Found ] Checking for system startup files [ Found ] Checking system startup files for malware [ None found ] Performing group and account checks Checking for passwd file [ Found ] Checking for root equivalent (UID 0) accounts [ None found ] Checking for passwordless accounts [ None found ] Checking for passwd file changes [ Warning ] Checking for group file changes [ Warning ] Checking root account shell history files [ None found ] Performing system configuration file checks Checking for SSH configuration file [ Not found ] Checking for running syslog daemon [ Found ] Checking for syslog configuration file [ Found ] Checking if syslog remote logging is allowed [ Not allowed ] Performing filesystem checks Checking /dev for suspicious file types [ Warning ] Checking for hidden files and directories [ Warning ] [Press <ENTER> to continue] System checks summary ===================== File properties checks... Required commands check failed Files checked: 137 Suspect files: 122 Rootkit checks... Rootkits checked : 291 Possible rootkits: 0 Applications checks... All checks skipped The system checks took: 5 minutes and 11 seconds All results have been written to the log file (/var/log/rkhunter.log)

Read the article

Performance triage

- by Dave

Folks often ask me how to approach a suspected performance issue. My personal strategy is informed by the fact that I work on concurrency issues. (When you have a hammer everything looks like a nail, but I'll try to keep this general). A good starting point is to ask yourself if the observed performance matches your expectations. Expectations might be derived from known system performance limits, prototypes, and other software or environments that are comparable to your particular system-under-test. Some simple comparisons and microbenchmarks can be useful at this stage. It's also useful to write some very simple programs to validate some of the reported or expected system limits. Can that disk controller really tolerate and sustain 500 reads per second? To reduce the number of confounding factors it's better to try to answer that question with a very simple targeted program. And finally, nothing beats having familiarity with the technologies that underlying your particular layer. On the topic of confounding factors, as our technology stacks become deeper and less transparent, we often find our own technology working against us in some unexpected way to choke performance rather than simply running into some fundamental system limit. A good example is the warm-up time needed by just-in-time compilers in Java Virtual Machines. I won't delve too far into that particular hole except to say that it's rare to find good benchmarks and methodology for java code. Another example is power management on x86. Power management is great, but it can take a while for the CPUs to throttle up from low(er) frequencies to full throttle. And while I love "turbo" mode, it makes benchmarking applications with multiple threads a chore as you have to remember to turn it off and then back on otherwise short single-threaded runs may look abnormally fast compared to runs with higher thread counts. In general for performance characterization I disable turbo mode and fix the power governor at "performance" state. Another source of complexity is the scheduler, which I've discussed in prior blog entries. Lets say I have a running application and I want to better understand its behavior and performance. We'll presume it's warmed up, is under load, and is an execution mode representative of what we think the norm would be. It should be in steady-state, if a steady-state mode even exists. On Solaris the very first thing I'll do is take a set of "pstack" samples. Pstack briefly stops the process and walks each of the stacks, reporting symbolic information (if available) for each frame. For Java, pstack has been augmented to understand java frames, and even report inlining. A few pstack samples can provide powerful insight into what's actually going on inside the program. You'll be able to see calling patterns, which threads are blocked on what system calls or synchronization constructs, memory allocation, etc. If your code is CPU-bound then you'll get a good sense where the cycles are being spent. (I should caution that normal C/C++ inlining can diffuse an otherwise "hot" method into other methods. This is a rare instance where pstack sampling might not immediately point to the key problem). At this point you'll need to reconcile what you're seeing with pstack and your mental model of what you think the program should be doing. They're often rather different. And generally if there's a key performance issue, you'll spot it with a moderate number of samples. I'll also use OS-level observability tools to lock for the existence of bottlenecks where threads contend for locks; other situations where threads are blocked; and the distribution of threads over the system. On Solaris some good tools are mpstat and too a lesser degree, vmstat. Try running "mpstat -a 5" in one window while the application program runs concurrently. One key measure is the voluntary context switch rate "vctx" or "csw" which reflects threads descheduling themselves. It's also good to look at the user; system; and idle CPU percentages. This can give a broad but useful understanding if your threads are mostly parked or mostly running. For instance if your program makes heavy use of malloc/free, then it might be the case you're contending on the central malloc lock in the default allocator. In that case you'd see malloc calling lock in the stack traces, observe a high csw/vctx rate as threads block for the malloc lock, and your "usr" time would be less than expected. Solaris dtrace is a wonderful and invaluable performance tool as well, but in a sense you have to frame and articulate a meaningful and specific question to get a useful answer, so I tend not to use it for first-order screening of problems. It's also most effective for OS and software-level performance issues as opposed to HW-level issues. For that reason I recommend mpstat & pstack as my the 1st step in performance triage. If some other OS-level issue is evident then it's good to switch to dtrace to drill more deeply into the problem. Only after I've ruled out OS-level issues do I switch to using hardware performance counters to look for architectural impediments.

Read the article

Determining cause of high NFS/IO utilization without iotop

- by Matt

I have a server that is doing an NFSv4 export for user's home directories. There are roughly 25 users (mostly developers/analysts) and about 40 servers mounting the home directory export. Performance is miserable, with users often seeing multi-second lags for simple commands (like ls, or writing a small text file). Sometimes the home directory mount completely hangs for minutes, with users getting "permission denied" errors. The hardware is a Dell R510 with dual E5620 CPUs and 8 GB RAM. There are eight 15k 2.5” 600 GB drives (Seagate ST3600057SS) configured in hardware RAID-6 with a single hot spare. RAID controller is a Dell PERC H700 w/512MB cache (Linux sees this as a LSI MegaSAS 9260). OS is CentOS 5.6, home directory partition is ext3, with options “rw,data=journal,usrquota”. I have the HW RAID configured to present two virtual disks to the OS: /dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for the home directories. What I find curious, and suspicious, is that the sda device often has very high utilization, even though it only contains the OS. I would expect this virtual drive to be idle almost all the time. The system is not swapping, according to "free" and "vmstat". Why would there be major load on this device? Here is a 30-second snapshot from iostat: Time: 09:37:28 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 44.09 0.03 107.76 0.13 607.40 11.27 0.89 8.27 7.27 78.35 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 44.09 0.03 107.76 0.13 607.40 11.27 0.89 8.27 7.27 78.35 sdb 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 sdb1 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 dm-0 0.00 0.00 0.03 151.82 0.13 607.26 8.00 1.25 8.23 5.16 78.35 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 dm-3 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 Looks like iotop is the ideal tool to use to sniff out these kinds of issues. But I'm on CentOS 5.6, which doesn't have a new enough kernel to support that program. I looked at Determining which process is causing heavy disk I/O?, and besides iotop, one of the suggestions said to do a "echo 1 /proc/sys/vm/block_dump". I did that (after directing kernel messages to tempfs). In about 13 minutes I had about 700k reads or writes, roughly half from kjournald and the other half from nfsd: # egrep " kernel: .*(READ|WRITE)" messages | wc -l 768439 # egrep " kernel: kjournald.*(READ|WRITE)" messages | wc -l 403615 # egrep " kernel: nfsd.*(READ|WRITE)" messages | wc -l 314028 For what it's worth, for the last hour, utilization has constantly been over 90% for the home directory drive. My 30-second iostat keeps showing output like this: Time: 09:36:30 PM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.46 0.20 11.33 0.80 71.71 12.58 0.24 20.53 14.37 16.56 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 6.46 0.20 11.33 0.80 71.71 12.58 0.24 20.53 14.37 16.56 sdb 137.29 7.00 549.92 3.80 22817.19 43.19 82.57 3.02 5.45 1.74 96.32 sdb1 137.29 7.00 549.92 3.80 22817.19 43.19 82.57 3.02 5.45 1.74 96.32 dm-0 0.00 0.00 0.20 17.76 0.80 71.04 8.00 0.38 21.21 9.22 16.57 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 687.47 10.80 22817.19 43.19 65.48 4.62 6.61 1.43 99.81 dm-3 0.00 0.00 687.47 10.80 22817.19 43.19 65.48 4.62 6.61 1.43 99.82

Read the article

Munin not creating HTML files in Ubuntu Server 14.04

- by lepe

I have used munin in several servers and this is the first time is taking me so much time to set it up. When I telnet munin directly, I can list the services, there is no error at the logs and munin its being updated every 5 minutes. However no html files are created. I'm using the default location (/var/cache/munin/www) and I can confirm the permissions of that directory are set to munin.munin (IP and domain has been changed) munin.conf: dbdir /var/lib/munin htmldir /var/cache/munin/www logdir /var/log/munin rundir /var/run/munin [example.com;] address 100.100.50.200 munin-node.conf: log_level 4 log_file /var/log/munin/munin-node.log pid_file /var/run/munin/munin-node.pid background 1 setsid 1 user root group root host_name example.com allow ^127\.0\.0\.1$ allow ^100\.100\.50\.200$ allow ^::1$ /etc/hosts : 100.100.50.200 example.com 127.0.0.1 localhost $ telnet example.com 4949 Trying 100.100.50.200... Connected to example.com. Escape character is '^]'. # munin node at example.com list apache_accesses apache_processes apache_volume cpu cpuspeed df df_inode entropy fail2ban forks fw_packets if_err_eth0 if_err_eth1 if_eth0 if_eth1 interrupts ipmi_fans ipmi_power ipmi_temp irqstats load memory munin_stats mysql_bin_relay_log mysql_commands mysql_connections mysql_files_tables mysql_innodb_bpool mysql_innodb_bpool_act mysql_innodb_insert_buf mysql_innodb_io mysql_innodb_io_pend mysql_innodb_log mysql_innodb_rows mysql_innodb_semaphores mysql_innodb_tnx mysql_myisam_indexes mysql_network_traffic mysql_qcache mysql_qcache_mem mysql_replication mysql_select_types mysql_slow mysql_sorts mysql_table_locks mysql_tmp_tables ntp_2001:e40:100:208::123 ntp_91.189.94.4 ntp_kernel_err ntp_kernel_pll_freq ntp_kernel_pll_off ntp_offset ntp_states open_files open_inodes postfix_mailqueue postfix_mailvolume proc_pri processes swap threads uptime users vmstat fetch df _dev_sda3.value 2.1762874086869 _sys_fs_cgroup.value 0 _run.value 0.0503536980635825 _run_lock.value 0 _run_shm.value 0 _run_user.value 0 _dev_sda5.value 0.0176986285727571 _dev_sda8.value 1.08464646179852 _dev_sda7.value 0.0346633563514803 _dev_sda9.value 6.81031810822797 _dev_sda6.value 9.0932802215469 . /var/log/munin/munin-node.log Process Backgrounded 2014/08/16-14:13:36 Munin::Node::Server (type Net::Server::Fork) starting! pid(19610) Binding to TCP port 4949 on host 100.100.50.200 with IPv4 2014/08/16-14:23:11 CONNECT TCP Peer: "[100.100.50.200]:55949" Local: "[100.100.50.200]:4949" 2014/08/16-14:36:16 CONNECT TCP Peer: "[100.100.50.200]:56209" Local: "[100.100.50.200]:4949" /var/log/munin/munin-update.log ... 2014/08/16 14:30:01 [INFO]: Starting munin-update 2014/08/16 14:30:01 [INFO]: Munin-update finished (0.00 sec) 2014/08/16 14:35:02 [INFO]: Starting munin-update 2014/08/16 14:35:02 [INFO]: Munin-update finished (0.00 sec) 2014/08/16 14:40:01 [INFO]: Starting munin-update 2014/08/16 14:40:01 [INFO]: Munin-update finished (0.00 sec) $ ls -la /var/cache/munin/www/ drwxr-xr-x 3 munin munin 19 Aug 16 13:55 . drwxr-xr-x 3 root root 16 Aug 16 13:54 .. drwxr-xr-x 2 munin munin 4096 Aug 16 13:55 static Any ideas on why it is not working? EDIT This is how /var/log/munin/ log looks like after some days: -rw-r----- 1 www-data 0 Aug 16 13:54 munin-cgi-graph.log -rw-r----- 1 www-data 0 Aug 16 13:54 munin-cgi-html.log -rw-rw-r-- 1 munin 0 Aug 16 13:55 munin-html.log -rw-r----- 1 munin 0 Aug 19 06:18 munin-limits.log -rw-r----- 1 munin 15K Aug 18 14:10 munin-limits.log.1 -rw-r----- 1 munin 1.8K Aug 18 06:15 munin-limits.log.2.gz -rw-rw-r-- 1 munin 1.3K Aug 17 06:15 munin-limits.log.3.gz -rw-r--r-- 1 root 6.5K Aug 16 13:55 munin-node-configure.log -rw-r--r-- 1 root 0 Aug 17 06:18 munin-node.log -rw-r--r-- 1 root 420 Aug 16 14:52 munin-node.log.1.gz -rw-r----- 1 munin 0 Aug 19 06:18 munin-update.log -rw-r----- 1 munin 11K Aug 18 14:10 munin-update.log.1 -rw-r----- 1 munin 1.6K Aug 18 06:15 munin-update.log.2.gz -rw-rw-r-- 1 munin 1.5K Aug 17 06:15 munin-update.log.3.gz

Search Results

Search found 68 results on 3 pages for 'vmstat'.

Page 2/3 | < Previous Page | 1 2 3 | Next Page >

- by user12608550

- by dschinn1001

- by user12620111

- by Pablo

- by Ben

- by Dave Forgac

- by OneOfOne

- by T.J. Crowder

- by Steve Kemp

- by exhuma

- by FunkyChicken

- by Arpit Tambi

- by OneOfOne

- by Pace

- by NullUser

- by user1149518

- by jon

- by Izzy

- by lpfavreau

- by zertux

- by AMF

- by user276166

- by Dave

- by Matt

- by lepe

< Previous Page | 1 2 3 | Next Page >