zombie colonel - Page 9

apache eat up too many ram per child

- by mrc4r7m4n

Hello to everyone. I've got fallowing problem: Apache eat to many ram per child. The fallowing comments shows: cat /etc/redhat-release -- Fedora release 8 (Werewolf) free -m: total used free shared buffers cached Mem: 3566 3136 429 0 339 1907 -/+ buffers/cache: 889 2676 Swap: 4322 0 4322 I know that you will say that there is nothing to worry about because swap is not use, but i think it's not use for now. 3.httpd -v: Server version: Apache/2.2.14 (Unix) 4.httpd -l: Compiled in modules: core.c mod_authn_file.c mod_authn_default.c mod_authz_host.c mod_authz_groupfile.c mod_authz_user.c mod_authz_default.c mod_auth_basic.c mod_include.c mod_filter.c mod_log_config.c mod_env.c mod_setenvif.c mod_version.c mod_ssl.c prefork.c http_core.c mod_mime.c mod_status.c mod_autoindex.c mod_asis.c mod_cgi.c mod_negotiation.c mod_dir.c mod_actions.c mod_userdir.c mod_alias.c mod_rewrite.c mod_so.c 5.List of loaded dynamic modules: LoadModule authz_host_module modules/mod_authz_host.so LoadModule include_module modules/mod_include.so LoadModule log_config_module modules/mod_log_config.so LoadModule setenvif_module modules/mod_setenvif.so LoadModule mime_module modules/mod_mime.so LoadModule autoindex_module modules/mod_autoindex.so LoadModule vhost_alias_module modules/mod_vhost_alias.so LoadModule negotiation_module modules/mod_negotiation.so LoadModule dir_module modules/mod_dir.so LoadModule alias_module modules/mod_alias.so LoadModule rewrite_module modules/mod_rewrite.so LoadModule proxy_module modules/mod_proxy.so LoadModule cgi_module modules/mod_cgi.so 6.My prefrok directive <IfModule prefork.c> StartServers 8 MinSpareServers 5 MaxSpareServers 25 ServerLimit 80 MaxClients 80 MaxRequestsPerChild 4000 </IfModule> KeepAliveTimeout 6 MaxKeepAliveRequests 100 KeepAlive On 7.top -u apache: ctrl+ M top - 09:19:42 up 2 days, 19 min, 2 users, load average: 0.85, 0.87, 0.80 Tasks: 113 total, 1 running, 112 sleeping, 0 stopped, 0 zombie Cpu(s): 7.3%us, 15.7%sy, 0.0%ni, 75.7%id, 0.0%wa, 0.7%hi, 0.7%si, 0.0%st Mem: 3652120k total, 3149964k used, 502156k free, 348048k buffers Swap: 4425896k total, 0k used, 4425896k free, 1944952k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16956 apache 20 0 700m 135m 100m S 0.0 3.8 2:16.78 httpd 16953 apache 20 0 565m 130m 96m S 0.0 3.7 1:57.26 httpd 16957 apache 20 0 587m 129m 102m S 0.0 3.6 1:47.41 httpd 16955 apache 20 0 567m 126m 93m S 0.0 3.6 1:43.60 httpd 17494 apache 20 0 626m 125m 96m S 0.0 3.5 1:58.77 httpd 17515 apache 20 0 540m 120m 88m S 0.0 3.4 1:45.57 httpd 17516 apache 20 0 573m 120m 88m S 0.0 3.4 1:50.51 httpd 16954 apache 20 0 551m 120m 88m S 0.0 3.4 1:52.47 httpd 17493 apache 20 0 586m 120m 94m S 0.0 3.4 1:51.02 httpd 17279 apache 20 0 568m 117m 87m S 16.0 3.3 1:51.87 httpd 17302 apache 20 0 560m 116m 90m S 0.3 3.3 1:59.06 httpd 17495 apache 20 0 551m 116m 89m S 0.0 3.3 1:47.51 httpd 17277 apache 20 0 476m 114m 81m S 0.0 3.2 1:37.14 httpd 30097 apache 20 0 536m 113m 83m S 0.0 3.2 1:47.38 httpd 30112 apache 20 0 530m 112m 81m S 0.0 3.2 1:40.15 httpd 17513 apache 20 0 516m 112m 85m S 0.0 3.1 1:43.92 httpd 16958 apache 20 0 554m 111m 82m S 0.0 3.1 1:44.18 httpd 1617 apache 20 0 487m 111m 85m S 0.0 3.1 1:31.67 httpd 16952 apache 20 0 461m 107m 75m S 0.0 3.0 1:13.71 httpd 16951 apache 20 0 462m 103m 76m S 0.0 2.9 1:28.05 httpd 17278 apache 20 0 497m 103m 76m S 0.0 2.9 1:31.25 httpd 17403 apache 20 0 537m 102m 79m S 0.0 2.9 1:52.24 httpd 25081 apache 20 0 412m 101m 70m S 0.0 2.8 1:01.74 httpd I guess thats all information needed to help me solve this problem. I think the virt memory is to big, the same res. The consumption of ram is increasing all the time. Maybe it's memory leak because i see there is so many static modules compiled. Could someone help me with this issue? Thank you in advance.

Read the article

What is consuming so much memory?

- by Christopher

Hi, I am having a few problems with my server. It is throwing up intermittant errors and running quite slow. Here is the output from top: top - 07:33:33 up 18:57, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 90 total, 1 running, 82 sleeping, 7 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1048576k total, 1048576k used, 0k free, 0k buffers Swap: 0k total, 0k used, 0k free, 0k cached Ordered by %MEM: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9597 root 16 0 276m 91m 15m S 0.0 8.9 0:29.38 java 9564 tomcat 15 0 249m 34m 11m S 0.0 3.4 0:11.79 java 9636 root 18 0 54804 24m 9784 S 0.0 2.4 0:02.58 httpd 26139 apache 15 0 57520 23m 5996 S 0.0 2.3 0:00.15 httpd 16264 apache 18 0 56984 23m 6104 S 0.0 2.2 0:00.21 httpd 24294 apache 15 0 57512 22m 5864 S 0.0 2.2 0:00.17 httpd 30231 apache 15 0 57272 22m 5748 S 0.0 2.2 0:00.97 httpd 32257 apache 15 0 57512 22m 5416 S 0.0 2.2 0:00.46 httpd 19947 apache 15 0 57512 22m 5320 S 0.0 2.2 0:00.19 httpd 26148 apache 15 0 56688 22m 5992 S 0.0 2.2 0:00.40 httpd 14039 apache 18 0 57000 22m 5492 S 0.0 2.2 0:00.33 httpd 6051 apache 15 0 57736 22m 5128 S 0.0 2.2 0:00.07 httpd 19937 apache 15 0 56992 22m 5400 S 0.0 2.2 0:00.14 httpd 5200 apache 15 0 56984 22m 5376 S 0.0 2.2 0:00.23 httpd 10001 apache 15 0 55636 21m 5636 S 0.0 2.1 0:01.05 httpd 11734 apache 15 0 56712 21m 4548 S 0.0 2.1 0:00.46 httpd 18193 apache 15 0 55100 20m 5508 S 0.0 2.0 0:00.24 httpd 14036 apache 15 0 55128 20m 5412 S 0.0 2.0 0:00.10 httpd 3981 apache 15 0 55128 19m 4860 S 0.0 1.9 0:00.16 httpd 7588 apache 18 0 55112 19m 4848 S 0.0 1.9 0:00.04 httpd 19768 apache 16 0 55112 19m 4844 S 0.0 1.9 0:00.02 httpd 5827 apache 15 0 55112 19m 4828 S 0.0 1.9 0:00.05 httpd 29774 apache 15 0 55112 19m 4544 S 0.0 1.9 0:00.11 httpd 6064 apache 15 0 55112 19m 4536 S 0.0 1.9 0:00.02 httpd 16253 apache 17 0 55116 19m 4532 S 0.0 1.9 0:00.01 httpd 19922 apache 15 0 55112 19m 4540 S 0.0 1.9 0:00.02 httpd 10010 apache 15 0 55100 19m 4524 S 0.0 1.9 0:00.01 httpd 18195 apache 18 0 55104 18m 3872 S 0.0 1.8 0:00.02 httpd 7361 mysql 15 0 134m 18m 6400 S 0.0 1.8 0:10.18 mysqld 19921 apache 15 0 55088 18m 3588 S 0.0 1.8 0:00.02 httpd 11967 apache 15 0 55080 18m 3584 S 0.0 1.8 0:00.00 httpd 13813 apache 15 0 55088 18m 3576 S 0.0 1.8 0:00.14 httpd 23898 apache 18 0 54968 17m 3212 S 0.0 1.7 0:00.00 httpd 13792 apache 15 0 54968 17m 3088 S 0.0 1.7 0:00.00 httpd 14083 apache 15 0 54968 17m 3088 S 0.0 1.7 0:00.00 httpd 32547 apache 15 0 54944 17m 2924 S 0.0 1.7 0:00.00 httpd 13787 apache 15 0 54944 17m 2908 S 0.0 1.7 0:00.00 httpd 3623 apache 17 0 54944 17m 2908 S 0.0 1.7 0:00.00 httpd 16024 apache 19 0 54944 17m 2860 S 0.0 1.7 0:00.00 httpd 13791 apache 15 0 54944 17m 2864 S 0.0 1.7 0:00.00 httpd 20090 named 19 0 110m 4244 2056 S 0.0 0.4 0:01.55 named 9369 cyrus 15 0 15904 3048 1720 S 0.0 0.3 0:00.24 cyrus-master 32735 root 15 0 8852 2888 2116 T 0.0 0.3 0:00.00 mysql The intermittant error I get using Firefox is: Server not found Firefox can't find the server at XXXXXXX.co. * Check the address for typing errors such as ww.example.com instead of www.example.com * If you are unable to load any pages, check your computer's network connection. * If your computer or network is protected by a firewall or proxy, make sure that Firefox is permitted to access the Web. And on other browsers, the page just loads for about 10 minutes but never appears. The only way to resolve it is to close the browser completely as the error appears to be saved in the cache. Has anyone got any ideas? Many Thanks.

Read the article

Low load average with plenty of cpu-intersive processes

- by sds

I see loadavg at about 1 with at least 3 processes running at full tile. How can that be? top - 11:48:32 up 147 days, 5:38, 8 users, load average: 1.08, 1.11, 1.05 Tasks: 416 total, 4 running, 410 sleeping, 2 stopped, 0 zombie Cpu0 : 43.3%us, 13.7%sy, 0.0%ni, 43.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 48.8%us, 12.4%sy, 0.0%ni, 38.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.7%us, 0.7%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu3 : 99.3%us, 0.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 5.7%us, 0.7%sy, 0.0%ni, 93.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 2.3%us, 0.3%sy, 0.0%ni, 97.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.3%us, 0.3%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu8 : 38.4%us, 17.4%sy, 0.0%ni, 44.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu9 : 43.4%us, 13.5%sy, 0.0%ni, 43.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu10 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu11 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu12 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu13 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu14 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 1.0%us, 0.7%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132145404k total, 88125080k used, 44020324k free, 516476k buffers Swap: 8388600k total, 620232k used, 7768368k free, 55729064k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25424 jonathan 20 0 4404m 4.1g 3268 R 99.7 3.3 212:58.17 python2.7 20939 sam 20 0 908m 733m 3376 R 81.2 0.6 603:08.07 python2.7 20987 sam 20 0 908m 732m 3376 R 79.8 0.6 598:49.18 python2.7 25428 jonathan 20 0 774m 164m 15m S 14.2 0.1 24:22.60 java 20996 sam 20 0 98.4m 7780 1880 S 4.3 0.0 17:48.15 vw 20941 sam 20 0 161m 70m 1880 S 3.0 0.1 18:10.03 vw 20940 sam 20 0 98.4m 8068 1880 S 2.6 0.0 18:06.28 vw 20942 sam 20 0 98.4m 8080 1880 S 2.6 0.0 17:39.45 vw 20944 sam 20 0 161m 71m 1880 S 2.6 0.1 17:29.29 vw 20947 sam 20 0 161m 71m 1880 S 2.6 0.1 17:25.58 vw 20959 sam 20 0 161m 70m 1880 S 2.6 0.1 17:28.00 vw 20962 sam 20 0 161m 70m 1880 S 2.6 0.1 17:26.96 vw 20963 sam 20 0 98.4m 8076 1880 S 2.6 0.0 18:07.19 vw 20965 sam 20 0 161m 71m 1880 S 2.6 0.1 18:08.13 vw 20995 sam 20 0 161m 71m 1880 S 2.6 0.1 17:38.67 vw 6399 root 20 0 558m 19m 5028 S 2.3 0.0 4329:56 BESClient 20945 sam 20 0 98.4m 8068 1880 S 2.3 0.0 17:35.38 vw 20948 sam 20 0 98.4m 8068 1880 S 2.3 0.0 17:26.01 vw 20950 sam 20 0 161m 70m 1880 S 2.3 0.1 17:25.79 vw 20952 sam 20 0 98.4m 8076 1880 S 2.3 0.0 17:32.94 vw 20955 sam 20 0 161m 70m 1880 S 2.3 0.1 17:26.61 vw 20956 sam 20 0 98.4m 8072 1880 S 2.3 0.0 17:34.76 vw 20960 sam 20 0 98.4m 8072 1880 S 2.3 0.0 17:34.04 vw Adding up CPU loads gives about 300%. The top process list also adds up to about 300%. Why is load average about 1?

Read the article

Have I pushed the limits of my current VPS or is there room for optimization?

- by JRameau

I am currently on a mediatemple DV server (basic) 512mb dedicated ram, this is a CentOS based VPS with Plesk and Virtuozzo. My experience with it from day 1 has been bad and I only could sooth my server issues with several caching "Band-aids," but my sites are not as small as they were a year ago either so the issues have worsen. I have 3 Drupal installs running on separate (plesk) domains, 1 of those drupal installs is a multisite, that consists of 5-6 sites 2 of those sites are bringing in actual traffic. Those caching "Band-aids" I mentioned are APC, which seemed to help alot initially, and Drupal's Boost, which is considered a poorman's Varnish, it makes all my pages static for anonymous users. Last 30day combined estimate on Google Ananlytics: 90k visitors 260k pageviews. Issue: alot of downtime, I am continually checking if my sites are up, and lately I have been finding it down more than 3 times daily. Restarting Apache will bring it back up, for some time. I have google search every error message and looked up ways to optimize my DV server, and I am beyond stump what is my next move. Is this server bad, have I hit a impossibly low restriction such as the 12mb kernel memory barrier (kmemsize), is it on my end, do I need to optimize some more? *I have provided as much information as I can below, any help or suggestions given will be appreciated Common Error messages I see in the log: [error] (12)Cannot allocate memory: fork: Unable to fork new process [error] make_obcallback: could not import mod_python.apache.\n Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/mod_python/apache.py", line 21, in ? import traceback File "/usr/lib/python2.4/traceback.py", line 3, in ? import linecache ImportError: No module named linecache [error] python_handler: no interpreter callback found. [warn-phpd] mmap cache can't open /var/www/vhosts/***/httpdocs/*** - Too many open files in system (pid ***) [alert] Child 8125 returned a Fatal error... Apache is exiting! [emerg] (43)Identifier removed: couldn't grab the accept mutex [emerg] (22)Invalid argument: couldn't release the accept mutex cat /proc/user_beancounters: Version: 2.5 uid resource held maxheld barrier limit failcnt 41548: kmemsize 4582652 5306699 12288832 13517715 21105036 lockedpages 0 0 600 600 0 privvmpages 38151 42676 229036 249036 0 shmpages 16274 16274 17237 17237 2 dummy 0 0 0 0 0 numproc 43 46 300 300 0 physpages 27260 29528 0 2147483647 0 vmguarpages 0 0 131072 2147483647 0 oomguarpages 27270 29538 131072 2147483647 0 numtcpsock 21 29 300 300 0 numflock 8 8 480 528 0 numpty 1 1 30 30 0 numsiginfo 0 1 1024 1024 0 tcpsndbuf 648440 675272 2867477 4096277 1711499 tcprcvbuf 301620 359716 2867477 4096277 0 othersockbuf 4472 4472 1433738 2662538 0 dgramrcvbuf 0 0 1433738 1433738 0 numothersock 12 12 300 300 0 dcachesize 0 0 2684271 2764800 0 numfile 3447 3496 6300 6300 3872 dummy 0 0 0 0 0 dummy 0 0 0 0 0 dummy 0 0 0 0 0 numiptent 14 14 200 200 0 TOP: (In January the load avg was really high 3-10, I was able to bring it down where it is currently is by giving APC more memory play around with) top - 16:46:07 up 2:13, 1 user, load average: 0.34, 0.20, 0.20 Tasks: 40 total, 2 running, 37 sleeping, 0 stopped, 1 zombie Cpu(s): 0.3% us, 0.1% sy, 0.0% ni, 99.7% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 916144k total, 156668k used, 759476k free, 0k buffers Swap: 0k total, 0k used, 0k free, 0k cached MySQLTuner: (after optimizing every table and repairing any table with overage I got the fragmented count down to 86) [--] Data in MyISAM tables: 285M (Tables: 1105) [!!] Total fragmented tables: 86 [--] Up for: 2h 44m 38s (409K q [41.421 qps], 6K conn, TX: 1B, RX: 174M) [--] Reads / Writes: 79% / 21% [--] Total buffers: 58.0M global + 2.7M per thread (100 max threads) [!!] Query cache prunes per day: 675307 [!!] Temporary tables created on disk: 35% (7K on disk / 20K total)

Read the article

Where's the Swap File/Partition?

- by chrisbunney

I'm investigating the virtual memory configuration of a Debian based Amazon EC2 instance, and as my background isn't in system admin, I'm slightly confused by what I'm seeing. We're using MongoDB, and the monitoring server we have indicates that the Mongo process is using about 20GB of swap space, however I can't figure out where this is located on the server. As far as I can tell from using the various suggested methods from Google, there is either a much smaller amount, or none at all. top indicates that there is 1.8GB of swap memory: top - 15:35:21 up 6 days, 3:23, 1 user, load average: 1.60, 1.43, 1.37 Tasks: 47 total, 2 running, 45 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 1.3%sy, 0.0%ni, 14.7%id, 83.8%wa, 0.0%hi, 0.0%si, 0.1%st Mem: 3928924k total, 2855572k used, 1073352k free, 640564k buffers Swap: 0k total, 0k used, 0k free, 1887788k cached swapon -s doesn't seem to think there's any swap space: Filename Type Size Used Priority free -m doesn't think there's any swap either: total used free shared buffers cached Mem: 3836 3663 172 0 626 2701 -/+ buffers/cache: 336 3500 Swap: 0 0 0 And neither does vmstat: procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 3 0 66224 641372 2874744 0 0 21 5012 21 33 2 2 76 19 But cat /etc/fstab thinks there is a swap partition: /dev/xvda1 / ext3 defaults 1 1 /dev/xvda2 /mnt ext3 defaults 0 0 /dev/xvda3 swap swap defaults 0 0 none /proc proc defaults 0 0 none /sys sysfs defaults 0 0 However df -k gives no indication of the xvda3 partition: Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvda1 16513960 15675324 0 100% / tmpfs 1964460 8 1964452 1% /lib/init/rw udev 1914148 28 1914120 1% /dev tmpfs 1964460 4 1964456 1% /dev/shm So I really don't know what to make of this, because I appear to have a process using about 10 times more virtual memory than what might be available, and I have no idea where this virtual memory is on the system. I'm probably misinterpreting the output of the tools, so I'd be grateful if someone would be able to set me straight: What have I got wrong, what's the right interpretation, and how do you reach that interpretation? EDIT0: We use 10gen's MMS for monitoring the database, the relevant section for memory from the last data point is: "mem": { "virtual": 20749, "bits": 64, "supported": true, "mappedWithJournal": 20376, "mapped": 10188, "resident": 1219 }, This JSON is specific to the database process (I believe) rather than the system as a whole. fdisk -l /dev/xvda outputs... nothing? I tried each of the 3 xvda entries in /etc/fstab as well: root@ip:~# fdisk -l /dev/xvda1 Disk /dev/xvda1: 34.4 GB, 34359738368 bytes 255 heads, 63 sectors/track, 4177 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/xvda1 doesn't contain a valid partition table root@ip:~# fdisk -l /dev/xvda2 root@ip:~# fdisk -l /dev/xvda3 root@ip:~# Edit1: Output of cat /proc/meminfo for the sake of completeness: MemTotal: 3928924 kB MemFree: 726600 kB Buffers: 648368 kB Cached: 2216556 kB SwapCached: 0 kB Active: 1945100 kB Inactive: 994016 kB Active(anon): 60476 kB Inactive(anon): 12952 kB Active(file): 1884624 kB Inactive(file): 981064 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 387180 kB Writeback: 0 kB AnonPages: 73380 kB Mapped: 1188260 kB Shmem: 48 kB Slab: 149768 kB SReclaimable: 146076 kB SUnreclaim: 3692 kB KernelStack: 1104 kB PageTables: 16096 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 1964460 kB Committed_AS: 305572 kB VmallocTotal: 34359738367 kB VmallocUsed: 16760 kB VmallocChunk: 34359721448 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 3932160 kB DirectMap2M: 0 kB

Read the article

Tomcat dying silently on regular basis

- by Hendrik

My tomcat (6.0.32, Java Sun 1.6.0_22-b04 on Ubuntu 10.04) keeps crashing multiple times daily without any specific output in catalina.out. This usually happens on high load (see top output). Update: The pid-file is properly removed when this happens. Update 2: No CATALINA_OPTS set, _JAVA_OPTS are: export _JAVA_OPTIONS="-Xms128m -Xmx1024m -XX:MaxPermSize=512m \ -XX:MinHeapFreeRatio=20 \ -XX:MaxHeapFreeRatio=40 \ -XX:NewSize=10m \ -XX:MaxNewSize=10m \ -XX:SurvivorRatio=6 \ -XX:TargetSurvivorRatio=80 \ -XX:+CMSClassUnloadingEnabled \ -Djava.awt.headless=true \ -Dcom.sun.management.jmxremote \ -Dcom.sun.management.jmxremote.port=37331 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=true \ -Djava.rmi.server.hostname=(myhostname) \ -Dcom.sun.management.jmxremote.password.file=/etc/java-6-sun/management/jmxremote.password \ -Dcom.sun.management.jmxremote.access.file=/etc/java-6-sun/management/jmxremote.access" Top: top - 12:40:03 up 9 days, 12:15, 3 users, load average: 30.00, 22.39, 21.91 Tasks: 89 total, 4 running, 85 sleeping, 0 stopped, 0 zombie Cpu(s): 53.2%us, 9.7%sy, 0.0%ni, 34.7%id, 1.5%wa, 0.0%hi, 0.8%si, 0.0%st Mem: 4194304k total, 3311304k used, 883000k free, 0k buffers Swap: 4194304k total, 0k used, 4194304k free, 0k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25850 tomcat6 20 0 1981m 1.2g 11m S 161 29.6 11:41.56 java 12632 mysql 20 0 393m 97m 4452 S 141 2.4 1690:05 mysqld 14932 nobody 20 0 253m 44m 9152 R 56 1.1 3:26.57 php-cgi 7011 nobody 20 0 241m 31m 9124 S 30 0.8 1:35.96 php-cgi 10093 nobody 20 0 228m 18m 8520 S 25 0.5 2:29.97 php-cgi 27071 nobody 20 0 237m 28m 8640 S 11 0.7 3:13.72 php-cgi 3306 nobody 20 0 227m 16m 6736 R 7 0.4 2:29.83 php-cgi 7756 nobody 20 0 261m 58m 15m R 5 1.4 2:22.33 php-cgi 7129 www-data 20 0 3646m 7228 1896 S 2 0.2 0:36.65 nginx 2657 nobody 20 0 228m 18m 8540 S 1 0.5 1:59.51 php-cgi 7131 www-data 20 0 3645m 6464 1960 S 1 0.2 0:34.13 nginx 7140 www-data 20 0 3652m 12m 1896 S 1 0.3 0:35.80 nginx 619 nobody 20 0 231m 29m 15m S 0 0.7 2:33.46 php-cgi 16552 nobody 20 0 250m 41m 8784 S 0 1.0 2:48.12 php-cgi 17134 nobody 20 0 239m 37m 16m S 0 0.9 2:32.86 php-cgi 21004 nobody 20 0 243m 34m 8700 S 0 0.8 1:19.85 php-cgi 26105 root 20 0 19220 1392 1060 R 0 0.0 0:00.82 top 32430 nobody 20 0 256m 47m 9196 S 0 1.2 2:19.01 php-cgi 314 nobody 20 0 256m 47m 8804 S 0 1.1 1:46.00 php-cgi 2111 nobody 20 0 253m 44m 9196 S 0 1.1 3:01.14 php-cgi 2142 root 20 0 26452 2564 868 S 0 0.1 0:00.56 screen 2144 root 20 0 19484 2012 1368 S 0 0.0 0:00.00 bash 2333 nobody 20 0 249m 41m 9160 S 0 1.0 1:10.33 php-cgi 2552 root 20 0 19484 2260 1620 S 0 0.1 0:00.01 bash 2587 nobody 20 0 258m 49m 9192 S 0 1.2 2:04.50 php-cgi 2684 root 20 0 4092 652 540 S 0 0.0 0:00.00 xvfb-run 2696 root 20 0 60720 13m 2352 S 0 0.3 0:09.12 Xvfb 2759 root 20 0 617m 12m 4676 S 0 0.3 0:00.66 node 3514 nobody 20 0 270m 61m 9216 S 0 1.5 3:13.69 php-cgi 5270 root 20 0 25164 1324 1036 S 0 0.0 0:00.01 screen 5402 nobody 20 0 227m 16m 8032 S 0 0.4 1:33.61 php-cgi 5765 root 20 0 81180 3820 3028 S 0 0.1 0:00.31 sshd 5798 nobody 20 0 242m 32m 9124 S 0 0.8 1:52.08 php-cgi 5856 root 20 0 19496 2292 1636 S 0 0.1 0:00.03 bash 6442 root 20 0 62332 20m 1960 S 0 0.5 0:30.58 mrtg 7082 root 20 0 88992 1916 1636 S 0 0.0 0:00.00 PassengerWatchd I can't find any concrete reason for it, no Exceptions or messages of a shutdown in catalina.out (and no other logs in tomcat's log dir). I can start up the service and it will run for a few days or just minutes before dying again. Is there somewhere else i could look for output? Could the kernel start killing threads due to a lack of ressources and by that bring the VM down?

Read the article

Troubleshooting High Load on Plesk LAMP Dedicated Server

- by Callmeed

I have 2 nearly identical dedicated servers with the same provider. They also run a nearly identical software stack: RedHat 5 64-bit, Plesk, PHP, Apache, & MySQL. We use them for hosting custom sites we build. The problem is, while our 1st server has a load average (in top) of around 0.3, the 2nd server consistently has a load average of around 4.0 or higher. Basic functions in Plesk are delayed and there is a bit of latency when executing shell commands. Anyone have ideas why it would be so high? And why it would differ from our other server so much? Here is my current top output (sorted by %MEM) ... Any help is much appreciated ... top - 21:48:04 up 100 days, 4:28, 1 user, load average: 3.74, 4.20, 4.23 Tasks: 336 total, 1 running, 335 sleeping, 0 stopped, 0 zombie Cpu(s): 0.8%us, 0.4%sy, 0.0%ni, 91.3%id, 7.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 12290884k total, 11886452k used, 404432k free, 2920212k buffers Swap: 2096472k total, 244k used, 2096228k free, 6560692k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22536 apache 15 0 860m 547m 6484 S 0.0 4.6 0:10.96 httpd 26467 apache 15 0 859m 546m 6408 S 0.0 4.5 0:07.67 httpd 3620 apache 15 0 859m 545m 5552 S 0.0 4.5 0:06.15 httpd 1895 apache 15 0 858m 544m 6356 S 0.0 4.5 0:08.25 httpd 16933 apache 15 0 858m 544m 5488 S 0.0 4.5 0:01.57 httpd 6431 apache 15 0 856m 542m 6076 S 10.6 4.5 0:05.32 httpd 14417 apache 15 0 856m 542m 5568 S 0.0 4.5 0:03.88 httpd 15403 apache 15 0 855m 541m 5616 S 0.0 4.5 0:03.73 httpd 19165 apache 15 0 853m 539m 6252 S 0.0 4.5 0:12.40 httpd 15898 apache 15 0 852m 539m 5376 S 0.0 4.5 0:02.68 httpd 14401 apache 15 0 851m 538m 5460 S 0.0 4.5 0:02.97 httpd 15393 apache 15 0 851m 538m 5404 S 0.0 4.5 0:03.12 httpd 15427 apache 15 0 851m 538m 5496 S 0.0 4.5 0:02.44 httpd 14412 apache 15 0 851m 538m 5324 S 0.0 4.5 0:02.15 httpd 18330 apache 15 0 851m 537m 5136 S 0.0 4.5 0:01.30 httpd 18303 apache 15 0 848m 535m 5140 S 0.0 4.5 0:00.47 httpd 21190 apache 15 0 845m 533m 3988 S 0.0 4.4 0:00.33 httpd 15923 root 18 0 822m 521m 9928 S 0.0 4.3 10:04.81 httpd 22021 apache 15 0 828m 520m 4964 S 0.0 4.3 0:00.16 httpd 22146 apache 15 0 823m 515m 3016 S 0.0 4.3 0:00.02 httpd 22345 apache 15 0 822m 514m 2408 S 0.0 4.3 0:00.00 httpd 14721 apache 15 0 733m 510m 488 S 0.0 4.3 0:00.00 httpd 5094 root 15 0 1452m 122m 15m S 1.0 1.0 852:24.24 java 4636 mysql 15 0 532m 57m 6440 S 1.0 0.5 488:05.84 mysqld 4799 popuser 15 0 166m 53m 2368 S 0.0 0.4 0:36.64 spamd 16761 popuser 15 0 159m 46m 2312 S 0.0 0.4 0:00.38 spamd 4797 root 15 0 158m 45m 2448 S 0.0 0.4 0:01.27 spamd 5074 root 34 19 255m 20m 2144 S 0.0 0.2 1:37.53 yum-updatesd 9917 named 15 0 366m 9804 1980 S 0.0 0.1 0:10.26 named 4332 sso 18 0 119m 8028 5212 S 0.0 0.1 0:00.06 sw-engine-cgi 4341 sso 18 0 119m 8028 5212 S 0.0 0.1 0:00.07 sw-engine-cgi 4350 sso 18 0 119m 8028 5212 S 0.0 0.1 0:00.09 sw-engine-cgi 4352 sso 18 0 119m 8028 5212 S 0.0 0.1 0:00.11 sw-engine-cgi 4376 ntp 15 0 23388 5020 3896 S 0.0 0.0 0:00.58 ntpd 4331 sw-cp-se 15 0 61336 4572 1480 S 0.0 0.0 5:53.22 sw-cp-serverd 4213 haldaemo 15 0 31252 4460 1684 S 0.0 0.0 0:01.52 hald 4778 postgres 18 0 117m 4164 3484 S 0.0 0.0 0:00.11 postmaster 18555 root 16 0 98.3m 3716 2852 S 0.0 0.0 0:00.01 sshd 4488 sso 18 0 119m 3044 224 S 0.0 0.0 0:00.00 sw-engine-cgi 4489 sso 18 0 119m 3044 224 S 0.0 0.0 0:00.00 sw-engine-cgi 4492 sso 18 0 119m 3044 224 S 0.0 0.0 0:00.00 sw-engine-cgi 4493 sso 18 0 119m 3044 224 S 0.0 0.0 0:00.00 sw-engine-cgi 4490 sso 18 0 119m 3040 220 S 0.0 0.0 0:00.00 sw-engine-cgi

Read the article

Performance Tuning a High-Load Apache Server

- by futureal

I am looking to understand some server performance problems I am seeing with a (for us) heavily loaded web server. The environment is as follows: Debian Lenny (all stable packages + patched to security updates) Apache 2.2.9 PHP 5.2.6 Amazon EC2 large instance The behavior we're seeing is that the web typically feels responsive, but with a slight delay to begin handling a request -- sometimes a fraction of a second, sometimes 2-3 seconds in our peak usage times. The actual load on the server is being reported as very high -- often 10.xx or 20.xx as reported by top. Further, running other things on the server during these times (even vi) is very slow, so the load is definitely up there. Oddly enough Apache remains very responsive, other than that initial delay. We have Apache configured as follows, using prefork: StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 150 MaxRequestsPerChild 0 And KeepAlive as: KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 5 Looking at the server-status page, even at these times of heavy load we are rarely hitting the client cap, usually serving between 80-100 requests and many of those in the keepalive state. That tells me to rule out the initial request slowness as "waiting for a handler" but I may be wrong. Amazon's CloudWatch monitoring tells me that even when our OS is reporting a load of 15, our instance CPU utilization is between 75-80%. Example output from top: top - 15:47:06 up 31 days, 1:38, 8 users, load average: 11.46, 7.10, 6.56 Tasks: 221 total, 28 running, 193 sleeping, 0 stopped, 0 zombie Cpu(s): 66.9%us, 22.1%sy, 0.0%ni, 2.6%id, 3.1%wa, 0.0%hi, 0.7%si, 4.5%st Mem: 7871900k total, 7850624k used, 21276k free, 68728k buffers Swap: 0k total, 0k used, 0k free, 3750664k cached The majority of the processes look like: 24720 www-data 15 0 202m 26m 4412 S 9 0.3 0:02.97 apache2 24530 www-data 15 0 212m 35m 4544 S 7 0.5 0:03.05 apache2 24846 www-data 15 0 209m 33m 4420 S 7 0.4 0:01.03 apache2 24083 www-data 15 0 211m 35m 4484 S 7 0.5 0:07.14 apache2 24615 www-data 15 0 212m 35m 4404 S 7 0.5 0:02.89 apache2 Example output from vmstat at the same time as the above: procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 8 0 0 215084 68908 3774864 0 0 154 228 5 7 32 12 42 9 6 21 0 198948 68936 3775740 0 0 676 2363 4022 1047 56 16 9 15 23 0 0 169460 68936 3776356 0 0 432 1372 3762 835 76 21 0 0 23 1 0 140412 68936 3776648 0 0 280 0 3157 827 70 25 0 0 20 1 0 115892 68936 3776792 0 0 188 8 2802 532 68 24 0 0 6 1 0 133368 68936 3777780 0 0 752 71 3501 878 67 29 0 1 0 1 0 146656 68944 3778064 0 0 308 2052 3312 850 38 17 19 24 2 0 0 202104 68952 3778140 0 0 28 90 2617 700 44 13 33 5 9 0 0 188960 68956 3778200 0 0 8 0 2226 475 59 17 6 2 3 0 0 166364 68956 3778252 0 0 0 21 2288 386 65 19 1 0 And finally, output from Apache's server-status: Server uptime: 31 days 2 hours 18 minutes 31 seconds Total accesses: 60102946 - Total Traffic: 974.5 GB CPU Usage: u209.62 s75.19 cu0 cs0 - .0106% CPU load 22.4 requests/sec - 380.3 kB/second - 17.0 kB/request 107 requests currently being processed, 6 idle workers C.KKKW..KWWKKWKW.KKKCKK..KKK.KKKK.KK._WK.K.K.KKKKK.K.R.KK..C.C.K K.C.K..WK_K..KKW_CK.WK..W.KKKWKCKCKW.W_KKKKK.KKWKKKW._KKK.CKK... KK_KWKKKWKCKCWKK.KKKCK.......................................... ................................................................ From my limited experience I draw the following conclusions/questions: We may be allowing far too many KeepAlive requests I do see some time spent waiting for IO in the vmstat although not consistently and not a lot (I think?) so I am not sure this is a big concern or not, I am less experienced with vmstat Also in vmstat, I see in some iterations a number of processes waiting to be served, which is what I am attributing the initial page load delay on our web server to, possibly erroneously We serve a mixture of static content (75% or higher) and script content, and the script content is often fairly processor intensive, so finding the right balance between the two is important; long term we want to move statics elsewhere to optimize both servers but our software is not ready for that today I am happy to provide additional information if anybody has any ideas, the other note is that this is a high-availability production installation so I am wary of making tweak after tweak, and is why I haven't played with things like the KeepAlive value myself yet.

Read the article

How to understand the memory usage and load average in linux server

- by Tim

Hi, I am using a linux server which has 128GB of memory and 24 cores. I use top to see how much it is used. Its output is pasted at the end of the post. Here are two questions: (1) I see that each of the running processes occupies a very small percentage of memory (%MEM no more than 0.2%, and most just 0.0%), but how the total memory is almost used as in the fourth line of output ("Mem: 130766620k total, 130161072k used, 605548k free, 919300k buffers")? The sum of used percentage of memory over all processes seems unlikely to achieve almost 100%, doesn't it? (2) how to understand the load average on the first line ("load average: 14.04, 14.02, 14.00")? Thanks and regards! Edit: Thanks! I also really like to hear some rough numbers based on used percentage of memory to determine if a server is heavily loaded, since I once became the one who cramed the server without understanding the current load. Is swap regarded as almost the same as memory? For example, when memory and swap are almost of same size, if the memory is almost running out but the swap is still largely free, may I just view it as if the used percentage of memory + swap is still not high and run other new processes? How would you consider together CPU or memory (or memory + swap) usage? Do you become worried if either of them reaches too high or both? Output of top: $ top top - 12:45:33 up 19 days, 23:11, 18 users, load average: 14.04, 14.02, 14.00 Tasks: 484 total, 12 running, 472 sleeping, 0 stopped, 0 zombie Cpu(s): 36.7%us, 19.7%sy, 0.0%ni, 43.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 130766620k total, 130161072k used, 605548k free, 919300k buffers Swap: 63111312k total, 500556k used, 62610756k free, 124437752k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6529 sanchez 18 -2 1075m 219m 13m S 100 0.2 13760:23 MATLAB 13210 timothy 18 -2 48336 37m 1216 R 100 0.0 3:56.75 absurdity 13888 timothy 18 -2 48336 37m 1204 R 100 0.0 2:04.89 absurdity 14542 timothy 18 -2 48336 37m 1196 R 100 0.0 1:08.34 absurdity 14544 timothy 18 -2 2888 2076 400 R 100 0.0 1:06.14 gatherData 6183 sanchez 18 -2 1133m 195m 13m S 100 0.2 13676:04 MATLAB 6795 sanchez 18 -2 1079m 210m 13m S 100 0.2 13734:26 MATLAB 10178 timothy 18 -2 48336 37m 1204 R 100 0.0 11:33.93 absurdity 12438 timothy 18 -2 48336 37m 1216 R 100 0.0 5:38.17 absurdity 13661 timothy 18 -2 48336 37m 1216 R 100 0.0 2:44.13 absurdity 14098 timothy 18 -2 48336 37m 1204 R 100 0.0 1:58.31 absurdity 14335 timothy 18 -2 48336 37m 1196 R 100 0.0 1:08.93 absurdity 14765 timothy 18 -2 48336 37m 1196 R 99 0.0 0:32.57 absurdity 13445 timothy 18 -2 48336 37m 1216 R 99 0.0 3:01.37 absurdity 28990 root 20 0 0 0 0 S 2 0.0 65:50.21 pdflush 12141 tim 18 -2 19380 1660 1024 R 1 0.0 0:04.04 top 1240 root 15 -5 0 0 0 S 0 0.0 16:07.11 kjournald 9019 root 20 0 296m 4460 2616 S 0 0.0 82:19.51 kdm_greet 1 root 20 0 4028 728 592 S 0 0.0 0:03.11 init 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:01.01 migration/0 4 root 15 -5 0 0 0 S 0 0.0 0:08.13 ksoftirqd/0 5 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0 6 root RT -5 0 0 0 S 0 0.0 17:27.31 migration/1 7 root 15 -5 0 0 0 S 0 0.0 0:01.21 ksoftirqd/1 8 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/1 9 root RT -5 0 0 0 S 0 0.0 10:02.56 migration/2 10 root 15 -5 0 0 0 S 0 0.0 0:00.34 ksoftirqd/2 11 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/2 12 root RT -5 0 0 0 S 0 0.0 4:29.53 migration/3 13 root 15 -5 0 0 0 S 0 0.0 0:00.34 ksoftirqd/3

Read the article

Screen -X exec commands not working until manually attached

- by James Watt

I have a batch script that starts a java server application inside of a screen. The command looks like this: cd /dir/ && screen -A -m -d -S javascreen java -Xms640M -Xmx1024M -jar javaserverapp.jar nogui After I run the batch script, it starts the server and puts it inside the correct screen. If I list my screens after, I see something like this: user@gtwy /dir $ screen -list There is a screen on: 16180.javascreen (Detached) 1 Socket in /var/run/screen/S-user. However, I have a second batch script that sends automated commands to this server and runs on a different crontab interval. Because of the way the application works, I send commands to it like this (this command tells it to alert connected users "testing 123"): screen -X exec .\!\! echo say testing 123 I've also tried: screen -R -X exec .\!\! echo say testing 123 screen -S javascreen -X exec .\!\! echo say testing 123 Unfortunately, these commands DO NOT WORK. They don't even give me an error message, they just do nothing. HOWEVER - If I manually attach to the screen first (with the below command) and then detach, now I can run any of the above commands flawlessly. I can demonstrate this with a video, if I wasn't clear enough here. screen -r -d Thanks in advance. Update: here is the important parts of /etc/screenrc. It should be totally vanilla, I've never edited this file. # VARIABLES # =============================================================== # No annoying audible bell, using "visual bell" # vbell on # default: off # vbell_msg " -- Bell,Bell!! -- " # default: "Wuff,Wuff!!" # Automatically detach on hangup. autodetach on # default: on # Don't display the copyright page startup_message off # default: on # Uses nethack-style messages # nethack on # default: off # Affects the copying of text regions crlf off # default: off # Enable/disable multiuser mode. Standard screen operation is singleuser. # In multiuser mode the commands acladd, aclchg, aclgrp and acldel can be used # to enable (and disable) other user accessing this screen session. # Requires suid-root. multiuser off # Change default scrollback value for new windows defscrollback 1000 # default: 100 # Define the time that all windows monitored for silence should # wait before displaying a message. Default 30 seconds. silencewait 15 # default: 30 # bufferfile: The file to use for commands # "readbuf" ('<') and "writebuf" ('>'): bufferfile $HOME/.screen_exchange # # hardcopydir: The directory which contains all hardcopies. # hardcopydir ~/.hardcopy # hardcopydir ~/.screen # # shell: Default process started in screen's windows. # Makes it possible to use a different shell inside screen # than is set as the default login shell. # If begins with a '-' character, the shell will be started as a login shell. # shell zsh # shell bash # shell ksh shell -$SHELL # shellaka '> |tcsh' # shelltitle '$ |bash' # emulate .logout message pow_detach_msg "Screen session of \$LOGNAME \$:cr:\$:nl:ended." # caption always " %w --- %c:%s" # caption always "%3n %t%? @%u%?%? [%h]%?%=%c" # advertise hardstatus support to $TERMCAP # termcapinfo * '' 'hs:ts=\E_:fs=\E\\:ds=\E_\E\\' # set every new windows hardstatus line to somenthing descriptive # defhstatus "screen: ^En (^Et)" # don't kill window after the process died # zombie "^["

Read the article

Can't sync filesystem without reboot

- by Fabio

I'm having an issue with a linux server. Once a week the running mysql instance hangs and there is no way to fully stop it. If I kill it, it remains in zombie status and init does not reap its pid. The server is used for staging deployments and some internal tools, so it's not under heavy load. The only process constantly used id mysql and for this I think that it's the only process which suffer of this issue. I've searched system logs for errors and the only thing I found is this error (repeated a couple of times) in dmesg output: [706560.640085] INFO: task mysqld:31965 blocked for more than 120 seconds. [706560.640198] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [706560.640312] mysqld D ffff88032fd93f40 0 31965 1 0x00000000 [706560.640317] ffff880242a27d18 0000000000000086 ffff88031a50dd00 ffff880242a27fd8 [706560.640321] ffff880242a27fd8 ffff880242a27fd8 ffff88031e549740 ffff88031a50dd00 [706560.640325] ffff88031a50dd00 ffff88032fd947f8 0000000000000002 ffffffff8112f250 [706560.640328] Call Trace: [706560.640338] [<ffffffff8112f250>] ? __lock_page+0x70/0x70 [706560.640344] [<ffffffff816cb1b9>] schedule+0x29/0x70 [706560.640347] [<ffffffff816cb28f>] io_schedule+0x8f/0xd0 [706560.640350] [<ffffffff8112f25e>] sleep_on_page+0xe/0x20 [706560.640353] [<ffffffff816c9900>] __wait_on_bit+0x60/0x90 [706560.640356] [<ffffffff8112f390>] wait_on_page_bit+0x80/0x90 [706560.640360] [<ffffffff8107dce0>] ? autoremove_wake_function+0x40/0x40 [706560.640363] [<ffffffff8112f891>] filemap_fdatawait_range+0x101/0x190 [706560.640366] [<ffffffff81130975>] filemap_write_and_wait_range+0x65/0x70 [706560.640371] [<ffffffff8122e441>] ext4_sync_file+0x71/0x320 [706560.640376] [<ffffffff811c3e6d>] do_fsync+0x5d/0x90 [706560.640379] [<ffffffff811c40d0>] sys_fsync+0x10/0x20 [706560.640383] [<ffffffff816d495d>] system_call_fastpath+0x1a/0x1f When this happens the only way to make everything working again is a full reboot, but in order to do that I'm forced to use this command after I've manually stopped all running processes echo b > /proc/sysrq-trigger otherwise normal reboot process hangs forever. I've tracked reboots script and I've found out that also the reboot process hangs on a sync call, this one in /etc/init.d/sendsigs (I'm on ubuntu) # Flush the kernel I/O buffer before we start to kill # processes, to make sure the IO of already stopped services to # not slow down the remaining processes to a point where they # are accidentily killed with SIGKILL because they did not # manage to shut down in time. sync I'm almost sure that the cause of this is an hardware issue (the RAID controller???) also because I've other two machines with the same hardware and software configuration and they don't suffer of this, but I can't find any hint in syslog or dmesg. I've also installed smartmontools and mcelog packages but none of them did report any issue. What can I do to track the cause of this issue? Today is happened again, here is the status of system after triggering a reboot init---console-kit-dae---64*[{console-kit-dae}] +-dbus-daemon +-mcelog +-mysqld---{mysqld} +-newrelic-daemon---newrelic-daemon---11*[{newrelic-daemon}] +-ntpd +-polkitd---{polkitd} +-python3 +-rpc.idmapd +-rpc.statd +-rpcbind +-sh---rc---S20sendsigs---sync +-smartd +-snmpd +-sshd---sshd---zsh---sudo---zsh---pstree +-sshd---sshd---zsh---sudo---zsh And here is the status of sync process # ps aux | grep sync root 3637 0.1 0.0 4352 372 ? D 05:53 0:00 sync i.e. Uninterruptible sleep... Hardware specs as reported by lshw I think the raid controller is a fake raid. I usually don't deal with hardware (and for the record I don't have physical access to it) description: Computer product: X7DBP () vendor: Supermicro version: 0123456789 serial: 0123456789 width: 64 bits capabilities: smbios-2.4 dmi-2.4 vsyscall32 configuration: administrator_password=disabled boot=normal frontpanel_password=unknown keyboard_password=unknown power-on_password=disabled uuid=53D19F64-D663-A017-8922-0030487C1FEE *-core description: Motherboard product: X7DBP vendor: Supermicro physical id: 0 version: PCB Version serial: 0123456789 *-firmware description: BIOS vendor: Phoenix Technologies LTD physical id: 0 version: 6.00 date: 05/29/2007 size: 106KiB capacity: 960KiB capabilities: pci pnp upgrade shadowing escd cdboot bootselect edd int13floppy2880 acpi usb ls120boot zipboot biosbootspecification *-storage description: RAID bus controller product: 631xESB/632xESB SATA RAID Controller vendor: Intel Corporation physical id: 1f.2 bus info: pci@0000:00:1f.2 version: 09 width: 32 bits clock: 66MHz capabilities: storage pm bus_master cap_list configuration: driver=ahci latency=0 resources: irq:19 ioport:18a0(size=8) ioport:1874(size=4) ioport:1878(size=8) ioport:1870(size=4) ioport:1880(size=32) memory:d8500400-d85007ff

Read the article

Load average is have been high over some period

- by user111196

We have a dedicated MySQL server and below is the a snapshot of the top. The load average has been staying at nearly 100 for an hour plus ready. top - 20:54:28 up 7:31, 2 users, load average: 83.08, 96.88, 106.23 Tasks: 278 total, 2 running, 274 sleeping, 2 stopped, 0 zombie Cpu0 : 18.8%us, 10.2%sy, 0.0%ni, 70.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 51.2%us, 4.3%sy, 0.0%ni, 44.2%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu2 : 9.0%us, 10.3%sy, 0.0%ni, 80.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 18.8%us, 7.4%sy, 0.0%ni, 73.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 7.8%us, 8.8%sy, 0.0%ni, 83.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 10.3%us, 8.4%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 6.2%us, 7.5%sy, 0.0%ni, 86.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 6.2%us, 6.2%sy, 0.0%ni, 87.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu8 : 8.8%us, 10.4%sy, 0.0%ni, 80.5%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu9 : 63.7%us, 4.6%sy, 0.0%ni, 12.2%id, 0.0%wa, 4.3%hi, 15.2%si, 0.0%st Cpu10 : 9.2%us, 10.2%sy, 0.0%ni, 80.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu11 : 17.3%us, 5.9%sy, 0.0%ni, 76.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu12 : 8.0%us, 8.7%sy, 0.0%ni, 83.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu13 : 10.9%us, 7.4%sy, 0.0%ni, 81.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu14 : 6.2%us, 6.9%sy, 0.0%ni, 86.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 4.8%us, 6.1%sy, 0.0%ni, 89.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 33009800k total, 23174396k used, 9835404k free, 120604k buffers Swap: 35061752k total, 0k used, 35061752k free, 16459540k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3341 mysql 20 0 14.3g 4.6g 4240 S 417.8 14.5 1673:51 mysqld 24406 root 20 0 15008 1292 876 R 0.3 0.0 0:00.19 top 1 root 20 0 4080 852 608 S 0.0 0.0 0:01.92 init 2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0.0 0.0 0:00.32 migration/0 4 root 15 -5 0 0 0 S 0.0 0.0 0:00.29 ksoftirqd/0 5 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 6 root RT -5 0 0 0 S 0.0 0.0 0:03.21 migration/1 7 root 15 -5 0 0 0 S 0.0 0.0 0:00.07 ksoftirqd/1 8 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1 9 root RT -5 0 0 0 S 0.0 0.0 0:00.17 migration/2 10 root 15 -5 0 0 0 S 0.0 0.0 0:00.03 ksoftirqd/2 11 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/2 12 root RT -5 0 0 0 S 0.0 0.0 0:00.32 migration/3 13 root 15 -5 0 0 0 S 0.0 0.0 0:00.02 ksoftirqd/3 14 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/3 15 root RT -5 0 0 0 S 0.0 0.0 0:00.10 migration/4 16 root 15 -5 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/4 17 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/4 18 root RT -5 0 0 0 S 0.0 0.0 0:00.35 migration/5 We have also tried to run this command. What else command can help us diagnose the exact problem of this high load? netstat -nat |grep 3306 | awk '{print $6}' | sort | uniq -c | sort -n 1 LISTEN 1 SYN_RECV 410 ESTABLISHED 964 TIME_WAIT Output of vmstat 1: ---------------memory--------------- --swap-- --io-- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 0 12978936 30944 15172360 0 0 259 3 184 265 6 6 77 12 0

Read the article

CPU Utilization LAMP stack

- by Max

We've got an ec2 m2.4xlarge running Magento (centos 5.6, httpd 2.2, php 5.2.17 with eaccelerator 0.9.5.3, mysql 5.1.52). Right now we're getting a large traffic spike, and our top looks like this: top - 09:41:29 up 31 days, 1:12, 1 user, load average: 120.01, 129.03, 113.23 Tasks: 1190 total, 18 running, 1172 sleeping, 0 stopped, 0 zombie Cpu(s): 97.3%us, 1.8%sy, 0.0%ni, 0.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.4%st Mem: 71687720k total, 36898928k used, 34788792k free, 49692k buffers Swap: 880737784k total, 0k used, 880737784k free, 1586524k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2433 mysql 15 0 23.6g 4.5g 7112 S 564.7 6.6 33607:34 mysqld 24046 apache 16 0 411m 65m 28m S 26.4 0.1 0:09.05 httpd 24360 apache 15 0 410m 60m 25m S 26.4 0.1 0:03.65 httpd 24993 apache 16 0 410m 57m 21m S 26.1 0.1 0:01.41 httpd 24838 apache 16 0 428m 74m 20m S 24.8 0.1 0:02.37 httpd 24359 apache 16 0 411m 62m 26m R 22.3 0.1 0:08.12 httpd 23850 apache 15 0 411m 64m 27m S 16.8 0.1 0:14.54 httpd 25229 apache 16 0 404m 46m 17m R 10.2 0.1 0:00.71 httpd 14594 apache 15 0 404m 63m 34m S 8.4 0.1 1:10.26 httpd 24955 apache 16 0 404m 50m 21m R 8.4 0.1 0:01.66 httpd 24313 apache 16 0 399m 46m 22m R 8.1 0.1 0:02.30 httpd 25119 apache 16 0 411m 59m 23m S 6.8 0.1 0:01.45 httpd Questions: Would giving msyqld more memory help it cache queries and react faster? If so, how? Other than splitting mysql and php to separate servers (which we're about to do) is there anything else we could/should be doing? Thanks! UPDATE: Here's our my.cnf along with the output of mysqltuner. It looks like a cache problem. Thanks again! # cat /etc/my.cnf [client] port = **** socket = /var/lib/mysql/mysql.sock [mysqld] datadir=/mnt/persistent/mysql port=**** socket=/var/lib/mysql/mysql.sock key_buffer = 512M max_allowed_packet = 64M table_cache = 1024 sort_buffer_size = 8M read_buffer_size = 4M read_rnd_buffer_size = 2M myisam_sort_buffer_size = 64M thread_cache_size = 128M tmp_table_size = 128M join_buffer_size = 1M query_cache_limit = 2M query_cache_size= 64M query_cache_type = 1 max_connections = 1000 thread_stack = 128K thread_concurrency = 48 log-bin=mysql-bin server-id = 1 wait_timeout = 300 innodb_data_home_dir = /mnt/persistent/mysql/ innodb_data_file_path = ibdata1:10M:autoextend innodb_buffer_pool_size = 20G innodb_additional_mem_pool_size = 20M innodb_log_file_size = 64M innodb_log_buffer_size = 8M innodb_flush_log_at_trx_commit = 1 innodb_lock_wait_timeout = 50 innodb_thread_concurrency = 48 ft_min_word_len=3 [myisamchk] ft_min_word_len=3 key_buffer = 128M sort_buffer_size = 128M read_buffer = 2M write_buffer = 2M # ./mysqltuner.pl >> MySQLTuner 1.2.0 - Major Hayden <[email protected]> >> Bug reports, feature requests, and downloads at http://mysqltuner.com/ >> Run with '--help' for additional options and output filtering -------- General Statistics -------------------------------------------------- [--] Skipped version check for MySQLTuner script [OK] Currently running supported MySQL version 5.1.52-log [OK] Operating on 64-bit architecture -------- Storage Engine Statistics ------------------------------------------- [--] Status: +Archive -BDB +Federated +InnoDB -ISAM -NDBCluster [--] Data in MyISAM tables: 2G (Tables: 26) [--] Data in InnoDB tables: 749M (Tables: 250) [!!] Total fragmented tables: 262 -------- Security Recommendations ------------------------------------------- -------- Performance Metrics ------------------------------------------------- [--] Up for: 31d 2h 30m 38s (680M q [253.371 qps], 2M conn, TX: 4825B, RX: 236B) [--] Reads / Writes: 89% / 11% [--] Total buffers: 20.6G global + 15.1M per thread (1000 max threads) [OK] Maximum possible memory usage: 35.4G (51% of installed RAM) [OK] Slow queries: 0% (35K/680M) [OK] Highest usage of available connections: 53% (537/1000) [OK] Key buffer size / total MyISAM indexes: 512.0M/457.2M [OK] Key buffer hit rate: 100.0% (9B cached / 264K reads) [OK] Query cache efficiency: 42.3% (260M cached / 615M selects) [!!] Query cache prunes per day: 4384652 [OK] Sorts requiring temporary tables: 0% (1K temp sorts / 38M sorts) [!!] Joins performed without indexes: 100404 [OK] Temporary tables created on disk: 17% (7M on disk / 45M total) [OK] Thread cache hit rate: 99% (537 created / 2M connections) [!!] Table cache hit rate: 0% (1K open / 946K opened) [OK] Open file limit used: 9% (453/5K) [OK] Table locks acquired immediately: 99% (758M immediate / 758M locks) [OK] InnoDB data size / buffer pool: 749.3M/20.0G -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance Enable the slow query log to troubleshoot bad queries Adjust your join queries to always utilize indexes Increase table_cache gradually to avoid file descriptor limits Variables to adjust: query_cache_size (> 64M) join_buffer_size (> 1.0M, or always use indexes with joins) table_cache (> 1024)

Read the article

Debian virtual memory reaching limit

- by Gregor

As a relative newbie to systems, I inherited a Debian server and I've noticed that virtual memory is very high (around 95%!). The server has been running slow for around 6 months, and I was wondering if any of you had any tips on things I could try, particularly on freeing up memory. The server hosts various websites and also a Postit email server. Here are the details: Operating system Debian Linux 5.0 Webmin version 1.580 Time on system Thu Apr 12 11:12:21 2012 Kernel and CPU Linux 2.6.18-6-amd64 on x86_64 Processor information Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz, 2 cores System uptime 229 days, 12 hours, 50 minutes Running processes 138 CPU load averages 0.10 (1 min) 0.28 (5 mins) 0.36 (15 mins) CPU usage 14% user, 1% kernel, 0% IO, 85% idle Real memory 2.94 GB total, 1.69 GB used Virtual memory 3.93 GB total, 3.84 GB used Local disk space 142.84 GB total, 116.13 GB used Free m output: free -m total used free shared buffers cached Mem: 3010 2517 492 0 107 996 -/+ buffers/cache: 1413 1596 Swap: 4024 3930 93 Top output: top - 11:59:57 up 229 days, 13:38, 1 user, load average: 0.26, 0.24, 0.26 Tasks: 136 total, 2 running, 134 sleeping, 0 stopped, 0 zombie Cpu(s): 3.8%us, 0.5%sy, 0.0%ni, 95.0%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3082544k total, 2773160k used, 309384k free, 111496k buffers Swap: 4120632k total, 4024712k used, 95920k free, 1036136k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28796 www-data 16 0 304m 68m 6188 S 8 2.3 0:03.13 apache2 1 root 15 0 10304 592 564 S 0 0.0 0:00.76 init 2 root RT 0 0 0 0 S 0 0.0 0:04.06 migration/0 3 root 34 19 0 0 0 S 0 0.0 0:05.67 ksoftirqd/0 4 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 5 root RT 0 0 0 0 S 0 0.0 0:00.06 migration/1 6 root 34 19 0 0 0 S 0 0.0 0:01.26 ksoftirqd/1 7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1 8 root 10 -5 0 0 0 S 0 0.0 0:00.12 events/0 9 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/1 10 root 10 -5 0 0 0 S 0 0.0 0:00.00 khelper 11 root 10 -5 0 0 0 S 0 0.0 0:00.02 kthread 16 root 10 -5 0 0 0 S 0 0.0 0:15.51 kblockd/0 17 root 10 -5 0 0 0 S 0 0.0 0:01.32 kblockd/1 18 root 15 -5 0 0 0 S 0 0.0 0:00.00 kacpid 127 root 10 -5 0 0 0 S 0 0.0 0:00.00 khubd 129 root 10 -5 0 0 0 S 0 0.0 0:00.00 kseriod 180 root 10 -5 0 0 0 S 0 0.0 70:09.05 kswapd0 181 root 17 -5 0 0 0 S 0 0.0 0:00.00 aio/0 182 root 17 -5 0 0 0 S 0 0.0 0:00.00 aio/1 780 root 16 -5 0 0 0 S 0 0.0 0:00.00 ata/0 782 root 16 -5 0 0 0 S 0 0.0 0:00.00 ata/1 783 root 16 -5 0 0 0 S 0 0.0 0:00.00 ata_aux 802 root 10 -5 0 0 0 S 0 0.0 0:00.00 scsi_eh_0 803 root 10 -5 0 0 0 S 0 0.0 0:00.00 scsi_eh_1 804 root 10 -5 0 0 0 S 0 0.0 0:00.00 scsi_eh_2 805 root 10 -5 0 0 0 S 0 0.0 0:00.00 scsi_eh_3 1013 root 10 -5 0 0 0 S 0 0.0 49:27.78 kjournald 1181 root 15 -4 16912 452 448 S 0 0.0 0:00.05 udevd 1544 root 14 -5 0 0 0 S 0 0.0 0:00.00 kpsmoused 1706 root 13 -5 0 0 0 S 0 0.0 0:00.00 kmirrord 1995 root 18 0 193m 3324 1688 S 0 0.1 8:52.77 rsyslogd 2031 root 15 0 48856 732 608 S 0 0.0 0:01.86 sshd 2071 root 25 0 17316 1072 1068 S 0 0.0 0:00.00 mysqld_safe 2108 mysql 15 0 320m 72m 4368 S 0 2.4 1923:25 mysqld 2109 root 18 0 3776 500 496 S 0 0.0 0:00.00 logger 2180 postgres 15 0 99504 3016 2880 S 0 0.1 1:24.15 postgres 2184 postgres 15 0 99504 3596 3420 S 0 0.1 0:02.08 postgres 2185 postgres 15 0 99504 696 628 S 0 0.0 0:00.65 postgres 2186 postgres 15 0 99640 892 648 S 0 0.0 0:01.18 postgres

Read the article

Where's my memory?! Nginx + PHP-FPM front end webserver slows to a crawl...

- by incredimike

I'm not sure if I have a problem with a memory leak (as my hosting company suggests), or if we both need to read http://linuxatemyram.com. Maybe you clever people can help us out? This is a front-end webserver VM running essentially only nginx & php-fpm on RHEL 5.5. This server is powering Magento, a PHP eCommerce thinggy. The server is running in a shared environment, but we're changing that soon. Anyway.. after a reboot the server runs just fine, but within a day it will grind itself into nothingness. Pages will take literally 2 minutes to load, CPU spikes like crazy, etc.. The console is even sluggish when I SSH in. It's like my whole server is being brought to its knees. I've also been monitoring the DB server via top and tcpdumping incoming traffic. The DB stays idle for a good portion of that "slow" load time. When i start seeing queries coming from the front-end server, the page loads soon afterward. Here are some stats after me logging in during a slow-down, after restarting php-fpm: [mike@front01 ~]$ free -m total used free shared buffers cached Mem: 5963 5217 745 0 192 314 -/+ buffers/cache: 4711 1252 Swap: 4047 4 4042 [mike@front01 ~]$ top top - 11:38:55 up 2 days, 1:01, 3 users, load average: 0.06, 0.17, 0.21 Tasks: 131 total, 1 running, 130 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 0.3%sy, 0.0%ni, 99.3%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 6106800k total, 5361288k used, 745512k free, 199960k buffers Swap: 4144728k total, 4976k used, 4139752k free, 328480k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31806 apache 15 0 601m 120m 37m S 0.0 2.0 0:22.23 php-fpm 31805 apache 15 0 549m 66m 31m S 0.0 1.1 0:14.54 php-fpm 31809 apache 16 0 547m 65m 32m S 0.0 1.1 0:12.84 php-fpm 32285 apache 15 0 546m 63m 33m S 0.0 1.1 0:09.22 php-fpm 32373 apache 15 0 546m 62m 32m S 0.0 1.1 0:09.66 php-fpm 31808 apache 16 0 543m 60m 35m S 0.0 1.0 0:18.93 php-fpm 31807 apache 16 0 533m 49m 30m S 0.0 0.8 0:08.93 php-fpm 32092 apache 15 0 535m 48m 27m S 0.0 0.8 0:06.67 php-fpm 4392 root 18 0 194m 10m 7184 S 0.0 0.2 0:06.96 cvd 4064 root 15 0 154m 8304 4220 S 0.0 0.1 3:55.57 snmpd 4394 root 15 0 119m 5660 2944 S 0.0 0.1 0:02.84 EvMgrC 31804 root 15 0 519m 5180 932 S 0.0 0.1 0:00.46 php-fpm 4138 ntp 15 0 23396 5032 3904 S 0.0 0.1 0:02.38 ntpd 643 nginx 15 0 95276 4408 1524 S 0.0 0.1 0:01.15 nginx 5131 root 16 0 90128 3340 2600 S 0.0 0.1 0:01.41 sshd 28467 root 15 0 90128 3340 2600 S 0.0 0.1 0:00.35 sshd 32602 root 16 0 90128 3332 2600 S 0.0 0.1 0:00.36 sshd 1614 root 16 0 90128 3308 2588 S 0.0 0.1 0:00.02 sshd 2817 root 5 -10 7216 3140 1724 S 0.0 0.1 0:03.80 iscsid 4161 root 15 0 66948 2340 800 S 0.0 0.0 0:10.35 sendmail 1617 nicole 17 0 53876 2000 1516 S 0.0 0.0 0:00.02 sftp-server ... Is there anything else I should be looking at, or any more information that might be useful? I'm just a developer, but the slowdowns on this system worry me and make it hard to do my work.. Help me out, ServerFault!

Read the article

What's going on with my server? High load, lots of idle CPU time, low disk utilization

- by Jonathan

I run a web site and send a legitimate opt-in, daily email newsletter to subscribers. Both the web hosting and email sending are done by the same machine. I have about 100,000 subscribers who have opted in to my daily email newsletter. My PHP script did a pretty good job sending mail to all of them until fairly recently, but as the list has grown I can't keep up. When I run top, I have very high load--usually at least 6 or 7, sometimes as high as 15--even though I only have two CPUs. However, when I run sar, my CPU is idle an average of about 30% of the time. So, it seems I'm not CPU bound. When I run iostat, it seems as though I'm not disk bound because my %util for each device is very low (no more than 5%). Given that I don't seem to be CPU bound or disk bound, why is top reporting such high load? Additionally, since I don't seem to be CPU bound or disk bound, why is my email sending script not able to keep up? Here's what I see when running top: top - 11:33:28 up 74 days, 18:49, 2 users, load average: 7.65, 8.79, 8.28 Tasks: 168 total, 5 running, 162 sleeping, 0 stopped, 1 zombie Cpu(s): 38.9%us, 58.6%sy, 0.8%ni, 0.0%id, 0.7%wa, 0.2%hi, 0.8%si, 0.0%st Mem: 3083012k total, 2144436k used, 938576k free, 281136k buffers Swap: 2048248k total, 39164k used, 2009084k free, 1470412k cached Here's what I see when running iostat -mx: avg-cpu: %user %nice %system %iowait %steal %idle 34.80 1.20 55.24 0.37 0.00 8.38 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.19 71.70 1.59 29.45 0.02 0.07 5.90 0.55 17.82 1.16 3.59 sda1 0.00 0.00 0.00 0.00 0.00 0.00 7.10 0.00 13.80 13.72 0.00 sda2 0.05 50.45 1.13 24.57 0.01 0.29 24.25 0.35 13.43 1.15 2.97 sda3 0.05 10.17 0.20 2.33 0.01 0.05 43.75 0.05 20.96 2.45 0.62 sda4 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 70.50 70.50 0.00 sda5 0.07 0.22 0.03 0.07 0.00 0.00 32.84 0.08 856.19 8.03 0.08 sda6 0.02 5.45 0.03 0.72 0.00 0.02 67.55 0.02 26.72 5.26 0.39 sda7 0.00 1.56 0.00 0.42 0.00 0.01 38.04 0.00 8.88 5.84 0.24 sda8 0.01 3.84 0.20 1.35 0.00 0.02 28.55 0.05 31.90 4.08 0.63 Here's what I see when running sar: 09:40:02 AM CPU %user %nice %system %iowait %steal %idle 09:50:01 AM all 30.59 1.01 49.80 0.23 0.00 18.37 10:00:08 AM all 31.73 0.92 51.66 0.13 0.00 15.55 10:10:06 AM all 30.43 0.99 48.94 0.26 0.00 19.38 10:20:01 AM all 29.58 1.00 47.76 0.25 0.00 21.42 10:30:01 AM all 29.37 1.02 47.30 0.18 0.00 22.13 10:40:06 AM all 32.50 1.01 52.94 0.16 0.00 13.39 10:50:01 AM all 30.49 1.00 49.59 0.15 0.00 18.77 11:00:01 AM all 29.43 0.99 47.71 0.17 0.00 21.71 11:10:07 AM all 30.26 0.93 49.48 0.83 0.00 18.50 11:20:02 AM all 29.83 0.81 48.51 1.32 0.00 19.52 11:30:06 AM all 31.18 0.88 51.33 1.15 0.00 15.47 Average: all 26.21 1.15 42.62 0.48 0.00 29.54 Here are the top handful of processes listed at the particular time I happened to run top -c: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8180 mysql 16 0 57448 19m 2948 S 26.6 0.7 4702:26 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --pid-file=/var/lib/mysql/bristno.pid --skip-external-locking 26956 brristno 17 0 0 0 0 Z 8.0 0.0 0:00.24 [php] <defunct> 26958 brristno 17 0 94408 43m 37m R 5.0 1.4 0:00.15 /usr/bin/php /home/brristno/public_html/dbv.php 22852 nobody 16 0 9628 2900 1524 S 0.7 0.1 0:00.17 /usr/local/apache/bin/httpd -k start -DSSL 8591 brristno 34 19 96896 13m 6652 S 0.3 0.4 0:29.82 /usr/local/bin/php /home/brristno/bin/mailer.php 1qwqyb6 i0gbor 24469 nobody 16 0 9628 2880 1508 S 0.3 0.1 0:00.08 /usr/local/apache/bin/httpd -k start -DSSL 25495 nobody 15 0 9628 2876 1500 S 0.3 0.1 0:00.06 /usr/local/apache/bin/httpd -k start -DSSL 26149 nobody 15 0 9628 2864 1504 S 0.3 0.1 0:00.04 /usr/local/apache/bin/httpd -k start -DSSL

Read the article

1600+ 'postfix-queue' processes - OK to have this many?

- by atomicguava

I have a Plesk 9.5.4 CentOS server running Postfix. I had been having massive problems with the mailq being full of 'double-bounce' email messages containing errors relating to 'Queue File Write Error', but I believe these are now fixed thanks to this thread. My new problem is that when I run top, I can see lots of processes called 'postfix-queue' and have fairly high load: top - 13:59:44 up 6 days, 21:14, 1 user, load average: 2.33, 2.19, 1.96 Tasks: 1743 total, 1 running, 1742 sleeping, 0 stopped, 0 zombie Cpu(s): 5.1%us, 8.8%sy, 0.0%ni, 85.3%id, 0.8%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3145728k total, 1950640k used, 1195088k free, 0k buffers Swap: 0k total, 0k used, 0k free, 0k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1324 apache 16 0 344m 33m 5664 S 21.7 1.1 0:03.17 httpd 32443 apache 15 0 350m 36m 6864 S 14.4 1.2 0:13.83 httpd 1678 root 15 0 13948 2568 952 R 2.0 0.1 0:00.37 top 1890 mysql 15 0 689m 318m 7600 S 1.0 10.4 219:45.23 mysqld 1394 apache 15 0 352m 41m 5972 S 0.7 1.3 0:03.91 httpd 1369 apache 15 0 344m 33m 5444 S 0.3 1.1 0:02.03 httpd 1592 apache 15 0 349m 37m 5912 S 0.3 1.2 0:02.52 httpd 1633 apache 15 0 336m 20m 1828 S 0.3 0.7 0:00.01 httpd 1952 root 19 0 335m 28m 10m S 0.3 0.9 1:35.41 httpd 1 root 15 0 10304 732 612 S 0.0 0.0 0:04.41 init 1034 mhandler 15 0 11520 1160 884 S 0.0 0.0 0:00.00 postfix-queue 1036 mhandler 15 0 11516 1120 860 S 0.0 0.0 0:00.00 postfix-queue 1041 mhandler 17 0 11516 1156 884 S 0.0 0.0 0:00.00 postfix-queue 1043 mhandler 15 0 11512 1116 860 S 0.0 0.0 0:00.00 postfix-queue 1063 mhandler 16 0 11516 1160 884 S 0.0 0.0 0:00.00 postfix-queue 1068 mhandler 15 0 11516 1128 860 S 0.0 0.0 0:00.00 postfix-queue 1071 mhandler 17 0 11512 1152 884 S 0.0 0.0 0:00.00 postfix-queue 1072 mhandler 15 0 11512 1116 860 S 0.0 0.0 0:00.00 postfix-queue 1081 mhandler 16 0 11516 1156 884 S 0.0 0.0 0:00.00 postfix-queue 1082 mhandler 15 0 11512 1120 860 S 0.0 0.0 0:00.00 postfix-queue 1089 popuser 15 0 33892 1972 1200 S 0.0 0.1 0:00.02 pop3d 1116 mhandler 16 0 11516 1164 884 S 0.0 0.0 0:00.00 postfix-queue 1117 mhandler 15 0 11516 1124 860 S 0.0 0.0 0:00.00 postfix-queue 1120 mhandler 16 0 11516 1160 884 S 0.0 0.0 0:00.00 postfix-queue 1121 mhandler 15 0 11512 1120 860 S 0.0 0.0 0:00.00 postfix-queue 1130 mhandler 17 0 11516 1160 884 S 0.0 0.0 0:00.00 postfix-queue 1131 mhandler 15 0 11516 1120 860 S 0.0 0.0 0:00.00 postfix-queue 1149 root 17 -4 12572 680 356 S 0.0 0.0 0:00.00 udevd 1181 mhandler 16 0 11516 1160 884 S 0.0 0.0 0:00.00 postfix-queue 1183 mhandler 15 0 11512 1116 860 S 0.0 0.0 0:00.00 postfix-queue 1224 mhandler 16 0 11516 1160 884 S 0.0 0.0 0:00.00 postfix-queue 1225 mhandler 15 0 11516 1120 860 S 0.0 0.0 0:00.00 postfix-queue 1228 apache 15 0 345m 34m 5472 S 0.0 1.1 0:04.64 httpd 1241 mhandler 16 0 11516 1156 884 S 0.0 0.0 0:00.00 postfix-queue 1242 mhandler 15 0 11512 1120 860 S 0.0 0.0 0:00.00 postfix-queue 1251 mhandler 17 0 11516 1156 884 S 0.0 0.0 0:00.00 postfix-queue 1252 mhandler 15 0 11516 1120 860 S 0.0 0.0 0:00.00 postfix-queue 1258 apache 15 0 349m 37m 5444 S 0.0 1.2 0:01.28 httpd When I run ps -Al | grep -c postfix-queue it returns 1618! My question is this: is this normal or is there something else going wrong with Postfix? Right now, if I run mailq it is empty, and qshape deferred / qshape active are empty too. Thanks in advance for your help.

Read the article

RedHat 5.5 server does not show per processor memory utilization

- by Mike S

I have been searching all over internet but not finding any leads. I have a system with a memory leak that I am trying to troubleshoot. Unfortunately I am not able to see per processor memory utilization. Here are the outputs of TOP and PS commands. Linux SERVER_NAME 2.6.18-194.8.1.el5 #1 SMP Wed Jun 23 10:52:51 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux top - 09:17:13 up 18:43, 3 users, load average: 0.00, 0.00, 0.00 Tasks: 375 total, 1 running, 373 sleeping, 0 stopped, 1 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 32922828k total, 32776712k used, 146116k free, 267128k buffers Swap: 5245212k total, 0k used, 5245212k free, 32141044k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 15 0 10348 744 620 S 0.0 0.0 0:05.65 init 2 root RT -5 0 0 0 S 0.0 0.0 0:00.05 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 5 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/1 6 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1 7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1 8 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/2 9 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/2 10 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/2 11 root RT -5 0 0 0 S 0.0 0.0 0:00.01 migration/3 12 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/3 13 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/3 14 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/4 15 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/4 16 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/4 17 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/5 18 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/5 19 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/5 20 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/6 % ps -auxf | sort -nr -k 4 | head -10 Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.7/FAQ xfs 6205 0.0 0.0 23316 3892 ? Ss Aug19 0:00 xfs -droppriv -daemon uuidd 6101 0.0 0.0 60976 224 ? Ss Aug19 0:00 /usr/sbin/uuidd USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND smmsp 6130 0.0 0.0 57900 1784 ? Ss Aug19 0:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue rpc 5126 0.0 0.0 8052 632 ? Ss Aug19 0:00 portmap root 99 0.0 0.0 0 0 ? S< Aug19 0:00 [events/1] root 98 0.0 0.0 0 0 ? S< Aug19 0:00 [events/0] root 97 0.0 0.0 0 0 ? S< Aug19 0:00 [watchdog/31] root 96 0.0 0.0 0 0 ? SN Aug19 0:00 [ksoftirqd/31] root 95 0.0 0.0 0 0 ? S< Aug19 0:00 [migration/31] Any help with this is appretiate.

Read the article

I’m sorry RPGs, it’s not you, it’s me: The birth of my game idea

- by George Clingerman

One of the things I’ve had to give up in order to have some development time at night is gaming. It’s something I refused to admit for years but I’ve just had to face the facts. I’m no longer a gamer. I just don’t have hours and hours of free time to pour into gaming and when I do have hours and hours of free time I want to pour them into game development. That doesn’t mean I don’t game at all! I play games pretty much every day. It just means I’ve moved more into the casual game realm. It’s all I have time for when juggling priorities in my life. That means that games like Gears of War 2 sit shrink wrapped on my shelf and although I popped Dragon Age into my Xbox 360 one time, I barely made it through the opening sequence and haven’t had time to sit down and play again. Instead I’m playing short games like Jamestown, Atom Zombie Smasher, Fortix or if I have time to jump in and play a few rounds maybe some Monday Night Combat or Team Fortress 2. These are games I can instantly get into and play for just a short period of time and then walk away. Breath of Death VII saved my life: Back in the day (way, way back in the day) I used to be a pretty big RPG fan. Not big by a lot of RPG gamers' standards (most of the RPGs RPG fans about I’ve never heard of) but I used to LOVE to play them on the NES, SNES and Genesis and considered that my genre. Final Fantasy, Shining in the Darkness, Bard’s Tale, Faxanadu, Shadowrun, Ultima, Dragon Warrior, Chrono Trigger, Phantasy Star, Shining Force and well the list could go on but those are the ones I remember off the top of my head. I loved playing RPGs and they were my games of choice. After my first son was born (this was just about 12 years ago), I tried to continue playing RPGs and purchased games like Baldur’s Gate I & II, Neverwinter Nights, Fable, then a few of the Final Fantasy’s then Kingdom Hearts. I kept buying these games and then only playing for about fifteen minutes and never getting back to them. I still loved RPGs but they just no longer fit into my life (I still haven’t accepted that since I still purchased Dragon Age II for some reason and convinced myself I’d find the time). Adding three more sons to the mix (that’s 4 total) didn’t help much to finding more RPG time (except for Breath of Death VII and other XBLIG RPG titles, thanks guys!) All work and no RPG: A few months ago as I was sitting thinking about the lack of RPGs in my life and talking to my wife about why I wish RPGs were different and easier for a dad like me to get into. She seemed like she was listening, so I started listing all the things that made them impossible for me to play. Here’s a short list I came up with. They take 15 billion hours to complete I have a few minutes at a time I can grab to play them if I want to have time to code. At that rate it would take me 9 trillion years to beat just one RPG. There’s such long spans of times between when I can play them I forget what I was even doing so I have to spend most of the playtime I have just figuring that out and then my play time is over. Repeat. I’ll never finish one and since it takes so long to get to the fun part in an RPG, I’m never having fun. RPGs aren’t fun if you don’t have hours to play them at a time. As you can see based on my science and math, RPGs aren’t fun for me any more. From there my brain started toying around with ideas of RPGs that would work for me. They would have to be a short RPG, you know one you could beat in a single play session. A dad sized play session. I started thinking, wouldn’t it be awesome if there was a fifteen minute RPG? That got me laughing and I took that as a good sign that it sounded fun and so I thought about it a little more. I immediately discarded the idea of doing a real RPG. I’m sure a short RPG like that could be done but it wasn’t the vibe that I had in my head. No this was going to be something that just had the core essence of an RPG. In reality what I’d be making would be more of an arcade style game. One with high scores and lots of crazy action on the screen. And that’s when it hit me. It would be a speed run RPG. That’s the basics of the game I’m working on. The Elevator Pitch: It’s a 2D top down RPG themed arcade game focused on speed. It sounds like an RPG, smells like an RPG but it’s merely emulating an RPG. The game is focused on fun and mayhem in RPG form with players leveling up in seconds instead of hours and rushing to finish quests as quickly as possible because they’ve only got fifteen minutes before EVIL overtakes the world. If the player takes longer than fifteen minutes, it’s game over man. One to four player co-operative play to really see just how fast players can level up and beat the game. Gamers will compete on leaderboards for bragging rights for fastest 1, 2, 3, and 4 player speed runs, lowest leveled characters to beat the game, highest leveled characters to beat the game and so on. Times will be tracked for everything from how long a player sat distributing stats, equipping items, talking to NPCs to running around the level. These stats will be shown at the end of each quest/level so the players can work on improving their speed run for that part of the game next time around. It’s the perfect RPG for those of us who only have fifteen minutes of game time! Where I’m at: I’m still at the prototyping stage attempting to but all the basic framework pieces in place that will at minimum give me one level to rush through. I’ve been working on this prototype for about a month now though so I’m going to have to step it up a bit or I’m not going to get finished in time (remember I’ve only got 85 days left!) Lots of the game code is in place (although pretty sloppy) but I still can’t play through that first quest/level just yet. That’s my goal to finish up by the end of next Sunday (3/25/2012). You can all hold me to that and cheer me on or heckle me throughout the week. Either way that should help me stay a bit more motivated and focused. In my head this feels like it’s going to be a fun game so I’m looking forward to seeing how it actually plays!

Read the article

Struct Method for Loops Problem

- by Annalyne

I have tried numerous times how to make a do-while loop using the float constructor for my code but it seems it does not work properly as I wanted. For summary, I am making a TBRPG in C++ and I encountered few problems. But before that, let me post my code. #include <iostream> #include <string> #include <ctime> #include <cstdlib> using namespace std; int char_level = 1; //the starting level of the character. string town; //town string town_name; //the name of the town the character is in. string charname; //holds the character's name upon the start of the game int gems = 0; //holds the value of the games the character has. const int MAX_ITEMS = 15; //max items the character can carry string inventory [MAX_ITEMS]; //the inventory of the character in game int itemnum = 0; //number of items that the character has. bool GameOver = false; //boolean intended for the game over scr. string monsterTroop [] = {"Slime", "Zombie", "Imp", "Sahaguin, Hounds, Vampire"}; //monster name float monsterTroopHealth [] = {5.0f, 10.0f, 15.0f, 20.0f, 25.0f}; // the health of the monsters int monLifeBox; //life carrier of the game's enemy troops int enemNumber; //enemy number //inventory[itemnum++] = "Sword"; class RPG_Game_Enemy { public: void enemyAppear () { srand(time(0)); enemNumber = 1+(rand()%3); if (enemNumber == 1) cout << monsterTroop[1]; //monster troop 1 else if (enemNumber == 2) cout << monsterTroop[2]; //monster troop 2 else if (enemNumber == 3) cout << monsterTroop[3]; //monster troop 3 else if (enemNumber == 4) cout << monsterTroop[4]; //monster troop 4 } void enemDefeat () { cout << "The foe has been defeated. You are victorious." << endl; } void enemyDies() { //if the enemy dies: //collapse declaration cout << "The foe vanished and you are victorious!" << endl; } }; class RPG_Scene_Battle { public: RPG_Scene_Battle(float ini_health) : health (ini_health){}; float getHealth() { return health; } void setHealth(float rpg_val){ health = rpg_val;}; private: float health; }; //---------------------------------------------------------------// // Conduct Damage for the Scene Battle's Damage //---------------------------------------------------------------// float conductDamage(RPG_Scene_Battle rpg_tr, float damage) { rpg_tr.setHealth(rpg_tr.getHealth() - damage); return rpg_tr.getHealth(); }; // ------------------------------------------------------------- // void RPG_Scene_DisplayItem () { cout << "Items: \n"; for (int i=0; i < itemnum; ++i) cout << inventory[i] <<endl; }; In this code I have so far, the problem I have is the battle scene. For example, the player battles a Ghost with 10 HP, when I use a do while loop to subtract the HP of the character and the enemy, it only deducts once in the do while. Some people said I should use a struct, but I have no idea how to make it. Is there a way someone can display a code how to implement it on my game? Edit: I made the do-while by far like this: do RPG_Scene_Battle (player, 20.0f); RPG_Scene_Battle (enemy, 10.0f); cout << "Battle starts!" <<endl; cout << "You used a blade skill and deducted 2 hit points to the enemy!" conductDamage (enemy, 2.0f); while (enemy!=0) also, I made something like this: #include <iostream> using namespace std; int gems = 0; class Entity { public: Entity(float startingHealth) : health(startingHealth){}; // initialize health float getHealth(){return health;} void setHealth(float value){ health = value;}; private: float health; }; float subtractHealthFrom(Entity& ent, float damage) { ent.setHealth(ent.getHealth() - damage); return ent.getHealth(); }; int main () { Entity character(10.0f); Entity enemy(10.0f); cout << "Hero Life: "; cout << subtractHealthFrom(character, 2.0f) <<endl; cout << "Monster Life: "; cout << subtractHealthFrom(enemy, 2.0f) <<endl; cout << "Hero Life: "; cout << subtractHealthFrom(character, 2.0f) <<endl; cout << "Monster Life: "; cout << subtractHealthFrom(enemy, 2.0f) <<endl; }; Struct method, they say, should solve this problem. How can I continously deduct hp from the enemy? Whenever I deduct something, it would return to its original value -_-

Read the article

Need help trouble shooting high CPU usage by PHP-fpm

- by user432506

There is a problem that is driving me crazy. After the day I tried to fix the CPU usage problem of my VPS, the CPU load has grown from 60% to 150%, and I have no idea what causes the problem. Please help me. I had installed a copy of mediawiki on a Linode 1024. The wiki is running on Niginx + PHP-fpm + MySql. The wiki doesn't have much traffic, only around 4000 requests/day, mostly from Google and Bing bots. It had been using around 60% (of total 400% on the Linode) of the CPU before. I thought it was a bit high, so two day ago, I was trying to fix the problem (not knowing what was waiting for me). I did nothing but added a new empty line to wiki's configure file, which would change the modified time of the configure file, and then all the cached page files would be set invalidated. I had done that before, and that would cause high CPU usage, but normally it would take only hours to let things back to normal again. Not this time, my CPU usage has been around 150% for more than two days. It is php-fpm using most of CPU reassures. Using 100% of three cores is not rare. I hadn't seen that before. There are other sites on the Linode, but it should be the wiki. Because if I offline the wiki, CPU usage will drop back to around 40% soon. The day I also duplicated php-fpm.conf, and opened it, but didn't changed it. I have no idea what I did wrong. I here ask for help to save myself from being crazy!!! It is php-fpm. Is there a way to find out what is it doing? I mean like which scripts are related and what function codes are running? top: top - 06:34:33 up 10 days, 4:23, 2 users, load average: 1.10, 1.24, 1.37 Tasks: 76 total, 4 running, 72 sleeping, 0 stopped, 0 zombie Cpu(s): 61.1%us, 3.1%sy, 0.0%ni, 32.8%id, 2.9%wa, 0.0%hi, 0.0%si, 0.1%st Mem: 1028684k total, 945192k used, 83492k free, 89580k buffers Swap: 524284k total, 18084k used, 506200k free, 530380k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26721 www-data 20 0 208m 54m 34m R 99 5.4 0:09.07 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf 26592 www-data 20 0 207m 45m 26m R 91 4.5 0:12.77 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf 26706 www-data 20 0 196m 43m 34m S 47 4.3 0:15.19 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf 26583 www-data 20 0 197m 45m 35m S 33 4.5 0:19.08 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf 26787 www-data 20 0 206m 36m 18m R 25 3.7 0:00.41 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf 26661 www-data 20 0 207m 46m 26m S 13 4.6 0:19.87 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf 1971 mysql 20 0 155m 57m 3952 S 8 5.7 383:57.81 /usr/sbin/mysqld 242 root 20 0 0 0 0 S 1 0.0 0:51.36 [kworker/3:1] 5711 root 20 0 139m 95m 580 S 1 9.5 0:41.30 /usr/local/bin/memcached -d -u root -m 128 -p 11211 19463 root 20 0 190m 3984 1284 S 1 0.4 0:02.66 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf 29100 www-data 20 0 10928 5540 820 S 1 0.5 4:49.05 nginx: worker process vmstat 30 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 16912 81456 90784 554172 0 0 4 6 3 2 11 1 87 1 0 0 16912 78036 91000 555356 0 0 38 34 1397 375 12 1 87 0 4 0 16912 31776 91528 557508 0 0 78 42 3197 487 45 1 52 1 1 0 16912 83356 91768 558576 0 0 35 56 2608 449 32 1 67 1 1 0 16912 81548 92040 559720 0 0 41 31 1243 432 8 1 91 1 2 0 16912 53056 92332 562744 0 0 105 33 2013 581 17 1 81 1 2 0 16912 73236 92552 564844 0 0 68 36 1968 615 16 1 82 1 0 0 16912 91612 92904 566676 0 0 69 35 1845 692 13 1 85 1 1 0 16912 71248 93180 568428 0 0 58 33 1952 604 15 1 82 1 1 0 16868 55952 93516 572660 1 0 144 42 1801 637 12 1 86 1 2 0 16868 48324 94416 577844 0 0 189 66 2058 702 17 1 80 2 1 0 16928 58644 94592 578184 0 2 160 49 2578 723 25 1 70 3 5 0 16928 22600 94980 580568 0 0 89 32 1496 361 13 0 85 1 0 0 16988 49256 94500 576396 0 2 41 37 1601 426 14 1 85 0 5 0 18084 24336 86032 502748 0 37 83 68 2989 562 42 1 56 0 1 0 18084 123604 86376 506996 0 0 118 41 2201 573 22 1 76 1 2 0 18084 126984 86752 508876 0 0 64 53 1620 490 13 1 85 1 2 0 18084 103104 87148 510768 0 0 71 37 2757 602 33 1 64 1

Read the article

Server Memory with Magento

- by Mohamed Elgharabawy

I have a cloud server with the following specifications: 2vCPUs 4G RAM 160GB Disk Space Network 400Mb/s System Image: Ubuntu 12.04 LTS I am only running Magento CE 1.7.0.2 on this server. Nothing else. Usually, the server has a loading time of 4-5 seconds. Recently, this has dropped to over 30 seconds and sometimes the server just goes away and I get HTTP error reports to my email stating that HTTP requests took more than 20000ms. Running top command and sorting them returns the following: top - 15:29:07 up 3:40, 1 user, load average: 28.59, 25.95, 22.91 Tasks: 112 total, 30 running, 82 sleeping, 0 stopped, 0 zombie Cpu(s): 90.2%us, 9.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.2%st PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31901 www-data 20 0 360m 71m 5840 R 7 1.8 1:39.51 apache2 32084 www-data 20 0 362m 72m 5548 R 7 1.8 1:31.56 apache2 32089 www-data 20 0 348m 59m 5660 R 7 1.5 1:41.74 apache2 32295 www-data 20 0 343m 54m 5532 R 7 1.4 2:00.78 apache2 32303 www-data 20 0 354m 65m 5260 R 7 1.6 1:38.76 apache2 32304 www-data 20 0 346m 56m 5544 R 7 1.4 1:41.26 apache2 32305 www-data 20 0 348m 59m 5640 R 7 1.5 1:50.11 apache2 32291 www-data 20 0 358m 69m 5256 R 6 1.7 1:44.26 apache2 32517 www-data 20 0 345m 56m 5532 R 6 1.4 1:45.56 apache2 30473 www-data 20 0 355m 66m 5680 R 6 1.7 2:00.05 apache2 32093 www-data 20 0 352m 63m 5848 R 6 1.6 1:53.23 apache2 32302 www-data 20 0 345m 56m 5512 R 6 1.4 1:55.87 apache2 32433 www-data 20 0 346m 57m 5500 S 6 1.4 1:31.58 apache2 32638 www-data 20 0 354m 65m 5508 R 6 1.6 1:36.59 apache2 32230 www-data 20 0 347m 57m 5524 R 6 1.4 1:33.96 apache2 32231 www-data 20 0 355m 66m 5512 R 6 1.7 1:37.47 apache2 32233 www-data 20 0 354m 64m 6032 R 6 1.6 1:59.74 apache2 32300 www-data 20 0 355m 66m 5672 R 6 1.7 1:43.76 apache2 32510 www-data 20 0 347m 58m 5512 R 6 1.5 1:42.54 apache2 32521 www-data 20 0 348m 59m 5508 R 6 1.5 1:47.99 apache2 32639 www-data 20 0 344m 55m 5512 R 6 1.4 1:34.25 apache2 32083 www-data 20 0 345m 56m 5696 R 5 1.4 1:59.42 apache2 32085 www-data 20 0 347m 58m 5692 R 5 1.5 1:42.29 apache2 32293 www-data 20 0 353m 64m 5676 R 5 1.6 1:52.73 apache2 32301 www-data 20 0 348m 59m 5564 R 5 1.5 1:49.63 apache2 32528 www-data 20 0 351m 62m 5520 R 5 1.6 1:36.11 apache2 31523 mysql 20 0 3460m 576m 8288 S 5 14.4 2:06.91 mysqld 32002 www-data 20 0 345m 55m 5512 R 5 1.4 2:01.88 apache2 32080 www-data 20 0 357m 68m 5512 S 5 1.7 1:31.30 apache2 32163 www-data 20 0 347m 58m 5512 S 5 1.5 1:58.68 apache2 32509 www-data 20 0 345m 56m 5504 R 5 1.4 1:49.54 apache2 32306 www-data 20 0 358m 68m 5504 S 4 1.7 1:53.29 apache2 32165 www-data 20 0 344m 55m 5524 S 4 1.4 1:40.71 apache2 32640 www-data 20 0 345m 56m 5528 R 4 1.4 1:36.49 apache2 31888 www-data 20 0 359m 70m 5664 R 4 1.8 1:57.07 apache2 32511 www-data 20 0 357m 67m 5512 S 3 1.7 1:47.00 apache2 32054 www-data 20 0 357m 68m 5660 S 2 1.7 1:53.10 apache2 1 root 20 0 24452 2276 1232 S 0 0.1 0:01.58 init Moreover, running free -m returns the following: total used free shared buffers cached Mem: 4003 3919 83 0 118 901 -/+ buffers/cache: 2899 1103 Swap: 0 0 0 To investigate this further, I have installed apache buddy, it recommeneded that I need to reduce the maxclient connections. Which I did. I also installed MysqlTuner and it suggests that I need to set my innodb_buffer_pool_size to = 3.0G. However, I cannot do that, since the whole memory is 4G. Here is the output from apache buddy: ### GENERAL REPORT ### Settings considered for this report: Your server's physical RAM: 4003MB Apache's MaxClients directive: 40 Apache MPM Model: prefork Largest Apache process (by memory): 73.77MB [ OK ] Your MaxClients setting is within an acceptable range. Max potential memory usage: 2950.8 MB Percentage of RAM allocated to Apache 73.72 % And this is the output of MySQLTuner: -------- Performance Metrics ------------------------------------------------- [--] Up for: 47m 22s (675K q [237.552 qps], 12K conn, TX: 1B, RX: 300M) [--] Reads / Writes: 45% / 55% [--] Total buffers: 2.1G global + 2.7M per thread (151 max threads) [OK] Maximum possible memory usage: 2.5G (64% of installed RAM) [OK] Slow queries: 0% (0/675K) [OK] Highest usage of available connections: 26% (40/151) [OK] Key buffer size / total MyISAM indexes: 36.0M/18.7M [OK] Key buffer hit rate: 100.0% (245K cached / 105 reads) [OK] Query cache efficiency: 92.5% (500K cached / 541K selects) [!!] Query cache prunes per day: 302886 [OK] Sorts requiring temporary tables: 0% (1 temp sorts / 15K sorts) [!!] Joins performed without indexes: 12135 [OK] Temporary tables created on disk: 25% (8K on disk / 32K total) [OK] Thread cache hit rate: 90% (1K created / 12K connections) [!!] Table cache hit rate: 17% (400 open / 2K opened) [OK] Open file limit used: 12% (123/1K) [OK] Table locks acquired immediately: 100% (196K immediate / 196K locks) [!!] InnoDB buffer pool / data size: 2.0G/3.5G [OK] InnoDB log waits: 0 -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance MySQL started within last 24 hours - recommendations may be inaccurate Enable the slow query log to troubleshoot bad queries Adjust your join queries to always utilize indexes Increase table_cache gradually to avoid file descriptor limits Read this before increasing table_cache over 64: http://bit.ly/1mi7c4C Variables to adjust: query_cache_size ( 64M) join_buffer_size ( 128.0K, or always use indexes with joins) table_cache ( 400) innodb_buffer_pool_size (= 3G) Last but not least, the server still has more than 60% of free disk space. Now, based on the above, I have few questions: Are these numbers normal? Do they make sense? Do I need to upgrade the server? If I don't need to upgrade and my configuration is not correct, how do I optimize it?

Read the article

Linux server apache httpd processes take i/o wait to close to 100% and lock down server

- by user3682065

For about 5 days now, and seemingly out of the blue, my linux server has started locking up from time to time. The pattern is always the same as far as I can tell from top and iotop commands around the time it starts happening: One or more httpd processes (usually one) hang and start using up 100% of CPU power, the %wa goes close to 100% and in the iotop I see several httpd processes with 99.99% in the IO column. I'm also running an SVN server on this machine through apache and the one way that I've been consistently able to reproduce this is to do an SVN commit of new files or an SVN update from the repository on this server (I am the only one using this SVN repository). This will always reproduce this scenario successfully, but until very recently I had no problems at all checking in/out of SVN. But sometimes it just happens for no detectable reason at all it seems. So it seems like there is some issue with my Apache that leads it to have processes use up a lot of read/write upon certain triggers. I was wondering if anyone could help me uncover that issue. EDIT: OK now it's happening again: This is top: [root@server ~]# top top - 10:56:54 up 2:59, 5 users, load average: 171.46, 70.35, 27.01 Tasks: 328 total, 2 running, 326 sleeping, 0 stopped, 0 zombie Cpu(s): 1.9%us, 2.0%sy, 0.0%ni, 0.0%id, 96.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2021144k total, 1968192k used, 52952k free, 2500k buffers Swap: 4194288k total, 2938584k used, 1255704k free, 39008k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10390 apache 20 0 2774m 936m 6200 D 2.0 47.4 1:52.27 httpd 2149 root 20 0 927m 13m 1040 S 0.7 0.7 1:50.46 namecoind 11 root 20 0 0 0 0 R 0.3 0.0 0:30.10 events/0 23 root 20 0 0 0 0 S 0.3 0.0 0:17.88 kblockd/1 2049 root 20 0 382m 4932 2880 D 0.3 0.2 0:03.67 httpd 2144 root 20 0 1702m 69m 1164 S 0.3 3.5 5:19.68 bitcoind 6325 root 20 0 15164 1100 656 R 0.3 0.1 0:11.09 top 10311 apache 20 0 387m 9496 7320 D 0.3 0.5 0:01.89 httpd 10313 apache 20 0 391m 10m 7364 D 0.3 0.5 0:02.40 httpd 10466 apache 20 0 399m 12m 7392 D 0.3 0.7 0:02.41 httpd 10599 apache 20 0 391m 9324 7340 D 0.3 0.5 0:00.15 httpd 10628 apache 20 0 384m 7620 4052 D 0.3 0.4 0:00.01 httpd 10633 apache 20 0 384m 7048 3504 D 0.3 0.3 0:00.01 httpd 10634 apache 20 0 384m 8012 4048 D 0.3 0.4 0:00.02 httpd 10638 apache 20 0 400m 22m 9.8m D 0.3 1.1 0:01.93 httpd 10640 apache 20 0 385m 8288 4028 D 0.3 0.4 0:00.03 httpd 10641 apache 20 0 401m 21m 6376 D 0.3 1.1 0:01.45 httpd 10759 apache 20 0 385m 8816 3480 D 0.3 0.4 0:01.45 httpd 10773 apache 20 0 384m 8044 3464 D 0.3 0.4 0:00.02 httpd This is an iotop snapshot: Total DISK READ: 5.93 M/s | Total DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 10732 be/4 apache 3.76 K/s 0.00 B/s 0.00 % 58.48 % httpd 876 be/3 root 0.00 B/s 52.68 K/s 0.00 % 52.98 % [jbd2/dm-1-8] 10906 be/4 root 124.17 K/s 0.00 B/s 0.00 % 23.03 % sh -c [ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1 2156 be/4 root 206.94 K/s 0.00 B/s 0.00 % 21.15 % bitcoind 10904 be/4 mysql 0.00 B/s 0.00 B/s 0.00 % 18.94 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock 10773 be/4 apache 7.53 K/s 0.00 B/s 0.00 % 14.77 % httpd 10641 be/4 apache 15.05 K/s 0.00 B/s 0.00 % 11.57 % httpd 10399 be/4 apache 1057.29 K/s 0.00 B/s 43.16 % 10.56 % httpd 10682 be/4 sw-cp-se 158.03 K/s 0.00 B/s 0.00 % 7.45 % sw-engine-cgi -c /usr/local/psa/admin/conf/php.ini -d auto_prepend_file=auth.php3 -u psaadm 10774 be/4 apache 3.76 K/s 0.00 B/s 0.00 % 6.53 % httpd 10624 be/4 apache 0.00 B/s 0.00 B/s 0.00 % 5.53 % httpd 10356 be/4 apache 899.26 K/s 0.00 B/s 35.52 % 4.01 % httpd 10795 be/4 apache 0.00 B/s 0.00 B/s 0.00 % 3.93 % httpd 10804 be/4 apache 7.53 K/s 0.00 B/s 0.00 % 3.08 % httpd 4379 be/4 root 2.89 M/s 0.00 B/s 99.99 % 0.00 % namecoind 10619 be/4 apache 462.80 K/s 0.00 B/s 7.80 % 0.00 % httpd 10636 be/4 apache 3.76 K/s 0.00 B/s 0.00 % 0.00 % httpd 10716 be/4 mysql 105.35 K/s 0.00 B/s 5.92 % 0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock 1988 be/4 root 18.81 K/s 0.00 B/s 0.00 % 0.00 % spamd_full.sock I also ran lsof -p for pid 10390 which was way up top under the top command and this is the bottom line where I can sort of see what request this was and it says CLOSE_WAIT: httpd 10390 apache 34u IPv6 315879 0t0 TCP default-domain.com:https->crawl-66-249-65-91.googlebot.com:42907 (CLOSE_WAIT) I'm still not sure what exactly is causing this all to happen though? I killed that service but %wa and load average remain high, I also stopped mysqld and other services. It really only goes down once I stop httpd altogether, and even then I can't start it without finding remaining hanging httpd processes via "netstat -tulpn", killing those or doing "killall -9 httpd" and after waiting a while for it to cycle through all those then doing /etc/init.d/httpd start

Read the article

my webserver with 16GB ram shows all RAM as used, but is it really, see the 'top'

- by Alex

I have some questions about my web server. Its a LAMP web server running centos 5.5 and php5, mysql5. The server gets hundreds (maybe thousand) of concurrent users during peak hours. I'm trying to optimize a little and understand "top". From what I can see: all 16GB of my ram have been used up? does that mean that my server needs more memory? My swap is only 2GB, should it be increased? usually during peak hours my server load average first number is about 2.5-3. What could I do to optimize the server so that the load average even during peak doesn't go above 1? In the past I was told a good working server should stay under 1 load, is this still true? Although even during load of 2.5-3, server pages and applications seem to load with pretty good speed. what should the memory size in php.ini be set to? top - 14:30:18 up 2 days, 12:41, 5 users, load average: 1.25, 1.74, 2.92 Tasks: 305 total, 2 running, 302 sleeping, 0 stopped, 1 zombie Cpu(s): 6.3%us, 0.9%sy, 0.0%ni, 92.5%id, 0.2%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 16427200k total, 16111472k used, 315728k free, 3120316k buffers Swap: 2104496k total, 268k used, 2104228k free, 6216756k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29080 apache 15 0 358m 36m 5192 S 20.2 0.2 2:08.40 httpd 29093 apache 18 0 357m 36m 5192 S 18.2 0.2 2:02.52 httpd 29079 apache 15 0 370m 49m 5832 S 10.0 0.3 2:32.14 httpd 1812 apache 15 0 370m 49m 5196 S 7.3 0.3 2:25.30 httpd 5204 apache 15 0 358m 36m 5168 S 5.3 0.2 0:59.28 httpd 29075 apache 15 0 370m 48m 5184 S 3.3 0.3 2:15.93 httpd 9712 apache 15 0 360m 38m 5180 S 3.0 0.2 0:54.81 httpd 29072 apache 16 0 358m 36m 5192 S 2.7 0.2 2:24.43 httpd 6310 apache 17 0 388m 67m 5180 S 2.3 0.4 0:58.85 httpd 8674 apache 15 0 343m 21m 4980 S 2.0 0.1 0:07.91 httpd 29085 apache 15 0 371m 49m 5224 S 2.0 0.3 2:16.86 httpd 29083 apache 15 0 370m 48m 5196 S 1.7 0.3 2:10.64 httpd 5575 apache 15 0 357m 36m 5228 S 1.3 0.2 0:53.78 httpd 29066 apache 15 0 379m 59m 5860 R 1.3 0.4 2:11.93 httpd 29078 apache 15 0 370m 48m 5188 S 1.3 0.3 2:14.52 httpd 29084 apache 15 0 370m 48m 5208 S 1.0 0.3 2:02.49 httpd 29089 apache 15 0 370m 48m 5188 S 1.0 0.3 2:27.61 httpd 29082 apache 15 0 390m 68m 5188 S 0.7 0.4 2:32.48 httpd 29984 apache 15 0 358m 36m 5228 S 0.7 0.2 2:08.32 httpd 3571 root 16 0 13400 1792 848 S 0.3 0.0 2:37.89 top 4419 mysql 15 0 668m 175m 7204 S 0.3 1.1 3:32.25 mysqld 28181 root 15 0 90460 3624 2680 S 0.3 0.0 0:17.60 sshd 29091 apache 15 0 390m 69m 5196 S 0.3 0.4 2:29.99 httpd 32476 root 15 0 12900 1320 848 R 0.3 0.0 0:06.46 top 1 root 15 0 10372 680 572 S 0.0 0.0 0:02.01 init 2 root RT -5 0 0 0 S 0.0 0.0 0:00.51 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.07 ksoftirqd/0 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 5 root RT -5 0 0 0 S 0.0 0.0 0:00.12 migration/1 6 root 34 19 0 0 0 S 0.0 0.0 0:00.03 ksoftirqd/1 7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1 8 root RT -5 0 0 0 S 0.0 0.0 0:00.06 migration/2 9 root 34 19 0 0 0 S 0.0 0.0 0:00.03 ksoftirqd/2 10 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/2 11 root RT -5 0 0 0 S 0.0 0.0 0:00.06 migration/3 12 root 34 19 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/3 13 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/3 14 root RT -5 0 0 0 S 0.0 0.0 0:01.45 migration/4 15 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/4 16 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/4 17 root RT -5 0 0 0 S 0.0 0.0 0:00.22 migration/5 18 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/5 19 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/5 20 root RT -5 0 0 0 S 0.0 0.0 0:00.15 migration/6 21 root 34 19 0 0 0 S 0.0 0.0 0:00.02 ksoftirqd/6 22 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/6 23 root RT -5 0 0 0 S 0.0 0.0 0:00.15 migration/7 24 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/7 25 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/7 26 root RT -5 0 0 0 S 0.0 0.0 0:00.19 migration/8 27 root 34 19 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/8 28 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/8 29 root RT -5 0 0 0 S 0.0 0.0 0:00.34 migration/9 30 root 34 19 0 0 0 S 0.0 0.0 0:00.03 ksoftirqd/9 31 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/9 32 root RT -5 0 0 0 S 0.0 0.0 0:00.16 migration/10 33 root 34 19 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/10 34 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/10 35 root RT -5 0 0 0 S 0.0 0.0 0:00.12 migration/11 36 root 34 19 0 0 0 S 0.0 0.0 0:00.05 ksoftirqd/11 37 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/11 38 root RT -5 0 0 0 S 0.0 0.0 0:00.35 migration/12

Read the article

Why is my mdadm raid-1 recovery so slow?

- by dimmer

On a system I'm running Ubuntu 10.04. My raid-1 restore started out fast but quickly became ridiculously slow (at this rate the restore will take 150 days!): dimmer@paimon:~$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdc1[2] sdb1[1] 1953513408 blocks [2/1] [_U] [====>................] recovery = 24.4% (477497344/1953513408) finish=217368.0min speed=113K/sec unused devices: <none> Eventhough I have set the kernel variables to reasonably quick values: dimmer@paimon:~$ cat /proc/sys/dev/raid/speed_limit_min 1000000 dimmer@paimon:~$ cat /proc/sys/dev/raid/speed_limit_max 100000000 I am using 2 2.0TB Western Digital Hard Disks, WDC WD20EARS-00M and WDC WD20EARS-00J. I believe they have been partitioned such that their sectors are aligned. dimmer@paimon:/sys$ sudo parted /dev/sdb GNU Parted 2.2 Using /dev/sdb Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: ATA WDC WD20EARS-00M (scsi) Disk /dev/sdb: 2000GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 2000GB 2000GB ext4 (parted) unit s (parted) p Number Start End Size File system Name Flags 1 2048s 3907028991s 3907026944s ext4 (parted) q dimmer@paimon:/sys$ sudo parted /dev/sdc GNU Parted 2.2 Using /dev/sdc Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: ATA WDC WD20EARS-00J (scsi) Disk /dev/sdc: 2000GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 2000GB 2000GB ext4 I am beginning to think that I have a hardware problem, otherwise I can't imagine why the mdadm restore should be so slow. I have done a benchmark on /dev/sdc using Ubuntu's disk utility GUI app, and the results looked normal so I know that sdc has the capability to write faster than this. I also had the same problem on a similar WD drive that I RMAd because of bad sectors. I suppose it's possible they sent me a replacement with bad sectors too, although there are no SMART values showing them yet. Any ideas? Thanks. As requested, output of top sorted by cpu usage (notice there is ~0 cpu usage). iowait is also zero which seems strange: top - 11:35:13 up 2 days, 9:40, 3 users, load average: 2.87, 2.58, 2.30 Tasks: 142 total, 1 running, 141 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3096304k total, 1482164k used, 1614140k free, 617672k buffers Swap: 1526132k total, 0k used, 1526132k free, 535416k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 45 root 20 0 0 0 0 S 0 0.0 2:17.02 scsi_eh_0 1 root 20 0 2808 1752 1204 S 0 0.1 0:00.46 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT 0 0 0 0 S 0 0.0 0:00.02 migration/0 4 root 20 0 0 0 0 S 0 0.0 0:00.17 ksoftirqd/0 5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 6 root RT 0 0 0 0 S 0 0.0 0:00.02 migration/1 ... dmesg errors, definitely looking like hardware: [202884.000157] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [202884.007015] ata5.00: failed command: FLUSH CACHE EXT [202884.013728] ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [202884.013730] res 40/00:00:ff:59:2e/00:00:35:00:00/e0 Emask 0x4 (timeout) [202884.033667] ata5.00: status: { DRDY } [202884.040329] ata5: hard resetting link [202889.400050] ata5: link is slow to respond, please be patient (ready=0) [202894.048087] ata5: COMRESET failed (errno=-16) [202894.054663] ata5: hard resetting link [202899.412049] ata5: link is slow to respond, please be patient (ready=0) [202904.060107] ata5: COMRESET failed (errno=-16) [202904.066646] ata5: hard resetting link [202905.840056] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [202905.849178] ata5.00: configured for UDMA/133 [202905.849188] ata5: EH complete [203899.000292] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [203899.007096] ata5.00: failed command: IDENTIFY DEVICE [203899.013841] ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in [203899.013843] res 40/00:00:ff:f9:f6/00:00:38:00:00/e0 Emask 0x4 (timeout) [203899.041232] ata5.00: status: { DRDY } [203899.048133] ata5: hard resetting link [203899.816134] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [203899.826062] ata5.00: configured for UDMA/133 [203899.826079] ata5: EH complete [204375.000200] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [204375.007421] ata5.00: failed command: IDENTIFY DEVICE [204375.014799] ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in [204375.014800] res 40/00:00:ff:0c:0f/00:00:39:00:00/e0 Emask 0x4 (timeout) [204375.044374] ata5.00: status: { DRDY } [204375.051842] ata5: hard resetting link [204380.408049] ata5: link is slow to respond, please be patient (ready=0) [204384.440076] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [204384.449938] ata5.00: configured for UDMA/133 [204384.449955] ata5: EH complete [204395.988135] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [204395.988140] ata5.00: failed command: IDENTIFY DEVICE [204395.988147] ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in [204395.988149] res 40/00:00:ff:0c:0f/00:00:39:00:00/e0 Emask 0x4 (timeout) [204395.988151] ata5.00: status: { DRDY } [204395.988156] ata5: hard resetting link [204399.320075] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [204399.330487] ata5.00: configured for UDMA/133 [204399.330503] ata5: EH complete

Search Results

Search found 250 results on 10 pages for 'zombie colonel'.

Page 9/10 | < Previous Page | 5 6 7 8 9 10 | Next Page >

- by mrc4r7m4n

- by Christopher

- by sds

- by JRameau

- by chrisbunney

- by Hendrik

- by Callmeed

- by futureal

- by Tim

- by James Watt

- by Fabio

- by user111196

- by Max

- by Gregor

- by incredimike

- by Jonathan

- by atomicguava

- by Mike S

- by George Clingerman

- by Annalyne

- by user432506

- by Mohamed Elgharabawy

- by user3682065

- by Alex

- by dimmer

< Previous Page | 5 6 7 8 9 10 | Next Page >