high - Page 18 - Developer IT

High data on recv-q buffer and thread lock on java.io.BufferedInputStream in linux

- by Sagar Patel

We have a java application running on linux (ubuntu server). We have been facing high recv-q problem since quite some time. Application gets hang and does not read data from socket every few hours. In thread dump, we have found below stack trace. "Receiver-146" daemon prio=10 tid=0x00007fb3fc010000 nid=0x7642 runnable [0x00007fb5906c5000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream. socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked <0x00000007688f1ff0> (a java.io.BufferedInputStream) at org.smpp.TCPIPConnection.receive(TCPIPConnection.java:413) at org.smpp.ReceiverBase.receivePDUFromConnection(ReceiverBase.java:197) at org.smpp.Receiver.receiveAsync(Receiver.java:351) at org.smpp.ReceiverBase.process(ReceiverBase.java:96) at org.smpp.util.ProcessingThread.run(ProcessingThread.java:199) at java.lang.Thread.run(Thread.java:722) We are not able to trace the exact reason behind this? Kindly help. We are using 16 core machine and load on the system is around 30-40 at the time of issue. We use command ss dst <ip> to find out recv-q. Recently we have been facing issues with recv-q size getting hung, were in receive buffer gets stuck at some point of time. But recvQ size is not decreasing and as a result we are losing a lot of hits from the other side, our application is not accepting any data.

Read the article

Netgear CG3000D new Modem/Router - Random High Ping

- by justin.chmura

Cox just recently came out and looked at my internet and decided that the modem I had was causing high latency issue. The speed was fine but the ping would spike to around 100 and over when gaming or putting a load more than browsing on the line. After they replaced it, it seems like I get better latency, but when it spikes, I get upwards of over 300 ping with like 500 jitter. I figured I would hit the serverfault universe before sending another email to Cox. I opted not to do the Cox setup as it was an extra $20 which I thought would have just setup the wireless (which I can handle). Is it a setting or something that I missed that needs to be setup? The firmware for the CG3000D is awful and not fun to use. I did change some hidden settings on the RgServices.asp page (I'll attach a screenshot). I've also heard that the Router/Modem combos are awful and that I should go back and just ask for a modem stand-alone. Any input is helpful. All screenshots: http://imgur.com/a/JX6qu#0

Read the article

Mysql server high trafic makes websites really slow or unable to load

- by Holapress

Lately we have been having a lot of problems with our mysql server, from websites being really slow or even unable to load them at all. The server is a dedicated server that only runs our mysql database. i have been running some test using a profiler (JetProfiler) and tool to stress test (loadUI). If I use loadUI to connect with 50 simultaneous connections to one of our websites that runs a resently big query it will already make the website be unable to load. One of the things that makes me worried is that when I look at Jetprofile it always shows a Treads_connected of 1.00 and it seems that when it hits around 2.00 that I'm unable to connect. The 3 big peaks are when I run a test with loadUI, first one was 15 simultaneous connections wich made it still able for me to load the website but just really slow, the second one was 40 simultaneous connections which already made it impossible to load and the third one was with 100 connection which also didn't make it load anymore. Another thing that worries me is that in JetProfiler it says all the queries that get used are full table scans, could this maybe be the problem? The website I run as a test runs 3 queries, one for a menu that outputs around 1000 rows, one for the adds that has around 560 rows and a big one to get posts that has around 7000 rows (see screenshot bellow) I also have monitored the cpu of the server and there seems to be no problem there, even when I make a lot of connections with loadui the cpu stays low. I can't seem to figure out what is the main cause of the websites being unable to load when there is a high amount of traffic, if anyone has other suggestions for testing or something that might cause the problem please let me know.

Read the article

Developing high-performance and scalable zend framework website

- by Daniel

We are going to develop an ads website like http://www.gumtree.com/ (it will not be like this one but just to give you an ideea) and we are having some issues regarding performance and scalability. We are planning on using Zend Framework for this project but this is all that I'm sure off at this point. I don't think a classic approch like Zend Framework (PHP) + MySQL + Memcache + jQuery (and I would throw Doctrine 2 in there to) will fix result in a high-performance application. I was thinking on making this a RESTful application (with Zend Framework) + NGINX (or maybe MongoDB) + Memcache (or eAccelerator -- I understand this will create problems with scalability on multiple servers) + jQuery, a CDN for static content, a server for images and a scalable server for the requests and the rest. My questions are: - What do you think about my approch? - What solutions would you recommand in terms of servers approch (MySQL, NGINX, MongoDB or pgsql) for a scalable application expected to have a lot of traffic using PHP?...I would be interested in your approch. Note: I'm a Zend Framework developer and don't have to much experience with the servers part (to determin what would be best solution for my scalable application)

Read the article

Linux 2.6.24-gentoo-r3-comtrance on x86_64 high Useage for unknown reasons

- by Dorjan

Hello everyone, I'm a complete rookie when it comes to all things Linux related so please treat me as such and assume I know nothing. That being said my Top says this: top - 12:08:03 up 11 days, 15:36, 0 users, load average: 5.47, 5.53, 5.46 Tasks: 296 total, 2 running, 294 sleeping, 0 stopped, 0 zombie Cpu(s): 6.3%us, 1.4%sy, 0.0%ni, 71.3%id, 20.6%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 8176880k total, 8118236k used, 58644k free, 89312k buffers Swap: 1004052k total, 0k used, 1004052k free, 7235652k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1229 root 15 -5 0 0 0 D 1 0.0 199:28.63 kjournald 2946 root 20 0 1716 676 552 D 1 0.0 145:02.94 syslogd 14553 root 20 0 2644 1268 876 R 1 0.0 0:00.34 top 14609 postfix 20 0 7896 1884 1460 D 1 0.0 0:00.02 bounce 14630 postfix 20 0 7896 1876 1452 R 0 0.0 0:00.00 bounce And my hard drives says: > df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda3 4925556 4474836 200508 96% / /dev/sda5 489992 36090 428602 8% /tmp /dev/sda6 377951852 236171160 122581816 66% /var none 4088440 0 4088440 0% /dev/shm It has been like it for a few days now... I know not what is causing the high server load (Normally around 1.3) can anyone give any tips on how to track down the culprit? Many thanks,

Read the article

HIGH CPU USAGE + low memory usage

- by hadi

as you can see in below , there are high cpu usage by httpd request. please help me to decrease them. thanks. 28577 apache 15 0 99676 53m 3488 S 21 0.2 1:13.67 httpd 28568 apache 15 0 99676 53m 3496 S 19 0.2 1:14.92 httpd 28608 apache 15 0 99676 53m 3428 R 19 0.2 0:28.28 httpd 28615 apache 15 0 99676 53m 3436 R 19 0.2 0:25.33 httpd 28616 apache 15 0 99676 53m 3440 S 19 0.2 0:25.83 httpd 28619 apache 15 0 99676 53m 3436 R 19 0.2 0:26.12 httpd 28635 apache 15 0 97.9m 54m 3416 S 19 0.2 0:24.86 httpd 28558 apache 15 0 97.9m 54m 3432 R 17 0.2 1:40.75 httpd 28560 apache 15 0 97.9m 54m 3496 R 17 0.2 1:40.02 httpd 28621 apache 15 0 97.9m 54m 3420 S 17 0.2 0:25.61 httpd 28641 apache 16 0 97.9m 54m 3428 R 17 0.2 0:21.52 httpd 28642 apache 15 0 99756 53m 3424 R 15 0.2 0:21.46 httpd 28643 apache 15 0 99676 53m 3424 S 15 0.2 0:21.59 httpd 28594 apache 15 0 99756 53m 3428 R 13 0.2 0:44.41 httpd 28618 apache 15 0 99676 53m 3420 S 13 0.2 0:26.15 httpd 28654 apache 15 0 99676 53m 3472 S 13 0.2 0:04.27 httpd 28575 apache 15 0 99756 53m 3436 R 11 0.2 1:14.02 httpd 28576 apache 15 0 99676 53m 3496 S 11 0.2 1:16.79 httpd 28634 apache 15 0 99676 53m 3436 S 11 0.2 0:25.36 httpd 28653 apache 15 0 99676 53m 3424 S 11 0.2 0:04.35 httpd 28574 apache 15 0 99676 53m 3440 S 10 0.2 1:13.05 httpd 28592 apache 15 0 99676 53m 3492 R 10 0.2 0:45.78 httpd 28595 apache 15 0 99676 53m 3432 R 10 0.2 0:47.02 httpd 28617 apache 16 0 99676 53m 3436 S 10 0.2 0:25.32 httpd 28620 apache 15 0 99676 53m 3432 S 10 0.2 0:25.35 httpd 28597 apache 15 0 99676 53m 3428 S 8 0.2 0:43.56 httpd 11345 mysql 15 0 2927m 198m 4472 R 4 0.6 1624:43 mysqld 1 root 15 0 2036 648 552 S 0 0.0 0:16.97 init 2 root RT 0 0 0 0 S 0 0.0 0:48.50 migration/0 3 root 34 19 0 0 0 S 0 0.0 0:26.72 ksoftirqd/0 4 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 5 root RT 0 0 0 0 S 0 0.0 0:04.98 migration/1 6 root 34 19 0 0 0 R 0 0.0 0:27.51 ksoftirqd/1 7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1 8 root RT 0 0 0 0 S 0 0.0 0:15.42 migration/2 9 root 34 19 0 0 0 S 0 0.0 0:26.50 ksoftirqd/2 10 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/2

Read the article

MySQL Extremely High Disk Activity (Read Operations)

- by Jake Schoermer

I have 1GB Linode VPS with a standard LAMP stack. Apache is tuned fine but for some reason MySQL's disk usage is high. This is causing really slow site load times. RAM and CPU usage are fine. Can anyone give me any pointers on tuning mysql's disk performance? I'm using InnoDB. iotop output is below. Total DISK READ: 38.50 M/s | Total DISK WRITE: 27.20 K/s TID PRIO USER DISK READ> DISK WRITE SWAPIN IO COMMAND 9808 be/4 mysql 22.40 M/s 0.00 B/s 0.00 % 63.75 % mysqld 10045 be/4 mysql 2.06 M/s 0.00 B/s 0.00 % 26.65 % mysqld 9987 be/4 mysql 1694.38 K/s 0.00 B/s 0.00 % 18.33 % mysqld 10015 be/4 mysql 1554.47 K/s 0.00 B/s 0.00 % 12.71 % mysqld 10019 be/4 mysql 1461.21 K/s 0.00 B/s 0.00 % 5.58 % mysqld 9839 be/4 mysql 1383.48 K/s 0.00 B/s 0.00 % 25.69 % mysqld 10031 be/4 mysql 1243.58 K/s 0.00 B/s 0.00 % 5.68 % mysqld 10023 be/4 mysql 1057.04 K/s 0.00 B/s 0.00 % 2.02 % mysqld 10020 be/4 mysql 1025.95 K/s 0.00 B/s 0.00 % 7.05 % mysqld 10001 be/4 mysql 808.33 K/s 683.97 K/s 0.00 % 1.16 % mysqld 10025 be/4 mysql 746.15 K/s 0.00 B/s 0.00 % 3.28 % mysqld 10043 be/4 mysql 715.06 K/s 0.00 B/s 0.00 % 0.48 % mysqld 10044 be/4 mysql 672.31 K/s 0.00 B/s 0.00 % 5.25 % mysqld 10034 be/4 mysql 668.42 K/s 1989.73 K/s 0.00 % 5.31 % mysqld 9985 be/4 mysql 450.80 K/s 124.36 K/s 0.00 % 8.83 % mysqld 9989 be/4 mysql 357.53 K/s 0.00 B/s 0.00 % 5.21 % mysqld 10033 be/4 mysql 186.54 K/s 0.00 B/s 0.00 % 1.59 % mysqld 10021 be/4 mysql 155.45 K/s 435.25 K/s 0.00 % 1.23 % mysqld 10007 be/4 mysql 124.36 K/s 0.00 B/s 0.00 % 0.53 % mysqld 9763 be/4 www-data 38.86 K/s 0.00 B/s 0.00 % 4.56 % apache2 -k start 10027 be/4 mysql 31.09 K/s 0.00 B/s 0.00 % 4.24 % mysqld 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd] 3 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0] 4 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0] 5 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/u:0] 6 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0] 7 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1]

Read the article

Getting started with webserver clustering.

- by Ernie

I work for a small ISP, and we host about 250 domains and all the stuff that goes along with that: DNS, mail, spam filtering, and backups. Currently, we have separate DNS servers (two of them) and mail servers (outgoing mail is actually on the secondary DNS server, but was previously on its own server). In the past, this was done as an insurance measure. The last thing we need is for some doofus (usually yours truly) to hose a server, taking out DNS and mail right along with it, or for spammers to jam our incoming SMTP server, preventing outgoing mail from being sent too. In the past, this was a problem, and our servers were set up the way they are now to combat it. However, clustering solutions like Sun's Cobalt RAQ (in days of olde) and Virtualmin appear to cater to an all-in-one approach, then deal with failures through redundant servers. I have avoided this thus far, but we've been using Virtualmin on our web server for a while now, and I'd like to expand into using it for a high availability cluster. Our networking partner has recently built a datacenter that has eliminated all of our other bugaboos like network, cooling, and power issues, so now the only thing left to go wrong is me hosing a server, which happened earlier this month. One of the bigger reasons we've avoided going this route is because our hardware requirements aren't particularly high. One server easily handles all the sites we host (most of them are flat sites). Also, load-balancing routers tend to be expensive and complicated. All that I'm really expecting to do is building a two-node cluster for redundancy so that when I hose a server (however rare that might be), we're not out for 8-12 hours while I rebuild it. What I need to know is how to get started, and if I'm really in a position to bother with this kind of thing at all.

Read the article

DNS failover in a two datacenter scenario

- by wanson

I'm trying to implement a low-cost solution for website high availability. I'm looking for the downsides of the following scenario: I have two servers with the same configuration, content, mysql replication (dual-master). They are in different datacenters - let's call them serverA and serverB. Users use serverA - serverB is more like a backup. Now, I want to use DNS failover, to switch users from serverA to serverB when serverA goes down. My idea is that I setup DNS servers (bind/powerdns) on serverA and serverB - let's call them ns1.website.com and ns2.website.com (assuming I own website.com). Then I configure my domain to use them as its nameservers. Both DNS servers will return serverA IP as my website's IP. If serverA goes down I can (either manually or automatically from serverB) change configuration of serverB's DNS, to return IP of serverB as website's IP. Of course the TTL will be low, as it's supposed to be in DNS failovers. I know that it may take some time to switch to serverB (DNS ttl, time to detect serverA failure, serverB DNS reconfiguration etc), and that some small part of users won't use serverB anyway. And I'm OK with that. But what are other downsides of such an approach? An alternative scenario is that ns1.website.com will return serverA IP as website's IP, and ns2.website.com will return serverB IP as website's IP. But AFAIK clients not always use primary nameserver and sometimes would use secondary one. So some small part of users would use serverB instead of serverA which is not quite what I'd like. Can you confirm that DNS clients behave like that and can you tell what percentage of clients would possibly use serverB instead of serverA (statistically)? This one also has the downside that when serverA goes back up, it will be automatically used as website's primary server, which is also a bad situation (cold cache, mysql replication could fail in the meantime etc). So I'm adding it only as a theoretical alternative. I was thinking about using some professional DNS failover companies but they charge for the number of DNS requests and the fees are very high (why?)

Read the article

"iostat" command different in two equal machines

- by Oz.

We have several machines on Amazon (ec2) of the type c1.xlarge with 8 cpus, running the Amazon AMI. Details on the machine: 7 GB of memory 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each) 1690 GB of instance storage 64-bit platform I/O Performance: High API name: c1.xlarge One out of the several machines is showing a high load average, since we have run the last yum upgrade a couple of weeks a go. We did not yet update the other machines, and everything looks normal on them. The strange thing is that the top command not showing any hint for the cause of the load. CPUs are - 4.8%us, 1.1%sy, 0.0%ni, 94.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st. Mem is about 1.5GB free. Any idea what could it be, or where else can we check? iostat command on the proper machine: avg-cpu: %user %nice %system %iowait %steal %idle 8.97 0.03 4.46 0.19 0.14 86.23 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvdap1 1.60 0.69 55.38 587620 47254184 xvdfp2 2.64 1.10 61.04 934786 52091056 xvdfp4 0.86 0.19 41.72 163866 35601920 xvdfp1 4.37 36.59 73.89 31220810 63051504 xvdfp3 8.03 7.08 94.63 6045402 80749184 iostat command on problematic machine: avg-cpu: %user %nice %system %iowait %steal %idle 9.29 0.04 5.55 0.26 0.11 84.74 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvdap1 2.13 3.34 68.85 246244 5077888 xvdfp1 7.60 74.31 104.88 5480362 7734840 xvdfp3 13.22 73.67 125.00 5433386 9218600 xvdfp4 1.11 0.76 65.08 55762 4799248 xvdfp2 4.16 3.31 99.17 243818 7313264 Many thanks for the help.

Read the article

Difference between all servers in one cluster and more than one cluster with servers?

- by silla

Not sure I understand what´s the difference or how it works when servers a running in one cluster or if there are more than one clusters with servers in it - regard High availability & Load Balancing. For me they are somehow the same, there is not really a big difference. Let´s make a simple example: 2 Servers in 1 Cluster 2 Clusters with each 1 Server - 1. If one Server failure, the other one is able to continue the work. The same for Load Balancing, these two Servers are able to balance the work together. - 2. The same thing! If one Server failure... The only thing that could be a problem with point 1. is if the Cluster fails (then both of the Server are dead). But is this even possible? I was reading stuff about clustering and high availability but I think I do not get this really. Probably I did not really understand how a cluster is working. Are these 2 points with 1 Cluster and 2 Clusters somehow the same or are there really some big differences? What should I know about it? Thank you

Read the article

Load Balance WCF and Share a Remote MSMQ for High Throughput

- by BarDev

After a ton of reading in books and on the web, I have noticed hints of information that WCF and MSMQ can be used in achieving high throughput. The information I have seen mentions using multiple WCF services in a farm that reads from a single MSMQ queue. The problem is that I have found paragraphs here and there that mentions that high throughput can be done, but I cannot seem to find a document of how to implement it. The following is an excerpt from a MSDN article. The following paragraph is from Best Practices for Queued Communication http://msdn.microsoft.com/en-us/library/ms731093.aspx To achieve higher throughput and availability, use a farm of WCF services that read from the queue. This requires that all of these services expose the same contract on the same endpoint. The farm approach works best for applications that have high production rates of messages because it enables a number of services to all read from the same queue. This is what I'm trying to solve. I have an intranet application where a client sends a request to a WCF service. But I want the ability to load balance the WCF services on multiple servers in a farm. I also want these WCF services in the farm to do transactional reads from a remote MSMQ when an item is available in the Queue. If this is possible, an issue I have is that I do not understand the activation process of WCF to retrieve messages from a remote queue. If this is possible, does anyone know of any articles or Webcasts that would explain it in detail? BarDev

Read the article

High apache load but little traffic logged

- by nrambeck

I recently installed Varnish to sit in front of Apache on a dedicated server running a single site. It appears to be working well, but the load on Apache is still very high. What doesn't make sense is that the Apache access log shows almost no traffic getting past Varnish. When I tail the apache log I see about 1-3 hits per second come through. Here is what the load on Apache looks like : USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND apache 13834 8.1 1.0 107716 34164 ? S 08:24 0:02 /usr/sbin/httpd apache 13835 8.1 1.0 107716 33856 ? S 08:24 0:02 /usr/sbin/httpd apache 11483 7.9 0.9 105916 30788 ? S 08:23 0:06 /usr/sbin/httpd apache 12255 7.5 1.0 107476 33312 ? S 08:24 0:04 /usr/sbin/httpd apache 9340 7.2 1.1 107732 34916 ? R 08:23 0:09 /usr/sbin/httpd apache 12029 6.8 0.9 106908 30416 ? S 08:24 0:04 /usr/sbin/httpd apache 11577 6.7 1.0 107192 34180 ? S 08:24 0:05 /usr/sbin/httpd apache 11486 6.6 1.0 106176 33112 ? S 08:23 0:05 /usr/sbin/httpd apache 11796 6.4 1.0 106936 31916 ? S 08:24 0:04 /usr/sbin/httpd apache 13815 6.3 1.0 107988 34464 ? S 08:24 0:02 /usr/sbin/httpd apache 18089 6.3 1.3 107444 43212 ? S 08:11 0:52 /usr/sbin/httpd apache 11797 5.9 1.0 107716 34580 ? S 08:24 0:04 /usr/sbin/httpd apache 7655 5.9 0.0 0 0 ? Z 08:22 0:09 [httpd] <defunct> mysql 8033 5.9 6.2 318240 199512 ? Sl May14 352:34 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/va apache 11488 5.8 0.9 106924 31632 ? S 08:23 0:04 /usr/sbin/httpd apache 9375 5.7 1.1 106956 35552 ? S 08:23 0:07 /usr/sbin/httpd apache 3551 5.6 1.1 106956 36140 ? S 08:20 0:14 /usr/sbin/httpd apache 7657 5.6 1.0 106968 32472 ? S 08:22 0:09 /usr/sbin/httpd apache 11433 5.6 1.0 107716 34396 ? S 08:23 0:04 /usr/sbin/httpd apache 5505 5.5 1.1 106944 34924 ? S 08:21 0:12 /usr/sbin/httpd apache 7172 5.3 1.1 106972 35368 ? S 08:22 0:09 /usr/sbin/httpd apache 10088 5.2 0.9 106160 31240 ? S 08:23 0:04 /usr/sbin/httpd apache 7656 5.1 1.0 106436 34388 ? S 08:22 0:08 /usr/sbin/httpd apache 3468 5.0 1.1 107716 35968 ? S 08:20 0:13 /usr/sbin/httpd apache 14242 4.8 1.0 107728 33032 ? S 08:25 0:00 /usr/sbin/httpd apache 3578 4.8 1.1 107988 35964 ? S 08:20 0:12 /usr/sbin/httpd apache 28192 4.8 1.2 106944 38060 ? S 08:17 0:23 /usr/sbin/httpd apache 3277 4.6 1.1 106956 35688 ? S 08:20 0:13 /usr/sbin/httpd apache 15434 3.7 0.7 106908 24684 ? S 08:25 0:00 /usr/sbin/httpd There is a default apache log and then one other VirtualHost log setup. I'm concerned that Apache is handling some kind of traffic that is not being logged. Is that possible? And is there anything I can do to capture that traffic?

Read the article

Determining cause of high NFS/IO utilization without iotop

- by Matt

I have a server that is doing an NFSv4 export for user's home directories. There are roughly 25 users (mostly developers/analysts) and about 40 servers mounting the home directory export. Performance is miserable, with users often seeing multi-second lags for simple commands (like ls, or writing a small text file). Sometimes the home directory mount completely hangs for minutes, with users getting "permission denied" errors. The hardware is a Dell R510 with dual E5620 CPUs and 8 GB RAM. There are eight 15k 2.5” 600 GB drives (Seagate ST3600057SS) configured in hardware RAID-6 with a single hot spare. RAID controller is a Dell PERC H700 w/512MB cache (Linux sees this as a LSI MegaSAS 9260). OS is CentOS 5.6, home directory partition is ext3, with options “rw,data=journal,usrquota”. I have the HW RAID configured to present two virtual disks to the OS: /dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for the home directories. What I find curious, and suspicious, is that the sda device often has very high utilization, even though it only contains the OS. I would expect this virtual drive to be idle almost all the time. The system is not swapping, according to "free" and "vmstat". Why would there be major load on this device? Here is a 30-second snapshot from iostat: Time: 09:37:28 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 44.09 0.03 107.76 0.13 607.40 11.27 0.89 8.27 7.27 78.35 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 44.09 0.03 107.76 0.13 607.40 11.27 0.89 8.27 7.27 78.35 sdb 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 sdb1 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 dm-0 0.00 0.00 0.03 151.82 0.13 607.26 8.00 1.25 8.23 5.16 78.35 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 dm-3 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 Looks like iotop is the ideal tool to use to sniff out these kinds of issues. But I'm on CentOS 5.6, which doesn't have a new enough kernel to support that program. I looked at Determining which process is causing heavy disk I/O?, and besides iotop, one of the suggestions said to do a "echo 1 /proc/sys/vm/block_dump". I did that (after directing kernel messages to tempfs). In about 13 minutes I had about 700k reads or writes, roughly half from kjournald and the other half from nfsd: # egrep " kernel: .*(READ|WRITE)" messages | wc -l 768439 # egrep " kernel: kjournald.*(READ|WRITE)" messages | wc -l 403615 # egrep " kernel: nfsd.*(READ|WRITE)" messages | wc -l 314028 For what it's worth, for the last hour, utilization has constantly been over 90% for the home directory drive. My 30-second iostat keeps showing output like this: Time: 09:36:30 PM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.46 0.20 11.33 0.80 71.71 12.58 0.24 20.53 14.37 16.56 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 6.46 0.20 11.33 0.80 71.71 12.58 0.24 20.53 14.37 16.56 sdb 137.29 7.00 549.92 3.80 22817.19 43.19 82.57 3.02 5.45 1.74 96.32 sdb1 137.29 7.00 549.92 3.80 22817.19 43.19 82.57 3.02 5.45 1.74 96.32 dm-0 0.00 0.00 0.20 17.76 0.80 71.04 8.00 0.38 21.21 9.22 16.57 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 687.47 10.80 22817.19 43.19 65.48 4.62 6.61 1.43 99.81 dm-3 0.00 0.00 687.47 10.80 22817.19 43.19 65.48 4.62 6.61 1.43 99.82

Read the article

Diagnosing packet loss / high latency in Ubuntu

- by Sam Gammon

We have a Linux box (Ubuntu 12.04) running Nginx (1.5.2), which acts as a reverse proxy/load balancer to some Tornado and Apache hosts. The upstream servers are physically and logically close (same DC, sometimes same-rack) and show sub-millisecond latency between them: PING appserver (10.xx.xx.112) 56(84) bytes of data. 64 bytes from appserver (10.xx.xx.112): icmp_req=1 ttl=64 time=0.180 ms 64 bytes from appserver (10.xx.xx.112): icmp_req=2 ttl=64 time=0.165 ms 64 bytes from appserver (10.xx.xx.112): icmp_req=3 ttl=64 time=0.153 ms We receive a sustained load of about 500 requests per second, and are currently seeing regular packet loss / latency spikes from the Internet, even from basic pings: sam@AM-KEEN ~> ping -c 1000 loadbalancer PING 50.xx.xx.16 (50.xx.xx.16): 56 data bytes 64 bytes from loadbalancer: icmp_seq=0 ttl=56 time=11.624 ms 64 bytes from loadbalancer: icmp_seq=1 ttl=56 time=10.494 ms ... many packets later ... Request timeout for icmp_seq 2 64 bytes from loadbalancer: icmp_seq=2 ttl=56 time=1536.516 ms 64 bytes from loadbalancer: icmp_seq=3 ttl=56 time=536.907 ms 64 bytes from loadbalancer: icmp_seq=4 ttl=56 time=9.389 ms ... many packets later ... Request timeout for icmp_seq 919 64 bytes from loadbalancer: icmp_seq=918 ttl=56 time=2932.571 ms 64 bytes from loadbalancer: icmp_seq=919 ttl=56 time=1932.174 ms 64 bytes from loadbalancer: icmp_seq=920 ttl=56 time=932.018 ms 64 bytes from loadbalancer: icmp_seq=921 ttl=56 time=6.157 ms --- 50.xx.xx.16 ping statistics --- 1000 packets transmitted, 997 packets received, 0.3% packet loss round-trip min/avg/max/stddev = 5.119/52.712/2932.571/224.629 ms The pattern is always the same: things operate fine for a while (<20ms), then a ping drops completely, then three or four high-latency pings (1000ms), then it settles down again. Traffic comes in through a bonded public interface (we will call it bond0) configured as such: bond0 Link encap:Ethernet HWaddr 00:xx:xx:xx:xx:5d inet addr:50.xx.xx.16 Bcast:50.xx.xx.31 Mask:255.255.255.224 inet6 addr: <ipv6 address> Scope:Global inet6 addr: <ipv6 address> Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:527181270 errors:1 dropped:4 overruns:0 frame:1 TX packets:413335045 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:240016223540 (240.0 GB) TX bytes:104301759647 (104.3 GB) Requests are then submitted via HTTP to upstream servers on the private network (we can call it bond1), which is configured like so: bond1 Link encap:Ethernet HWaddr 00:xx:xx:xx:xx:5c inet addr:10.xx.xx.70 Bcast:10.xx.xx.127 Mask:255.255.255.192 inet6 addr: <ipv6 address> Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:430293342 errors:1 dropped:2 overruns:0 frame:1 TX packets:466983986 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:77714410892 (77.7 GB) TX bytes:227349392334 (227.3 GB) Output of uname -a: Linux <hostname> 3.5.0-42-generic #65~precise1-Ubuntu SMP Wed Oct 2 20:57:18 UTC 2013 x86_64 GNU/Linux We have customized sysctl.conf in an attempt to fix the problem, with no success. Output of /etc/sysctl.conf (with irrelevant configs omitted): # net: core net.core.netdev_max_backlog = 10000 # net: ipv4 stack net.ipv4.tcp_ecn = 2 net.ipv4.tcp_sack = 1 net.ipv4.tcp_fack = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_timestamps = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_max_syn_backlog = 10000 net.ipv4.tcp_congestion_control = cubic net.ipv4.ip_local_port_range = 8000 65535 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_thin_dupack = 1 net.ipv4.tcp_thin_linear_timeouts = 1 net.netfilter.nf_conntrack_max = 99999999 net.netfilter.nf_conntrack_tcp_timeout_established = 300 Output of dmesg -d, with non-ICMP UFW messages suppressed: [508315.349295 < 19.852453>] [UFW BLOCK] IN=bond1 OUT= MAC=<mac addresses> SRC=118.xx.xx.143 DST=50.xx.xx.16 LEN=68 TOS=0x00 PREC=0x00 TTL=51 ID=43221 PROTO=ICMP TYPE=3 CODE=1 [SRC=50.xx.xx.16 DST=118.xx.xx.143 LEN=40 TOS=0x00 PREC=0x00 TTL=249 ID=10220 DF PROTO=TCP SPT=80 DPT=53817 WINDOW=8190 RES=0x00 ACK FIN URGP=0 ] [517787.732242 < 0.443127>] Peer 190.xx.xx.131:59705/80 unexpectedly shrunk window 1155488866:1155489425 (repaired) How can I go about diagnosing the cause of this problem, on a Debian-family Linux box?

Read the article

Having problems with high CPU usage and apparent memory leak of Exim

- by Dancrumb

I'm having problems with my server and am hoping you can help. The culprit appears to be exim. The CPU usage is consistently high and the memory usage trends up and up and up for no apparent reason (this is not a heavily used server). To demonstrate the issue, I ran the following: root@server [/var/log]# service exim restart; for iter in `seq 0 9`; do date; top -n1 | grep exim; sleep 10; done Shutting down exim: [ OK ] Shutting down spamd: [ OK ] Starting exim: [ OK ] Sun Jun 6 18:12:07 CDT 2010 62592 root 25 0 11400 6572 2356 R 51.5 1.3 0:00.92 exim 62587 mailnull 18 0 7548 1212 792 S 0.0 0.2 0:00.00 exim Sun Jun 6 18:12:18 CDT 2010 62592 root 25 0 28768 23m 2356 R 57.4 4.6 0:06.75 exim 62587 mailnull 18 0 7548 1212 792 S 0.0 0.2 0:00.00 exim 62588 root 18 0 7536 2052 1648 S 0.0 0.4 0:00.00 exim Sun Jun 6 18:12:28 CDT 2010 62592 root 25 0 36408 30m 2356 R 55.5 6.0 0:12.59 exim 62587 mailnull 18 0 7548 1212 792 S 0.0 0.2 0:00.00 exim 62588 root 18 0 7536 2052 1648 S 0.0 0.4 0:00.00 exim Sun Jun 6 18:12:39 CDT 2010 62592 root 25 0 41396 35m 2356 R 53.5 7.0 0:18.35 exim 62587 mailnull 18 0 7548 1212 792 S 0.0 0.2 0:00.00 exim 62588 root 18 0 7536 2052 1648 S 0.0 0.4 0:00.00 exim Sun Jun 6 18:12:49 CDT 2010 62592 root 25 0 45868 40m 2356 R 47.5 7.8 0:24.06 exim 62587 mailnull 18 0 7548 1212 792 S 0.0 0.2 0:00.00 exim 62588 root 18 0 7536 2052 1648 S 0.0 0.4 0:00.00 exim Sun Jun 6 18:13:00 CDT 2010 62592 root 25 0 50056 44m 2356 R 55.3 8.6 0:29.84 exim 62587 mailnull 18 0 7548 1212 792 S 0.0 0.2 0:00.00 exim 62588 root 18 0 7536 2052 1648 S 0.0 0.4 0:00.00 exim Sun Jun 6 18:13:10 CDT 2010 62592 root 25 0 53888 47m 2356 R 55.2 9.4 0:35.63 exim 62587 mailnull 18 0 7548 1212 792 S 0.0 0.2 0:00.00 exim 62588 root 18 0 7536 2052 1648 S 0.0 0.4 0:00.00 exim Sun Jun 6 18:13:21 CDT 2010 62592 root 20 0 56920 50m 2356 R 55.3 9.9 0:41.15 exim 62587 mailnull 18 0 7548 1212 792 S 0.0 0.2 0:00.00 exim 62588 root 18 0 7536 2052 1648 S 0.0 0.4 0:00.00 exim Sun Jun 6 18:13:31 CDT 2010 62592 root 25 0 60380 54m 2356 R 53.4 10.6 0:46.98 exim 62587 mailnull 18 0 7548 1212 792 S 0.0 0.2 0:00.00 exim 62588 root 18 0 7536 2052 1648 S 0.0 0.4 0:00.00 exim Sun Jun 6 18:13:42 CDT 2010 62592 root 22 0 63400 57m 2356 R 49.5 11.2 0:52.74 exim 62587 mailnull 18 0 7548 1212 792 S 0.0 0.2 0:00.00 exim 62588 root 18 0 7536 2052 1648 S 0.0 0.4 0:00.00 exim After some time, it gets to a rate of picking up an extra MB every 10s. I've checked the exim logs and there are no messages coming in there. exim -bV shows: Exim version 4.69 #1 built 16-Mar-2009 14:44:43 Copyright (c) University of Cambridge 2006 Berkeley DB: Sleepycat Software: Berkeley DB 4.2.52: (February 22, 2005) Support for: crypteq iconv() IPv6 PAM Perl OpenSSL Content_Scanning Old_Demime Experimental_SPF Experimental_SRS Experimental_DomainKeys Lookups: lsearch wildlsearch nwildlsearch iplsearch dbm dbmnz passwd Authenticators: cram_md5 dovecot plaintext spa Routers: accept dnslookup ipliteral manualroute queryprogram redirect Transports: appendfile/maildir autoreply pipe smtp Size of off_t: 8 Configuration file is /etc/exim.conf I'm at something of a loss as to how to proceed. Any recommendations would be well received!

Read the article

Troubleshooting High Load on Plesk LAMP Dedicated Server

- by Callmeed

I have 2 nearly identical dedicated servers with the same provider. They also run a nearly identical software stack: RedHat 5 64-bit, Plesk, PHP, Apache, & MySQL. We use them for hosting custom sites we build. The problem is, while our 1st server has a load average (in top) of around 0.3, the 2nd server consistently has a load average of around 4.0 or higher. Basic functions in Plesk are delayed and there is a bit of latency when executing shell commands. Anyone have ideas why it would be so high? And why it would differ from our other server so much? Here is my current top output (sorted by %MEM) ... Any help is much appreciated ... top - 21:48:04 up 100 days, 4:28, 1 user, load average: 3.74, 4.20, 4.23 Tasks: 336 total, 1 running, 335 sleeping, 0 stopped, 0 zombie Cpu(s): 0.8%us, 0.4%sy, 0.0%ni, 91.3%id, 7.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 12290884k total, 11886452k used, 404432k free, 2920212k buffers Swap: 2096472k total, 244k used, 2096228k free, 6560692k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22536 apache 15 0 860m 547m 6484 S 0.0 4.6 0:10.96 httpd 26467 apache 15 0 859m 546m 6408 S 0.0 4.5 0:07.67 httpd 3620 apache 15 0 859m 545m 5552 S 0.0 4.5 0:06.15 httpd 1895 apache 15 0 858m 544m 6356 S 0.0 4.5 0:08.25 httpd 16933 apache 15 0 858m 544m 5488 S 0.0 4.5 0:01.57 httpd 6431 apache 15 0 856m 542m 6076 S 10.6 4.5 0:05.32 httpd 14417 apache 15 0 856m 542m 5568 S 0.0 4.5 0:03.88 httpd 15403 apache 15 0 855m 541m 5616 S 0.0 4.5 0:03.73 httpd 19165 apache 15 0 853m 539m 6252 S 0.0 4.5 0:12.40 httpd 15898 apache 15 0 852m 539m 5376 S 0.0 4.5 0:02.68 httpd 14401 apache 15 0 851m 538m 5460 S 0.0 4.5 0:02.97 httpd 15393 apache 15 0 851m 538m 5404 S 0.0 4.5 0:03.12 httpd 15427 apache 15 0 851m 538m 5496 S 0.0 4.5 0:02.44 httpd 14412 apache 15 0 851m 538m 5324 S 0.0 4.5 0:02.15 httpd 18330 apache 15 0 851m 537m 5136 S 0.0 4.5 0:01.30 httpd 18303 apache 15 0 848m 535m 5140 S 0.0 4.5 0:00.47 httpd 21190 apache 15 0 845m 533m 3988 S 0.0 4.4 0:00.33 httpd 15923 root 18 0 822m 521m 9928 S 0.0 4.3 10:04.81 httpd 22021 apache 15 0 828m 520m 4964 S 0.0 4.3 0:00.16 httpd 22146 apache 15 0 823m 515m 3016 S 0.0 4.3 0:00.02 httpd 22345 apache 15 0 822m 514m 2408 S 0.0 4.3 0:00.00 httpd 14721 apache 15 0 733m 510m 488 S 0.0 4.3 0:00.00 httpd 5094 root 15 0 1452m 122m 15m S 1.0 1.0 852:24.24 java 4636 mysql 15 0 532m 57m 6440 S 1.0 0.5 488:05.84 mysqld 4799 popuser 15 0 166m 53m 2368 S 0.0 0.4 0:36.64 spamd 16761 popuser 15 0 159m 46m 2312 S 0.0 0.4 0:00.38 spamd 4797 root 15 0 158m 45m 2448 S 0.0 0.4 0:01.27 spamd 5074 root 34 19 255m 20m 2144 S 0.0 0.2 1:37.53 yum-updatesd 9917 named 15 0 366m 9804 1980 S 0.0 0.1 0:10.26 named 4332 sso 18 0 119m 8028 5212 S 0.0 0.1 0:00.06 sw-engine-cgi 4341 sso 18 0 119m 8028 5212 S 0.0 0.1 0:00.07 sw-engine-cgi 4350 sso 18 0 119m 8028 5212 S 0.0 0.1 0:00.09 sw-engine-cgi 4352 sso 18 0 119m 8028 5212 S 0.0 0.1 0:00.11 sw-engine-cgi 4376 ntp 15 0 23388 5020 3896 S 0.0 0.0 0:00.58 ntpd 4331 sw-cp-se 15 0 61336 4572 1480 S 0.0 0.0 5:53.22 sw-cp-serverd 4213 haldaemo 15 0 31252 4460 1684 S 0.0 0.0 0:01.52 hald 4778 postgres 18 0 117m 4164 3484 S 0.0 0.0 0:00.11 postmaster 18555 root 16 0 98.3m 3716 2852 S 0.0 0.0 0:00.01 sshd 4488 sso 18 0 119m 3044 224 S 0.0 0.0 0:00.00 sw-engine-cgi 4489 sso 18 0 119m 3044 224 S 0.0 0.0 0:00.00 sw-engine-cgi 4492 sso 18 0 119m 3044 224 S 0.0 0.0 0:00.00 sw-engine-cgi 4493 sso 18 0 119m 3044 224 S 0.0 0.0 0:00.00 sw-engine-cgi 4490 sso 18 0 119m 3040 220 S 0.0 0.0 0:00.00 sw-engine-cgi

Read the article

How to resolve high CPU + excessive stat("/etc/localtime") and clock_gettime(CLOCK_REALTIME) calls

- by Yemster

I've been experiencing really high CPU on a ruby on rails app (see stack below) and have been trying to diagnose the possible causes to no avail. Stack: ruby 1.9.3 rails 3.2.6 Apache/2.2.21 (Debian) Phusion Passenger 3.0.11 Whenever I run strace against the spiking Rack process PID (see Top excerpt below), I am seeing a tonne of stat("/etc/localtime") and clock_gettime(CLOCK_REALTIME) calls and have no idea how to stop these. Excerpt from Top showin running PID: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11674 www-user 20 0 313m 182m 5076 R 99 2.3 63:04.60 Rack: /var/www/my_rails_app/current 11634 www-user 20 0 411m 216m 5144 S 10 2.7 197:55.63 Rack: /var/www/my_rails_app/current Strace snippet below: [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 141474018}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 141577456}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 143073982}) = 0 [pid 11674] poll([{fd=15, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [pid 11674] write(15, "b\0\0\0\3SELECT `images`.* FROM `ima"..., 102) = 102 [pid 11674] read(15, "\1\0\0\1\0229\0\0\2\3def\23myappy_productio"..., 16384) = 2063 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 144138035}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 ... [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 154076443}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 154189429}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 157185700}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 157298770}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 165076003}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 165212572}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 167542679}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 167683436}) = 0 .... [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 62052248}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 62182486}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 62919948}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 63057266}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 63751707}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 73730686}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 75874687}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 76077133}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 78205019}) = 0 ... [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 89370879}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 89583247}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 91637614}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 91782149}) = 0 Have Google'd around and came across a number of suggestions which I've tried with no success. Things tried so far: Have tried setting time zone as recommended here Made no difference and issue still persists. Content of my /etc/localtime: TZif2UTCTZif2UTC UTC0 Have tried the recommended fix for the leapsecond bug: date -s 'date' No joy so far. I'm fresh out of ideas so any help/advice on how to diagnose or resolve would be greatly appreciated.

Read the article

Load average is have been high over some period

- by user111196

We have a dedicated MySQL server and below is the a snapshot of the top. The load average has been staying at nearly 100 for an hour plus ready. top - 20:54:28 up 7:31, 2 users, load average: 83.08, 96.88, 106.23 Tasks: 278 total, 2 running, 274 sleeping, 2 stopped, 0 zombie Cpu0 : 18.8%us, 10.2%sy, 0.0%ni, 70.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 51.2%us, 4.3%sy, 0.0%ni, 44.2%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu2 : 9.0%us, 10.3%sy, 0.0%ni, 80.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 18.8%us, 7.4%sy, 0.0%ni, 73.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 7.8%us, 8.8%sy, 0.0%ni, 83.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 10.3%us, 8.4%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 6.2%us, 7.5%sy, 0.0%ni, 86.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 6.2%us, 6.2%sy, 0.0%ni, 87.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu8 : 8.8%us, 10.4%sy, 0.0%ni, 80.5%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu9 : 63.7%us, 4.6%sy, 0.0%ni, 12.2%id, 0.0%wa, 4.3%hi, 15.2%si, 0.0%st Cpu10 : 9.2%us, 10.2%sy, 0.0%ni, 80.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu11 : 17.3%us, 5.9%sy, 0.0%ni, 76.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu12 : 8.0%us, 8.7%sy, 0.0%ni, 83.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu13 : 10.9%us, 7.4%sy, 0.0%ni, 81.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu14 : 6.2%us, 6.9%sy, 0.0%ni, 86.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 4.8%us, 6.1%sy, 0.0%ni, 89.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 33009800k total, 23174396k used, 9835404k free, 120604k buffers Swap: 35061752k total, 0k used, 35061752k free, 16459540k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3341 mysql 20 0 14.3g 4.6g 4240 S 417.8 14.5 1673:51 mysqld 24406 root 20 0 15008 1292 876 R 0.3 0.0 0:00.19 top 1 root 20 0 4080 852 608 S 0.0 0.0 0:01.92 init 2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0.0 0.0 0:00.32 migration/0 4 root 15 -5 0 0 0 S 0.0 0.0 0:00.29 ksoftirqd/0 5 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 6 root RT -5 0 0 0 S 0.0 0.0 0:03.21 migration/1 7 root 15 -5 0 0 0 S 0.0 0.0 0:00.07 ksoftirqd/1 8 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1 9 root RT -5 0 0 0 S 0.0 0.0 0:00.17 migration/2 10 root 15 -5 0 0 0 S 0.0 0.0 0:00.03 ksoftirqd/2 11 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/2 12 root RT -5 0 0 0 S 0.0 0.0 0:00.32 migration/3 13 root 15 -5 0 0 0 S 0.0 0.0 0:00.02 ksoftirqd/3 14 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/3 15 root RT -5 0 0 0 S 0.0 0.0 0:00.10 migration/4 16 root 15 -5 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/4 17 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/4 18 root RT -5 0 0 0 S 0.0 0.0 0:00.35 migration/5 We have also tried to run this command. What else command can help us diagnose the exact problem of this high load? netstat -nat |grep 3306 | awk '{print $6}' | sort | uniq -c | sort -n 1 LISTEN 1 SYN_RECV 410 ESTABLISHED 964 TIME_WAIT Output of vmstat 1: ---------------memory--------------- --swap-- --io-- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 0 12978936 30944 15172360 0 0 259 3 184 265 6 6 77 12 0

Read the article

BizTalk host throttling – Singleton pattern and High database size

- by S.E.R.

Originally posted on: http://geekswithblogs.net/SERivas/archive/2013/06/30/biztalk-host-throttling-ndash-singleton-pattern-and-high-database-size.aspxI have worked for some days around the singleton pattern (for those unfamiliar with it, read this post by Victor Fehlberg) and have come across a few very interesting posts, among which one dealt with performance issues (here, also by Victor Fehlberg). Simply put: if you have an orchestration which implements the singleton pattern, then performances will continuously decrease as the orchestration receives and consumes messages, and that behavior is more obvious when the orchestration never ends (ie : it keeps looping and never terminates or completes). As I experienced the same kind of problem (actually I was alerted by SCOM, which told me that the host was being throttled because of High database size), I thought it would be a good idea to dig a little bit a see what happens deep inside BizTalk and thus understand the reasons for this behavior. NOTE: in this article, I will focus on this High database size throttling condition. I will try and work on the other conditions in some not too distant future… Test conditions The singleton orchestration For the purpose of this study, I have created the following orchestration, which is a very basic implementation of a singleton that piles up incoming messages, then does something else when a certain timeout has been reached without receiving another message: Throttling settings I have two distinct hosts : one that hosts the receive port (basic FILE port) : Ports_ReceiveHostone that hosts the orchestration : ProcessingHost In order to emphasize the throttling mechanism, I have modified the throttling settings for each of these hosts are as follows (all other parameters are set to the default value): [Throttling thresholds] Message count in database: 500 (default value : 50000) Evolution of performance counters when submitting messages Since we are investigating the High database size throttling condition, here are the performance counter that we should take a look at (all of them are in the BizTalk:Message Agent performance object): Database sizeHigh database sizeMessage delivery throttling stateMessage publishing throttling stateMessage delivery delay (ms)Message publishing delay (ms)Message delivery throttling state durationMessage publishing throttling state duration (If you are not used to Perfmon, I strongly recommend that you start using it right now: it is a wonderful tool that allows you to open the hood and see what is going on inside BizTalk – and other systems) Database size It is quite obvious that we will start by watching the database size and high database size counters, just to see when the first reaches the configured threshold (500) and when the second rings the alarm. NOTE : During this test I submitted 600 messages, one message at a time every 10ms to see the evolution of the counters we have previously selected. It might not show very well on this screenshot, but here is what happened: From 15:46:50 to 15:47:50, the database size for the Ports_ReceiveHost host (blue line) kept growing until it reached a maximum of 504.At 15:47:50, the high database size alert fires At first I was surprised by this result: why is it the database size of the receiving host that keeps growing since it is the processing host that piles up messages? Actually, it makes total sense. This counter measures the size of the database queue that is being filled by the host, not consumed. Therefore, the high database size alert is raised on the host that fills the queue: Ports_ReceiveHost. More information is available on the Public MPWiki page. Now, looking at the Message publishing throttling state for the receiving host (green line), we can see that a throttling condition has been reached at 15:47:50: We can also see that the Message publishing delay(ms) (blue line) has begun growing slowly from this point. All of this explains why performances keep decreasing when a singleton keeps processing new messages: the database size grows and when it has exceeded the Message count in database threshold, the host is throttled and the publishing delay keeps increasing. Digging further So, what happens to the database queue then? Is it flushed some day or does it keep growing and growing indefinitely? The real question being: will the host be throttled forever because of this singleton? To answer this question, I set the Message count in database threshold to 20 (this value is very low in order not to wait for too long, otherwise I certainly would have fallen asleep in front of my screen) and I submitted 30 messages. The test was started at 18:26. At 18:56 (ie : exactly 30min later) the throttling was stopped and the database size was divided by 2. 30 min later again, the database size had dropped to almost zero: I guess I’ll have to find some documentation and do some more testing before I sort this out! My guess is that some maintenance job is at work here, though I cannot tell which one Digging even further If we take a look at the Message delivery throttling state counter for the processing host, we can see that this host was also throttled during the submission of the 600 documents: The value for the counter was 1, meaning that Message delivery incoming rate for the host instance exceeds the Message delivery outgoing rate * the specified Rate overdrive factor (percent) value. We will see this another day… :) A last word Let’s end this article with a warning: DO NOT CHANGE THE THROTTLING SETTINGS LIGHTLY! The temptation can be great to just bypass throttling by setting very high values for each parameter (or zero in some cases, which simply disables throttling). Nevertheless, always keep in mind that this mechanism is here for a very good reason: prevent your BizTalk infrastructure from exploding!! So whatever you do with those settings, do a lot of testing and benchmarking!

Read the article

What are some reputable merchant account providers for high risk payment web sites?

- by GregH

I am helping to set up an online cigar web site. However, it has become a real pain to take payments online since tobacco is considered a "high-risk" item and nobody will provide a merchant account to process the payments. It looks like there are companies that specialize in high-risk merchant accounts. I was wondering if anybody could recommend a high-risk merchant account and payment processing provider?

Read the article

How Do Top Performing High Tech Companies Measure Online Marketing Success?

- by Charles Knapp

You might expect a focus on Net Promoter scores, open rates, and click metrics. The real answers from top performers may surprise you. I've been working for a few months with Aberdeen Group and colleagues from IBM and Oracle to survey high technology firms worldwide on best practices in marketing and channel sales effectiveness. Now, we will share the results of our original customer research in a new white paper and webcast. Register today to learn how leading High Tech companies are increasing their Return on Marketing Investment (ROMI) and growing channel sales revenue. Discover how top performing high tech companies manage and use customer data, measure marketing spend effectiveness, and support internal and channel sales. Learn how best in class high tech companies use enterprise data throughout their customer lifecycle -- messaging to leads, selling to prospects, and serving customers. Our speakers will be: Peter Ostrow, Research Director - Sales Effectiveness, Aberdeen Group David Lasher, Global Business Services Partner, IBM Jonathan Oomrigar, Vice President, Global High Technology Business Unit, Oracle Reserve your place now! This global webinar is on Tuesday, November 15, 10-11 am PST / 1-2 pm EST / 6-7 GMT / 7-8 CET

Read the article

What is causing the unusual high load average?

- by James

I noticed on Tuesday night of last week, the load average went up sharply and it seemed abnormal since the traffic is small. Usually, the numbers usually average around .40 or lower and my server stuff (mysql, php and apache) are optimized. I noticed that the IOWait is unusually high even though the processes is barely using any CPU. top - 01:44:39 up 1 day, 21:13, 1 user, load average: 1.41, 1.09, 0.86 Tasks: 60 total, 1 running, 59 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 0.0%sy, 0.0%ni, 91.5%id, 8.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1048576k total, 331944k used, 716632k free, 0k buffers Swap: 0k total, 0k used, 0k free, 0k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 15 0 2468 1376 1140 S 0 0.1 0:00.92 init 1656 root 15 0 13652 5212 664 S 0 0.5 0:00.00 apache2 9323 root 18 0 13652 5212 664 S 0 0.5 0:00.00 apache2 10079 root 18 0 3972 1248 972 S 0 0.1 0:00.00 su 10080 root 15 0 4612 1956 1448 S 0 0.2 0:00.01 bash 11298 root 15 0 13652 5212 664 S 0 0.5 0:00.00 apache2 11778 chikorit 15 0 2344 1092 884 S 0 0.1 0:00.05 top 15384 root 18 0 17544 13m 1568 S 0 1.3 0:02.28 miniserv.pl 15585 root 15 0 8280 2736 2168 S 0 0.3 0:00.02 sshd 15608 chikorit 15 0 8280 1436 860 S 0 0.1 0:00.02 sshd Here is the VMStat procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 0 768644 0 0 0 0 14 23 0 10 1 0 99 0 IOStat - Nothing unusal Total DISK READ: 67.13 K/s | Total DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 19496 be/4 chikorit 11.85 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19501 be/4 mysql 3.95 K/s 0.00 B/s 0.00 % 0.00 % mysqld 19568 be/4 chikorit 11.85 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19569 be/4 chikorit 11.85 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19570 be/4 chikorit 11.85 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19571 be/4 chikorit 7.90 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19573 be/4 chikorit 7.90 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init 11778 be/4 chikorit 0.00 B/s 0.00 B/s 0.00 % 0.00 % top 19470 be/4 mysql 0.00 B/s 0.00 B/s 0.00 % 0.00 % mysqld Load Average Chart - http://i.stack.imgur.com/kYsD0.png I want to be sure if this is not a MySQL problem before making sure. Also, this is a Ubuntu 10.04 LTS Server on OpenVZ. Edit: This will probably give a good picture on the IO Wait top - 22:12:22 up 17:41, 1 user, load average: 1.10, 1.09, 0.93 Tasks: 33 total, 1 running, 32 sleeping, 0 stopped, 0 zombie Cpu(s): 0.6%us, 0.2%sy, 0.0%ni, 89.0%id, 10.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1048576k total, 260708k used, 787868k free, 0k buffers Swap: 0k total, 0k used, 0k free, 0k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 15 0 2468 1376 1140 S 0 0.1 0:00.88 init 5849 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 8063 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 9732 root 16 0 8280 2728 2168 S 0 0.3 0:00.02 sshd 9746 chikorit 18 0 8412 1444 864 S 0 0.1 0:01.10 sshd 9747 chikorit 18 0 4576 1960 1488 S 0 0.2 0:00.24 bash 13706 chikorit 15 0 2344 1088 884 R 0 0.1 0:00.03 top 15745 chikorit 15 0 12968 5108 1280 S 0 0.5 0:00.00 apache2 15751 chikorit 15 0 72184 25m 18m S 0 2.5 0:00.37 php5-fpm 15790 chikorit 18 0 12472 4640 1192 S 0 0.4 0:00.00 apache2 15797 chikorit 15 0 72888 23m 16m S 0 2.3 0:00.06 php5-fpm 16038 root 15 0 67772 2848 592 D 0 0.3 0:00.00 php5-fpm 16309 syslog 18 0 24084 1316 992 S 0 0.1 0:00.07 rsyslogd 16316 root 15 0 5472 908 500 S 0 0.1 0:00.00 sshd 16326 root 15 0 2304 908 712 S 0 0.1 0:00.02 cron 17464 root 15 0 10252 7560 856 D 0 0.7 0:01.88 psad 17466 root 18 0 1684 276 208 S 0 0.0 0:00.31 psadwatchd 17559 root 18 0 11444 2020 732 S 0 0.2 0:00.47 sendmail-mta 17688 root 15 0 10252 5388 1136 S 0 0.5 0:03.81 python 17752 teamspea 19 0 44648 7308 4676 S 0 0.7 1:09.70 ts3server_linux 18098 root 15 0 12336 6380 3032 S 0 0.6 0:00.47 apache2 18099 chikorit 18 0 10368 2536 464 S 0 0.2 0:00.00 apache2 18120 ntp 15 0 4336 1316 984 S 0 0.1 0:00.87 ntpd 18379 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 18387 mysql 15 0 62796 36m 5864 S 0 3.6 1:43.26 mysqld 19584 root 15 0 12336 4028 668 S 0 0.4 0:00.02 apache2 22498 root 16 0 12336 4028 668 S 0 0.4 0:00.00 apache2 24260 root 15 0 67772 3612 1356 S 0 0.3 0:00.22 php5-fpm 27712 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 27730 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 30343 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 30366 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 This is the free ram as of today. total used free shared buffers cached Mem: 1024 302 721 0 0 0 -/+ buffers/cache: 302 721 Swap: 0 0 0 Update: Looking into the logs, particularly the PHP5-FPM, which is causing the CPU spike. I found that its segment faulting for some apparent reason. [03-Jun-2012 06:11:20] NOTICE: [pool www] child 14132 started [03-Jun-2012 06:11:25] WARNING: [pool www] child 13664 exited on signal 11 (SIGSEGV) after 53.686322 seconds from start [03-Jun-2012 06:11:25] NOTICE: [pool www] child 14328 started [03-Jun-2012 06:11:25] WARNING: [pool www] child 14132 exited on signal 11 (SIGSEGV) after 4.708681 seconds from start [03-Jun-2012 06:11:25] NOTICE: [pool www] child 14329 started [03-Jun-2012 06:11:58] WARNING: [pool www] child 14328 exited on signal 11 (SIGSEGV) after 32.981228 seconds from start [03-Jun-2012 06:11:58] NOTICE: [pool www] child 15745 started [03-Jun-2012 06:12:25] WARNING: [pool www] child 15745 exited on signal 11 (SIGSEGV) after 27.442864 seconds from start [03-Jun-2012 06:12:25] NOTICE: [pool www] child 17446 started [03-Jun-2012 06:12:25] WARNING: [pool www] child 14329 exited on signal 11 (SIGSEGV) after 60.411278 seconds from start [03-Jun-2012 06:12:25] NOTICE: [pool www] child 17447 started [03-Jun-2012 06:13:02] WARNING: [pool www] child 17446 exited on signal 11 (SIGSEGV) after 36.746793 seconds from start [03-Jun-2012 06:13:02] NOTICE: [pool www] child 18133 started [03-Jun-2012 06:13:48] WARNING: [pool www] child 17447 exited on signal 11 (SIGSEGV) after 82.710107 seconds from start I'm thinking that this might be causing the problem. If that is the cause, probably switching it off that to fastcgi/fcgid might resolve it... but still, I want to see if something else might be causing it to do this.

Read the article

High memory usage on the server - can't determine the process

- by HTF

I've noticed high memory usage on the server. Details: OS: CentOS 6.3 - x86_64 Web server: Nginx with PHP-FPM The server is generating PDF documents so the traffic is minimum. top: # top -b -n 1 -a top - 10:04:51 up 21 days, 18:57, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 92 total, 1 running, 91 sleeping, 0 stopped, 0 zombie Cpu(s): 0.3%us, 0.2%sy, 0.0%ni, 99.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3923092k total, 3720380k used, 202712k free, 133904k buffers Swap: 4194296k total, 12k used, 4194284k free, 147404k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15855 www-data 20 0 199m 4952 2128 S 0.0 0.1 0:00.06 php-fpm 15853 www-data 20 0 199m 4940 2028 S 0.0 0.1 0:00.06 php-fpm 15850 www-data 20 0 199m 4928 2020 S 0.0 0.1 0:00.05 php-fpm 15851 www-data 20 0 199m 4888 2020 S 0.0 0.1 0:00.06 php-fpm 15852 www-data 20 0 199m 4852 2020 S 0.0 0.1 0:00.06 php-fpm 15857 www-data 20 0 198m 4716 2020 S 0.0 0.1 0:00.06 php-fpm 17553 root 20 0 97816 3860 2924 S 0.0 0.1 0:00.03 sshd 15849 root 20 0 198m 3460 1072 S 0.0 0.1 0:00.12 php-fpm 13441 nginx 20 0 65608 2968 1604 S 0.0 0.1 0:02.06 nginx 13440 nginx 20 0 65608 2964 1600 S 0.0 0.1 0:01.87 nginx 17561 root 20 0 105m 1944 1488 S 0.0 0.0 0:00.01 bash 1150 xfs 20 0 20980 1784 704 S 0.0 0.0 0:00.13 xfs 15863 root 20 0 179m 1424 1028 S 0.0 0.0 0:00.00 rsyslogd 1 root 20 0 19224 1360 1088 S 0.0 0.0 0:17.96 init 1201 nrpe 20 0 40928 1288 704 S 0.0 0.0 3:57.64 nrpe 13226 root 20 0 114m 1216 612 S 0.0 0.0 0:00.01 crond 6691 root 20 0 64068 1156 488 S 0.0 0.0 0:09.59 sshd 13439 root 20 0 65104 1128 292 S 0.0 0.0 0:00.00 nginx 19026 root 20 0 15040 1116 844 R 0.0 0.0 0:00.00 top 451 root 16 -4 11052 1096 316 S 0.0 0.0 0:00.02 udevd 1174 root 18 -2 11048 1064 288 S 0.0 0.0 0:00.00 udevd 1175 root 18 -2 11048 1064 288 S 0.0 0.0 0:00.00 udevd 1065 root 16 -4 93168 824 560 S 0.0 0.0 0:16.00 auditd 1165 root 20 0 4056 564 480 S 0.0 0.0 0:00.00 mingetty 1167 root 20 0 4056 564 480 S 0.0 0.0 0:00.00 mingetty 1169 root 20 0 4056 564 480 S 0.0 0.0 0:00.00 mingetty 1171 root 20 0 4056 564 480 S 0.0 0.0 0:00.00 mingetty 1163 root 20 0 4056 560 480 S 0.0 0.0 0:00.00 mingetty 1176 root 20 0 4056 560 480 S 0.0 0.0 0:00.00 mingetty 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root RT 0 0 0 0 S 0.0 0.0 0:11.75 migration/0 4 root 20 0 0 0 0 S 0.0 0.0 44:30.28 ksoftirqd/0 5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 6 root RT 0 0 0 0 S 0.0 0.0 0:03.51 watchdog/0 7 root RT 0 0 0 0 S 0.0 0.0 0:11.63 migration/1 8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 9 root 20 0 0 0 0 S 0.0 0.0 11:35.50 ksoftirqd/1 10 root RT 0 0 0 0 S 0.0 0.0 0:03.34 watchdog/1 11 root 20 0 0 0 0 S 0.0 0.0 1:36.68 events/0 12 root 20 0 0 0 0 S 0.0 0.0 1:50.57 events/1 13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cgroup 14 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khelper 15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 netns 16 root 20 0 0 0 0 S 0.0 0.0 0:00.00 async/mgr 17 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pm 18 root 20 0 0 0 0 S 0.0 0.0 0:07.86 sync_supers 19 root 20 0 0 0 0 S 0.0 0.0 0:10.38 bdi-default 20 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kintegrityd/0 21 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kintegrityd/1 22 root 20 0 0 0 0 S 0.0 0.0 0:04.35 kblockd/0 23 root 20 0 0 0 0 S 0.0 0.0 0:04.18 kblockd/1 24 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpid 25 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpi_notify 26 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpi_hotplug 27 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata/0 28 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata/1 29 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_aux 30 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksuspend_usbd 31 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khubd 32 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kseriod 33 root 20 0 0 0 0 S 0.0 0.0 0:00.00 md/0 34 root 20 0 0 0 0 S 0.0 0.0 0:00.00 md/1 35 root 20 0 0 0 0 S 0.0 0.0 0:00.00 md_misc/0 36 root 20 0 0 0 0 S 0.0 0.0 0:00.00 md_misc/1 37 root 20 0 0 0 0 S 0.0 0.0 0:00.48 khungtaskd 38 root 20 0 0 0 0 S 0.0 0.0 1:07.52 kswapd0 39 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd 40 root 39 19 0 0 0 S 0.0 0.0 0:22.00 khugepaged 41 root 20 0 0 0 0 S 0.0 0.0 0:00.00 aio/0 42 root 20 0 0 0 0 S 0.0 0.0 0:00.00 aio/1 43 root 20 0 0 0 0 S 0.0 0.0 0:00.00 crypto/0 44 root 20 0 0 0 0 S 0.0 0.0 0:00.00 crypto/1 49 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthrotld/0 50 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthrotld/1 52 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kpsmoused 53 root 20 0 0 0 0 S 0.0 0.0 0:00.00 usbhid_resumer 83 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kstriped 233 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0 234 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1 321 root 20 0 0 0 0 S 0.0 0.0 0:00.00 virtio-blk 359 root 20 0 0 0 0 S 0.0 0.0 0:03.24 kdmflush 360 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdmflush 380 root 20 0 0 0 0 S 0.0 0.0 0:20.64 jbd2/dm-0-8 381 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ext4-dio-unwrit 382 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ext4-dio-unwrit 694 root 20 0 0 0 0 S 0.0 0.0 0:00.00 vballoon 697 root 20 0 0 0 0 S 0.0 0.0 0:00.00 virtio-net 818 root 20 0 0 0 0 S 0.0 0.0 0:00.00 jbd2/vda1-8 819 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ext4-dio-unwrit 820 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ext4-dio-unwrit 851 root 20 0 0 0 0 S 0.0 0.0 0:06.96 kauditd 1013 root 20 0 0 0 0 S 0.0 0.0 0:15.45 flush-253:0 ps: # ps aux --sort -vsz | head USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND www-data 13213 0.0 0.1 204416 4772 ? S 08:28 0:00 php-fpm: pool default www-data 13214 0.0 0.1 204416 4776 ? S 08:28 0:00 php-fpm: pool default www-data 13215 0.0 0.1 204416 4832 ? S 08:28 0:00 php-fpm: pool default www-data 13216 0.0 0.1 204416 4776 ? S 08:28 0:00 php-fpm: pool default www-data 13218 0.0 0.1 204416 4956 ? S 08:28 0:00 php-fpm: pool default free: #free -m total used free shared buffers cached Mem: 3831 3530 300 0 130 143 -/+ buffers/cache: 3256 574 Swap: 4095 0 4095 When I stooped Nginx, PHP-FPM the memory usage was still the same. Could you help me to investigate what is consuming the memory on the system? Regards

Read the article

Unusually high dentry cache usage

- by Wolfgang Stengel

Problem A CentOS machine with kernel 2.6.32 and 128 GB physical RAM ran into trouble a few days ago. The responsible system administrator tells me that the PHP-FPM application was not responding to requests in a timely manner anymore due to swapping, and having seen in free that almost no memory was left, he chose to reboot the machine. I know that free memory can be a confusing concept on Linux and a reboot perhaps was the wrong thing to do. However, the mentioned administrator blames the PHP application (which I am responsible for) and refuses to investigate further. What I could find out on my own is this: Before the restart, the free memory (incl. buffers and cache) was only a couple of hundred MB. Before the restart, /proc/meminfo reported a Slab memory usage of around 90 GB (yes, GB). After the restart, the free memory was 119 GB, going down to around 100 GB within an hour, as the PHP-FPM workers (about 600 of them) were coming back to life, each of them showing between 30 and 40 MB in the RES column in top (which has been this way for months and is perfectly reasonable given the nature of the PHP application). There is nothing else in the process list that consumes an unusual or noteworthy amount of RAM. After the restart, Slab memory was around 300 MB If have been monitoring the system ever since, and most notably the Slab memory is increasing in a straight line with a rate of about 5 GB per day. Free memory as reported by free and /proc/meminfo decreases at the same rate. Slab is currently at 46 GB. According to slabtop most of it is used for dentry entries: Free memory: free -m total used free shared buffers cached Mem: 129048 76435 52612 0 144 7675 -/+ buffers/cache: 68615 60432 Swap: 8191 0 8191 Meminfo: cat /proc/meminfo MemTotal: 132145324 kB MemFree: 53620068 kB Buffers: 147760 kB Cached: 8239072 kB SwapCached: 0 kB Active: 20300940 kB Inactive: 6512716 kB Active(anon): 18408460 kB Inactive(anon): 24736 kB Active(file): 1892480 kB Inactive(file): 6487980 kB Unevictable: 8608 kB Mlocked: 8608 kB SwapTotal: 8388600 kB SwapFree: 8388600 kB Dirty: 11416 kB Writeback: 0 kB AnonPages: 18436224 kB Mapped: 94536 kB Shmem: 6364 kB Slab: 46240380 kB SReclaimable: 44561644 kB SUnreclaim: 1678736 kB KernelStack: 9336 kB PageTables: 457516 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 72364108 kB Committed_AS: 22305444 kB VmallocTotal: 34359738367 kB VmallocUsed: 480164 kB VmallocChunk: 34290830848 kB HardwareCorrupted: 0 kB AnonHugePages: 12216320 kB HugePages_Total: 2048 HugePages_Free: 2048 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 5604 kB DirectMap2M: 2078720 kB DirectMap1G: 132120576 kB Slabtop: slabtop --once Active / Total Objects (% used) : 225920064 / 226193412 (99.9%) Active / Total Slabs (% used) : 11556364 / 11556415 (100.0%) Active / Total Caches (% used) : 110 / 194 (56.7%) Active / Total Size (% used) : 43278793.73K / 43315465.42K (99.9%) Minimum / Average / Maximum Object : 0.02K / 0.19K / 4096.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 221416340 221416039 3% 0.19K 11070817 20 44283268K dentry 1123443 1122739 99% 0.41K 124827 9 499308K fuse_request 1122320 1122180 99% 0.75K 224464 5 897856K fuse_inode 761539 754272 99% 0.20K 40081 19 160324K vm_area_struct 437858 223259 50% 0.10K 11834 37 47336K buffer_head 353353 347519 98% 0.05K 4589 77 18356K anon_vma_chain 325090 324190 99% 0.06K 5510 59 22040K size-64 146272 145422 99% 0.03K 1306 112 5224K size-32 137625 137614 99% 1.02K 45875 3 183500K nfs_inode_cache 128800 118407 91% 0.04K 1400 92 5600K anon_vma 59101 46853 79% 0.55K 8443 7 33772K radix_tree_node 52620 52009 98% 0.12K 1754 30 7016K size-128 19359 19253 99% 0.14K 717 27 2868K sysfs_dir_cache 10240 7746 75% 0.19K 512 20 2048K filp VFS cache pressure: cat /proc/sys/vm/vfs_cache_pressure 125 Swappiness: cat /proc/sys/vm/swappiness 0 I know that unused memory is wasted memory, so this should not necessarily be a bad thing (especially given that 44 GB are shown as SReclaimable). However, apparently the machine experienced problems nonetheless, and I'm afraid the same will happen again in a few days when Slab surpasses 90 GB. Questions I have these questions: Am I correct in thinking that the Slab memory is always physical RAM, and the number is already subtracted from the MemFree value? Is such a high number of dentry entries normal? The PHP application has access to around 1.5 M files, however most of them are archives and not being accessed at all for regular web traffic. What could be an explanation for the fact that the number of cached inodes is much lower than the number of cached dentries, should they not be related somehow? If the system runs into memory trouble, should the kernel not free some of the dentries automatically? What could be a reason that this does not happen? Is there any way to "look into" the dentry cache to see what all this memory is (i.e. what are the paths that are being cached)? Perhaps this points to some kind of memory leak, symlink loop, or indeed to something the PHP application is doing wrong. The PHP application code as well as all asset files are mounted via GlusterFS network file system, could that have something to do with it? Please keep in mind that I can not investigate as root, only as a regular user, and that the administrator refuses to help. He won't even run the typical echo 2 > /proc/sys/vm/drop_caches test to see if the Slab memory is indeed reclaimable. Any insights into what could be going on and how I can investigate any further would be greatly appreciated. Updates Some further diagnostic information: Mounts: cat /proc/self/mounts rootfs / rootfs rw 0 0 proc /proc proc rw,relatime 0 0 sysfs /sys sysfs rw,relatime 0 0 devtmpfs /dev devtmpfs rw,relatime,size=66063000k,nr_inodes=16515750,mode=755 0 0 devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /dev/shm tmpfs rw,relatime 0 0 /dev/mapper/sysvg-lv_root / ext4 rw,relatime,barrier=1,data=ordered 0 0 /proc/bus/usb /proc/bus/usb usbfs rw,relatime 0 0 /dev/sda1 /boot ext4 rw,relatime,barrier=1,data=ordered 0 0 tmpfs /phptmp tmpfs rw,noatime,size=1048576k,nr_inodes=15728640,mode=777 0 0 tmpfs /wsdltmp tmpfs rw,noatime,size=1048576k,nr_inodes=15728640,mode=777 0 0 none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0 cgroup /cgroup/cpuset cgroup rw,relatime,cpuset 0 0 cgroup /cgroup/cpu cgroup rw,relatime,cpu 0 0 cgroup /cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0 cgroup /cgroup/memory cgroup rw,relatime,memory 0 0 cgroup /cgroup/devices cgroup rw,relatime,devices 0 0 cgroup /cgroup/freezer cgroup rw,relatime,freezer 0 0 cgroup /cgroup/net_cls cgroup rw,relatime,net_cls 0 0 cgroup /cgroup/blkio cgroup rw,relatime,blkio 0 0 /etc/glusterfs/glusterfs-www.vol /var/www fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 0 0 /etc/glusterfs/glusterfs-upload.vol /var/upload fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 0 0 sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 172.17.39.78:/www /data/www nfs rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,port=38467,timeo=600,retrans=2,sec=sys,mountaddr=172.17.39.78,mountvers=3,mountport=38465,mountproto=tcp,local_lock=none,addr=172.17.39.78 0 0 Mount info: cat /proc/self/mountinfo 16 21 0:3 / /proc rw,relatime - proc proc rw 17 21 0:0 / /sys rw,relatime - sysfs sysfs rw 18 21 0:5 / /dev rw,relatime - devtmpfs devtmpfs rw,size=66063000k,nr_inodes=16515750,mode=755 19 18 0:11 / /dev/pts rw,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000 20 18 0:16 / /dev/shm rw,relatime - tmpfs tmpfs rw 21 1 253:1 / / rw,relatime - ext4 /dev/mapper/sysvg-lv_root rw,barrier=1,data=ordered 22 16 0:15 / /proc/bus/usb rw,relatime - usbfs /proc/bus/usb rw 23 21 8:1 / /boot rw,relatime - ext4 /dev/sda1 rw,barrier=1,data=ordered 24 21 0:17 / /phptmp rw,noatime - tmpfs tmpfs rw,size=1048576k,nr_inodes=15728640,mode=777 25 21 0:18 / /wsdltmp rw,noatime - tmpfs tmpfs rw,size=1048576k,nr_inodes=15728640,mode=777 26 16 0:19 / /proc/sys/fs/binfmt_misc rw,relatime - binfmt_misc none rw 27 21 0:20 / /cgroup/cpuset rw,relatime - cgroup cgroup rw,cpuset 28 21 0:21 / /cgroup/cpu rw,relatime - cgroup cgroup rw,cpu 29 21 0:22 / /cgroup/cpuacct rw,relatime - cgroup cgroup rw,cpuacct 30 21 0:23 / /cgroup/memory rw,relatime - cgroup cgroup rw,memory 31 21 0:24 / /cgroup/devices rw,relatime - cgroup cgroup rw,devices 32 21 0:25 / /cgroup/freezer rw,relatime - cgroup cgroup rw,freezer 33 21 0:26 / /cgroup/net_cls rw,relatime - cgroup cgroup rw,net_cls 34 21 0:27 / /cgroup/blkio rw,relatime - cgroup cgroup rw,blkio 35 21 0:28 / /var/www rw,relatime - fuse.glusterfs /etc/glusterfs/glusterfs-www.vol rw,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 36 21 0:29 / /var/upload rw,relatime - fuse.glusterfs /etc/glusterfs/glusterfs-upload.vol rw,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 37 21 0:30 / /var/lib/nfs/rpc_pipefs rw,relatime - rpc_pipefs sunrpc rw 39 21 0:31 / /data/www rw,relatime - nfs 172.17.39.78:/www rw,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,port=38467,timeo=600,retrans=2,sec=sys,mountaddr=172.17.39.78,mountvers=3,mountport=38465,mountproto=tcp,local_lock=none,addr=172.17.39.78 GlusterFS config: cat /etc/glusterfs/glusterfs-www.vol volume remote1 type protocol/client option transport-type tcp option remote-host 172.17.39.71 option ping-timeout 10 option transport.socket.nodelay on # undocumented option for speed # http://gluster.org/pipermail/gluster-users/2009-September/003158.html option remote-subvolume /data/www end-volume volume remote2 type protocol/client option transport-type tcp option remote-host 172.17.39.72 option ping-timeout 10 option transport.socket.nodelay on # undocumented option for speed # http://gluster.org/pipermail/gluster-users/2009-September/003158.html option remote-subvolume /data/www end-volume volume remote3 type protocol/client option transport-type tcp option remote-host 172.17.39.73 option ping-timeout 10 option transport.socket.nodelay on # undocumented option for speed # http://gluster.org/pipermail/gluster-users/2009-September/003158.html option remote-subvolume /data/www end-volume volume remote4 type protocol/client option transport-type tcp option remote-host 172.17.39.74 option ping-timeout 10 option transport.socket.nodelay on # undocumented option for speed # http://gluster.org/pipermail/gluster-users/2009-September/003158.html option remote-subvolume /data/www end-volume volume replicate1 type cluster/replicate option lookup-unhashed off # off will reduce cpu usage, and network option local-volume-name 'hostname' subvolumes remote1 remote2 end-volume volume replicate2 type cluster/replicate option lookup-unhashed off # off will reduce cpu usage, and network option local-volume-name 'hostname' subvolumes remote3 remote4 end-volume volume distribute type cluster/distribute subvolumes replicate1 replicate2 end-volume volume iocache type performance/io-cache option cache-size 8192MB # default is 32MB subvolumes distribute end-volume volume writeback type performance/write-behind option cache-size 1024MB option window-size 1MB subvolumes iocache end-volume ### Add io-threads for parallel requisitions volume iothreads type performance/io-threads option thread-count 64 # default is 16 subvolumes writeback end-volume volume ra type performance/read-ahead option page-size 2MB option page-count 16 option force-atime-update no subvolumes iothreads end-volume

Search Results

Search found 7660 results on 307 pages for 'high'.

Page 18/307 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >

- by Sagar Patel

- by justin.chmura

- by Holapress

- by Daniel

- by Dorjan

- by hadi

- by Jake Schoermer

- by Ernie

- by wanson

- by Oz.

- by silla

- by BarDev

- by nrambeck

- by Matt

- by Sam Gammon

- by Dancrumb

- by Callmeed

- by Yemster

- by user111196

- by S.E.R.

- by GregH

- by Charles Knapp

- by James

- by HTF

- by Wolfgang Stengel

< Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >