High CPU usage - symptoms moving from server to server after bouncing

Posted by grt3kl on Server Fault See other posts from Server Fault or by grt3kl
Published on 2010-03-01T19:13:57Z Indexed on 2010/04/06 8:03 UTC
Read the original article Hit count: 256

Filed under:
|
|
|
|

First off, I apologize if I didn't include enough information to properly troubleshoot this issue. This sort of thing isn't my specialty, so it is a learning process. If there's something I need to provide, please let me know and I'll be happy to do what I can. The images associated with my question are at the bottom of this post.

We are dealing with a clustered environment of four WebLogic 9.2 Java application servers. The cluster utilizes a round-robin load algorithm. Other details include:

  • Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04)
  • BEA JRockit(R) (build R27.4.0-90_CR352234-91983-1.5.0_12-20071115-1605-linux-x86_64, compiled mode)

Basically, I started looking at the servers' performance because our customers are seeing lots of lag at various times of the day. Our servers should easily handle the loads they are given, so it's not clear what's going on. Using HP Performance Manager, I generated some graphs that indicate that the CPU usage is completely out of whack. It seems that, at any given point, one or more of the servers has a CPU utilization of over 50%. I know this isn't particularly high, but I would say it is a red flag based on the CPU utilization of the other servers in the WebLogic cluster.

Interesting things to note:

  • The high CPU utilization was occurring only on server02 for several weeks. The server crashed (extremely rare; we are not sure if it's related to this) and upon starting it back up, the CPU utilization was normal on all 4 servers.
  • We restarted all 4 managed servers and the application server (on server01) yesterday, on 2/28. As you can see, server03 and server04 picked up the behavior that was seen on server02 before.
  • The CPU utilization is a Java process owned by the application user (appown).
  • The number of transactions is consistent across all servers. It doesn't seem like any one server is actually handling more than another.

If anyone has any ideas or can at least point me in the right direction, that would be great. Again, please let me know if there is any additional information I should post. Thanks!

server03
server02
server01
server04

© Server Fault or respective owner

Related posts about linux

Related posts about java