Postgres 9.0 locking up, 100% CPU

Posted by Jake on Server Fault See other posts from Server Fault or by Jake
Published on 2012-07-05T01:29:33Z Indexed on 2012/07/05 3:17 UTC
Read the original article Hit count: 567

Filed under:

We are having a problem where our Postgres 9.0 server occasionally locks up and kills our webapp. Restarting Postgres fixes the problem.

Here's what I've been able to observe:

  • First, usage of one CPU jumps to 100% for a few minutes
    • Disk operations drop to ~0 during this time
    • Database operations drop to 0 (blocks and tuples per sec)
    • Logs show during this time:
      • WARNING: worker took too long to start; cancelled
      • WARNING: worker took too long to start; cancelled
      • No Queries in logs (only those over 200ms are logged)
    • No unusually long-running queries logged before or during
  • Then the second CPU jumps to 100%
    • The number of postgres processes jumps from the usual 8-10 to ~20
    • Matched by a spike in Postgres Blocks per second (about twice normal)
    • Logs show
      • LOG: could not accept SSL connection: EOF detected
      • Queries are running but slow
  • Restarting postgres returns everything to normal

Setup:

Server: Amazon EC2 Large
Ubuntu 10.04.2 LTS
Postgres 9.0.3
Dedicated DB server

Does anyone have any idea what's causing this? Or any suggestions about what else I should be checking out?

© Server Fault or respective owner

Related posts about postgresql