We're having an issue where one our Linux boxes (Ubuntu 10.04 LTS, running on EC2 with a quadruple-large size, 68GB of RAM and 8 virtual cores with 3.25GHz each) freezes up every few seconds. Typing in an ssh session will freeze, and running strace on one of the Postgresql processes that's running usually shows:
02:37:41.567990 semop(7831581, {{3, -1, 0}}, 1
for a few seconds before it proceeds (it always gets stuck at that semop).
OProfile shows that most of the time is spent in the kernel (60%) versus 37% in Postgresql.
The result of these halts (which began suddenly a day ago) is that load on the box has gone from 0.7 to 10+, and causes our entire stack to slow done.
Any ideas on how to track down what's going on? iostat doesn't show the disks being particularly slow or overloaded, and top shows user cpu % spike from 8% to about 40% whenever these back-ups happen.