We have a VPS running Ubuntu, on Xen. The problem is this, about once a day, for about 20-50 minutes, at a random time, the server becomes completely unresponsive to the outside world. After this period, it becomes responsive again, as if nothing had happened, it doesn't lose uptime, it doesn't restart. It just starts responding again as if it had been in suspended animation.
These outages occur under conditions of non-exceptional memory and cpu, for example 70% mem, 5% cpu. I have stopped all non-essential services so the usage is very even. These outages don't particularly occur during times of increased memory/cpu (during daily tasks), they sometimes occur at times of very low cpu use (<2%), but in the past also occured during swapping.
These blackouts have been occurring both under Ubuntu 12.04 LTS, and Ubuntu 14.04 LTS - no change at all (I upgraded Ubuntu specifically to see if it helped this problem).
It is possible to log into our webhosts site, and use their administration console to see error messages from during this time. Presumably, these messages are from the Xen virtualization, the main message goes like this:
BUG: soft lockp - CPU#0 stuck for 22s! [ksoftireqd/0:3] (repeats many times)
SysRq : Emergency Sync (Sometimes this is the only message in the console)
Others seen previously under different load situations include:
BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
(repeated many times) or:
INFO: rcu_sched detected stall on CPU 0 (t=15000 jiffies)
(repeated many times with t getting bigger)
From googling around I've tried various kernel parameters such as nohz=off and acpi=off to no avail. All tech support has said is that other Ubuntu installations are not suffering the same problem.
Anyone got any ideas or experience with this problem?