Anyone else experiencing high rates of linux server crashes today?
- by Bron Gondwana
Just today, Sat June 30th - starting soon after the start of the day GMT. We've had a handful of blades in different datacentres as managed by different teams all go dark - not responding to pings, screen blank.
They're all running Debian Squeeze - with everything from stock kernel to custom 3.2.21 builds. Most are Dell M610 blades, but I've also just lost a Dell R510 and other departments have lost machines from other vendors too. There was also an older IBM x3550 which crashed and which I thought might be unrelated, but now I'm wondering.
The one crash which I did get a screen dump from said:
[3161000.864001] BUG: spinlock lockup on CPU#1, ntpd/3358
[3161000.864001] lock: ffff88083fc0d740, .magic: dead4ead, .owner: imapd/24737, .owner_cpu: 0
Unfortunately the blades all supposedly had kdump configured, but they died so hard that kdump didn't trigger - and they had console blanking turned on. I've disabled console blanking now, so fingers crossed I'll have more information after the next crash.
Just want to know if it's a common thread or "just us". It's really odd that they're different units in different datacentres bought at different times and run by different admins (I run the FastMail.FM ones)... and now even different vendor hardware. Most of the machines which crashed had been up for weeks/months and were running 3.1 or 3.2 series kernels.
The most recent crash was a machine which had only been up about 6 hours running 3.2.21.