Anyone else experiencing high rates of linux server crashes today?

Posted by Bron Gondwana on Server Fault See other posts from Server Fault or by Bron Gondwana
Published on 2012-06-30T16:15:09Z Indexed on 2012/06/30 21:18 UTC
Read the original article Hit count: 332

Filed under:
|
|
|
|

Just today, Sat June 30th - starting soon after the start of the day GMT. We've had a handful of blades in different datacentres as managed by different teams all go dark - not responding to pings, screen blank.

They're all running Debian Squeeze - with everything from stock kernel to custom 3.2.21 builds. Most are Dell M610 blades, but I've also just lost a Dell R510 and other departments have lost machines from other vendors too. There was also an older IBM x3550 which crashed and which I thought might be unrelated, but now I'm wondering.

The one crash which I did get a screen dump from said:

[3161000.864001] BUG: spinlock lockup on CPU#1, ntpd/3358
[3161000.864001]  lock: ffff88083fc0d740, .magic: dead4ead, .owner: imapd/24737, .owner_cpu: 0

Unfortunately the blades all supposedly had kdump configured, but they died so hard that kdump didn't trigger - and they had console blanking turned on. I've disabled console blanking now, so fingers crossed I'll have more information after the next crash.

Just want to know if it's a common thread or "just us". It's really odd that they're different units in different datacentres bought at different times and run by different admins (I run the FastMail.FM ones)... and now even different vendor hardware. Most of the machines which crashed had been up for weeks/months and were running 3.1 or 3.2 series kernels.

The most recent crash was a machine which had only been up about 6 hours running 3.2.21.

© Server Fault or respective owner

Related posts about linux

Related posts about debian