I'm experiencing seldom (so far about once a month) hard crashes on our ubuntu server 10.04 LTS box. The box itself is quite old (Dell PowerEdge 750 from 2004, Pentium4 2.8 GHz). I set up netconsole after it crashed twice last thursday and was able to extract the following output:
[ 9354.062473] invalid opcode: 0000 [#1] SMP
[ 9354.062516] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.0/usb2/2-2/2-2:1.0/uevent
[ 9354.062555] Modules linked in: ppdev adm1026 hwmon_vid i2c_i801 bridge stp dcdbas psmouse serio_raw netconsole configfs shpchp lp parport usbhid hid e1000
[ 9354.062685]
[ 9354.062704] Pid: 3988, comm: rsync Not tainted 2.6.38-12-generic-pae #51~lucid1-Ubuntu Dell Computer Corporation PowerEdge 750 /0R1479
[ 9354.062773] EIP: 0060:[<c104fef1>] EFLAGS: 00010046 CPU: 1
[ 9354.062802] EIP is at check_preempt_wakeup+0x181/0x250
[ 9354.062826] EAX: 00000002 EBX: f2a10ccc ECX: 00000000 EDX: 00000002
[ 9354.062850] ESI: f1db71cc EDI: f1db71a0 EBP: f1dbdea8 ESP: f1dbde8c
[ 9354.062875] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 9354.062900] Process rsync (pid: 3988, ti=f1dbc000 task=f1db71a0 task.ti=f1dbc000)
[ 9354.062933] Stack:
[ 9354.062951] 0053ea60 f7907680 f28da840 f2a10ca0 c153ea60 f7907680 c153ea60 f1dbdebc
[ 9354.063019] c103f98a f2a10ca0 f7907680 00000001 f1dbdef8 c104f97f 00000000 f2f0bacc
[ 9354.063088] f7904338 00000001 00000003 00000000 f2f0bacc 00000001 00000001 00000086
[ 9354.063157] Call Trace:
[ 9354.063183] [<c103f98a>] check_preempt_curr+0x6a/0x80
[ 9354.063210] [<c104f97f>] try_to_wake_up+0x5f/0x3f0
[ 9354.063236] [<c1077a00>] ? hrtimer_wakeup+0x0/0x30
[ 9354.063261] [<c104fd64>] wake_up_process+0x14/0x20
[ 9354.063286] [<c1077a1d>] hrtimer_wakeup+0x1d/0x30
[ 9354.063310] [<c1077f4a>] __run_hrtimer+0x7a/0x1c0
[ 9354.063336] [<c107dbad>] ? ktime_get+0x6d/0x110
[ 9354.063360] [<c1078310>] hrtimer_interrupt+0x120/0x2b0
[ 9354.063390] [<c1535c36>] smp_apic_timer_interrupt+0x56/0x8a
[ 9354.063418] [<c152f459>] apic_timer_interrupt+0x31/0x38
[ 9354.063446] [<c1520000>] ? mca_attach_bus+0x5/0xc0
[ 9354.063469] Code: 8b 9b 20 01 00 00 8b 86 24 01 00 00 3b 83 24 01 00 00 75 e6 85 db 0f 84 a3 00 00 00 89 da 89 f0 e8 75 f6 fe ff 83 f8 01 0f 85 00 <fe> ff ff 89 f8 e8 95 f9 fe ff 8b 5e 1c 85 db 0f 84 e4 fe ff ff
[ 9354.063804] EIP: [<c104fef1>] check_preempt_wakeup+0x181/0x250 SS:ESP 0068:f1dbde8c
[ 9354.064231] ---[ end trace 290689cea65aea7f ]---
[ 9354.064290] Kernel panic - not syncing: Fatal exception in interrupt
[ 9354.064352] Pid: 3988, comm: rsync Tainted: G D 2.6.38-12-generic-pae #51~lucid1-Ubuntu
[ 9354.064424] Call Trace:
[ 9354.064481] [<c152c057>] ? panic+0x5c/0x15b
[ 9354.064539] [<c15302bd>] ? oops_end+0xcd/0xd0
[ 9354.064539] [<c100d9e4>] ? die+0x54/0x80
[ 9354.064539] [<c152f926>] ? do_trap+0x96/0xc0
[ 9354.064539] [<c100ba00>] ? do_invalid_op+0x0/0xa0
[ 9354.064539] [<c100ba8b>] ? do_invalid_op+0x8b/0xa0
[ 9354.064539] [<c104fef1>] ? check_preempt_wakeup+0x181/0x250
[ 9354.064539] [<c144884d>] ? __kfree_skb+0x3d/0x90
[ 9354.064539] [<c1042ae7>] ? update_curr+0x247/0x2a0
[ 9354.064539] [<c10447bb>] ? update_cfs_load+0x11b/0x2d0
[ 9354.064539] [<c1042a25>] ? update_curr+0x185/0x2a0
[ 9354.064539] [<c152f6bf>] ? error_code+0x67/0x6c
[ 9354.064539] [<c104fef1>] ? check_preempt_wakeup+0x181/0x250
[ 9354.064539] [<c103f98a>] ? check_preempt_curr+0x6a/0x80
[ 9354.064539] [<c104f97f>] ? try_to_wake_up+0x5f/0x3f0
[ 9354.064539] [<c1077a00>] ? hrtimer_wakeup+0x0/0x30
[ 9354.064539] [<c104fd64>] ? wake_up_process+0x14/0x20
[ 9354.064539] [<c1077a1d>] ? hrtimer_wakeup+0x1d/0x30
[ 9354.064539] [<c1077f4a>] ? __run_hrtimer+0x7a/0x1c0
[ 9354.064539] [<c107dbad>] ? ktime_get+0x6d/0x110
[ 9354.064539] [<c1078310>] ? hrtimer_interrupt+0x120/0x2b0
[ 9354.064539] [<c1535c36>] ? smp_apic_timer_interrupt+0x56/0x8a
[ 9354.064539] [<c152f459>] ? apic_timer_interrupt+0x31/0x38
[ 9354.064539] [<c1520000>] ? mca_attach_bus+0x5/0xc0
Googling for this issue didn't really turn up anything useful (most stuff I found was related to btrfs, but I don't use that, although the module exists and is sometimes loaded). From experience it might have to do with relatively heavy I/O, as two of the panics happened during a backup procedure.
Kernel is 2.6.38-12-generic-pae, but I'm pretty sure I also saw panics on 2.6.32. I meanwhile upgraded to 3.0.0-17-generic-pae and am waiting for the next crash ;-)
I'm at a loss here, so any pointers where to look for the cause or what it could be would be great :-) Thanks !