xen domUs crashes or unavailability
- by Rush
I've xen server with 8 domU.
Server is Xeon E31270 with 16gb ram. I think it is enough for 8 machines.
Sometimes domU's crashes and i can't figure out the reason.
After crash i can connect to console and there is somthing like this:
Oct 8 22:20:49 server kernel: [30892.320780] lowmem_reserve[]: 0 0 0 0
Oct 8 22:20:49 server kernel: [30892.320790] Node 0 DMA: 10*4kB 3*8kB 13*16kB 10*32kB 7*64kB 3*128kB 2*256kB 2*512kB 1*1024kB 2*2048kB 0*4096kB = 8080kB
Oct 8 22:20:49 server kernel: [30892.320817] Node 0 DMA32: 648*4kB 2*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 5760kB
Oct 8 22:20:49 server kernel: [30892.320842] 1491 total pagecache pages
Oct 8 22:20:49 server kernel: [30892.320847] 0 pages in swap cache
Oct 8 22:20:49 server kernel: [30892.320852] Swap cache stats: add 0, delete 0, find 0/0
Oct 8 22:20:49 server kernel: [30892.320858] Free swap = 0kB
Oct 8 22:20:49 server kernel: [30892.320862] Total swap = 0kB
Oct 8 22:20:49 server kernel: [30892.324024] 524288 pages RAM
Oct 8 22:20:49 server kernel: [30892.324024] 11010 pages reserved
Oct 8 22:20:49 server kernel: [30892.324024] 424467 pages shared
Oct 8 22:20:49 server kernel: [30892.324024] 503538 pages non-shared
Oct 8 22:20:49 server kernel: [30892.330308] apache2 invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
Oct 8 22:20:49 server kernel: [30892.330322] apache2 cpuset=/ mems_allowed=0
Oct 8 22:20:49 server kernel: [30892.330330] Pid: 23938, comm: apache2 Not tainted 2.6.32-5-xen-amd64 #1
Oct 8 22:20:49 server kernel: [30892.330337] Call Trace:
Oct 8 22:20:49 server kernel: [30892.330349] [<ffffffff810b7180>] ? oom_kill_process+0x7f/0x23f
Oct 8 22:20:49 server kernel: [30892.330358] [<ffffffff810b76a4>] ? __out_of_memory+0x12a/0x141
Oct 8 22:20:49 server kernel: [30892.330367] [<ffffffff810b77fb>] ? out_of_memory+0x140/0x172
Oct 8 22:20:49 server kernel: [30892.330376] [<ffffffff810bb59c>] ? __alloc_pages_nodemask+0x4e5/0x5f5
Oct 8 22:20:49 server kernel: [30892.330385] [<ffffffff810cc224>] ? do_wp_page+0x386/0x707
Oct 8 22:20:49 server kernel: [30892.330395] [<ffffffff8100c3a5>] ? __raw_callee_save_xen_pud_val+0x11/0x1e
Oct 8 22:20:49 server kernel: [30892.330404] [<ffffffff8100c369>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
Oct 8 22:20:49 server kernel: [30892.330412] [<ffffffff810cdfc7>] ? handle_mm_fault+0x7aa/0x80f
Oct 8 22:20:49 server kernel: [30892.330422] [<ffffffff8130f906>] ? do_page_fault+0x2e0/0x2fc
Oct 8 22:20:49 server kernel: [30892.330433] [<ffffffff8130d7a5>] ? page_fault+0x25/0x30
Oct 8 22:20:49 server kernel: [30892.330439] Mem-Info:
Oct 8 22:20:49 server kernel: [30892.330443] Node 0 DMA per-cpu:
Oct 8 22:20:49 server kernel: [30892.330450] CPU 0: hi: 0, btch: 1 usd: 0
Oct 8 22:20:49 server kernel: [30892.330463] CPU 1: hi: 0, btch: 1 usd: 0
Oct 8 22:20:49 server kernel: [30892.330466] Node 0 DMA32 per-cpu:
Oct 8 22:20:49 server kernel: [30892.330469] CPU 0: hi: 186, btch: 31 usd: 0
Oct 8 22:20:49 server kernel: [30892.330472] CPU 1: hi: 186, btch: 31 usd: 60
Oct 8 22:20:49 server kernel: [30892.330476] active_anon:342076 inactive_anon:115398 isolated_anon:0
Oct 8 22:20:49 server kernel: [30892.330477] active_file:268 inactive_file:481 isolated_file:0
Oct 8 22:20:49 server kernel: [30892.330477] unevictable:1125 dirty:2 writeback:13 unstable:0
Oct 8 22:20:49 server kernel: [30892.330478] free:3410 slab_reclaimable:1718 slab_unreclaimable:6946
Oct 8 22:20:49 server kernel: [30892.330478] mapped:899 shmem:113 pagetables:35697 bounce:0
Oct 8 22:20:49 server kernel: [30892.330502] Node 0 DMA free:8036kB min:32kB low:40kB high:48kB active_anon:1144kB inactive_anon:1268kB active_file:8kB inactive_file:8kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:11792kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:224kB kernel_stack:16kB pagetables:1228kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 8 22:20:49 server kernel: [30892.330518] lowmem_reserve[]: 0 2004 2004 2004
Oct 8 22:20:49 server kernel: [30892.330523] Node 0 DMA32 free:5604kB min:5708kB low:7132kB high:8560kB active_anon:1367160kB inactive_anon:460324kB active_file:1064kB inactive_file:1916kB unevictable:4500kB isolated(anon):0kB isolated(file):0kB present:2052320kB mlocked:4500kB dirty:8kB writeback:52kB mapped:3600kB shmem:452kB slab_reclaimable:6872kB slab_unreclaimable:27560kB kernel_stack:3528kB pagetables:141560kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:992 all_unreclaimable? no
Oct 8 22:20:49 server kernel: [30892.330539] lowmem_reserve[]: 0 0 0 0
Oct 8 22:20:49 server kernel: [30892.330544] Node 0 DMA: 1*4kB 2*8kB 13*16kB 10*32kB 7*64kB 3*128kB 2*256kB 2*512kB 1*1024kB 2*2048kB 0*4096kB = 8036kB
Oct 8 22:20:49 server kernel: [30892.330579] Node 0 DMA32: 609*4kB 2*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 5604kB
Oct 8 22:20:49 server kernel: [30892.330605] 1522 total pagecache pages
Oct 8 22:20:49 server kernel: [30892.330610] 0 pages in swap cache
Oct 8 22:20:49 server kernel: [30892.330615] Swap cache stats: add 0, delete 0, find 0/0
Oct 8 22:20:49 server kernel: [30892.330621] Free swap = 0kB
Oct 8 22:20:49 server kernel: [30892.330625] Total swap = 0kB
Oct 8 22:20:49 server kernel: [30892.333018] 524288 pages RAM
Oct 8 22:20:49 server kernel: [30892.333018] 11010 pages reserved
Oct 8 22:20:49 server kernel: [30892.333018] 424367 pages shared
Oct 8 22:20:49 server kernel: [30892.333018] 503658 pages non-shared
Seems like there isn't enough memory for this domU. But there is no any memory problems reported in munin monitoring:
As you see system uses around 0.2G and 1G is available.
So my question is:
Is it xen specific problem, that real memory usage and memory usage that shows munin are different (I've never seen such problems oh real hardware machines)?
Or maybe it is just monitoring problem, that can't catch moment when there is unusual high load and domU go down?
And how I can to defeat this problem? it is really annoying to catch messages in e-mail that domU went down.
Btw, such situation was when domU had 2G memory.