What can cause kernel out_of_memory error?
- by nbolton
I'm running Debian GNU/Linux 5.0 and I'm experiencing intermittent out_of_memory errors coming from the kernel. The server stops responding to all but pings, and I have to reboot the server.
# uname -a
Linux xxx 2.6.18-164.9.1.el5xen #1 SMP Tue Dec 15 21:31:37 EST 2009 x86_64
GNU/Linux
This seems to be the important bit from /var/log/messages
Dec 28 20:16:25 slarti kernel: Call Trace:
Dec 28 20:16:25 slarti kernel: [<ffffffff802bedff>] out_of_memory+0x8b/0x203
Dec 28 20:16:25 slarti kernel: [<ffffffff8020f825>] __alloc_pages+0x245/0x2ce
Dec 28 20:16:25 slarti kernel: [<ffffffff8021377f>] __do_page_cache_readahead+0xc6/0x1ab
Dec 28 20:16:25 slarti kernel: [<ffffffff80214015>] filemap_nopage+0x14c/0x360
Dec 28 20:16:25 slarti kernel: [<ffffffff80208ebc>] __handle_mm_fault+0x443/0x1337
Dec 28 20:16:25 slarti kernel: [<ffffffff8026766a>] do_page_fault+0xf7b/0x12e0
Dec 28 20:16:25 slarti kernel: [<ffffffff8026ef17>] monotonic_clock+0x35/0x7b
Dec 28 20:16:25 slarti kernel: [<ffffffff80262da3>] thread_return+0x6c/0x113
Dec 28 20:16:25 slarti kernel: [<ffffffff8021afef>] remove_vma+0x4c/0x53
Dec 28 20:16:25 slarti kernel: [<ffffffff80264901>] _spin_lock_irqsave+0x9/0x14
Dec 28 20:16:25 slarti kernel: [<ffffffff8026082b>] error_exit+0x0/0x6e
Full snippet here: http://pastebin.com/a7eWf7VZ
I thought that perhaps the server was actually running out of memory (it has 1GB physical memory), but my Cacti memory graph looks OK to me...
But strangely the load graph goes through the roof shortly before the kernel crashes:
What logs can I look at for more info?
Update:
Maybe noteworthy - the CPU percentage and network traffic graphs were both normal at the time of the crash. The only abnormality was the average load graph.