"postgres blocked for more than 120 seconds" - is my db still consistent?

Posted by nn4l on Server Fault See other posts from Server Fault or by nn4l
Published on 2012-11-03T23:30:31Z Indexed on 2012/11/04 5:05 UTC
Read the original article Hit count: 505

I am using an iscsi volume on an Open-E storage system for several virtual machines running on a XenServer host. Occasionally, when there is a very high disk I/O load on the virtual machines (and therefore also on the storage system), I got this error message on the vm consoles:

[2594520.161701] INFO: task kjournald:117 blocked for more than 120 seconds.
[2594520.161787] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2594520.162194] INFO: task flush-202:0:229 blocked for more than 120 seconds.
[2594520.162274] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2594520.162801] INFO: task postgres:1567 blocked for more than 120 seconds.
[2594520.162882] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

I understand this error message is caused by the kernel to inform that these processes haven't been run for 120 seconds, most likely because a disk access to the storage system has not yet been processed.

But what is the effect on the processes. For example, will the postgres process eventually write its data when the storage system is idle again after a few minutes, so that all data is still consistent? Or will it abort the write, leaving some tables in an inconsistent state?

I certainly expect that the former should be the case - if the disk access is slow, postgres (or any other affected process) should just wait as long as it takes. I can live with the application hanging for a few minutes. But if there is a chance for data corruption then any of these errors is really bad news.

Please advise what to do here.

© Server Fault or respective owner

Related posts about linux

Related posts about virtualization