HDFS datanode startup fails when disks are full
Posted
by
mbac
on Server Fault
See other posts from Server Fault
or by mbac
Published on 2013-10-29T14:53:57Z
Indexed on
2013/10/29
15:58 UTC
Read the original article
Hit count: 272
Our HDFS cluster is only 90% full but some datanodes have some disks that are 100% full. That means when we mass reboot the entire cluster some datanodes completely fail to start with a message like this:
2013-10-26 03:58:27,295 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Mkdirs failed to create /mnt/local/sda1/hadoop/dfsdata/blocksBeingWritten
Only three have to fail this way before we start experiencing real data loss.
Currently we workaround it by decreasing the amount of space reserved for the root user but we'll eventually run out. We also run the re-balancer pretty much constantly, but some disks stay stuck at 100% anyway.
Changing the dfs.datanode.failed.volumes.tolerated setting is not the solution as the volume has not failed.
Any ideas?
© Server Fault or respective owner