Best practice for administering a (hadoop) cluster
Posted
by
Alex
on Server Fault
See other posts from Server Fault
or by Alex
Published on 2011-03-08T07:23:32Z
Indexed on
2011/03/08
16:12 UTC
Read the original article
Hit count: 205
Dear all,
I've recently been playing with Hadoop. I have a six node cluster up and running - with HDFS, and having run a number of MapRed jobs. So far, so good. However I'm now looking to do this more systematically and with a larger number of nodes. Our base system is Ubuntu and the current setup has been administered using apt (to install the correct java runtime) and ssh/scp (to propagate out the various conf files). This is clearly not scalable over time.
Does anyone have any experience of good systems for administering (possibly slightly heterogenous: different disk sizes, different numbers of cpus on each node) hadoop clusters automagically? I would consider diskless boot - but imagine that with a large cluster, getting the cluster up and running might be bottle-necked on the machine serving the OS. Or some form of distributed debian apt to keep the machines native environment synchronised? And how do people successfully manage the conf files over a number of (potentially heterogenous) machines?
Thanks very much in advance,
Alex
© Server Fault or respective owner