Implications of Multiple JobTracker nodes in a Hadoop cluster?
- by Jim Dennis
I get the impression that one can, potentially, have multiple JobTracker nodes configured to share the same set of MR (TaskTracker) nodes. I know that, conventionally, all the nodes in a Hadoop cluster should have the same set of configuration files (conventionally under /etc/hadoop/conf/ --- at least for the Cloudera Distribution of Hadoop (CDH). Can we define multiple Job Trackers in mapred-site.xml? Something like:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>jt01.mydomain.not:8021</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>jt02.mydomain.not:8021</value>
</property>
...
</configuration>
Or is there some other allowed syntax for this?
What are the implications of doing this. Does each JobTracker get information about the load on each TaskTracker node. In other words can the two JobTracker co-ordinated their scheduling across the TT nodes only based on the gossip information from the TTs or would they need to talk to one another?
Is this documented anywhere?