FairScheduling Conventions in Hadoop
- by dan.mcclary
While scheduling and resource allocation control has been present in Hadoop since 0.20, a lot of people haven't
discovered or utilized it in their initial investigations of the Hadoop ecosystem. We could chalk this up to many things:
Organizations are still determining what their dataflow and analysis workloads will comprise
Small deployments under tests aren't likely to show the signs
of strains that would send someone looking for resource allocation
options
The default scheduling options -- the FairScheduler and the
CapacityScheduler -- are not placed in the most prominent position
within the Hadoop documentation.
However, for production deployments, it's wise to start with at
least the foundations of scheduling in place so that you can tune
the cluster as workloads emerge. To do that, we have to ask ourselves
something about what the off-the-rack scheduling options are. We have
some choices:
The FairScheduler, which will work to ensure resource allocations are enforced on a per-job basis.
The CapacityScheduler, which will ensure resource allocations are enforced on a per-queue basis.
Writing your own implementation of the abstract class
org.apache.hadoop.mapred.job.TaskScheduler is an option, but usually
overkill.
If you're going to have several concurrent users and leverage the more
interactive aspects of the Hadoop environment
(e.g. Pig and Hive scripting), the FairScheduler is definitely the way
to go.
In particular, we can do user-specific pools so that default users get
their fair share, and specific users are given the resources their
workloads require.
To enable fair scheduling, we're going to need to do a couple of things.
First, we need to tell the JobTracker that we want to use scheduling and where we're going to be defining our allocations.
We do this by adding the following to the
mapred-site.xml file in HADOOP_HOME/conf:
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.FairScheduler</value>
</property>
<property>
<name>mapred.fairscheduler.allocation.file</name>
<value>/path/to/allocations.xml</value>
</property>
<property>
<name>mapred.fairscheduler.poolnameproperty</name>
<value>pool.name</value>
</property>
<property>
<name>pool.name</name>
<value>${user.name}</name>
</property>
What we've done here is simply tell the JobTracker that we'd like to
task scheduling to use the FairScheduler class rather than a single FIFO
queue.
Moreover, we're going to be defining our resource pools and
allocations in a file called allocations.xml
For reference, the allocation file is read every 15s or so, which
allows for tuning allocations without having to take down the
JobTracker.
Our allocation file is now going to look a little like this
<?xml version="1.0"?>
<allocations>
<pool name="dan">
<minMaps>5</minMaps>
<minReduces>5</minReduces>
<maxMaps>25</maxMaps>
<maxReduces>25</maxReduces>
<minSharePreemptionTimeout>300</minSharePreemptionTimeout>
</pool>
<mapreduce.job.user.name="dan">
<maxRunningJobs>6</maxRunningJobs>
</user>
<userMaxJobsDefault>3</userMaxJobsDefault>
<fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
</allocations>
In this case, I've explicitly set my username to have upper and lower
bounds on the maps and reduces, and allotted myself double the number of
running jobs.
Now, if I run hive or pig jobs from either the console or via the
Hue web interface, I'll be treated "fairly" by the JobTracker.
There's a lot more tweaking that can be done to the allocations
file, so it's best to dig down into the
description
and start trying out allocations that might fit your workload.