[In-depth explanation of Yarn architecture and implementation] 5-2 Yarn three schedulers

This article will give an in-depth introduction to the three schedulers of Yarn. Yarn itself is a resource management and scheduling service, and the resource scheduling module is the most important. The following will introduce the scheduler function implemented in Yarn, as well as the internal execution logic.

1. Introduction

The main function of Yarn is resource management and allocation. This article will introduce the scheduler, the core component in resource allocation.
The ideal goal of the scheduler is to satisfy resource requests immediately. However, since physical resources are limited, there will be a problem of how to allocate resources. For different resource requirements, different priorities, different resource types, etc., it is difficult to find a perfect strategy that can solve all application scenarios. Therefore, Yarn provides a variety of schedulers and configurable strategies for us to choose from.
All Yarn resource schedulers implement the ResourceScheduler interface, which is a pluggable component. Users can configure parameters to use different schedulers, or write new resource schedulers according to the interface specifications. Three governors are implemented by default in Yarn: FIFO Scheduler, Capacity Scheduler, and Fair Scheduler.
The official introduction to the three schedulers. Just look at the general meaning. With the continuous update and iteration of the scheduler, this diagram no longer conforms to the current situation.

2. FIFO

The simplest strategy is only for testing.
A queue is used to store the tasks waiting for submission. The task submitted first will allocate resources first, and the remaining resources will be allocated to subsequent tasks waiting in line. If there are no resources, the subsequent tasks will wait for the previous tasks to release resources.
Benefits:
Simple, works out of the box and requires no additional configuration. Earlier versions of Yarn used FIFO as the default scheduling strategy, and later changed to CapacityScheduler as the default scheduling strategy.
Cons:
In addition to being simple, it is a disadvantage, and you cannot configure various scheduling strategies you want (limiting the amount of resources, limiting users, resource grabbing, etc.).

3. CapacityScheduler

1) Introduction to CS

Capacity Scheduler (later replaced by CS abbreviation) divides resources in units of queues. Each queue will be configured with minimum guaranteed resources and maximum available resources. The minimum resource configuration guarantees that the queue will be able to obtain so many resources, which can be shared with other queues when there is free time; the maximum available resource limits the maximum resources that the queue can use to prevent excessive consumption.
Queues can be nested inside to form a hierarchical structure. The resources in the queue are allocated in FIFO mode by default. As shown below.

advantage:

The minimum resource guarantee of the queue prevents small applications from starving to death;
Free capacity sharing, when the queue configuration resources are free, they can be shared with other queues

shortcoming:

Queue configuration is cumbersome. Parent queues and child queues must be individually configured with priority, maximum resource, minimum resource, user maximum resource, user minimum resource, user permission configuration, and so on. A program will be written in the project to automatically generate the configuration;

2) CS features

Hierarchical Queues: Supports queue hierarchy, and child queues can allocate resources available to parent queues.
Capacity Guarantees: Each queue will be configured with a minimum capacity guarantee. When cluster resources are tight, at least the resources that each queue can allocate will be guaranteed.
Elasticity: When the queue configuration resources are free, they can be allocated to other queues with resource requirements. These resources can be snatched back when they are needed again.
Security: Each queue has strict ACLs that control which users can submit applications to which queues.
Multi-tenancy: Provides comprehensive constraints to prevent individual applications, users, and queues from monopolizing the resources of the queue or cluster as a whole.
Priority Scheduling: This feature allows applications to be submitted and scheduled at different priorities. At the same time, priority configuration is also supported between queues (supported after 2.9.0).
Absolute Resource Configuration: Administrators can specify absolute resources for queues instead of providing percentage-based values (supported since 3.1.0).
Resource pool configuration: NodeManager can be divided into different resource pools, and queues are configured in resource pools for resource isolation. At the same time, the resource pool has two modes: shared and independent. In the case of sharing, excess resources will be shared to the default resource pool.

3) CS Configuration

Suppose the queue hierarchy is as follows:

root
├──prod
└── dev
    ├──eng
    └── science

It can be achieved by configuring capacity-scheduler.xml:

<configuration>
  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>prod,dev</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.dev.queues</name>
    <value>eng,science</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.prod.capacity</name>
    <value>40</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.dev.capacity</name>
    <value>60</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.dev.eng.capacity</name>
    <value>50</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.dev.science.capacity</name>
    <value>50</value>
  </property>
</configuration>

In addition to capacity configuration, you can also configure the maximum number of resources that a single user or program can use, run several applications at the same time, permission ACL control, etc., which are not the focus of this article and will not be expanded. Refer to: cloudera – Capacity Scheduler, Hadoop doc – Capacity Scheduler, Hadoop: Capacity Scheduler yarn capacity scheduling configuration.

4) CS Implementation

Here we only focus on the process of CS resource allocation.
CS allocates idle resources on each NM node. Please refer to the previous article “4-3 RM Management NodeManager” for NM resource reporting.

1. Resource request description

AM reports resource requests through heartbeat, which contains the following information.

message ResourceRequestProto {
  optional PriorityProto priority = 1; // priority
  optional string resource_name = 2; // The node or rack where the expected resource is located
  optional ResourceProto capability = 3; // resource amount
  optional int32 num_containers = 4; // Number of Containers
  optional bool relax_locality = 5 [default = true]; // Whether to relax locality
  optional string node_label_expression = 6; // resource pool
}

2. Resource update entry

After NM sends heartbeat to RM, RM will send NODE_UPDATE event, which will be processed by CapacityScheduler.

 case NODE_UPDATE:
    {<!-- -->
      NodeUpdateSchedulerEvent nodeUpdatedEvent = (NodeUpdateSchedulerEvent) event;
      RMNode node = nodeUpdatedEvent.getRMNode();
      setLastNodeUpdateTime(Time.now());
      nodeUpdate(node);
      if (!scheduleAsynchronously) {<!-- -->
        // focus
        allocateContainersToNode(getNode(node. getNodeID()));
      }
    }

The focus is on allocateContainersToNode(), the internal logic is as follows:

Find the most ‘under-served’ queue from the root queue (that is, the smallest allocated resource/configured resource);
Containers that have already reserved resources (RESERVED) are satisfied first
Then process unreserved resource requests. If the resources are not enough, perform RESERVE and wait for the next allocation

Here is a concept of reservation (there will be an article dedicated to the reserve mechanism later):

RESERVED is to prevent container starvation;
Traditional scheduling: For example, a bunch of 1G and 2G container requests, the current cluster is fully occupied by 1G, when a 1G container is completed, the next 1G will still be scheduled, because 2G resources are not enough;
RESERVED is to prevent this from happening, so reserve this resource first, and don’t use it for anyone, and wait until the resource is available next time to make it up until the container resource request is satisfied.

4. FairScheduler

1. Introduction to Fair

Similar to Capacity Seheduler, Fair Scheduler is also a multi-user scheduler. It also adds multi-level resource constraints to better allow multiple users to share a Hadoop cluster, such as queue resource limits, user application number limits, and so on.
In the Fair scheduler, we do not need to occupy certain system resources in advance, and the Fair scheduler will dynamically adjust system resources for all running jobs. As shown in the figure below, when the first big job is submitted, only this job is running, and it has obtained all cluster resources; when the second small job is submitted, the Fair scheduler will allocate half of the resources to this small task , allowing the two tasks to share cluster resources fairly.

The design goal of the Fair scheduler is to allocate fair resources to all applications (the definition of fairness can be set through parameters).
Benefits:

The resources allocated to each application depend on its priority;
It can limit concurrently running tasks in a specific pool or queue.

2) Fair features

The fair scheduler is able to share the resources of the entire cluster
No need to pre-occupy resources, each job is shared
Whenever a job is submitted, it takes up the entire resource. If another job is submitted, the first job will allocate part of the resources to the second job, and the first job will release part of the resources. The same is true when submitting other assignments. That is to say, every job comes in and has the opportunity to obtain resources.
weight attribute, and use this attribute as the basis for fair scheduling. If the weights of the two queues are set to 2 and 3, it is considered fair when the scheduler allocates 40:60 resources of the cluster to the two queues.
There can still be different scheduling policies inside each queue. The default scheduling policy of the queue can be configured through the top-level element. If not configured, fair scheduling is used by default.

3) Fair configuration

In FairScheduler, “fairness” is achieved by configuring queue weights in fair-scheduler.xml.
The calculation is based on (current queue weight / total weight) to get the percentage of resources that the current queue can share.
For more detailed parameter configuration, please refer to: Detailed Explanation of Yarn Scheduler Scheduler

<queue name="first">
  <minResources>512mb, 4vcores</minResources>
  <maxResources>30720nb, 30vcores</maxResources>
  <maxRunningApps>100</maxRunningApps>
  <schedulingMode>fair</schedulingMode>
  <weight>2.0</weight>
</queue>

<queue name="second">
  <minResources>512mb, 4vcores</minResources>
  <maxResources>30720nb, 30vcores</maxResources>
  <maxRunningApps>100</maxRunningApps>
  <schedulingMode>fair</schedulingMode>
  <weight>1.0</weight>
</queue>

5. The difference between Fair Scheduler and Capacity Scheduler

Similar

Both support multi-user and multi-queue, that is: suitable for multi-user shared cluster application environment
Both support hierarchical queues
It supports dynamic configuration modification, which better guarantees the stable operation of the cluster.
Both support resource sharing, that is, when there are surplus resources in a certain queue, they can be shared with other queues that lack resources
A single queue supports priority and FIFO scheduling

Differences

The scheduling strategy of the Capacity Scheduler is to select a queue with low resource utilization first, and then schedule in the queue through FIFO or DRF.
The scheduling strategy of Fair Scheduler is to use the fair sorting algorithm to select the queue, and then schedule in the queue through Fair (default), FIFO or DRF.

6. Summary

This article introduces ResourceScheduler, an important resource scheduling module in Yarn. As a pluggable component, there are three default implementation methods: Fifo, CapacityScheduler, and FairScheduler.
In this paper, the function, characteristics, configuration and realization of the three schedulers are analyzed in detail. If you are interested in the implementation details, you can go deep into the source code and explore further.