HadoopApache Hadoop YARN

Personal homepage--Driving a tractor home_Linux, Java basic learning, big data operation and maintenance-CSDN blog ?

Thanks for the likes and attention, we will make a little progress every day! come on!

Table of Contents

1. Overview of YARN

2. YARN infrastructure

2.1 ResourceManager (RM)

2.1.1 Scheduler

2.1.2 ApplicationManager

2.2 ApplicationMaster (AM)

2.3 NodeManager (NM)

2.4 Containers

3. YARN job submission process

4. YARN common commands and resource configuration parameters

4.1 Common Yarn commands

4.2 yarn-site.xml


1. Overview of YARN


Apache Hadoop YARN (Yet Another Resource Negotiator, another resource coordinator) is a new Hadoop resource manager. It is a general resource management system that can provide unifiedResource management and scheduling, its introduction has brought huge benefits to the cluster in terms of utilization, unified resource management and data sharing.

  • Resource management system: manages the cluster’s CPU and memory. YARN does not manage disks because disks are managed by HDFS.
  • Scheduling platform: Reasonably allocate resources to applications that apply for resources
  • Versatility: Supports various computing frameworks. YARN doesn’t care what you do, it only cares about the resources you need.

As can be seen from the above figure, the bottom layer of the cluster is HDFS, and above it is the YARN layer. On the YARN layer are various computing frameworks. YARN resource scheduling not only supports MapReduce, but also supports many other frameworks, such as Hive, Spark, Fink and other tasks, and YARN can support various frameworks to read data on HDFS.


2. YARN infrastructure


2.1 ResourceManager (RM)


ResourceManager (RM):RM is a global resource manager responsible for resource management and allocation of the entire system. It is mainly composed of two components: Scheduler (Scheduler) and Application Manager (Applications Manager, ASM) strong>.

The scheduler allocates resources in the system to each running application based on capacity, queue and other constraints (such as each queue allocating certain resources, executing a certain number of jobs at most, etc.). The scheduler only allocates resources based on the resource requirements of each application, and the resource allocation unit is represented by an abstract concept “Resource Container” (Container for short).Container is a dynamic resource allocation unit that combines memory, Resources such as CPU, disk, and network are encapsulated together to limit the amount of resources used by each task. In addition, Scheduler is a pluggable component. Users can design new schedulers according to their own needs. YARN provides a variety of directly available schedulers, such as Fair Scheduler and Capacity Scheduler.

2.1.1 Scheduler


Scheduler is a pluggable plug-in responsible for resource allocation of each running application, which is affected by resource capacity, queues and other factors. It is a pure scheduler that is not responsible for application monitoring and status tracking. It does not guarantee that the TASK will be restarted in the event of application failure or hardware failure. Instead, it performs its scheduling function based on the resource requirements of the application, using what is called a resource container. The concept includes a variety of resources, such as CPU, memory, disk, network, etc. There are three main types of Scheduler in Hadoop’s MapReduce framework: FIFO Scheduler, Capacity Scheduler and Fair Scheduler.

  • FIFO Scheduler: First in, first out, regardless of job priority and scope, suitable for low load clusters.
  • Capacity Scheduler: Divide resources into multiple queues, allowing cluster sharing, and ensuring minimum resource usage for each queue.
  • Fair Scheduler: A fair way to allocate resources to applications so that all applications get the same share of resources over time on average.

2.1.2 ApplicationManager


The ApplicationManager is mainly responsible for receiving job submission requests, allocating the first Container to the application to run the ApplicationMaster, and is responsible for monitoring the ApplicationMaster and restarting the Container running the ApplicationMaster when encountering a failure.

2.2 ApplicationMaster (AM)


ApplicationMaster (AM): Each application submitted by a user contains an AM. Its main functions include:

  • Negotiate with the ResourceManager scheduler to obtain resources (represented by Container);
  • Further allocate the obtained tasks to internal tasks (secondary allocation of resources);
  • Communicate with NM to start/stop tasks;
  • Monitor the running status of all tasks, and re-apply resources for the task to restart the task when the task fails.

When an ApplicationMaster is started, it will periodically send heartbeat reports to the ResourceManager to confirm its health and required resources. In the built demand model, the ApplicationMaster encapsulates preferences and restrictions in the heartbeat information sent to the ResourceManager. In subsequent heartbeats, the ApplicationMaster will receive a lease for a Container bound to a certain resource on a specific node in the cluster. Based on the Container sent by the ResourceManager, the ApplicationMaster can update its execution plan to adapt to insufficient or excessive resources. The Container can dynamically Allocate and release resources.

2.3 NodeManager (NM)


NodeManager (NM): NM is the resource and task manager on each node. On the one hand, it will regularly report to RM the resource usage on this node and the running status of each Container; on the other hand, it will regularly report to RM the resource usage on this node and the running status of each Container; On the one hand, it receives and processes various requests from AM such as Container start/stop.

NodeManager is a “worker process” agent of the yarn node. It manages independent computing nodes in the hadoop cluster. It is mainly responsible for communicating with ResourceManager, responsible for starting and managing the life cycle of the application’s container, and monitoring their resource usage (cpu and memory). , track the monitoring status of nodes, manage logs, etc. and report to RM.

When NodeManager starts, NodeManager registers with ResourceManager and then sends a heartbeat packet to wait for instructions from ResourceManager. The main purpose is to manage the application container assigned to it by resourcemanager. NodeManager is only responsible for managing its own Container, and it does not know the information of the applications running on it. During the runtime, through NodeManager and ResourceManager working together, this information will be continuously updated and ensure that the entire cluster operates at its best.

Major Responsibilities:
1. Receive the request from ResourceManager and allocate Container to a certain task of the application
2. Exchange information with ResourceManager to ensure the smooth operation of the entire cluster. ResourceManager tracks the health status of the entire cluster by collecting report information from each NodeManager, and NodeManager is responsible for monitoring its own health status.
3. Manage the life cycle of each Container
4. Manage logs on each node
5. Execute some additional services applied on Yarn, such as the MapReduce shuffle process.

2.4 Container


Container: Container is a resource abstraction in YARN. It encapsulates multi-dimensional resources on a certain node, such as memory, CPU, etc. When AM applies for resources from RM, RM returns the resources for AM. It is represented by Container. YARN allocates a Container to each task, and the task can only use the resources described in the Container. YARN only supports two resources: CPU and memory, and uses the lightweight resource isolation mechanism Cgroups for resource isolation.

YARN’s resource management and execution framework are implemented according to the master/slave paradigm. The node manager (NM) runs, monitors each node and reports the resource availability status to the cluster’s resource manager (RM). The resource manager ultimately provides the system Allocate resources to all applications.

The execution of a specific application is controlled by the ApplicationMaster. The ApplicationMaster is responsible for dividing an application into multiple tasks and coordinating the resources required for execution with the resource manager. Once the resources are allocated, the ApplicationMaster works with the node manager to arrange, execute, and monitor independent tasks. Application tasks.


3. YARN job submission process


1. The client program submits the application to ResourceManager and requests an ApplicationMaster instance. ResourceManager gives an applicationId and resource capacity information in the response that helps the client request resources.

2.ResourceManager finds the NodeManager that can run a Container and starts the ApplicationMaster instance in this Container.

The Application Submission Context sends a response, which contains: ApplicationId, username, queue and other information to start the ApplicationMaster.

The Container Launch Context (CLC) is also sent to the ResourceManager. The CLC provides resource requirements, job files, security tokens, and other information needed to launch the ApplicationMaster on the node.

When the ResourceManager receives the context submitted by the client, it will schedule an available Container (usually called container0) to the ApplicationMaster. Then ResourceManager will contact NodeManager to start ApplicationMaster, and establish ApplicationMaster’s RPC port and tracking URL to monitor the status of the application.

3.ApplicationMaster registers with ResourceManager. After registration, the client can query ResourceManager to obtain the detailed information of its ApplicationMaster, and then it can directly interact with its ApplicationMaster. In the registration response, the ResourceManager sends information about the cluster’s maximum and minimum capacity.

4.ApplicationMaster sends a resource-request request to ResourceManager according to the resource-request protocol. ResourceManager will allocate container resources to ApplicationMaster as optimally as possible according to the scheduling policy and send it to ApplicationMaster as a response to the resource request.

5. When the Container is successfully allocated, the ApplicationMaster starts the Container by sending container-launch-specification information to the NodeManager. The container-launch-specification information contains the information needed to allow the Container and ApplicationMaster to communicate. Once the container is started successfully, ApplicationMaster can check their status. Resourcemanager no longer participates in the execution of the program and only handles scheduling and monitoring other resources. Resourcemanager can order NodeManager to kill the container.

6. The application code runs in the started Container, and the running progress, status and other information are sent to the ApplicationMaster through the application-specific protocol. As the job is executed, the ApplicationMaster sends the heartbeat and progress information Sent to ResourceManager, in these heartbeat messages, ApplicationMaster can also request and release some containers.

7. During the running of the application, the client that submits the application actively communicates with the ApplicationMaster to obtain the application’s running status, progress updates and other information. The communication protocol is also an application-specific protocol.

8. Once the application execution is completed and all related work has been completed, the ApplicationMaster unregisters with the ResourceManager and then closes it. All used Containers are also returned to the system. When the container is killed or recycled, the Resourcemanager NodeManager will be notified to aggregate logs and clean up container-specific files.


4. YARN common commands and resource configuration parameters


4.1 Yarn common commands


List all Applications

yarn application -list

Filter tasks based on Application status

yarn application -list -appStates XXX (XXX - ALL, NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED)

Task status

yarn application -status application-id

Kill task

yarn application -kill application-id

Query Application Log

yarn logs -applicationId <ApplicationId>

List all NM nodes

Print queue information

yarn queue -status kangll

YARN tasks can view resource usage and operation status on the WEB page, ResourceManager IP: 8088

You can also see the memory and CPU core usage on each NM as follows:

4.2 yarn-site.xml


The following are commonly used YARN resource configuration parameters

<configuration xmlns:xi="http://www.w3.org/2001/XInclude">


  <!-- Number of AppMaster retries -->
  <property>
    <name>yarn.resourcemanager.am.max-attempts</name>
    <value>2</value>
  </property>
  <!-- Capacity Scheduling -->
  <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
  </property>
  <!-- Maximum memory of the container -->
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>20480</value>
  </property>
  <!-- Maximum number of CPU cores in the container -->
  <property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>16</value>
  </property>
  <!-- Minimum memory of container -->
  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>512</value>
  </property>
  <!-- Minimum number of CPU cores for the container -->
  <property>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>1</value>
  </property>
  <!-- nodemanager CPU core number -->
  <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>16</value>
  </property>
  <!-- nodemanager memory-->
  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>20480</value>
  </property>
  <!-- nodemanager CPU usage limit -->
  <property>
    <name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name>
    <value>80</value>
  </property>


</configuration>

Reference links:

Baidu security verification