spark on yarn configuration, deployment mode DeployMode

Table of Contents 1. The essence of spark on yarn 2. spark on yarn configuration 1.spark-env.sh file 2.yarn-site.xml file 3.spark-defaults.conf file 4. Upload the spark dependent jar package 5. Modify the spark-defaults.conf file 6.Yarn resource check configuration (to prevent the virtual machine from running out of memory due to insufficient memory) 7. Start the cluster […]

How to enable and configure YARN

How to enable and configure YARN Apache Hadoop’s YARN (Yet Another Resource Negotiator) is a cluster resource manager used to allocate and manage resources in the cluster. This article will describe how to enable and configure YARN. 1. YARN configuration file Before you start configuring YARN, you need to create the configuration files required by […]

Centos7 yum installs nodejs, npm, cnpm, pm2, yarn

1. Environmental preparation 1.1 Check the system environment [root@localhost ~]# cat /etc/redhat-release CentOS Linux release 7.5.1804 (Core) [root@localhost ~]# uname -m x86_64 [root@localhost ~]# uname -r 3.10.0-862.el7.x86_64 1.2 Turn off the firewall and selinux 1.2.1 Turn off the firewall [root@localhost ~]# /bin/systemctl stop firewalld [root@localhost ~]# /bin/systemctl disable firewalld 1.2.2 Turn off selinux [root@localhost ~]# […]

Deployment of MapReduce & YARN

1. Deployment instructions Hadoop HDFS distributed file system, we will start: NameNode process as management node DataNode process as worker node SecondaryNamenode as secondary In the same way, Hadoop YARN distributed resource scheduling will start: ResourceManager process as management node NodeManager process as worker node The two auxiliary nodes of ProxyServer and JobHistoryServer MapReduce runs […]

NodeLabel configuration for Yarn

Basic introduction The YARN Node Labels feature supports partition management of YARN NodeManager nodes. Because a node can only be set to belong to a certain Node Label, the Node Label can be used to divide the entire YARN cluster into disjoint node sets. The default node belongs to the DEFAULT partition (partition=””, empty string). […]

MapReduce on YARN in action: Development framework and practice using MapReduce

Author: Zen and the Art of Computer Programming 1. Introduction Hadoop is a distributed computing platform open sourced by the Apache Foundation. It is a highly fault-tolerant and highly reliable storage system that can support real-time analysis and processing on extremely large-scale data sets. It is also one of the most popular big data frameworks […]

The third major component of Hadoop: YARN framework

The third major component of Hadoop-YARN framework 1. Basic concepts of YARN 2. Basic architecture of YARN 1. ResourceManager: Manager of YARN cluster 2.NodeManager 3. Container 4. Application Master 3. Detailed workflow of YARN – running MapReduce 4. YARN resource scheduler issues 5. YARN web website problems 1. Basic concepts of YARN YARN is a […]

Flink–2, Flink deployment (session mode deployment, single job mode deployment, application mode deployment under Yarn cluster construction)

You must win before you can say you don’t care about winning or losing Article directory 1. Flink deployment 1.1 Cluster role 1.2 Flink cluster construction 1.2.1 Cluster startup 1.2.2 Submit jobs to the cluster 1.3 Deployment mode 1.3.1 Session Mode 1.3.2 Per-Job Mode 1.3.3 Application Mode 1.4 Standalone operating mode 1.4.1 Session mode deployment […]

Hadoop: A Beginner’s Guide to Hadoop Yarn (Part I)

Author: Zen and the Art of Computer Programming 1. Introduction Hadoop is an open source distributed computing framework, its original name is MapReduce. The YARN (Yet Another Resource Negotiator) project is one of its sub-projects. This series of articles will explain the underlying mechanism of Hadoop from scratch and go into the three modules of […]