The latest version of high-availability hadoop cluster construction and troubleshooting

The latest version of high-availability hadoop cluster construction and troubleshooting

Comment

Note: This article only explains the deployment of highly available hadoop clusters. If you need to learn hadoop systematically, you can refer to the gitee document: Big Data Getting Started Guide

Cluster Architecture

Please add a picture description

Cluster construction

Pre-steps

All nodes execute

Configure hosts:
vim /etc/hosts
172.16.2.242 hadoop001
172.16.2.243 hadoop002
172.16.2.244 hadoop003
172.16.2.245 hadoop004
172.16.2.246 hadoop005

Configure password-free:
ssh-keygen //Enter all

ssh-copy-id hadoop002
ssh-copy-id hadoop003
ssh-copy-id hadoop004
ssh-copy-id hadoop005
//Take hadoop001 as an example here, and the rest of the nodes are executed in sequence

1. jdk installation

All nodes execute:

Create jdk folder:
mkdir -p /opt/jdk & amp; & amp; cd /opt/jdk/

Download the jdk package:
wget https://bobo.bcebos.cloud.geely.com/jdk-11.0.13_linux-x64_bin.tar.gz

Unzip the jdk package:
tar zxvf jdk-11.0.13_linux-x64_bin.tar.gz

Add jkd environment variable
vim /etc/profile
..... //Finally add the following two lines
export JAVA_HOME=/opt/jdk/jdk-11.0.13
export JRE_HOME=${<!-- -->JAVA_HOME}/jre
export CLASSPATH=.:${<!-- -->JAVA_HOME}/lib:${<!-- -->JRE_HOME}/lib
export PATH=${<!-- -->JAVA_HOME}/bin:$PATH

Reload environment variables:
source /etc/profile

Test to view jdk:
java-version

2. Zookeeper cluster construction

hadoop001 node execution:

Create zk folder
mkdir -p /opt/zk & amp; & amp; cd /opt/zk/

Download the zk binary package:
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.8.1/apache-zookeeper-3.8.1-bin.tar.gz

Unzip the zk package:
tar zxvf apache-zookeeper-3.8.1-bin.tar.gz

Add the zk environment variable:
vim /etc/profile
..... //Finally add the following two lines
export ZOOKEEPER_HOME=/opt/zk/apache-zookeeper-3.8.1-bin
export PATH=$ZOOKEEPER_HOME/bin:$PATH

Reload environment variables:
source /etc/profile

Copy zoo_sample.cfg to zoo.cfg:
cd /opt/zk/apache-zookeeper-3.8.1-bin/conf/ & amp; & amp; cp zoo_sample.cfg zoo.cfg

Modify the configuration file:
vim zoo.cfg
//Modify the following configuration
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper/data/
dataLogDir=/tmp/zookeeper/log/
clientPort=2181
# server.1 This 1 is the identification of the server, which can be any valid number, indicating which server node this is, and this identification should be written in the myid file under the dataDir directory
# Specify the inter-cluster communication port and election port
server.1=hadoop001:2287:3387
server.2=hadoop002:2287:3387
server.3=hadoop003:2287:3387

Copy zk configuration to hadoop002:
scp -r /opt/zk/ root@hadoop002:/opt/

Copy zk configuration to hadoop003:
scp -r /opt/zk/ root@hadoop003:/opt/

Create the zk data folder:
mkdir -vp /tmp/zookeeper/data/

Configure the zk node id:
echo "1" > /tmp/zookeeper/data/myid

hadoop002 node execution:

Create zk data folder:
mkdir -vp /tmp/zookeeper/data/

Configure the zk node id:
echo "2" > /tmp/zookeeper/data/myid

Add the zk environment variable:
vim /etc/profile
..... //Finally add the following two lines
export ZOOKEEPER_HOME=/opt/zk/apache-zookeeper-3.8.1-bin
export PATH=$ZOOKEEPER_HOME/bin:$PATH

Reload environment variables:
source /etc/profile

hadoop003 node execution:

Create zk data folder:
mkdir -vp /tmp/zookeeper/data/

Configure the zk node id:
echo "3" > /tmp/zookeeper/data/myid

Add the zk environment variable:
vim /etc/profile
..... //Finally add the following two lines
export ZOOKEEPER_HOME=/opt/zk/apache-zookeeper-3.8.1-bin
export PATH=$ZOOKEEPER_HOME/bin:$PATH

Reload environment variables:
source /etc/profile

hadoop001, hadoop002, hadoop003 node execution:

Start zk:
/opt/zk/apache-zookeeper-3.8.1-bin/bin/zkServer.sh start

View the status of each node:
/opt/zk/apache-zookeeper-3.8.1-bin/bin/zkServer.sh status

3. Hadoop cluster construction

hadoop001 node execution:

Create hadoop directory:
mkdir -p /opt/hadoop & & cd /opt/hadoop

Download hadoop binary package:
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz

Unzip the hadoop package:
tar zxvf hadoop-3.3.5.tar.gz

All nodes execute:

Add hadoop environment variables:
vim /etc/profile
....//add at the end
export HADOOP_HOME=/opt/hadoop/hadoop-3.3.5
export PATH=${<!-- -->HADOOP_HOME}/bin:$PATH

Reload environment variables:
source /etc/profile

hadoop001 node execution:

Enter hadoop configuration file directory:
cd ${<!-- -->HADOOP_HOME}/etc/hadoop

Modify the core-site.xml configuration file:
vim core-site.xml
<configuration>
    <property>
        <!-- Specify the communication address of the namenode's hdfs protocol file system -->
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop001:8020</value>
    </property>
    <property>
        <!-- Specify the directory where the hadoop cluster stores temporary files -->
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/tmp</value>
    </property>
    <property>
        <!-- The address of the ZooKeeper cluster -->
        <name>ha.zookeeper.quorum</name>
        <value>hadoop001:2181, hadoop002:2181, hadoop002:2181</value>
    </property>
    <property>
        <!-- ZKFC connects to ZooKeeper timeout -->
        <name>ha.zookeeper.session-timeout.ms</name>
        <value>10000</value>
    </property>
</configuration>

Modify the hdfs-site.xml configuration file:
vim hdfs-site.xml
<configuration>
    <property>
        <!-- Specify the number of HDFS replicas -->
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <!-- The storage location of namenode node data (metadata), you can specify multiple directories to achieve fault tolerance, and multiple directories are separated by commas -->
        <name>dfs.namenode.name.dir</name>
        <value>/home/hadoop/namenode/data</value>
    </property>
    <property>
        <!-- datanode node data (ie data block) storage location -->
        <name>dfs.datanode.data.dir</name>
        <value>/home/hadoop/datanode/data</value>
    </property>
    <property>
        <!-- The logical name of the cluster service -->
        <name>dfs.nameservices</name>
        <value>mycluster</value>
    </property>
    <property>
        <!-- List of NameNode IDs -->
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
    </property>
    <property>
        <!-- RPC communication address of nn1 -->
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>hadoop001:8020</value>
    </property>
    <property>
        <!-- RPC communication address of nn2 -->
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>hadoop002:8020</value>
    </property>
    <property>
        <!-- http communication address of nn1 -->
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>hadoop001:50070</value>
    </property>
    <property>
        <!-- http communication address of nn2 -->
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>hadoop002:50070</value>
    </property>
    <property>
        <!-- Shared storage directory of NameNode metadata on JournalNode -->
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/mycluster</value>
    </property>
    <property>
        <!-- Storage directory for Journal Edit Files -->
        <name>dfs.journalnode.edits.dir</name>
        <value>/home/hadoop/journalnode/data</value>
    </property>
    <property>
        <!-- Configure the isolation mechanism to ensure that only one NameNode is active at any given time -->
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>
    <property>
        <!-- ssh password-free login is required when using the sshfence mechanism -->
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
    </property>
    <property>
        <!-- SSH timeout -->
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
    </property>
    <property>
        <!-- Access agent class, used to determine the NameNode that is currently in the Active state -->
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
        <!-- Enable automatic failover -->
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
</configuration>

Modify the yarn-site.xml configuration file:
vim yarn-site.xml
<configuration>
    <property>
        <!--Configure ancillary services running on NodeManager. The MapReduce program can be run on Yarn only after it is configured as mapreduce_shuffle. -->
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <!-- Whether to enable log aggregation (optional) -->
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>
        <!-- Storage time of aggregated logs (optional) -->
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>86400</value>
    </property>
    <property>
        <!-- Enable RM HA -->
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>
    <property>
        <!-- RM cluster ID -->
        <name>yarn.resourcemanager.cluster-id</name>
        <value>my-yarn-cluster</value>
    </property>
    <property>
        <!-- List of logical IDs for RM -->
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>
    <property>
        <!-- Service address of RM1 -->
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>hadoop002</value>
    </property>
    <property>
        <!-- RM2 service address -->
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>hadoop003</value>
    </property>
    <property>
        <!-- Address of RM1 Web Application -->
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>hadoop002:8088</value>
    </property>
    <property>
        <!-- Address of RM2 web application -->
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>hadoop003:8088</value>
    </property>
    <property>
        <!-- The address of the ZooKeeper cluster -->
        <name>yarn.resourcemanager.zk-address</name>
        <value>hadoop001:2181, hadoop002:2181, hadoop003:2181</value>
    </property>
    <property>
        <!-- Enable auto recovery -->
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>
    <property>
        <!-- Classes for persistent storage -->
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>
    <property>
        <name>yarn.application.classpath</name>
        <value>/opt/hadoop/hadoop-3.3.5/etc/hadoop:/opt/hadoop/hadoop-3.3.5/share/hadoop/common/lib/*:/opt/hadoop/hadoop-3.3.5/ share/hadoop/common/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/hdfs:/opt/hadoop/hadoop-3.3.5/share/hadoop/hdfs/lib/*:/opt/hadoop /hadoop-3.3.5/share/hadoop/hdfs/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/mapreduce/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/yarn :/opt/hadoop/hadoop-3.3.5/share/hadoop/yarn/lib/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/yarn/*</value>
    </property>
</configuration>

//yarn.application.classpath value can be obtained through: hadoop classpath command

Modify the mapred-site.xml configuration file:
vim mapred-site.xml
<configuration>
    <property>
        <!--Specify the mapreduce job to run on yarn-->
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=/opt/hadoop/hadoop-3.3.5</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=/opt/hadoop/hadoop-3.3.5</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=/opt/hadoop/hadoop-3.3.5</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>/opt/hadoop/hadoop-3.3.5/etc/hadoop:/opt/hadoop/hadoop-3.3.5/share/hadoop/common/lib/*:/opt/hadoop/hadoop-3.3.5/ share/hadoop/common/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/hdfs:/opt/hadoop/hadoop-3.3.5/share/hadoop/hdfs/lib/*:/opt/hadoop /hadoop-3.3.5/share/hadoop/hdfs/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/mapreduce/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/yarn :/opt/hadoop/hadoop-3.3.5/share/hadoop/yarn/lib/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/yarn/*</value>
    </property>
</configuration>

//mapreduce.application.classpath value can be obtained through: hadoop classpath command

Copy hadoop configuration files to other nodes:
scp -r /opt/hadoop/hadoop-3.3.5 hadoop002:/opt/hadoop/
scp -r /opt/hadoop/hadoop-3.3.5 hadoop003:/opt/hadoop/
scp -r /opt/hadoop/hadoop-3.3.5 hadoop004:/opt/hadoop/
scp -r /opt/hadoop/hadoop-3.3.5 hadoop005:/opt/hadoop/

hadoop001, hadoop002, hadoop003 node execution:

Start journalnode:
cd ${<!-- -->HADOOP_HOME}/sbin & amp; & amp; ./hadoop-daemon.sh start journalnode

hadoop001 node execution:

Initialize namenode:
hdfs namenode -format

Copy the namenode configuration to another namenode node:
scp -r /home/hadoop/namenode/data hadoop002:/home/hadoop/namenode/

All nodes execute:

Add the environment variables required for Hadoop to run:
vim /etc/profile
.....//Add at the end
export HDFS_ZKFC_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
#export HADOOP_SECURE_DN_USER=root

Reload environment variables:
source /etc/profile

Arbitrary namenode node execution:

Initialize HA state:
hdfs zkfc -formatZK

start hdfs:
cd /opt/hadoop/hadoop-3.3.5/sbin & amp; & amp; ./start-dfs.sh

Start yarn:
cd /opt/hadoop/hadoop-3.3.5/sbin & amp; & amp; ./start-yarn.sh

Each machine performs a jps check for compliance with the schema:
jps

Fourth, test hadoop cluster

hadoop jar /opt/hadoop/hadoop-3.3.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar pi 3 3

Troubleshooting

Error report:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture Save it and upload directly (img-ZvyRuZDr-1679652327153) (high availability hadoop cluster building.assets/image-20230324174408152.png)]

namenode node execution:

View the current namenode node status
/opt/hadoop/hadoop-3.3.5/bin/hdfs haadmin -getServiceState nn1
standby
/opt/hadoop/hadoop-3.3.5/bin/hdfs haadmin -getServiceState nn2
active

Modify the master and backup of the namenode node:
/opt/hadoop/hadoop-3.3.5/bin/hdfs haadmin -transitionToStandby --forcemanual nn2
/opt/hadoop/hadoop-3.3.5/bin/hdfs haadmin -transitionToActive --forcemanual nn1

Check the namenode node status again:
/opt/hadoop/hadoop-3.3.5/bin/hdfs haadmin -getServiceState nn1
active
/opt/hadoop/hadoop-3.3.5/bin/hdfs haadmin -getServiceState nn2
standby

Execute the task again:
hadoop jar /opt/hadoop/hadoop-3.3.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar pi 3 3
//execution succeed