The latest version of high-availability hadoop cluster construction and troubleshooting
Comment
Note: This article only explains the deployment of highly available hadoop clusters. If you need to learn hadoop systematically, you can refer to the gitee document: Big Data Getting Started Guide
Cluster Architecture
Cluster construction
Pre-steps
All nodes execute
Configure hosts: vim /etc/hosts 172.16.2.242 hadoop001 172.16.2.243 hadoop002 172.16.2.244 hadoop003 172.16.2.245 hadoop004 172.16.2.246 hadoop005 Configure password-free: ssh-keygen //Enter all ssh-copy-id hadoop002 ssh-copy-id hadoop003 ssh-copy-id hadoop004 ssh-copy-id hadoop005 //Take hadoop001 as an example here, and the rest of the nodes are executed in sequence
1. jdk installation
All nodes execute:
Create jdk folder: mkdir -p /opt/jdk & amp; & amp; cd /opt/jdk/ Download the jdk package: wget https://bobo.bcebos.cloud.geely.com/jdk-11.0.13_linux-x64_bin.tar.gz Unzip the jdk package: tar zxvf jdk-11.0.13_linux-x64_bin.tar.gz Add jkd environment variable vim /etc/profile ..... //Finally add the following two lines export JAVA_HOME=/opt/jdk/jdk-11.0.13 export JRE_HOME=${<!-- -->JAVA_HOME}/jre export CLASSPATH=.:${<!-- -->JAVA_HOME}/lib:${<!-- -->JRE_HOME}/lib export PATH=${<!-- -->JAVA_HOME}/bin:$PATH Reload environment variables: source /etc/profile Test to view jdk: java-version
2. Zookeeper cluster construction
hadoop001 node execution:
Create zk folder mkdir -p /opt/zk & amp; & amp; cd /opt/zk/ Download the zk binary package: wget https://archive.apache.org/dist/zookeeper/zookeeper-3.8.1/apache-zookeeper-3.8.1-bin.tar.gz Unzip the zk package: tar zxvf apache-zookeeper-3.8.1-bin.tar.gz Add the zk environment variable: vim /etc/profile ..... //Finally add the following two lines export ZOOKEEPER_HOME=/opt/zk/apache-zookeeper-3.8.1-bin export PATH=$ZOOKEEPER_HOME/bin:$PATH Reload environment variables: source /etc/profile Copy zoo_sample.cfg to zoo.cfg: cd /opt/zk/apache-zookeeper-3.8.1-bin/conf/ & amp; & amp; cp zoo_sample.cfg zoo.cfg Modify the configuration file: vim zoo.cfg //Modify the following configuration tickTime=2000 initLimit=10 syncLimit=5 dataDir=/tmp/zookeeper/data/ dataLogDir=/tmp/zookeeper/log/ clientPort=2181 # server.1 This 1 is the identification of the server, which can be any valid number, indicating which server node this is, and this identification should be written in the myid file under the dataDir directory # Specify the inter-cluster communication port and election port server.1=hadoop001:2287:3387 server.2=hadoop002:2287:3387 server.3=hadoop003:2287:3387 Copy zk configuration to hadoop002: scp -r /opt/zk/ root@hadoop002:/opt/ Copy zk configuration to hadoop003: scp -r /opt/zk/ root@hadoop003:/opt/ Create the zk data folder: mkdir -vp /tmp/zookeeper/data/ Configure the zk node id: echo "1" > /tmp/zookeeper/data/myid
hadoop002 node execution:
Create zk data folder: mkdir -vp /tmp/zookeeper/data/ Configure the zk node id: echo "2" > /tmp/zookeeper/data/myid Add the zk environment variable: vim /etc/profile ..... //Finally add the following two lines export ZOOKEEPER_HOME=/opt/zk/apache-zookeeper-3.8.1-bin export PATH=$ZOOKEEPER_HOME/bin:$PATH Reload environment variables: source /etc/profile
hadoop003 node execution:
Create zk data folder: mkdir -vp /tmp/zookeeper/data/ Configure the zk node id: echo "3" > /tmp/zookeeper/data/myid Add the zk environment variable: vim /etc/profile ..... //Finally add the following two lines export ZOOKEEPER_HOME=/opt/zk/apache-zookeeper-3.8.1-bin export PATH=$ZOOKEEPER_HOME/bin:$PATH Reload environment variables: source /etc/profile
hadoop001, hadoop002, hadoop003 node execution:
Start zk: /opt/zk/apache-zookeeper-3.8.1-bin/bin/zkServer.sh start View the status of each node: /opt/zk/apache-zookeeper-3.8.1-bin/bin/zkServer.sh status
3. Hadoop cluster construction
hadoop001 node execution:
Create hadoop directory: mkdir -p /opt/hadoop & & cd /opt/hadoop Download hadoop binary package: wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz Unzip the hadoop package: tar zxvf hadoop-3.3.5.tar.gz
All nodes execute:
Add hadoop environment variables: vim /etc/profile ....//add at the end export HADOOP_HOME=/opt/hadoop/hadoop-3.3.5 export PATH=${<!-- -->HADOOP_HOME}/bin:$PATH Reload environment variables: source /etc/profile
hadoop001 node execution:
Enter hadoop configuration file directory: cd ${<!-- -->HADOOP_HOME}/etc/hadoop Modify the core-site.xml configuration file: vim core-site.xml <configuration> <property> <!-- Specify the communication address of the namenode's hdfs protocol file system --> <name>fs.defaultFS</name> <value>hdfs://hadoop001:8020</value> </property> <property> <!-- Specify the directory where the hadoop cluster stores temporary files --> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> </property> <property> <!-- The address of the ZooKeeper cluster --> <name>ha.zookeeper.quorum</name> <value>hadoop001:2181, hadoop002:2181, hadoop002:2181</value> </property> <property> <!-- ZKFC connects to ZooKeeper timeout --> <name>ha.zookeeper.session-timeout.ms</name> <value>10000</value> </property> </configuration> Modify the hdfs-site.xml configuration file: vim hdfs-site.xml <configuration> <property> <!-- Specify the number of HDFS replicas --> <name>dfs.replication</name> <value>3</value> </property> <property> <!-- The storage location of namenode node data (metadata), you can specify multiple directories to achieve fault tolerance, and multiple directories are separated by commas --> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/namenode/data</value> </property> <property> <!-- datanode node data (ie data block) storage location --> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/datanode/data</value> </property> <property> <!-- The logical name of the cluster service --> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <!-- List of NameNode IDs --> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <property> <!-- RPC communication address of nn1 --> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>hadoop001:8020</value> </property> <property> <!-- RPC communication address of nn2 --> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>hadoop002:8020</value> </property> <property> <!-- http communication address of nn1 --> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>hadoop001:50070</value> </property> <property> <!-- http communication address of nn2 --> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>hadoop002:50070</value> </property> <property> <!-- Shared storage directory of NameNode metadata on JournalNode --> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/mycluster</value> </property> <property> <!-- Storage directory for Journal Edit Files --> <name>dfs.journalnode.edits.dir</name> <value>/home/hadoop/journalnode/data</value> </property> <property> <!-- Configure the isolation mechanism to ensure that only one NameNode is active at any given time --> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <!-- ssh password-free login is required when using the sshfence mechanism --> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <property> <!-- SSH timeout --> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <property> <!-- Access agent class, used to determine the NameNode that is currently in the Active state --> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <!-- Enable automatic failover --> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> </configuration> Modify the yarn-site.xml configuration file: vim yarn-site.xml <configuration> <property> <!--Configure ancillary services running on NodeManager. The MapReduce program can be run on Yarn only after it is configured as mapreduce_shuffle. --> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <!-- Whether to enable log aggregation (optional) --> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <!-- Storage time of aggregated logs (optional) --> <name>yarn.log-aggregation.retain-seconds</name> <value>86400</value> </property> <property> <!-- Enable RM HA --> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <!-- RM cluster ID --> <name>yarn.resourcemanager.cluster-id</name> <value>my-yarn-cluster</value> </property> <property> <!-- List of logical IDs for RM --> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <!-- Service address of RM1 --> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop002</value> </property> <property> <!-- RM2 service address --> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop003</value> </property> <property> <!-- Address of RM1 Web Application --> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>hadoop002:8088</value> </property> <property> <!-- Address of RM2 web application --> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>hadoop003:8088</value> </property> <property> <!-- The address of the ZooKeeper cluster --> <name>yarn.resourcemanager.zk-address</name> <value>hadoop001:2181, hadoop002:2181, hadoop003:2181</value> </property> <property> <!-- Enable auto recovery --> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <!-- Classes for persistent storage --> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.application.classpath</name> <value>/opt/hadoop/hadoop-3.3.5/etc/hadoop:/opt/hadoop/hadoop-3.3.5/share/hadoop/common/lib/*:/opt/hadoop/hadoop-3.3.5/ share/hadoop/common/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/hdfs:/opt/hadoop/hadoop-3.3.5/share/hadoop/hdfs/lib/*:/opt/hadoop /hadoop-3.3.5/share/hadoop/hdfs/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/mapreduce/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/yarn :/opt/hadoop/hadoop-3.3.5/share/hadoop/yarn/lib/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/yarn/*</value> </property> </configuration> //yarn.application.classpath value can be obtained through: hadoop classpath command Modify the mapred-site.xml configuration file: vim mapred-site.xml <configuration> <property> <!--Specify the mapreduce job to run on yarn--> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/opt/hadoop/hadoop-3.3.5</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/opt/hadoop/hadoop-3.3.5</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/opt/hadoop/hadoop-3.3.5</value> </property> <property> <name>mapreduce.application.classpath</name> <value>/opt/hadoop/hadoop-3.3.5/etc/hadoop:/opt/hadoop/hadoop-3.3.5/share/hadoop/common/lib/*:/opt/hadoop/hadoop-3.3.5/ share/hadoop/common/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/hdfs:/opt/hadoop/hadoop-3.3.5/share/hadoop/hdfs/lib/*:/opt/hadoop /hadoop-3.3.5/share/hadoop/hdfs/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/mapreduce/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/yarn :/opt/hadoop/hadoop-3.3.5/share/hadoop/yarn/lib/*:/opt/hadoop/hadoop-3.3.5/share/hadoop/yarn/*</value> </property> </configuration> //mapreduce.application.classpath value can be obtained through: hadoop classpath command Copy hadoop configuration files to other nodes: scp -r /opt/hadoop/hadoop-3.3.5 hadoop002:/opt/hadoop/ scp -r /opt/hadoop/hadoop-3.3.5 hadoop003:/opt/hadoop/ scp -r /opt/hadoop/hadoop-3.3.5 hadoop004:/opt/hadoop/ scp -r /opt/hadoop/hadoop-3.3.5 hadoop005:/opt/hadoop/
hadoop001, hadoop002, hadoop003 node execution:
Start journalnode: cd ${<!-- -->HADOOP_HOME}/sbin & amp; & amp; ./hadoop-daemon.sh start journalnode
hadoop001 node execution:
Initialize namenode: hdfs namenode -format Copy the namenode configuration to another namenode node: scp -r /home/hadoop/namenode/data hadoop002:/home/hadoop/namenode/
All nodes execute:
Add the environment variables required for Hadoop to run: vim /etc/profile .....//Add at the end export HDFS_ZKFC_USER=root export HDFS_JOURNALNODE_USER=root export HDFS_NAMENODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_DATANODE_SECURE_USER=root #export HADOOP_SECURE_DN_USER=root Reload environment variables: source /etc/profile
Arbitrary namenode node execution:
Initialize HA state: hdfs zkfc -formatZK start hdfs: cd /opt/hadoop/hadoop-3.3.5/sbin & amp; & amp; ./start-dfs.sh Start yarn: cd /opt/hadoop/hadoop-3.3.5/sbin & amp; & amp; ./start-yarn.sh Each machine performs a jps check for compliance with the schema: jps
Fourth, test hadoop cluster
hadoop jar /opt/hadoop/hadoop-3.3.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar pi 3 3
Troubleshooting
Error report:
namenode node execution:
View the current namenode node status /opt/hadoop/hadoop-3.3.5/bin/hdfs haadmin -getServiceState nn1 standby /opt/hadoop/hadoop-3.3.5/bin/hdfs haadmin -getServiceState nn2 active Modify the master and backup of the namenode node: /opt/hadoop/hadoop-3.3.5/bin/hdfs haadmin -transitionToStandby --forcemanual nn2 /opt/hadoop/hadoop-3.3.5/bin/hdfs haadmin -transitionToActive --forcemanual nn1 Check the namenode node status again: /opt/hadoop/hadoop-3.3.5/bin/hdfs haadmin -getServiceState nn1 active /opt/hadoop/hadoop-3.3.5/bin/hdfs haadmin -getServiceState nn2 standby Execute the task again: hadoop jar /opt/hadoop/hadoop-3.3.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar pi 3 3 //execution succeed