Hadoop cluster HA construction

1. Prepare the environment

Prepare three machines.

Modify hostname vim/etc/hostsname hadoop01 hadoop02 hadoop03

Modify hosts vim/etc/hosts

Second, install JDK

a.vim/etc/profile

b. export JAVA_HOME=xxx

c. export PATH=$PATH:$JAVA_HOME/bin

d. source /etc/profile

Third, configure password-free

a. ssh-keygen -t rsa

b. ssh-copy-id root@ip

Fourth, distribution script

#!/bin/bash
 
#1. Determine the number of parameters
if [ $# -lt 1 ]
then
    echo Not Enough Arguement!
    exit;
the fi
 
#2. Traverse all machines in the cluster
for host in hadoop02 hadoop03
do
    echo ===================== $host =====================
    #3. Traverse all directories and send them one by one
 
    for file in $@
    do
        #4. Determine whether the file exists
        if [ -e $file ]
            then
                #5. Get the parent directory
                pdir=$(cd -P $(dirname $file); pwd)
 
                #6. Get the name of the current file
                fname=$(basename $file)
                ssh $host "mkdir -p $pdir"
                rsync -av $pdir/$fname $host:$pdir
            else
                echo $file does not exist!
        the fi
    done
done

5. Install zookeeper

First download the zookeeper package from the official website

Configure environment variables

export ZK_HOME=/training/zookeeper-3.4.5

export PATH=$PATH:$ZK_HOME/bin

Create a data directory under the zookeeper installation directory to store temporary files.

mkdir data mkdir log

Create an empty file of myid in the data directory

echo 1 >myid

Configure the zoo.cfg file, (the file does not exist, copy from the template file)

Go to the conf of the zookeeper directory, copy zoo_sample.cfg as zoo.cfg and execute

cp zoo_sample.cfg zoo.cfg

f. Modify configuration file

The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgment

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir=/opt/training/zookeeper-3.4.5/data

# the port at which the clients will connect

clientPort=2181

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to “0” to disable auto purge feature

#autopurge.purgeInterval=1

server.1=hadoop001:2888:3888

server.2=hadoop002:2888:3888

server.3=hadoop003:2888:3888

The configured zookeeper Distribute to another virtual machine, modify myid hadoop02 to 2, hadoop03 to 3, and so on

Execute the start command, in All nodes need to execute the command to start the bin directory and execute ./zkServer.sh start

6. Build hadoop

Download the package from the official website, upload and decompress it.

Configure environment variables

#hadoop

export HADOOP_HOME=/opt/training/hadoop-3.1.3

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

To distribute the rest of the machines, environment variables need to be configured.

Modify hadoop-env.sh

Vim hadoop-env.sh

Add export JAVA_HOME=/opt/training/jdk1.8.0_311

Modify core-site.xml

Enter the etc/hadoop directory, Vim core-site.xml

<configuration>
<property>
     <name>fs.defaultFS</name>
     <value>hdfs://HAhadoop01</value>
 </property>
 <property>
     <name>hadoop.tmp.dir</name>
     <value>/opt/training/hadoop-3.1.3/tmp</value>
 </property>
<property>
     <name>ha.zookeeper.quorum</name>
     <value>hadoop01:2181, hadoop02:2181, hadoop03:2181</value>
</property>
</configuration>

Modify hdfs-site.xml

 <property>
        <name>dfs.nameservices</name>
        <value>HAhadoop01</value>
    </property>
 
    <property>
        <name>dfs.ha.namenodes.HAhadoop01</name>
        <value>HAhadoop02,HAhadoop03</value>
    </property>
 
    <property>
        <name>dfs.namenode.rpc-address.HAhadoop01.HAhadoop02</name>
        <value>hadoop01:9000</value>
    </property>
 
    <property>
        <name>dfs.namenode.http-address.HAhadoop01.HAhadoop02</name>
        <value>hadoop01:9870</value>
    </property>
 
    <property>
        <name>dfs.namenode.rpc-address.HAhadoop01.HAhadoop03</name>
        <value>hadoop02:9000</value>
    </property>
 
    <property>
        <name>dfs.namenode.http-address.HAhadoop01.HAhadoop03</name>
        <value>hadoop02:9870</value>
    </property>
 
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://hadoop01:8485;hadoop02:8485;/HAhadoop01</value>
    </property>
 
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/opt/training/hadoop-3.1.3/journal</value>
    </property>
 
 
    <property>
        <name>dfs.ha.automatic-failover.enabled.HAhadoop01</name>
        <value>true</value>
    </property>
 
    <property>
        <name>dfs.client.failover.proxy.provider.HAhadoop01</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
 
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>
            ssh fence
            shell(/bin/true)
        </value>
    </property>
 
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
    </property>
 
    <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/opt/training/hadoop-3.1.3/data</value>
    </property>
    <property>
        <name>dfs.datanode.name.dir</name>
        <value>/opt/training/hadoop-3.1.3/name</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>

Enter hadoop directory mkdir tmp, journal, logs

Modify mapred-site.xml

<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
 <!-- Specify the MapReduce program to run on Yarn -->
  <property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
  </property>
  
 <property>
   <name>yarn.app.mapreduce.am.env</name>
   <value>HADOOP_MAPRED_HOME=/opt/training/hadoop-3.1.3</value>
 </property>
 <property>
   <name>mapreduce.map.env</name>
   <value>HADOOP_MAPRED_HOME=/opt/training/hadoop-3.1.3</value>
 </property>
 <property>
   <name>mapreduce.reduce.env</name>
   <value>HADOOP_MAPRED_HOME=/opt/training/hadoop-3.1.3</value>
  </property>
<property>
    <name>mapreduce.application.classpath</name>
    <value>/opt/training/hadoop-3.1.3/etc/hadoop:/opt/training/hadoop-3.1.3/share/hadoop/common/lib/*:/opt/training/hadoop-3.1.3/ share/hadoop/common/*:/opt/training/hadoop-3.1.3/share/hadoop/hdfs:/opt/training/hadoop-3.1.3/share/hadoop/hdfs/lib/*:/opt/training /hadoop-3.1.3/share/hadoop/hdfs/*:/opt/training/hadoop-3.1.3/share/hadoop/mapreduce/lib/*:/opt/training/hadoop-3.1.3/share/hadoop /mapreduce/*:/opt/training/hadoop-3.1.3/share/hadoop/yarn:/opt/training/hadoop-3.1.3/share/hadoop/yarn/lib/*:/opt/training/hadoop- 3.1.3/share/hadoop/yarn/*</value>
</property>

Modify yarn-site,xml

 <property>
       <name>yarn.resourcemanager.ha.enabled</name>
       <value>true</value>
    </property>
 
    <property>
       <name>yarn.resourcemanager.cluster-id</name>
       <value>yarn</value>
    </property>
 
    <property>
       <name>yarn.resourcemanager.ha.rm-ids</name>
       <value>rm1,rm2</value>
    </property>
 
    <property>
       <name>yarn.resourcemanager.hostname.rm1</name>
       <value>hadoop01</value>
    </property>
 
    <property>
       <name>yarn.resourcemanager.hostname.rm2</name>
       <value>hadoop02</value>
    </property>
 
    <property>
       <name>yarn.resourcemanager.zk-address</name>
       <value>hadoop01:2181, hadoop02:2181, hadoop03:2181</value>
    </property>
 
    <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>hadoop01:8088</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>hadoop02:8088</value>
</property>

g. Modify workers, enter /etc/hadoop, Vim workers, add hadoop02 hadoop03

k. Modify yarn-env.sh, add export JAVA_HOME=/opt/training/jdk1.8.0_311 to distribute to other machines

l. Verification. Start zk and enter the bin directory of zookeeper,

zkServer.sh start

All three are activated.

m. Execute once in all the single-start journalnodes, not multiple times.

Hdaoop-daemon.sh start

n. Format hdfs Execute on hadoop01. once. not multiple times

Hdfs namenode-format

o. Copy the hadoop-3.1.3/tmp directory to hadoop02’s /hadoop-3.1.3/tmp

scp -r /opt/training/hadoop-3.1.3/tmp/ root@hadoop02:/opt/training/hadoop-3.1.3/

p. Format zookeeper to execute once, not multiple times

Hdfs zkfc -formatZK

There will be a log Successfully created /hadoop-ha/HAhadoop01 in ZK

q. Stop journalnode on all nodes and execute it once, not multiple times

Hadoop-daemon.sh stop journalnode

r. Start zkfc on hadoop01 and hadoop02

Hadoop-daemon.sh start zkfc

s. Configure environment variables and use root user to start

export HDFS_NAMENODE_USER=root

export HDFS_DATANODE_USER=root

export HDFS_JOURNALNODE_USER=root

export HDFS_SECONDARYNAMENODE_USER=root

export YARN_RESOURCEMANAGER_USER=root

export YARN_NODEMANAGER_USER=root

Source/etc/profile

t. Start hadoop cluster on hadoop01

Start-all.sh

The process after startup is shown in the figure below.