1. Prepare the environment
-
Prepare three machines.
-
Modify hostname vim/etc/hostsname hadoop01 hadoop02 hadoop03
-
Modify hosts vim/etc/hosts
Second, install JDK
a.vim/etc/profile
b. export JAVA_HOME=xxx
c. export PATH=$PATH:$JAVA_HOME/bin
d. source /etc/profile
Third, configure password-free
a. ssh-keygen -t rsa
b. ssh-copy-id root@ip
Fourth, distribution script
#!/bin/bash #1. Determine the number of parameters if [ $# -lt 1 ] then echo Not Enough Arguement! exit; the fi #2. Traverse all machines in the cluster for host in hadoop02 hadoop03 do echo ===================== $host ===================== #3. Traverse all directories and send them one by one for file in $@ do #4. Determine whether the file exists if [ -e $file ] then #5. Get the parent directory pdir=$(cd -P $(dirname $file); pwd) #6. Get the name of the current file fname=$(basename $file) ssh $host "mkdir -p $pdir" rsync -av $pdir/$fname $host:$pdir else echo $file does not exist! the fi done done
5. Install zookeeper
-
First download the zookeeper package from the official website
-
Configure environment variables
export ZK_HOME=/training/zookeeper-3.4.5
export PATH=$PATH:$ZK_HOME/bin
-
Create a data directory under the zookeeper installation directory to store temporary files.
mkdir data mkdir log
-
Create an empty file of myid in the data directory
echo 1 >myid
-
Configure the zoo.cfg file, (the file does not exist, copy from the template file)
Go to the conf of the zookeeper directory, copy zoo_sample.cfg as zoo.cfg and execute
cp zoo_sample.cfg zoo.cfg
f. Modify configuration file
The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgment
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/opt/training/zookeeper-3.4.5/data
# the port at which the clients will connect
clientPort=2181
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to “0” to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hadoop001:2888:3888
server.2=hadoop002:2888:3888
server.3=hadoop003:2888:3888
-
The configured zookeeper Distribute to another virtual machine, modify myid hadoop02 to 2, hadoop03 to 3, and so on
-
Execute the start command, in All nodes need to execute the command to start the bin directory and execute ./zkServer.sh start
6. Build hadoop
-
Download the package from the official website, upload and decompress it.
-
Configure environment variables
#hadoop
export HADOOP_HOME=/opt/training/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
-
To distribute the rest of the machines, environment variables need to be configured.
-
Modify hadoop-env.sh
Vim hadoop-env.sh
Add export JAVA_HOME=/opt/training/jdk1.8.0_311
-
Modify core-site.xml
-
Enter the etc/hadoop directory, Vim core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://HAhadoop01</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/training/hadoop-3.1.3/tmp</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>hadoop01:2181, hadoop02:2181, hadoop03:2181</value> </property> </configuration>
-
Modify hdfs-site.xml
<property> <name>dfs.nameservices</name> <value>HAhadoop01</value> </property> <property> <name>dfs.ha.namenodes.HAhadoop01</name> <value>HAhadoop02,HAhadoop03</value> </property> <property> <name>dfs.namenode.rpc-address.HAhadoop01.HAhadoop02</name> <value>hadoop01:9000</value> </property> <property> <name>dfs.namenode.http-address.HAhadoop01.HAhadoop02</name> <value>hadoop01:9870</value> </property> <property> <name>dfs.namenode.rpc-address.HAhadoop01.HAhadoop03</name> <value>hadoop02:9000</value> </property> <property> <name>dfs.namenode.http-address.HAhadoop01.HAhadoop03</name> <value>hadoop02:9870</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop01:8485;hadoop02:8485;/HAhadoop01</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/training/hadoop-3.1.3/journal</value> </property> <property> <name>dfs.ha.automatic-failover.enabled.HAhadoop01</name> <value>true</value> </property> <property> <name>dfs.client.failover.proxy.provider.HAhadoop01</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value> ssh fence shell(/bin/true) </value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/opt/training/hadoop-3.1.3/data</value> </property> <property> <name>dfs.datanode.name.dir</name> <value>/opt/training/hadoop-3.1.3/name</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property>
-
Enter hadoop directory mkdir tmp, journal, logs
-
Modify mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- Specify the MapReduce program to run on Yarn --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/opt/training/hadoop-3.1.3</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/opt/training/hadoop-3.1.3</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/opt/training/hadoop-3.1.3</value> </property> <property> <name>mapreduce.application.classpath</name> <value>/opt/training/hadoop-3.1.3/etc/hadoop:/opt/training/hadoop-3.1.3/share/hadoop/common/lib/*:/opt/training/hadoop-3.1.3/ share/hadoop/common/*:/opt/training/hadoop-3.1.3/share/hadoop/hdfs:/opt/training/hadoop-3.1.3/share/hadoop/hdfs/lib/*:/opt/training /hadoop-3.1.3/share/hadoop/hdfs/*:/opt/training/hadoop-3.1.3/share/hadoop/mapreduce/lib/*:/opt/training/hadoop-3.1.3/share/hadoop /mapreduce/*:/opt/training/hadoop-3.1.3/share/hadoop/yarn:/opt/training/hadoop-3.1.3/share/hadoop/yarn/lib/*:/opt/training/hadoop- 3.1.3/share/hadoop/yarn/*</value> </property>
-
Modify yarn-site,xml
<property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop01</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop02</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop01:2181, hadoop02:2181, hadoop03:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>hadoop01:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>hadoop02:8088</value> </property>
g. Modify workers, enter /etc/hadoop, Vim workers, add hadoop02 hadoop03
k. Modify yarn-env.sh, add export JAVA_HOME=/opt/training/jdk1.8.0_311 to distribute to other machines
l. Verification. Start zk and enter the bin directory of zookeeper,
zkServer.sh start
All three are activated.
m. Execute once in all the single-start journalnodes, not multiple times.
Hdaoop-daemon.sh start
n. Format hdfs Execute on hadoop01. once. not multiple times
Hdfs namenode-format
o. Copy the hadoop-3.1.3/tmp directory to hadoop02’s /hadoop-3.1.3/tmp
scp -r /opt/training/hadoop-3.1.3/tmp/ root@hadoop02:/opt/training/hadoop-3.1.3/
p. Format zookeeper to execute once, not multiple times
Hdfs zkfc -formatZK
There will be a log Successfully created /hadoop-ha/HAhadoop01 in ZK
q. Stop journalnode on all nodes and execute it once, not multiple times
Hadoop-daemon.sh stop journalnode
r. Start zkfc on hadoop01 and hadoop02
Hadoop-daemon.sh start zkfc
s. Configure environment variables and use root user to start
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
Source/etc/profile
t. Start hadoop cluster on hadoop01
Start-all.sh
The process after startup is shown in the figure below.