1. Hadoop3.1.3 cluster construction

1. Cluster planning

	hadoop01(209.2)	hadoop02(209.3)	hadoop03(209.4)
HDFS	NameNode DataNode	DataNode	SecondaryNameNode DataNode
YARN	NodeManager	ResourceManager NodeManager	NodeManager

Do not put NameNode and SecondaryNameNode on the same server

2. Create user

useradd atguigu
passwd ***

Configure atguigu user permissions

vim /etc/sudoers

## Allow root to run any commands anywhere
root ALL=(ALL) ALL
## Allows people in group wheel to run all commands
%wheel ALL=(ALL) ALL
atguigu ALL=(ALL) NOPASSWD:ALL

3. Create module and software under /opt

mkdir /opt/module
mkdir /opt/software
chown atguigu:atguigu /opt/module
chown atguigu:atguigu /opt/software

4. Reinstall JDK

1. Uninstall the original JDK

2. Upload the jdk package to the module and decompress it

3. Configure JDK environment variables

Create a new /etc/profile.d/my_env.sh file

vim /etc/profile.d/my_env.sh

#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin

source /etc/profile
java-version

5. Hadoop01 install hadoop

https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/

1. Unzip to /opt/module

2. Environment variable settings

vim /etc/profile.d/my_env.sh

#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

source /etc/profile

3. SSH password-free login

Enter /home/atguigu/.ssh
ssh-keygen-trsa
Then type (three carriage returns), and two files will be generated: id_rsa (private key) and id_rsa.pub (public key)
Copy the public key to the target machine where you want to log in without a password
ssh-copy-id hadoop01
ssh-copy-id hadoop02
ssh-copy-id hadoop03

Note: Each server needs to use the atguigu account to configure passwordless login

6. Cluster configuration

Custom configuration files are stored under $HADOOP_HOME/etc/hadoop

1. Core file configuration

Configure core.site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <!-- Specify the address of NameNode -->
 <property>
 <name>fs.defaultFS</name>
 <value>hdfs://hadoop01:8020</value>
 </property>
 <!-- Specify the storage directory for hadoop data -->
 <property>
 <name>hadoop.tmp.dir</name>
 <value>/opt/module/hadoop-3.1.3/data</value>
 </property>
 <!-- Configure the static user used for HDFS web login as atguigu -->
 <property>
 <name>hadoop.http.staticuser.user</name>
 <value>atguigu</value>
 </property>
</configuration>

Configure hdfs.site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- nn web access address -->
<property>
 <name>dfs.namenode.http-address</name>
 <value>hadoop01:9870</value>
 </property>
<!-- 2nn web access address -->
 <property>
 <name>dfs.namenode.secondary.http-address</name>
 <value>hadoop03:9868</value>
 </property>
</configuration>

Configure yarn-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <!-- Specify MR to run shuffle -->
 <property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
 </property>
 <!-- Specify the address of ResourceManager -->
 <property>
 <name>yarn.resourcemanager.hostname</name>
 <value>hadoop02</value>
 </property>
 <!-- Inheritance of environment variables -->
 <property>
 <name>yarn.nodemanager.env-whitelist</name>

<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAP
RED_HOME</value>
 </property>
</configuration>

Configure mapred-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- Specify the MapReduce program to run on Yarn -->
 <property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
 </property>
</configuration>

2. Configure the history server

mapred-site.xml

<!-- Historical server address -->
<property>
 <name>mapreduce.jobhistory.address</name>
 <value>hadoop01:10020</value>
</property>
<!-- History server web address -->
<property>
 <name>mapreduce.jobhistory.webapp.address</name>
 <value>hadoop01:19888</value>
</property>

3. Group cluster configuration

vim /opt/module/hadoop3.1.3/etc/hadoop/workers and add
hadoop01
hadoop02
hadoop03

4. Configure log aggregation

Log aggregation concept: After the application is completed, the program running log information is uploaded to the HDFS system.

Note: To enable the log aggregation function, you need to restart NodeManager, ResourceManager and HistoryServer.

Configuration yarn-site.xml

<!-- Enable log aggregation function -->
<property>
 <name>yarn.log-aggregation-enable</name>
 <value>true</value>
</property>
<!-- Set the log aggregation server address -->
<property>
 <name>yarn.log.server.url</name>
 <value>http://hadoop02:19888/jobhistory/logs</value>
</property>
<!-- Set the log retention time to 7 days -->
<property>
 <name>yarn.log-aggregation.retain-seconds</name>
 <value>604800</value>
</property>

Distribute all modified configurations to other nodes, such as:

rsync yarn-site.xml atguigu@hadoop02:/$HADOOP_HOME/etc/hadoop/yarnsite.xml

7. Start the cluster

If the cluster is started for the first time, the NameNode needs to be formatted on the hadoop01 node (note: formatting the NameNode will generate a new cluster ID, causing the cluster IDs of the NameNode and DataNode to be inconsistent, and the cluster cannot find past data. If the cluster is running If an error is reported and the NameNode needs to be reformatted, the namenode and datanode processes must be stopped first, and the data and logs directories of all machines must be deleted before formatting.

hdfs namenode -format

Start HDFS

sbin/start-dfs.sh

Start yarn on the node hadoop02 configured with ResourceManager

sbin/start-yarn.sh

View the NameNode of HDFS: http://hadoop102:9870

Check out YARN’s ResourceManager: http://hadoop103:8088

Start the history server

mapred --daemon start historyserver

View JobHistory: http://hadoop102:19888/jobhistory

You can use jps to check whether the started service is consistent with what was originally planned.

8. Summary of cluster start/stop methods

Pay attention to the port communication between nodes or completely close the firewall

1. Start/stop each module separately (configuring ssh is a prerequisite). Commonly used

(1) Start/stop HDFS overall

start-dfs.sh/stop-dfs.sh

(2) Start/stop YARN overall

start-yarn.sh/stop-yarn.sh

2. Start/stop each service component one by one

(1) Start/stop HDFS components respectively

hdfs --daemon start/stop namenode/datanode/secondarynamenode

(2) Start/stop YARN

yarn --daemon start/stop resourcemanager/nodemanager

3. Hadoop cluster startup and shutdown script (including HDFS, Yarn, Historyserver): myhadoop.sh

cd /opt/module/hadoop-3.1.3/sbin
vim myhadoop.sh

#!/bin/bash
if [ $# -lt 1 ]
then
 echo "No Args Input..."
 exit ;
fi
case $1 in
"start")
 echo " =================== Start hadoop cluster ==================="
 echo " --------------- Start hdfs ---------------"
 ssh hadoop01 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh"
 echo " --------------- start yarn ---------------"
ssh hadoop02 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh"
 echo " --------------- Start historyserver ---------------"
 ssh hadoop01 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
;;
"stop")
 echo " =================== Shut down the hadoop cluster ==================="
 echo " --------------- Close historyserver ---------------"
 ssh hadoop01 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
 echo " --------------- close yarn ---------------"
 ssh hadoop02 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh"
 echo " --------------- close hdfs ---------------"
 ssh hadoop01 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh"
;;
*)
 echo "Input Args Error..."
;;
esac

 chmod +x myhadoop.sh

View the three server Java process scripts: jpsall

cd /opt/module/hadoop-3.1.3/sbin
vim jpsall

#!/bin/bash
for host in hadoop01 hadoop02 hadoop03
do
 echo =============== $host ===============
 ssh $host jps
done

chmod + x jpsall