1. Cluster planning
hadoop01(209.2) | hadoop02(209.3) | hadoop03(209.4) | |
---|---|---|---|
HDFS | NameNode DataNode | DataNode | SecondaryNameNode DataNode |
YARN | NodeManager | ResourceManager NodeManager | NodeManager |
Do not put NameNode and SecondaryNameNode on the same server
2. Create user
useradd atguigu passwd ***
Configure atguigu user permissions
vim /etc/sudoers
## Allow root to run any commands anywhere root ALL=(ALL) ALL ## Allows people in group wheel to run all commands %wheel ALL=(ALL) ALL atguigu ALL=(ALL) NOPASSWD:ALL
3. Create module and software under /opt
mkdir /opt/module mkdir /opt/software chown atguigu:atguigu /opt/module chown atguigu:atguigu /opt/software
4. Reinstall JDK
1. Uninstall the original JDK
2. Upload the jdk package to the module and decompress it
3. Configure JDK environment variables
Create a new /etc/profile.d/my_env.sh file
vim /etc/profile.d/my_env.sh
#JAVA_HOME export JAVA_HOME=/opt/module/jdk1.8.0_212 export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile java-version
5. Hadoop01 install hadoop
https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/
1. Unzip to /opt/module
2. Environment variable settings
vim /etc/profile.d/my_env.sh
#HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-3.1.3 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin
source /etc/profile
3. SSH password-free login
Enter /home/atguigu/.ssh ssh-keygen-trsa Then type (three carriage returns), and two files will be generated: id_rsa (private key) and id_rsa.pub (public key) Copy the public key to the target machine where you want to log in without a password ssh-copy-id hadoop01 ssh-copy-id hadoop02 ssh-copy-id hadoop03
Note: Each server needs to use the atguigu account to configure passwordless login
6. Cluster configuration
Custom configuration files are stored under $HADOOP_HOME/etc/hadoop
1. Core file configuration
Configure core.site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- Specify the address of NameNode --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop01:8020</value> </property> <!-- Specify the storage directory for hadoop data --> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-3.1.3/data</value> </property> <!-- Configure the static user used for HDFS web login as atguigu --> <property> <name>hadoop.http.staticuser.user</name> <value>atguigu</value> </property> </configuration>
Configure hdfs.site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- nn web access address --> <property> <name>dfs.namenode.http-address</name> <value>hadoop01:9870</value> </property> <!-- 2nn web access address --> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop03:9868</value> </property> </configuration>
Configure yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- Specify MR to run shuffle --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- Specify the address of ResourceManager --> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop02</value> </property> <!-- Inheritance of environment variables --> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAP RED_HOME</value> </property> </configuration>
Configure mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- Specify the MapReduce program to run on Yarn --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
2. Configure the history server
mapred-site.xml
<!-- Historical server address --> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop01:10020</value> </property> <!-- History server web address --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop01:19888</value> </property>
3. Group cluster configuration
vim /opt/module/hadoop3.1.3/etc/hadoop/workers and add hadoop01 hadoop02 hadoop03
4. Configure log aggregation
Log aggregation concept: After the application is completed, the program running log information is uploaded to the HDFS system.
Note: To enable the log aggregation function, you need to restart NodeManager, ResourceManager and HistoryServer.
Configuration yarn-site.xml
<!-- Enable log aggregation function --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- Set the log aggregation server address --> <property> <name>yarn.log.server.url</name> <value>http://hadoop02:19888/jobhistory/logs</value> </property> <!-- Set the log retention time to 7 days --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property>
Distribute all modified configurations to other nodes, such as:
rsync yarn-site.xml atguigu@hadoop02:/$HADOOP_HOME/etc/hadoop/yarnsite.xml
7. Start the cluster
If the cluster is started for the first time, the NameNode needs to be formatted on the hadoop01 node (note: formatting the NameNode will generate a new cluster ID, causing the cluster IDs of the NameNode and DataNode to be inconsistent, and the cluster cannot find past data. If the cluster is running If an error is reported and the NameNode needs to be reformatted, the namenode and datanode processes must be stopped first, and the data and logs directories of all machines must be deleted before formatting.
hdfs namenode -format
Start HDFS
sbin/start-dfs.sh
Start yarn on the node hadoop02 configured with ResourceManager
sbin/start-yarn.sh
View the NameNode of HDFS: http://hadoop102:9870
Check out YARN’s ResourceManager: http://hadoop103:8088
Start the history server
mapred --daemon start historyserver
View JobHistory: http://hadoop102:19888/jobhistory
You can use jps to check whether the started service is consistent with what was originally planned.
8. Summary of cluster start/stop methods
Pay attention to the port communication between nodes or completely close the firewall
1. Start/stop each module separately (configuring ssh is a prerequisite). Commonly used
(1) Start/stop HDFS overall
start-dfs.sh/stop-dfs.sh
(2) Start/stop YARN overall
start-yarn.sh/stop-yarn.sh
2. Start/stop each service component one by one
(1) Start/stop HDFS components respectively
hdfs --daemon start/stop namenode/datanode/secondarynamenode
(2) Start/stop YARN
yarn --daemon start/stop resourcemanager/nodemanager
3. Hadoop cluster startup and shutdown script (including HDFS, Yarn, Historyserver): myhadoop.sh
cd /opt/module/hadoop-3.1.3/sbin vim myhadoop.sh
#!/bin/bash if [ $# -lt 1 ] then echo "No Args Input..." exit ; fi case $1 in "start") echo " =================== Start hadoop cluster ===================" echo " --------------- Start hdfs ---------------" ssh hadoop01 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh" echo " --------------- start yarn ---------------" ssh hadoop02 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh" echo " --------------- Start historyserver ---------------" ssh hadoop01 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver" ;; "stop") echo " =================== Shut down the hadoop cluster ===================" echo " --------------- Close historyserver ---------------" ssh hadoop01 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver" echo " --------------- close yarn ---------------" ssh hadoop02 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh" echo " --------------- close hdfs ---------------" ssh hadoop01 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh" ;; *) echo "Input Args Error..." ;; esac
chmod +x myhadoop.sh
View the three server Java process scripts: jpsall
cd /opt/module/hadoop-3.1.3/sbin vim jpsall
#!/bin/bash for host in hadoop01 hadoop02 hadoop03 do echo =============== $host =============== ssh $host jps done
chmod + x jpsall