Planning
Node | CPU | Memory | Hard Disk |
---|---|---|---|
node1 | 1 | 4G | 20G |
node2 | 1 | 2G | 20G |
node3 | 1 | 2G | 20G |
Create a new virtual machine
I use 1810
After the setup is complete, start the installation
After successfully logging in, enter the init 0 command, shut down and clone
Clone virtual machine
One is named node2 and the other is named node3
The memory of node2 and node3 is set to 2G
1.Preparation
1.1 Modify host name
#On node1 node [root@localhost ~]# hostnamectl set-hostname node1 ? #On node2 node [root@localhost ~]# hostnamectl set-hostname node2 ? #On node3 node [root@localhost ~]# hostnamectl set-hostname node3
1.2 Modify IP
#node1 IP changed to 192.168.59.101 #node2 IP changed to 192.168.59.102 #node3 IP changed to 192.168.59.103
1.3 Modify the hosts file on your computer
The file is in the C:\Windows\System32\drivers\etc directory
1.4 Write the /etc/hosts file on the node
#node1, node2, and node3 nodes must be written. Only node1 is shown below. [root@node1 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.59.101node1 192.168.59.102node2 192.168.59.103 node3
1.5 Configure SSH password-free login
#node1, node2, and node3 nodes must execute the following commands. Only node1 is shown below. [root@node1 ~]# ssh-keygen -t rsa -b 4096 #After executing this command, just press Enter [root@node1 ~]# ssh-copy-id node1 [root@node1 ~]# ssh-copy-id node2 [root@node1 ~]# ssh-copy-id node3 ? #Create a hadoop user and configure SSH password-free login. Node1, node2, and node3 nodes must execute the following commands. Only node1 is shown below. [root@node1 ~]# useradd hadoop [root@node1 ~]# passwd hadoop Changing password for user hadoop. New password: BAD PASSWORD: The password is shorter than 8 characters Retype new password: passwd: all authentication tokens updated successfully. [root@node1 ~]# su - hadoop #Switch to hadoop user ? #You need node1, node2, and node3 nodes to create hadoop to execute the following command, and switch to the hadoop user. Only node1 is shown below. #Premise: node1, node2, node3 have all created hadoop users [hadoop@node1 ~]$ ssh-keygen -t rsa -b 4096 #After executing this command, just press Enter [hadoop@node1 ~]$ ssh-copy-id node1 [hadoop@node1 ~]$ ssh-copy-id node2 [hadoop@node1 ~]$ ssh-copy-id node3
1.6 Configuring JDK environment
Upload JDK files
Configuration environment
Remember to switch back to root user
[root@node1 ~]$ su - root # The following is only executed on the node1 node [root@node1 ~]# ls anaconda-ks.cfg jdk-8u381-linux-x64.tar.gz ? [root@node1 ~]$ mkdir -p /export/server [root@node1 ~]# tar -zxvf jdk-8u381-linux-x64.tar.gz -C /export/server/ ? #Create soft link [root@node1 ~]# ln -s /export/server/jdk1.8.0_381 /export/server/jdk ? [root@node1 ~]# ll /export/server/ total 4 lrwxrwxrwx. 1 root root 27 Oct 22 10:49 jdk -> /export/server/jdk1.8.0_381 drwxr-xr-x. 8 root root 4096 Oct 22 10:49 jdk1.8.0_381 ? [root@node1 ~]$ vi /etc/profile #Add at the bottom of the file: export JAVA_HOME=/export/server/jdk export PATH=$PATH:$JAVA_HOME/bin ? #Refresh the /etc/profile file [root@node1 ~]$ source /etc/profile ? #Test java environment [root@node1 ~]# java -version java version "1.8.0_381" Java(TM) SE Runtime Environment (build 1.8.0_381-b09) Java HotSpot(TM) 64-Bit Server VM (build 25.381-b09, mixed mode) ? [root@node1 ~]# javac -version javac 1.8.0_381
Synchronize files to node2, node3
# Before synchronization, node2 and node3 must have /export/server, if not executed mkdir -p /export/server ? [root@node1 ~]# scp -r /export/server/jdk1.8.0_381 node2:/export/server/ [root@node1 ~]# scp -r /export/server/jdk1.8.0_381 node3:/export/server/ ? # After synchronization is completed, check whether there is jdk1.8.0_381 on node2 and node3. Only node2 is shown below. [root@node2 ~]# ls /export/server/ jdk1.8.0_381 ? # Synchronize the /etc/profile file to node2, node3. Only node2 is shown below. [root@node2 ~]# scp /etc/profile node2:/etc/profile profile 100% 1819 1.0MB/s 00:00 [root@node2 ~]# scp /etc/profile node3:/etc/profile profile 100% 1819 943.9KB/s 00:00 ? # After synchronization is completed, create soft links on node2 and node3. Only node2 is shown below. [root@node2 ~]# ln -s /export/server/jdk1.8.0_381 /export/server/jdk [root@node2 ~]# ll /export/server/ total 4 lrwxrwxrwx. 1 root root 27 Oct 22 10:58 jdk -> /export/server/jdk1.8.0_381 drwxr-xr-x. 8 root root 4096 Oct 22 10:53 jdk1.8.0_381 ? # After the above is completed, refresh /etc/profile and test the java environment. Only node2 is shown below. [root@node2 ~]# source /etc/profile [root@node2 ~]# java -version java version "1.8.0_381" Java(TM) SE Runtime Environment (build 1.8.0_381-b09) Java HotSpot(TM) 64-Bit Server VM (build 25.381-b09, mixed mode) [root@node2 ~]# javac -version javac 1.8.0_381
1.7 Turn off the firewall and SELinux
# node1, node2, and node3 must execute the following statements. Only node1 is shown below. [root@node1 ~]# systemctl stop firewalld [root@node1 ~]# systemctl disable firewalld Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service. Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service. ? [root@node1 ~]# vi /etc/selinux/config # Change SELINUX=enforcing to SELINUX=disabled [root@node1 ~]# setenforce 0
Create a snapshot for 1.8
Node1, node2, and node3 all need to create snapshots
2.Hadoop planning
NameNode: master node manager
DataNode: slave node worker
SecondaryNameNode: primary node secondary
node1 | NameNode, DataNode, SecondaryNameNode |
---|---|
node2 | DataNode |
node3 | DataNode |
3. Deploy HDFS cluster
3.1 Upload & decompress Hadoop to node1
# The following is only executed on node1 [root@node1 ~]# ls anaconda-ks.cfg hadoop-3.3.4.tar.gz jdk-8u381-linux-x64.tar.gz ? [root@node1 ~]# tar -zxvf hadoop-3.3.4.tar.gz -C /export/server/ ? # Create soft link [root@node1 ~]# ln -s /export/server/hadoop-3.3.4 /export/server/hadoop [root@node1 ~]# ll /export/server/ total 4 lrwxrwxrwx 1 root root 27 Oct 23 06:22 hadoop -> /export/server/hadoop-3.3.4 drwxr-xr-x 10 1024 1024 215 Jul 29 2022 hadoop-3.3.4 lrwxrwxrwx. 1 root root 27 Oct 22 10:49 jdk -> /export/server/jdk1.8.0_381 drwxr-xr-x. 8 root root 4096 Oct 22 10:49 jdk1.8.0_381
3.2 Configuration File
workers | What are the configuration slave nodes (DataNode) |
---|---|
hadoop -env.sh | Configure Hadoop-related environment variables |
core-site.xml | Hadoop core configuration file |
hafs-site.xml | HDFS core configuration file |
These files are in /export/server/hadoop/etc/hadoop
3.3 Configuring workers
# The following is only executed on node1 [root@node1 ~]# cd /export/server/hadoop/etc/hadoop/ [root@node1 hadoop]# vi workers # Delete localhost in the file # Add the following: (hostnames of three hosts) node1 node2 node3
3.4 Configuring hadoop-env.sh
# The following is only executed on node1 [root@node1 hadoop]# vi hadoop-env.sh # Add the following: export JAVA_HOME=/export/server/jdk export HADOOP_HOME=/export/server/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_LOG_DIR=$HADOOP_HOME/logs
3.5 configuration core-site.xml
# The following is only executed on node1 [root@node1 hadoop]# vi core-site.xml # Add the following between <configuration> and </configuration>: <property> <name>fs.defaultFS</name> <value>hdfs://node1:8020</value> </property> ? <property> <name>io.file.buffer.size</name> <value>131072</value> </property>
3.6 Configure hdfs-site.xml
# The following is only executed on node1 [root@node1 hadoop]# vi hdfs-site.xml # Add the following between <configuration> and </configuration>: <property> <name>dfs.datanode.data.dir.perm</name> <value>700</value> </property> ? <property> <name>dfs.namenode.name.dir</name> <value>/data/nn</value> </property> ? <property> <name>dfs.namenode.hosts</name> <value>node1,node2,node3</value> </property> ? <property> <name>dfs.blocksize</name> <value>268435456</value> </property> ? <property> <name>dfs.namenode.handler.count</name> <value>100</value> </property> ? <property> <name>dfs.datanode.data.dir</name> <value>/data/dn</value> </property>
4. Post-configuration operations
# On node1: [root@node1 hadoop]# mkdir -p /data/nn [root@node1 hadoop]# mkdir /data/dn ? # On node2 and node3: [root@node2 ~]# mkdir -p /data/dn ? [root@node3 ~]# mkdir -p /data/dn ? # On node1: [root@node1 hadoop]# cd /export/server/ [root@node1 server]# scp -r hadoop-3.3.4 node2:`pwd`/ [root@node1 server]# scp -r hadoop-3.3.4 node3:`pwd`/ ? # Check whether node2 and node3 have hadoop-3.3.4. Only node2 is shown below. [root@node2 ~]# ll /export/server/ total 4 drwxr-xr-x 10 root root 215 Oct 23 06:50 hadoop-3.3.4 lrwxrwxrwx. 1 root root 27 Oct 22 10:58 jdk -> /export/server/jdk1.8.0_381 drwxr-xr-x. 8 root root 4096 Oct 22 10:53 jdk1.8.0_381 ? # Establish hadoop soft links on node2 and node3. Only node2 is shown below. [root@node2 ~]# ln -s /export/server/hadoop-3.3.4 /export/server/hadoop [root@node2 ~]# ll /export/server/ total 4 lrwxrwxrwx 1 root root 27 Oct 23 07:21 hadoop -> /export/server/hadoop-3.3.4 drwxr-xr-x 10 root root 215 Oct 23 06:50 hadoop-3.3.4 lrwxrwxrwx. 1 root root 27 Oct 22 10:58 jdk -> /export/server/jdk1.8.0_381 drwxr-xr-x. 8 root root 4096 Oct 22 10:53 jdk1.8.0_381 ? # The following node1, node2, and node3 must be executed. Only node1 is shown below. [root@node1 ~]# vi /etc/profile #Add at the bottom of the file: export HADOOP_HOME=/export/server/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin [root@node1 ~]# source /etc/profile # Check if hadoop is available [root@node1 ~]# hadoop version Hadoop 3.3.4 Source code repository https://github.com/apache/hadoop.git -r a585a73c3e02ac62350c136643a5e7f6095a3dbb Compiled by stevel on 2022-07-29T12:32Z Compiled with protoc 3.7.1 From source with checksum fb9dd8918a7b8a5b430d61af858f6ec This command was run using /export/server/hadoop-3.3.4/share/hadoop/common/hadoop-common-3.3.4.jar ? # Execute on node1, node2, node3 as root [root@node1 ~]# chown -R hadoop:hadoop /data [root@node1 ~]# chown -R hadoop:hadoop /export [root@node1 ~]# ll / drwxr-xr-x 4 hadoop hadoop 26 Oct 23 06:47 data drwxr-xr-x. 3 hadoop hadoop 20 Oct 22 10:48 export
5. Format namenode
# Switch to hadoop user [root@node1 ~]# su - hadoop Last login: Sun Oct 22 10:38:44 EDT 2023 on pts/0 [hadoop@node1 ~]$ hadoop namenode -format ? # Start HDFS cluster [hadoop@node1 ~]$ start-dfs.sh ? # node1 node [hadoop@node1 ~]$ jps 9089 NameNode 9622Jps 9193 DataNode 9356 SecondaryNameNode ? # node2 and node3 nodes [root@node2 ~]# jps 9092Jps 9031 DataNode ? [root@host3 ~]# jps 8933Jps 8871 DataNode ? # If you want to shut down the HDFS cluster, execute the following command [hadoop@node1 ~]$ stop-dfs.sh
After HDFS starts, open the browser and enter http://node1:9870 or http://192.168.59.101:9870 in the url box.
Note: The node1 at http://node1:9870 needs to modify the hosts file on your computer, otherwise the website cannot be opened. Please see 1.3 for details.
After successfully opening the website, it will look like this
There is basically no problem here
6. Create HDFS cluster snapshot
7. Deploy YARN cluster
7.1 Review & Understanding
Hadoop HDFS distributed file system, we will start:
-
NameNode process as management node
-
DataNode process as worker
-
SecondaryNamenoe as secondary
In the same way, Hadoop YARN distributed resource scheduling will start:
-
ResourceManager process as management node
-
NodeManager process as worker node
-
ProxyServer and JobHistoryServer two auxiliary nodes
MapReduce runs in a YARN container without starting a separate process
ResourceManager | Cluster Resource Manager |
---|---|
NodeManager | Stand-alone Resource Manager |
ProxyServer | The proxy server provides security |
JobHistoryServer | Record historical information and logs |
node1 | ResourceManager, NodeManager, ProxyServer, JobHistoryServer |
---|---|
node2 | NodeManager |
node3 | NodeManager |
7.2 Configuring mapred-env.sh
# The following is only executed on node1 # Remember to switch back to the root user before doing this [hadoop@node1 hadoop]$ su - root [root@node1 ~]# cd /export/server/hadoop/etc/hadoop/ [root@node1 hadoop]# vi mapred-env.sh # Add the following: export JAVA_HOME=/export/server/jdk export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000 export HADOOP_MAPRED_R00T_LOGGER=INFO,RFA
7.3 Configure mapred-site.xml
# The following is only executed on node1 [root@node1 hadoop]# vi mapred-site.xml # Add the following between <configuration> and </configuration>: <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> ? <property> <name>mapreduce.jobhistory.address</name> <value>node1:10020</value> </property> ? <property> <name>mapreduce.jobhistory.webapp.address</name> <value>node1:19888</value> </property> ? <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/data/mr-history/tmp</value> </property> ? <property> <name>mapreduce.jobhistory.done-dir</name> <value>/data/mr-history/done</value> </property> ? <property> <name>yarn.app.mapreduce.am.env</name> <value>HAD0OP_MAPRED_HOME=$HADOOP_HOME</value> </property> ? <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> ? <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property>
7.4 configuration yarn.env.sh
# The following is only executed on node1 [root@node1 hadoop]# vi yarn-env.sh # Add the following: export JAVA_HOME=/export/server/jdk export HADOOP_HOME=/export/server/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_LOG_DIR=$HADOOP_HOME/logs
7.5 configuration yarn.site.xml
# The following is only executed on node1 [root@node1 hadoop]# vi yarn-site.xml # Add the following between <configuration> and </configuration>: <property> <name>yarn.resourcemanager.hostname</name> <value>node1</value> </property> ? <property> <name>yarn.nodemanager.local-dirs</name> <value>/data/nm-local</value> </property> ? <property> <name>yarn.nodemanager.log-dirs</name> <value>/data/nm-log</value> </property> ? <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> ? <property> <name>yarn.log.server.url</name> <value>http://node1:19888/jobhistory/logs</value> </property> ? <property> <name>yarn.web-proxy.address</name> <value>node1:8089</value> </property> ? <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> ? <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/tmp/logs</value> </property> ? <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property>
8. Synchronize files to node2 and node3 nodes
# Enter the /export/server/hadoop/etc/hadoop/ directory [root@node1 hadoop]# cd /export/server/hadoop/etc/hadoop/ # Synchronize all files in this directory to node2 and node3 nodes [root@node1 hadoop]# scp * node2:`pwd`/ [root@node1 hadoop]# scp * node3:`pwd`/
9. Start yarn cluster
# Remember to switch to hadoop user before starting [root@node1 hadoop]# su - hadoop [hadoop@node1 ~]$ start-yarn.sh Starting resource manager Starting node managers ? # Shut down the yarn cluster command: stop-yarn.sh ? #node1 [hadoop@node1 ~]$ jps 9089 NameNode 10389 WebAppProxyServer 9193 DataNode 10233 NodeManager 9356 SecondaryNameNode 10126 ResourceManager 10671Jps ? #node2 [root@node2 ~]# jps 9361Jps 9268 NodeManager 9031 DataNode ? #node3 [root@host3 ~]# jps 9108 NodeManager 8871 DataNode 9199Jps ? # Because I did not close the HDFS process. All node1, node2, and node3 have HDFS processes. ? # History server needs to be started independently [hadoop@node1 ~]$ mapred --daemon start historyserver # Shut down the history server Command: mapred --daemon stop historyserver [hadoop@node1 ~]$ jps 9089 NameNode 10755Jps 10389 WebAppProxyServer 10728 JobHistoryServer 9193 DataNode 10233 NodeManager 9356 SecondaryNameNode 10126 ResourceManager
After yarn is started correctly, enter http://node1:8088 or http://192.168.59.101:8088 in the url box of the browser
At this point, the YARN cluster deployment has been basically completed.