Build a Hadoop fully distributed cluster

Table of Contents

Preparations required for construction:

Building process:

1. Install the virtual machine

2. Configure the network

3. Modify the host name

4. Bind the host name and IP

5. Configure password-free login

6. Use remote connection tools to upload jdk and hadoop

7. Install jdk and Hadoop

1. Unzip jdk and hadoop

2. Configure jdk and hadoop environment variables

?3. Add jdk environment

4. In the file core-site.xml

5. hdfs-site.xml

6. yarn-site.xml

7. mapred-site.xml

8. Copy to slave node

9. Format the NameNode file system

10. Start the Hadoop cluster

11. Test whether you can connect to the Hadoop platform


Preparations required for construction:

  • Virtual machine: VMware Workstation Pro 17
  • ISO image file: CentOS-6.5-x86_64-bin-DVD1.iso
  • JDK version: jdk-8u171-linux-x64.tar.gz
  • Hadoop version: hadoop-3.3.0.tar.gz
  • Remote connection tool: MobaXterm

Building process:

1. Install virtual machine

A virtual machine HadoopMaster is required as the master node (the host name can be chosen by yourself, not unique)

Two or more hosts are required as slave nodes (we name them HadoopSlave1 and HadoopSlave2 here)

2. Configure the network

  • Enter the command prompt and enter the command ipconfig to view the IP and gateway.
  • We use bridging (automatic). After changing WiFi, make sure the network segment is the same, otherwise the virtual machine will not be able to ping the gateway.
  • Here our network segment is 43, the WiFi IP address is 192.168.43.146, and the gateway is 192.168.43.1.
  • When configuring the virtual machine network card, make sure that the IP address does not conflict with the WiFi address here, but the gateway must be consistent.

Open the virtual machine and use the root user to edit the file ifcfg-eth0 to configure the network. Otherwise, the permissions on the side are insufficient.

Switch to root user: su –

Edit ifcfg-eth0 to configure the network: vim /etc/sysconfig/network-scripts/ifcfg-eth0

  • Reserved DEVICE=eth0, ONBOOT, BOOTPROTO
  • Comment out the rest with #
  • Designed to start at boot: ONBOOT=no changed to ONBOOT=yes
  • Set IP to static IP: BOOTPRO=dhcp changed to BOOTPRO=static

Note: Except for the different IP addresses, the other configurations of HadoopMaster, HadoopSlave1, and HadoopSlave2 are the same.

After ifcfg-eth0 configuration is completed:

  • Turn off the firewall: chkconfig iptables off
  • Refresh the network card: service network restart

Note: All three virtual machines need to execute

Ensure that the IP and gateway of each virtual machine can be pinged.

3. Modify the host name

Needs to be modified in the configuration file.

Use command: vim /etc/sysconfig/network

  • Change HOSTNAME to your own hostname

Note: All three virtual machines must execute

4. Bind the host name and IP

Use command: vim /etc/hosts

The IP address of each machine corresponds to its own host name.

  • 192.168.43.110 HadoopMatser
  • 192.168.43.111 HadoopSlave1
  • 192.168.43.112 HadoopSlave2

Note: Each virtual machine must execute

Make sure you can ping the hostname:

5. Configure password-free login

  • Generate key pair:
ssh-keygen -t rsa
  • Keep pressing Enter
  • Copy the public key into the key file:
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

Note: Need to be executed in each virtual machine

  • Remotely copy the key file to the slave node
scp ~/.ssh/authorized_keys zxa@HadoopSlave1:~/.ssh/
scp ~/.ssh/authorized_keys zxa@HadoopSlave2:~/.ssh/

Log in from the master node to the slave node and use ssh to authenticate:

6. Use the remote connection tool to upload jdk and hadoop

You can use Xshell, MobaXterm, or WinSCP to upload files. MobaXterm is used here.

Create an ordinary user zxa, create a folder software in the home directory, and then upload the compressed packages of Hadoop and jdk to the folder software.

  1. useradd zxa
  2. su-zxa
  3. mkdir software
  4. Upload jdk and hadoop compressed packages to software

Note: Creating the ordinary user zxa and directory software needs to be done for each virtual machine.

You can connect each virtual machine in turn and upload it to the software, or you can upload it to the master node HadoopMaster and then copy it remotely to the slave nodes HadoopSlave1 and HadooSlave2.

scp /home/zxa/software/hadoop-3.3.0.tar.gz zxa@HadoopSlave1:/home/zxa/software/
scp /home/zxa/software/hadoop-3.3.0.tar.gz zxa@HadoopSlave2:/home/zxa/software/
scp /home/zxa/software/jdk-8u171-linux-x64.tar.gz zxa@HadoopSlave1:/home/zxa/software/
scp /home/zxa/software/jdk-8u171-linux-x64.tar.gz zxa@HadoopSlave2:/home/zxa/software/

Seven. Install jdk and Hadoop

1. Decompress jdk and hadoop
  1. Decompress the jdk compressed package: tar -zxvf jdk-8u171-linux-x64.tar.gz
  2. Decompress the hadoop compressed package: tar -zxvf hadoop-3.3.0.tar.gz
  3. Create the directory hadooptmp under software: mkdir hadooptmp

Note: Each virtual machine needs to execute

2. Configure jdk and hadoop environment variables
  1. vim /home/zxa/.bash_profile
#jdk
export JAVA_HOME=/home/zxa/software/jdk1.8.0_171
export PATH=$JAVA_HOME/bin:$PATH
#hadoop
export HADOOP_HOME=/home/zxa/software/hadoop-3.3.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

  1. source /home/zxa/.bash_profile
  2. java -version (check whether jdk configuration is successful)

Note: Source must be used, otherwise the configuration will not take effect. Each virtual machine needs to be configured.

3. Adding jdk environment

Switch directory to hadoop, the full path is: /home/zxa/software/hadoop-3.3.0/etc/hadoop/

Add content to the files hadoop-env.sh and yarn-env.sh

export JAVA_HOME=/home/zxa/software/jdk1.8.0_171

Four. File core-site.xml in

content:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://HadoopMaster:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/zxa/software/hadooptmp</value>
</property>
</configuration>

Note: Only need to operate in HadoopMaster

< /h5>

五. hdfs-site.xml

content:

<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>

Note: You only need to operate on the master node HadoopMaster

6. yarn-site.xml in

content:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>HadoopMaster:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>HadoopMaster:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>HadoopMaster:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>HadoopMaster:18141</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>HadoopMaster:8088</value>
</property>
</configuration>

Note: Only need to operate in HadoopMaster

Seven. mapred-site.xml in

content:

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Note: You only need to operate on the master node HadoopMaster.

8. Workers Medium

vim /home/zxa/hadoop-3.3.0/etc/hadoop/workers

Replace worker with the following
s
Contents in:

HadoopSlave1

HadoopSlave2

8. Copy to slave node

Use the following command to configure the
Hadoop
Directory copied to slave node
superior:

scp -r hadoop-3.3.0 zxa@HadoopSlave1:~/software/
scp -r hadoop-3.3.0 zxa@HadoopSlave2:~/software/

Note: Just execute in the master node HadoopMaster

9. Format NameNode File System

The formatting command is as follows. This operation only needs to be done in
HadoopMaster
Execute on the node:

hdfs namenode -format

10. Start the Hadoop cluster

Start command: start-all.sh

View started processes: jps

Note: The startup command only needs to be entered in the master node HadoopMaster. There are four processes in the master node and only three in the slave node.

Eleven. Test whether it can be connected to the Hadoop platform

The ip address of the master node HadoopMaster + English colon + port number

192.168.43.110:9870

192.168.43.110.8088

hadoop port 9870

hadoop port 8088

At this point, the Hadoop fully distributed cluster has been successfully built!