Build a Hadoop fully distributed cluster

Table of Contents

Preparations required for construction:

Building process:

1. Install the virtual machine

2. Configure the network

3. Modify the host name

4. Bind the host name and IP

5. Configure password-free login

6. Use remote connection tools to upload jdk and hadoop

7. Install jdk and Hadoop

1. Unzip jdk and hadoop

2. Configure jdk and hadoop environment variables

?3. Add jdk environment

4. In the file core-site.xml

5. hdfs-site.xml

6. yarn-site.xml

7. mapred-site.xml

8. Copy to slave node

9. Format the NameNode file system

10. Start the Hadoop cluster

11. Test whether you can connect to the Hadoop platform

Preparations required for construction:

Virtual machine: VMware Workstation Pro 17
ISO image file: CentOS-6.5-x86_64-bin-DVD1.iso
JDK version: jdk-8u171-linux-x64.tar.gz
Hadoop version: hadoop-3.3.0.tar.gz
Remote connection tool: MobaXterm

Building process:

1. Install virtual machine

A virtual machine HadoopMaster is required as the master node (the host name can be chosen by yourself, not unique)

Two or more hosts are required as slave nodes (we name them HadoopSlave1 and HadoopSlave2 here)

2. Configure the network

Enter the command prompt and enter the command ipconfig to view the IP and gateway.

We use bridging (automatic). After changing WiFi, make sure the network segment is the same, otherwise the virtual machine will not be able to ping the gateway.

Here our network segment is 43, the WiFi IP address is 192.168.43.146, and the gateway is 192.168.43.1.

When configuring the virtual machine network card, make sure that the IP address does not conflict with the WiFi address here, but the gateway must be consistent.

Open the virtual machine and use the root user to edit the file ifcfg-eth0 to configure the network. Otherwise, the permissions on the side are insufficient.

Switch to root user: su –

Edit ifcfg-eth0 to configure the network: vim /etc/sysconfig/network-scripts/ifcfg-eth0

Reserved DEVICE=eth0, ONBOOT, BOOTPROTO

Comment out the rest with #

Designed to start at boot: ONBOOT=no changed to ONBOOT=yes

Set IP to static IP: BOOTPRO=dhcp changed to BOOTPRO=static

Note: Except for the different IP addresses, the other configurations of HadoopMaster, HadoopSlave1, and HadoopSlave2 are the same.

After ifcfg-eth0 configuration is completed:

Turn off the firewall: chkconfig iptables off

Refresh the network card: service network restart

Note: All three virtual machines need to execute

Ensure that the IP and gateway of each virtual machine can be pinged.

3. Modify the host name

Needs to be modified in the configuration file.

Use command: vim /etc/sysconfig/network

Change HOSTNAME to your own hostname

Note: All three virtual machines must execute

4. Bind the host name and IP

Use command: vim /etc/hosts

The IP address of each machine corresponds to its own host name.

192.168.43.110 HadoopMatser

192.168.43.111 HadoopSlave1

192.168.43.112 HadoopSlave2

Note: Each virtual machine must execute

Make sure you can ping the hostname:

Generate key pair:
ssh-keygen -t rsa
Keep pressing Enter

Copy the public key into the key file:
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

Note: Need to be executed in each virtual machine

Remotely copy the key file to the slave node

scp ~/.ssh/authorized_keys zxa@HadoopSlave1:~/.ssh/
scp ~/.ssh/authorized_keys zxa@HadoopSlave2:~/.ssh/

6. Use the remote connection tool to upload jdk and hadoop

You can use Xshell, MobaXterm, or WinSCP to upload files. MobaXterm is used here.

Create an ordinary user zxa, create a folder software in the home directory, and then upload the compressed packages of Hadoop and jdk to the folder software.

useradd zxa

su-zxa

mkdir software

Upload jdk and hadoop compressed packages to software

Note: Creating the ordinary user zxa and directory software needs to be done for each virtual machine.

You can connect each virtual machine in turn and upload it to the software, or you can upload it to the master node HadoopMaster and then copy it remotely to the slave nodes HadoopSlave1 and HadooSlave2.

scp /home/zxa/software/hadoop-3.3.0.tar.gz zxa@HadoopSlave1:/home/zxa/software/
scp /home/zxa/software/hadoop-3.3.0.tar.gz zxa@HadoopSlave2:/home/zxa/software/
scp /home/zxa/software/jdk-8u171-linux-x64.tar.gz zxa@HadoopSlave1:/home/zxa/software/
scp /home/zxa/software/jdk-8u171-linux-x64.tar.gz zxa@HadoopSlave2:/home/zxa/software/

Seven. Install jdk and Hadoop

1. Decompress jdk and hadoop

Decompress the jdk compressed package: tar -zxvf jdk-8u171-linux-x64.tar.gz

Decompress the hadoop compressed package: tar -zxvf hadoop-3.3.0.tar.gz

Create the directory hadooptmp under software: mkdir hadooptmp

Note: Each virtual machine needs to execute

2. Configure jdk and hadoop environment variables

vim /home/zxa/.bash_profile
#jdk
export JAVA_HOME=/home/zxa/software/jdk1.8.0_171
export PATH=$JAVA_HOME/bin:$PATH
#hadoop
export HADOOP_HOME=/home/zxa/software/hadoop-3.3.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
source /home/zxa/.bash_profile

java -version (check whether jdk configuration is successful)

Note: Source must be used, otherwise the configuration will not take effect. Each virtual machine needs to be configured.

3. Adding jdk environment

Switch directory to hadoop, the full path is: /home/zxa/software/hadoop-3.3.0/etc/hadoop/

Add content to the files hadoop-env.sh and yarn-env.sh
export JAVA_HOME=/home/zxa/software/jdk1.8.0_171

Four. File core-site.xml in

content:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://HadoopMaster:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/zxa/software/hadooptmp</value>
</property>
</configuration>

Note: Only need to operate in HadoopMaster

< /h5>

五. hdfs-site.xml 中

content:

<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>

Note: You only need to operate on the master node HadoopMaster

6. yarn-site.xml in

content:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>HadoopMaster:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>HadoopMaster:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>HadoopMaster:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>HadoopMaster:18141</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>HadoopMaster:8088</value>
</property>
</configuration>

Note: Only need to operate in HadoopMaster

Seven. mapred-site.xml in

content:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Note: You only need to operate on the master node HadoopMaster.

8. Workers Medium

vim /home/zxa/hadoop-3.3.0/etc/hadoop/workers

Replace worker with the following
s
Contents in:

HadoopSlave1

HadoopSlave2

8. Copy to slave node

Use the following command to configure the
Hadoop
Directory copied to slave node
superior:
scp -r hadoop-3.3.0 zxa@HadoopSlave1:~/software/
scp -r hadoop-3.3.0 zxa@HadoopSlave2:~/software/
Note: Just execute in the master node HadoopMaster

9. Format NameNode File System

The formatting command is as follows. This operation only needs to be done in
HadoopMaster
Execute on the node:

hdfs namenode -format

10. Start the Hadoop cluster

Start command: start-all.sh

View started processes: jps

Note: The startup command only needs to be entered in the master node HadoopMaster. There are four processes in the master node and only three in the slave node.