Hadoop fully distributed construction

Table of Contents

Environmental preparation

Install hadoop

Usage example

Note

Unless otherwise specified, all operations in this article are performed on the root user of all nodes.

The files needed in this article are placed in the /usr/local/software directory (this directory is created by yourself)

Environment preparation

First, the basic situation of my use of three machines is as follows:

ip	Identity
192.168.88.144	master
192.168.88.145	slave1
192.168.88.146	slave2

Turn off the firewall

systemctl stop firewalld # Temporary shutdown
systemctl disable firewalld # Permanently shut down

Shut down selinux

setenforce 0 # Temporarily shut down
vim /etc/selinux/config # Change the value of SELINUX to disabled in order to shut it down permanently

Modify yum source

wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
yum makecache # Refresh configuration

Need to ensure metadata has been created

Install jdk

cd /usr/local/software
tar -zxvf jdk-8u291-linux-x64.tar.gz -C /usr/local # Place the decompressed folder under /usr/local

Modify jdk name

cd /usr/local
mv jdk1.8.0_291 jdk

Modify environment variables

vim ~/.bashrc
source ~/.bahsrc #Refresh configuration

# Below are the environment variables that need to be added to the .bashrc file
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=${JAVA_HOME}/jre
export CLASS_PATH=${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=$PATH:${JAVA_HOME}/bin

Check java version

java -version

Install database
Here we use the mariadb that comes with centos

yum install -y mariadb-server

Initialize database

systemctl start mariadb # Start mariadb
systemctl enable mariadb #Set auto-start at boot
mysql_secure_installation # Initialize data, follow the prompts after using this command

Check if the installation is successful

mysql -u root -p # Log in as root user
show databases; # View all databases, there should be three

Modify the configuration to ensure that the outside world can connect to the database

use mysql;
update user set host='%' where host='localhost';
flush privileges; # Refresh configuration
select host,user from user where user='root'; # Make sure the query results are as shown below

Set static address
Modify the ifcfg-ens33 file and change the dynamic ip to a static ip

vim /etc/sysconfig/network-scripts/ifcfg-ens33

# It is necessary to change the value of BOOTPROTO to static, and add four fields IPADDR (ip), NETMASK (subnet mask), GATEWAY (gateway), and DNS (domain name resolution server). The specific content is as shown in the figure below</pre > <p><img alt="" height="480" src="//i2.wp.com/img-blog.csdnimg.cn/e4b40786b897452f88d18fa5c761fce9.png" width="365"></p> <p>Please note that the gateway, IP and subnet mask should be filled in according to the actual situation.</p> <pre>systemctl stop network # Restart the network to update the network card configuration
systemctl start network

Modify the host name

hostnamectl set-hostname master # master is the modified host name, and the two sub-nodes are modified to slave1 and slave2 respectively.
reboot # The terminal may not be updated after modification. It will be displayed after restarting

Modify the hosts file

vim /etc/hosts # Ensure that three nodes on the same LAN can communicate with each other

# Add the following content to the hosts file. The IP address and host name must be separated by the TAB key, and spaces cannot be used.
192.168.88.144 master
192.168.88.145 slave1
192.168.88.146 slave2

Configure password-free login between nodes
First, perform the following operations in the ordinary user chenxl of the three nodes (if you want to use the root user to build hadoop, use the root user to perform the following operations)
```
ssh localhost # Connect to the machine according to the prompts, and then use exit to exit

ssh-keygen -t dsa -f ~/.ssh/id_dsa # Generate key
cd ~/.ssh
cp id_dsa.pub authorized_keys
sudo chmod 600 ./authorized_keys # Modify permissions

ssh localhost # Verify that you can log in without a password on this machine
```
You can see from the picture above that you need to enter the login password when logging in for the first time, and then use an appropriate encryption algorithm to generate a key.

Then save the master’s key to two child nodes (still use the ordinary user chenxl to operate)
```
scp ~/.ssh/id_dsa.pub chenxl@slave1:~/.ssh/master.pub
scp ~/.ssh/id_dsa.pub chenxl@slave2:~/.ssh/master.pub
```
Append the contents of the master.pub file to the authorized_keys file in slave1 and slave2 (use ordinary user chenxl to operate)
```
cd ~/.ssh
cat master.pub >> authorized_keys
```
Then verify on the master whether you can log in to the two child nodes without password.
```
ssh slave1 # Log in to slave1 node
ssh slave2 # Log in to slave2 node
```
As you can see from the picture above, the master can already log in to two child nodes without a password.

Unified time zone

ntpdate ntp1.aliyun.com # If there is no ntpdate command, you can use yum install to install it.
hwclock # Check whether the time on the three nodes is consistent

Install hadoop

Unzip
Use the root user on the master node to decompress hadoop

tar -zxvf hadoop-2.7.7.tar.gz -C /usr/local # Decompress
cd /usr/local
mv hadoop-2.7.7 hadoop # Rename

Modify permissions
Provide the permissions of the hadoop directory to the ordinary user chenxl (if you use the root user to build hadoop, you do not need to perform this step)
```
cd /usr/local
sudo chown -R chenxl:chenxl ./hadoop
```
Modify environment variables
Use the ordinary user chenxl in the master node to add the environment variables needed by hadoop

Check if hadoop is installed successfully
```
hadoop version # Check the hadoop version
```

Modify child node environment variables
Modify environment variables in the ordinary user chenxl in slave1 and slave2

vim ~/.bashrc
source ~/.bashrc # Refresh environment variables

# The specific content that needs to be added is shown in the figure below

Modify hadoop configuration file
There are mainly five files that need to be modified in fully distributed. These files are in the ./etc/hadoop directory under the hadoop installation directory. The files that need to be modified are core-site.xml, hdfs-site.xml, and mapred-site. xml, yarn-site.xml and slaves files. The mapred-site.xml file may not exist. You can directly rename the mapred-site.xml.template file to mapred-site.xml. The specific configuration of these files can be referred to the following (use the ordinary user chenxl of the master node to operate) :
core-site.xml file:

hdfs-site.xml file:

mapred-site.xml file:

yarn-site.xml file:

slaves file:
will copy hadoop folder
Copy all the contents of the hadoop folder in the master node to the two child nodes slave1 and slave2
```
scp -r /usr/local/hadoop root@slave1:/usr/local
scp -r /usr/local/hadoop root@slave2:/usr/local
```
Use the root user in slave1 and slave2 to transfer the permissions of the /usr/local/hadoop folder to the ordinary user chenxl
```
sudo chown -R chenxl:chenxl /usr/local/hadoop
```
Format namenode
Use the ordinary user chenxl on the master node to perform operations:
```
hdfs namenode -format # Format namenode
```
Start the cluster
```
start-all.sh # Start the cluster
```
Verify
```
jps
```
View the process on three nodes, the results should be as follows

There should be ResourceManager, NmaeNode and SecondaryNameNode on the master node

There should be DataNode and NodeManager on the slave1 and slave2 nodes.

Then access the master node IP + 50070 port on the web page, such as 192.168.88.144:50070

You can see that the two datanodes in the hadoop cluster have started normally.
Then access the master node IP + 8088 port, such as 192.168.88.144:8088

If the web page can be accessed normally, it means that the yarn is also normal.

Usage Example

Try to create/delete a folder on hdfs to see if it can be used normally

hdfs dfs -mkdir /input # Create a folder

hdfs dfs -rm -r /input # Delete the folder /input

Upload local files

Count word occurrences

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /input/test.txt /output # Count previously uploaded test.txt The number of occurrences of each word in the file
hdfs dfs -cat /output/part-r-00000 # Query results

stop-all.sh # Stop the cluster

At this point, complete distribution is complete. Just use start-all.sh to start it.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Cloud native entry-level skills treeHomepageOverview 16993 people are learning the system