Hadoop cluster environment construction

Hadoop cluster environment construction

1. Environment preparation

Virtual machine: vm15pro


Linux system: centos7.6

View the ip command line command: ifconig If there is no such command, you need to install the plug-in. The command is as follows: yum install net-tools
jdk1.8.0_221:

hadoop-2.9.2:

zookeeper-3.4.10:

hbase-1.2.4:
Install the centos7.6 system on three cluster hosts (after installing one virtual machine environment, you can directly clone the other two)
ip respectively
host master ip (master)
Slave1ip (slave1)
Slave2ip (slave2)

2. Modify the hosts file

Mapping hostnames and ip addresses
vi /etc/hosts
– Add the following content (for example)
192.168.207.129 master
192.168.207.130 slave1
192.168.207.131 slave2
Note: Each host needs to be modified
screenshot:

3. Install jdk

– First check whether the system has integrated openjdk
java-version
If the jdk version is displayed, uninstall openjdk first
rpm -qa |grep java (find the installation location)
– Display the following content
java-1.8.0-openjdk-headless-1.8.0.101-3.b13.el7_2.x86_64
java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64
– Uninstall
rpm -e –nodeps java-1.8.0-openjdk-headless-1.8.0.101-3.b13.el7_2.x86_64
rpm -e –nodeps java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64
– install jdk
– Decompress the jdk installation package (you need to upload the jdk file compression package, you can choose to use sftp to connect to the virtual machine or mobaxterm to upload files)
mkdir /usr/java
tar -zxvf jdk-8u221-linux-x64.tar.gz
– Copy jdk to slave1 and slave2
scp -r /usr/java slave1:/usr
scp –r /usr/java slave2:/usr
– Set the jdk environment variable and all three need to be modified
vi /etc/environment
JAVA_HOME=/usr/java/jdk1.8.0_221
JRE_HOME=/usr/java/jdk1.8.0_221/jre

vi /etc/profile All three need to be modified
– add the following content
export JAVA_HOME=/usr/java/jdk1.8.0_221
export JRE_HOME=

J

A

V

A

h

o

m

E.

/

j

r

e

e

x

p

o

r

t

C

L

A

S

S

P

A

T

h

=

.

:

JAVA_HOME/jre export CLASSPATH=.:

JAVAH?OME/jreexportCLASSPATH=.:JAVA_HOME/lib:$JRE_HOME/lib:

C

L

A

S

S

P

A

T

h

e

x

p

o

r

t

P

A

T

h

=

CLASSPATH export PATH=

CLASSPATH exportPATH=JAVA_HOME/bin:

J

R

E.

h

o

m

E.

/

b

i

no

:

JRE_HOME/bin:

JREH?OME/bin:PATH
The screenshot is the complete configuration

source /etc/profile to make the changes take effect

4. Set passwordless access between clusters

slave1
– use rsa encryption
ssh-keygen -t rsa and then keep pressing Enter
– copy the public key
cp ~/.ssh/id_rsa.pub ~/.ssh/slave1_id_rsa.pub
– send to master
scp ~/.ssh/slave1_id_rsa.pub master:~/.ssh/
Do the same on slave2
the master
– use rsa encryption
ssh-keygen -t rsa
– Copy the master public key to authorized_keys
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
– Copy the master public key to authorized_keys
– Copy slave1 public key to authorized_keys
cat ~/.ssh/slave1_id_rsa.pub >> ~/.ssh/authorized_keys
– Copy slave2 public key to authorized_keys
cat ~/.ssh/slave2_id_rsa.pub >> ~/.ssh/authorized_keys
At this time, there are public keys of three machines in the authorized_kyes of the master
–Send authorized_kyes to slave1 and slave2
scp ~/.ssh/authorized_keys slave1:~/.ssh
scp ~/.ssh/authorized_keys slave2:~/.ssh
If the transfer is successful, the command line will display the successfully transferred file

5. Turn off firewall and SELINUX

– turn off the firewall
systemctl stop firewalld.service
systemctl disable firewalld.service
– close SELINUX
vi /etc/selinux/config
note
#SELINUX=enforcing
#SELINUXTYPE=targeted
join again
SELINUX=disable

6. Hadoop installation and configuration

– Unzip the installation package and create a basic directory (I created a bigData folder under the usr folder and then unzipped hadoop into it. Before unzipping, you need to transfer the compressed package to the folder)
#mkdir /usr/bigData

tar -zxvf hadoop-2.9.2.tar.gz
cd /usr/hadoop-2.9.2
#mkdir hdfs
#mkdir tmp
#cd /hdfs
#mkdir data
#mkdir name
Then create folders hdfs and tmp for subsequent configuration files

– modify the configuration file
– Modify the slaves file
vi /usr/bigData/hadoop-2.9.2/etc/hadoop/slaves
remove localhost
join in
slave1
slave2

Start to modify a series of configuration files, the folders are as follows

Modify hadoop-env.sh file (full path, may not be found using ${JAVA_HOME})

 In order to avoid the problem that the command stop-all.sh cannot be stopped after the cluster starts

The reason for the problem is: Hadoop is based on the mapred and dfs process numbers on the datanode when it stops. The default process number is stored in /tmp, and Linux will delete the files in this directory by default every once in a while (usually a month or about 7 days). Therefore, after deleting the two files hadoop-hadoop-jobtracker.pid and hadoop-hadoop-namenode.pid, the namenode will naturally not be able to find these two processes on the datanode.
Therefore, in order to avoid this problem, it is necessary to add a new

The pids folder needs to be created outside
– Modify the core-site.xml file

#vi core-site.xml

Add the following to the configuration node

fs.default.name hdfs://master:9000 hadoop.tmp.dir file:/usr/bigData/hadoop-2.9.2/tmp

–Modify the hdfs-site.xml file

#vi hdfs-site.xml

Add the following to the configuration node

dfs.namenode.name.dir file:/usr/bigData/hadoop-2.9.2/hdfs/name true dfs.datanode.data.dir file:/usr/bigData/hadoop-2.9.2/hdfs/ data true

– Modify the mapred-site.xml file. This file is not available. You need to copy /mapred-site.xml.template and rename it to the mapred-site.xml file

#cp mapred-site.xml.template mapred-site.xml

Add the following to the configuration node

mapreduce.framework.name yarn mapreduce.jobhistory.address master:10020 mapreduce.jobhistory.webapp.address master:19888

– Modify the yarn-site.xml file

#vi yarn-site.xml

– Add the following to the configuration node

yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.mapred.ShuffleHandler yarn.resourcemanager.address master:8032 yarn.resourcemanager.scheduler.address master:8030 yarn.resourcemanager.resource-tracker. address master:8031 yarn.resourcemanager.admin.address master:8033 yarn.resourcemanager.webapp.address master:8088

– Copy hadoop to slave1 and slave2 nodes
#scp -r /usr/bigData/hadoop-2.9.2 slave1:/usr/bigData

#scp -r /usr/bigData/hadoop-2.9.2 slave2:/usr/bigData

– Configure hadoop environment variables Each node needs to be configured
#vi /etc/profile

Add hadoop as follows

Execute to make the configuration file take effect
#source /etc/profile
#vi ~/.bashrc

Add the following content

export HADOOP_PREFIX=/usr/bigData/hadoop-2.9.2/

Each node has to be modified
Check if hadoop is successfully installed command
#hadoop version

Check if hadoop is 32-bit or 64-bit version command

–master node format namenode (the first time to format)
/usr/bigData/hadoop-2.9.2/bin/hdfs namenode -format

The Hadoop environment was built successfully!
master node startup
#/usr/bigData/hadoop-2.9.2/sbin/start-all.sh Start hadoop to check whether it is successful
Jps command view
#jps Three nodes have to be viewed


Tested through the browser as follows:
http://ip:50070/ The following page appears, which means success

Check the status of YARN and you can see that there are two nodes running
http://ip:8088/

 Hadoop file operations need to change permissions

Change dfs operation permissions hadoop fs -chmod 777 /

7. Zookeeper environment construction

– Unzip the zookeeper installation package and create a basic directory
#tar -zxvf zookeeper-3.4.14.tar.gz
#mkdir /usr/bigData/zookeeper-3.4.14/data
#mkdir /usr/bigData/zookeeper-3.4.14/log
– Configure the environment margin
#vi /etc/profile

and then make it work
#source /etc/profile
– modify the configuration file
– Copy configuration file template
#cd /usr/bigData/zookeeper-3.4.14/conf/
#cp zoo_sample.cfg zoo.cfg
– modify the configuration file to start
#vim /usr/zookeeper-3.4.10/conf/zoo.cfg
Add the following content
dataDir=/usr/bigData/zookeeper-3.4.14/data
dataLogDir=/usr/bigData/zookeeper-3.4.14/log
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888
–Create myid file (need to be created for each node)
#cd /usr/bigData/zookeeper-3.4.14/data/
#touch myid

#vi myid

Add the following content respectively

1 (master node added)

2 (slave2 node added)

3 (slave3 node added)
Zookeeper build successfully
Start View Close As shown below
#cd /usr/bigData/zookeeper-3.4.14/bin/

Use the Shell command to view the zookeeper version
echo stat|nc localhost 2181
Start zookeeper to check whether it is successful, each node needs to be started separately
The provided shell command starts and closes the method, just run a command on the master

Since zookeeper startup requires each node to be started separately, it is cumbersome to operate after all. Provide a method to start, shut down, and check the status through shell commands

– Create a new shell folder
mkdir /usr/shell
-start up
vim /usr/shell/startzk.sh

Enter the following

#!/bin/bash

echo “start zookeeper server…”

hosts=”master slave1 slave2″

for host in $hosts

do

ssh $host "source /etc/profile;/usr/zookeeper-3.4.10/bin/zkServer.sh start"

done

save!

-closure

vim /usr/shell/stopzk.sh

Enter the following

#!/bin/bash

echo “stop zookeeper server…”

hosts=”master slave1 slave2″

for host in $hosts

do

ssh $host "source /etc/profile;/usr/zookeeper-3.4.10/bin/zkServer.sh stop"

done

save!

– view status

vim /usr/shell/statuszk.sh

Enter the following

#!/bin/bash

echo “status zookeeper server…”

hosts=”master slave1 slave2″

for host in $hosts

do

ssh $host "source /etc/profile;/usr/zookeeper-3.4.10/bin/zkServer.sh status"

done

Note: hosts = “” There are as many nodes as there are in it, just write a few nodes, I only have three here, so I wrote three

ssh $host “” Pay attention to the zookeeper installation path, don’t make a mistake

Last assigned permissions

#chmod 777 ./startzk.sh

#chmod 777 ./stopzk.sh

chmod 777 ./statuszk.sh

eclipse configuration

(1) Copy the hadoop-eclipse-plugin-2.6.0.jar package to the plugins directory under the eclipse directory
(2) Open eclipse and configure hadoop path

(3) Open the Map/Reduce Locations view

(4) Connect to the hadoop cluster (need to start the cluster in advance)
New hadoop location

The configuration is as follows

(5) Create several folders in the hadoop cluster. If you can see the corresponding directory structure in DFS Locations in the Project Explorer view, the configuration is successful.

8. hbase environment construction

– Unzip the hbase installation package
#cd /usr/bigData/
#tar -zxvf hbase-2.0.5-bin.tar.gz
mkdir logs
– modify the configuration file
vi /usr/bigData/hbase-2.0.5/conf/hbase-env.sh
Add the following content
export JAVA_HOME=/usr/java/jdk1.8.0_221
export HBASE_LOG_DIR=KaTeX parse error: Expected ‘EOF’, got ‘#’ at position 68: …change regionservers #?vi /usr/bigData… jps
If bgs-5p173-wangwenting is as follows:
22898 Resource Manager
20739 Jps
24383 JobHistoryServer
20286 HMaster
22722 SecondaryNameNode
22488 NameNode
[hadoop@bgs-5p174-wangwenting opt]$ jps
2141 Node Manager
3257 HRegionServer
25283 Jps
1841 DataNodes
[hadoop@bgs-5p175-wangwenting opt]$ jps
2141 Node Manager
3257 HRegionServer
25283 Jps
1841 DataNodes
If HMaster and HRegionServer are displayed, the startup is successful
8). Use the /opt/hadoop/hbase/bin/hbase shell command to test the installation result:
[hadoop@bgs-5p173-wangwenting opt]$ /opt/hbase/bin/hbase shell
a. Create table test:
hbase(main):002:0> create “test”, “cf”
0 row(s) in 2.5840 seconds
=> Hbase::Table – test
b. List all tables:
hbase(main):003:0> list
TABLE
test
1 row(s) in 0.0310 seconds
=> [“test”]
If you enter the list several times, after starting Hadoop and HBase, execute the jps command, you have already seen the HMaster process, but enter the HBase shell and execute a command, the following error will appear:

Solution:
Go to the logs directory to view the master’s log: it is found that the following content is always displayed:
vim hbase-hadoop-master-s1.log

2017-03-13 17:13:17,374 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode…
2017-03-13 17:13:27,377 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode…
2017-03-13 17:13:37,386 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode…
2017-03-13 17:13:47,393 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode…
2017-03-13 17:13:57,395 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode…
2017-03-13 17:14:07,409 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode…
It turned out that Hadoop was still in safe mode when it was first started. Manually exit Hadoop’s safe mode. Then restart the hbase service.

After restarting, enter list, and the error will no longer be reported.

c. Insert data into the test table:
hbase(main):001:0> put “test”, “row”, “cf:a”, “value”
0 row(s) in 0.4150 seconds
d. Check the test table information:
hbase(main):002:0> scan ‘test’
ROW COLUMN + CELL
row column=cf:a, timestamp=1447246157917, value=value
1 row(s) in 0.0270 seconds
If the hbase shell test is successful, enter the browser to access the following URL: http://172.24.5.173:16010/,

If it is displayed normally, the hbase cluster installation is successful! Use the ip to be safe, and use the domain name to configure it in the host file of your computer first.

9). Start the thriftserver2 service
[hadoop@bgs-5p173-wangwenting opt]$ nohup /opt/hbase/bin/hbase-daemon.sh start thrift2 &