Hadoop fully distributed construction

Host Settings

Three hosts turn off the firewall

Three hosts disable the SeLinux security mechanism

Modify the hostname for each host

Configure three host address mappings

Set password-free login

Install JDK

Unzip and install

Configure environment variables for jdk

The test installation was successful

Distribute the JDK

Install and configure Hadoop

Unzip and install

Hadoop environment configuration file – hadoop-env.sh

Modify the Hadoop configuration file core-site.xml

HDFS configuration file hdfs-site.xml

YARN configuration file yarn-site.xml

MapReduce configuration file vim mapred-site.xml

Configure workers

to initialize


sudo cp /etc/hosts /etc/hosts.bak
sudo nano /etc/hosts

host settings

Three hosts turn off firewall

#Close the service
systemctl stop firewalld
#Turn off autostart
systemctl disable firewalld

Three hosts turn off the SeLinux security mechanism

vim /etc/sysconfig/selinux

Change enforcing to disabled

Change the hostname for each host

Respectively modified to master, slave1, slave2

hostnamectl set-hostname hostname

Configure three host address mapping

vim /etc/hosts

Set password-free login

cd /root/.ssh

master password-free login master, slave1 and slave2

Generate a key pair

ssh-keygen -t rsa

Enter and hit enter 3 times

Copy three host public keys

ssh-copy-id root@hostname

Password-free login

#login
ssh hostname
#Sign out
exit

Install JDK

Decompression installation

Unzip the installation package to the opt directory

tar -zxvf jdk-8u162-linux-x64.tar.gz -C /opt/

Modify directory name

mv jdk1.8.0_162 jdk1.8

Configure jdk environment variables

vim /etc/profile
export JAVA_HOME=/opt/jdk1.8
export PATH=$PATH:$JAVA_HOME/bin

Save configuration

source /etc/profile

Test installation success

java -version

Distribute JDK

scp -r $JAVA_HOME root@slave1:/opt
scp -r $JAVA_HOME root@slave2:/opt

Install and configure Hadoop

Hadoop official download: Apache Hadoop

Unzip and install

Unzip the installation package to the opt directory

tar -zxvf hadoop-3.1.3.tar.gz -C /opt

Add hadoop environment variables

vim /etc/profile

export HADOOP_HOME=/opt/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root

Note: The other two hosts also need to set the corresponding environment variables

Save configuration environment

source /etc/profile

Test installation successful

hadoop version

Hadoop environment configuration file – hadoop-env.sh

cd $HADOOP_HOME/etc/hadoop
vim hadoop-env.sh
export JAVA_HOME=/opt/jdk1.8
export HADOOP_HOME=/opt/hadoop-3.1.3
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

Modify Hadoop configuration file core-site.xml

cd $HADOOP_HOME/etc/hadoop
vim core-site.xml

Write the following content

<configuration>
    <!--Used to specify the boss of hdfs -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
    <!--Used to specify the storage directory of files generated when hadoop is running -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoop-3.1.3/tmp</value>
    </property>
    <property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>

HDFS configuration file hdfs-site.xml

vim hdfs-site.xml
<configuration>
    <!--Set the directory of the name node -->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/opt/hadoop-3.1.3/tmp/namenode</value>
    </property>
    <!--Set the directory of the data node-->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/opt/hadoop-3.1.3/tmp/datanode</value>
    </property>
    <!--Set Auxiliary Name Node-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>master:50090</value>
    </property>
    <!--Note that if you use hadoop2, the default is 50070-->
    <property>
        <name>dfs.namenode.http-address</name>
        <value>0.0.0.0:9870</value>
    </property>
    
    <!--Whether to enable hdfs permissions, when the value is false, it means off -->
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

YARN configuration fileyarn-site.xml

vim yarn-site.xml
<configuration>
    <!--Configuration resource manager: cluster master-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
    <!--Configure additional services running on the node manager-->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <!--Turn off virtual memory detection, if you do not configure it in the virtual machine environment, an error will be reported-->
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
</configuration>

MapReduce configuration file vim mapred-site.xml

vim mapred-site.xml
<configuration>
    <!--Configure MR resource scheduling framework YARN-->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
</configuration>

Configure workeders

vim workers
master
slave1
slave2

Distribute the configured Hadoop

scp -r $HADOOP_HOME root@slave1:/opt
scp -r $HADOOP_HOME root@slave2:/opt

Initialize

hdfs namenode -format

Start hadoop with the start command

start-all.sh

Master node process:

Slave node process: