hadoop-3.1.3, ubuntu18.04 to build a cluster

1. On the vmwa machine, install ubuntu, some preparation settings

When creating a virtual machine, fill in the name of the virtual machine master,

When installing ubuntu, fill in master for name and computer name, fill in hadoop for user name,

After installation, check whether you can directly drag the file to the virtual machine. If not, install vmware-tools, restart after installation, and then copy jdk-8u162-linux-x64.tar.gz and hadoop-3.1.3.tar. gz to the main directory,

Set the screen not to turn off automatically

2. Configure ip address

Verify

3. Install openssh-server, vim (habit)

sudo apt-get update
sudo apt-get install openssh-server
sudo apt-get install vim

4. Install java and hadoop

cd ~
sudo tar -zxvf jdk-8u162-linux-x64.tar.gz -C /usr/local/
sudo tar -zxvf hadoop-3.1.3.tar.gz -C /usr/local

Give these two permissions, owned by the hadoop user

cd /usr/local
sudo chown -R hadoop hadoop-3.1.3/
sudo chown -R hadoop jdk1.8.0_162

View results with ll

Configure environment variables

cd ~
vim.bashrc

Add at the bottom of the .bashrc file

## java environment variable
export JAVA_HOME=/usr/local/jdk1.8.0_162/
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin

# hadoop variables
export HADOOP_HOME=/usr/local/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

to validate

source.bashrc

check

5. Configure hadoop

cd /usr/local/hadoop-3.1.3/etc/hadoop/
vim core-site.xml

#Add to <configuration></configuration>
        <!-- Specify NameNode address -->
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://master:8020</value>
        </property>

        <!-- Specify the storage directory of hadoop data -->
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/usr/local/hadoop-3.1.3/data</value>
        </property>
vim hdfs-site.xml

 <!-- Specify NameNode web-->
        <property>
                <name>dfs.namenode.http-address</name>
                <value>master:9870</value>
        </property>

        <!-- second name node web -->
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>slave2:9868</value>
        </property>
vim mapred-site.xml

 <!-- Specify NameNode address -->
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
vim yarn-site.xml

  <!-- Specify MR shuffle-->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>

        <!-- ResourceManager addr-->
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>slave1</value>
        </property>

                <property>
                <name>yarn.nodemanager.env-whitelist</name>
                <value>JAVA_HOME, HADOOP_CONNON_HONE, HADOOP_HDFS_HONE, HADOOP_CONF_DIR, CLASSPATH_PREPEND_DISTCACHE, HADOOP_YARN_HOME, HADOOP_MAPRED_HOME</value>
        </property>
vim workers

the master
slave1
slave2
vim hadoop-env.sh

HADOOP_SECURE_DN_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_ZKFC_USER=root
HDFS_JOURNALNODE_USER=root
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root
JAVA_HOME=/usr/local/jdk1.8.0_162/
HADOOP_SHELL_EXECNAME=root

6. Modify the hosts file and close the firewall

sudo vim /etc/hosts

192.168.33.10 master
192.168.33.11 slave1
192.168.33.12 slave2
# Turn off the firewall
systemctl stop firewalld.service
# disable booting
systemctl disable firewalld.service

7. Clone the virtual machine

Shut down the virtual machine first, right-click the virtual machine-management-next step-create a full clone-virtual machine name slave1 location and look at the selection

Clone slave2 in the same way

Open three virtual machines after cloning

8. Modify the ip address and hostname of slave1 and slave2

#in slave1
hostnamectl set-hostname slave1
## Re-use hadoop login
sudo login

#in slave2
hostnamectl set-hostname slave2
sudo login

9. Configure ssh password-free login

Generate the private key and public key of the ssh link on the three machines respectively

ssh-keygen -t rsa
#under master
cd ~/.ssh
touch authorized_keys
cat id_rsa.pub >> authorized_keys

#under slave1
scp ~/.ssh/id_rsa.pub hadoop@master:~/

#under master
cd ~
cat id_rsa.pub >> .ssh/authorized_keys

#slave2
scp ~/.ssh/id_rsa.pub hadoop@master:~/

#master
cd ~
cat id_rsa.pub >> .ssh/authorized_keys

#Finally, look at the three additional public keys, in the master:
cat .ssh/authorized_keys

#Then use the scp command to upload the authorized_keys file in the master node to the .ssh/ directory of the slave1 and slave2 nodes respectively, in the master:
scp /home/hadoop/.ssh/authorized_keys hadoop@slave1:~/.ssh/
scp /home/hadoop/.ssh/authorized_keys hadoop@slave2:~/.ssh/


#Verify ssh password-free login, remote login slave1 command, if you want to log in to other nodes, change slave1 to the host name of other hosts
ssh slave1
# exit command
exit


10. Start the cluster

Initialize HDFS on first boot (on master)

hdfs namenode -format

Start HDFS (on master)

start-dfs.sh

Browser access

192.168.33.10:9870/

Stop the cluster (on the master)

stop-all.sh

refer to:

hadoop-3.1.3 fully distributed cluster construction – Zhihu (zhihu.com)

(60 messages) Super invincible detailed use of ubuntu to build hadoop fully distributed cluster_ubuntu build hadoop cluster_Ordinary Netizen’s Blog-CSDN Blog