Table of Contents
Environmental preparation
Install hadoop
Usage example
Note
Unless otherwise specified, all operations in this article are performed on the root user of all nodes.
The files needed in this article are placed in the /usr/local/software directory (this directory is created by yourself)
Environment preparation
First, the basic situation of my use of three machines is as follows:
ip | Identity |
192.168.88.144 | master |
192.168.88.145 | slave1 |
192.168.88.146 | slave2 |
- Turn off the firewall
systemctl stop firewalld # Temporary shutdown systemctl disable firewalld # Permanently shut down
- Shut down selinux
setenforce 0 # Temporarily shut down vim /etc/selinux/config # Change the value of SELINUX to disabled in order to shut it down permanently
- Modify yum source
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo yum makecache # Refresh configuration
Need to ensure metadata has been created
- Install jdk
cd /usr/local/software tar -zxvf jdk-8u291-linux-x64.tar.gz -C /usr/local # Place the decompressed folder under /usr/local
Modify jdk name
cd /usr/local mv jdk1.8.0_291 jdk
- Modify environment variables
vim ~/.bashrc source ~/.bahsrc #Refresh configuration # Below are the environment variables that need to be added to the .bashrc file export JAVA_HOME=/usr/local/jdk export JRE_HOME=${JAVA_HOME}/jre export CLASS_PATH=${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=$PATH:${JAVA_HOME}/bin
Check java version
java -version
- Install database
Here we use the mariadb that comes with centosyum install -y mariadb-server
Initialize database
systemctl start mariadb # Start mariadb systemctl enable mariadb #Set auto-start at boot mysql_secure_installation # Initialize data, follow the prompts after using this command
Check if the installation is successful
mysql -u root -p # Log in as root user show databases; # View all databases, there should be three
Modify the configuration to ensure that the outside world can connect to the database
use mysql; update user set host='%' where host='localhost'; flush privileges; # Refresh configuration select host,user from user where user='root'; # Make sure the query results are as shown below
- Set static address
Modify the ifcfg-ens33 file and change the dynamic ip to a static ipvim /etc/sysconfig/network-scripts/ifcfg-ens33 # It is necessary to change the value of BOOTPROTO to static, and add four fields IPADDR (ip), NETMASK (subnet mask), GATEWAY (gateway), and DNS (domain name resolution server). The specific content is as shown in the figure below</pre > <p><img alt="" height="480" src="//i2.wp.com/img-blog.csdnimg.cn/e4b40786b897452f88d18fa5c761fce9.png" width="365"></p> <p>Please note that the gateway, IP and subnet mask should be filled in according to the actual situation.</p> <pre>systemctl stop network # Restart the network to update the network card configuration systemctl start network
- Modify the host name
hostnamectl set-hostname master # master is the modified host name, and the two sub-nodes are modified to slave1 and slave2 respectively. reboot # The terminal may not be updated after modification. It will be displayed after restarting
- Modify the hosts file
vim /etc/hosts # Ensure that three nodes on the same LAN can communicate with each other # Add the following content to the hosts file. The IP address and host name must be separated by the TAB key, and spaces cannot be used. 192.168.88.144 master 192.168.88.145 slave1 192.168.88.146 slave2
- Configure password-free login between nodes
First, perform the following operations in the ordinary user chenxl of the three nodes (if you want to use the root user to build hadoop, use the root user to perform the following operations)ssh localhost # Connect to the machine according to the prompts, and then use exit to exit ssh-keygen -t dsa -f ~/.ssh/id_dsa # Generate key cd ~/.ssh cp id_dsa.pub authorized_keys sudo chmod 600 ./authorized_keys # Modify permissions ssh localhost # Verify that you can log in without a password on this machine
You can see from the picture above that you need to enter the login password when logging in for the first time, and then use an appropriate encryption algorithm to generate a key.
Then save the master’s key to two child nodes (still use the ordinary user chenxl to operate)
scp ~/.ssh/id_dsa.pub chenxl@slave1:~/.ssh/master.pub scp ~/.ssh/id_dsa.pub chenxl@slave2:~/.ssh/master.pub
Append the contents of the master.pub file to the authorized_keys file in slave1 and slave2 (use ordinary user chenxl to operate)
cd ~/.ssh cat master.pub >> authorized_keys
Then verify on the master whether you can log in to the two child nodes without password.
ssh slave1 # Log in to slave1 node ssh slave2 # Log in to slave2 node
As you can see from the picture above, the master can already log in to two child nodes without a password.
- Unified time zone
ntpdate ntp1.aliyun.com # If there is no ntpdate command, you can use yum install to install it. hwclock # Check whether the time on the three nodes is consistent
Install hadoop
- Unzip
Use the root user on the master node to decompress hadooptar -zxvf hadoop-2.7.7.tar.gz -C /usr/local # Decompress cd /usr/local mv hadoop-2.7.7 hadoop # Rename
- Modify permissions
Provide the permissions of the hadoop directory to the ordinary user chenxl (if you use the root user to build hadoop, you do not need to perform this step)cd /usr/local sudo chown -R chenxl:chenxl ./hadoop
- Modify environment variables
Use the ordinary user chenxl in the master node to add the environment variables needed by hadoopCheck if hadoop is installed successfully
hadoop version # Check the hadoop version
- Modify child node environment variables
Modify environment variables in the ordinary user chenxl in slave1 and slave2vim ~/.bashrc source ~/.bashrc # Refresh environment variables # The specific content that needs to be added is shown in the figure below
- Modify hadoop configuration file
There are mainly five files that need to be modified in fully distributed. These files are in the ./etc/hadoop directory under the hadoop installation directory. The files that need to be modified are core-site.xml, hdfs-site.xml, and mapred-site. xml, yarn-site.xml and slaves files. The mapred-site.xml file may not exist. You can directly rename the mapred-site.xml.template file to mapred-site.xml. The specific configuration of these files can be referred to the following (use the ordinary user chenxl of the master node to operate) :
core-site.xml file:hdfs-site.xml file:
mapred-site.xml file:
yarn-site.xml file:
slaves file:
- will copy hadoop folder
Copy all the contents of the hadoop folder in the master node to the two child nodes slave1 and slave2scp -r /usr/local/hadoop root@slave1:/usr/local scp -r /usr/local/hadoop root@slave2:/usr/local
Use the root user in slave1 and slave2 to transfer the permissions of the /usr/local/hadoop folder to the ordinary user chenxl
sudo chown -R chenxl:chenxl /usr/local/hadoop
- Format namenode
Use the ordinary user chenxl on the master node to perform operations:
hdfs namenode -format # Format namenode
- Start the cluster
start-all.sh # Start the cluster
- Verify
jps
View the process on three nodes, the results should be as follows
There should be ResourceManager, NmaeNode and SecondaryNameNode on the master node
There should be DataNode and NodeManager on the slave1 and slave2 nodes.
Then access the master node IP + 50070 port on the web page, such as 192.168.88.144:50070
You can see that the two datanodes in the hadoop cluster have started normally.
Then access the master node IP + 8088 port, such as 192.168.88.144:8088
If the web page can be accessed normally, it means that the yarn is also normal.
Usage Example
- Try to create/delete a folder on hdfs to see if it can be used normally
hdfs dfs -mkdir /input # Create a folder
hdfs dfs -rm -r /input # Delete the folder /input
Upload local files
- Count word occurrences
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /input/test.txt /output # Count previously uploaded test.txt The number of occurrences of each word in the file hdfs dfs -cat /output/part-r-00000 # Query results
stop-all.sh # Stop the cluster
At this point, complete distribution is complete. Just use start-all.sh to start it.
The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Cloud native entry-level skills treeHomepageOverview 16993 people are learning the system