Hadoop fully distributed construction

Table of Contents

Environmental preparation

Install hadoop

Usage example


Note

Unless otherwise specified, all operations in this article are performed on the root user of all nodes.

The files needed in this article are placed in the /usr/local/software directory (this directory is created by yourself)

Environment preparation

First, the basic situation of my use of three machines is as follows:

ip Identity
192.168.88.144 master
192.168.88.145 slave1
192.168.88.146 slave2
  1. Turn off the firewall
    systemctl stop firewalld # Temporary shutdown
    systemctl disable firewalld # Permanently shut down

  2. Shut down selinux
    setenforce 0 # Temporarily shut down
    vim /etc/selinux/config # Change the value of SELINUX to disabled in order to shut it down permanently

  3. Modify yum source
    wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
    yum makecache # Refresh configuration

    Need to ensure metadata has been created

  4. Install jdk
    cd /usr/local/software
    tar -zxvf jdk-8u291-linux-x64.tar.gz -C /usr/local # Place the decompressed folder under /usr/local

    Modify jdk name

    cd /usr/local
    mv jdk1.8.0_291 jdk

  5. Modify environment variables
    vim ~/.bashrc
    source ~/.bahsrc #Refresh configuration
    
    # Below are the environment variables that need to be added to the .bashrc file
    export JAVA_HOME=/usr/local/jdk
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASS_PATH=${JAVA_HOME}/lib:${JRE_HOME}/lib
    export PATH=$PATH:${JAVA_HOME}/bin

    Check java version

    java -version

  6. Install database
    Here we use the mariadb that comes with centos
    yum install -y mariadb-server 

    Initialize database

    systemctl start mariadb # Start mariadb
    systemctl enable mariadb #Set auto-start at boot
    mysql_secure_installation # Initialize data, follow the prompts after using this command

    Check if the installation is successful

    mysql -u root -p # Log in as root user
    show databases; # View all databases, there should be three

    Modify the configuration to ensure that the outside world can connect to the database

    use mysql;
    update user set host='%' where host='localhost';
    flush privileges; # Refresh configuration
    select host,user from user where user='root'; # Make sure the query results are as shown below
    

  7. Set static address
    Modify the ifcfg-ens33 file and change the dynamic ip to a static ip
    vim /etc/sysconfig/network-scripts/ifcfg-ens33
    
    # It is necessary to change the value of BOOTPROTO to static, and add four fields IPADDR (ip), NETMASK (subnet mask), GATEWAY (gateway), and DNS (domain name resolution server). The specific content is as shown in the figure below</pre > <p><img alt="" height="480" src="//i2.wp.com/img-blog.csdnimg.cn/e4b40786b897452f88d18fa5c761fce9.png" width="365"></p> <p>Please note that the gateway, IP and subnet mask should be filled in according to the actual situation.</p> <pre>systemctl stop network # Restart the network to update the network card configuration
    systemctl start network
  8. Modify the host name
    hostnamectl set-hostname master # master is the modified host name, and the two sub-nodes are modified to slave1 and slave2 respectively.
    reboot # The terminal may not be updated after modification. It will be displayed after restarting
  9. Modify the hosts file
    vim /etc/hosts # Ensure that three nodes on the same LAN can communicate with each other
    
    # Add the following content to the hosts file. The IP address and host name must be separated by the TAB key, and spaces cannot be used.
    192.168.88.144 master
    192.168.88.145 slave1
    192.168.88.146 slave2

  10. Configure password-free login between nodes
    First, perform the following operations in the ordinary user chenxl of the three nodes (if you want to use the root user to build hadoop, use the root user to perform the following operations)
    ssh localhost # Connect to the machine according to the prompts, and then use exit to exit
    
    ssh-keygen -t dsa -f ~/.ssh/id_dsa # Generate key
    cd ~/.ssh
    cp id_dsa.pub authorized_keys
    sudo chmod 600 ./authorized_keys # Modify permissions
    
    ssh localhost # Verify that you can log in without a password on this machine

    You can see from the picture above that you need to enter the login password when logging in for the first time, and then use an appropriate encryption algorithm to generate a key.

    Then save the master’s key to two child nodes (still use the ordinary user chenxl to operate)

    scp ~/.ssh/id_dsa.pub chenxl@slave1:~/.ssh/master.pub
    scp ~/.ssh/id_dsa.pub chenxl@slave2:~/.ssh/master.pub

    Append the contents of the master.pub file to the authorized_keys file in slave1 and slave2 (use ordinary user chenxl to operate)

    cd ~/.ssh
    cat master.pub >> authorized_keys

    Then verify on the master whether you can log in to the two child nodes without password.

    ssh slave1 # Log in to slave1 node
    ssh slave2 # Log in to slave2 node

    As you can see from the picture above, the master can already log in to two child nodes without a password.

  11. Unified time zone
    ntpdate ntp1.aliyun.com # If there is no ntpdate command, you can use yum install to install it.
    hwclock # Check whether the time on the three nodes is consistent

Install hadoop

  1. Unzip
    Use the root user on the master node to decompress hadoop
    tar -zxvf hadoop-2.7.7.tar.gz -C /usr/local # Decompress
    cd /usr/local
    mv hadoop-2.7.7 hadoop # Rename
  2. Modify permissions
    Provide the permissions of the hadoop directory to the ordinary user chenxl (if you use the root user to build hadoop, you do not need to perform this step)
    cd /usr/local
    sudo chown -R chenxl:chenxl ./hadoop

  3. Modify environment variables
    Use the ordinary user chenxl in the master node to add the environment variables needed by hadoop

    Check if hadoop is installed successfully

    hadoop version # Check the hadoop version

  4. Modify child node environment variables
    Modify environment variables in the ordinary user chenxl in slave1 and slave2
    vim ~/.bashrc
    source ~/.bashrc # Refresh environment variables
    
    # The specific content that needs to be added is shown in the figure below

  5. Modify hadoop configuration file
    There are mainly five files that need to be modified in fully distributed. These files are in the ./etc/hadoop directory under the hadoop installation directory. The files that need to be modified are core-site.xml, hdfs-site.xml, and mapred-site. xml, yarn-site.xml and slaves files. The mapred-site.xml file may not exist. You can directly rename the mapred-site.xml.template file to mapred-site.xml. The specific configuration of these files can be referred to the following (use the ordinary user chenxl of the master node to operate) :
    core-site.xml file:

    hdfs-site.xml file:

    mapred-site.xml file:

    yarn-site.xml file:

    slaves file:

  6. will copy hadoop folder
    Copy all the contents of the hadoop folder in the master node to the two child nodes slave1 and slave2
    scp -r /usr/local/hadoop root@slave1:/usr/local
    scp -r /usr/local/hadoop root@slave2:/usr/local

    Use the root user in slave1 and slave2 to transfer the permissions of the /usr/local/hadoop folder to the ordinary user chenxl

    sudo chown -R chenxl:chenxl /usr/local/hadoop

  7. Format namenode

    Use the ordinary user chenxl on the master node to perform operations:

    hdfs namenode -format # Format namenode

  8. Start the cluster
    start-all.sh # Start the cluster

  9. Verify
    jps

    View the process on three nodes, the results should be as follows

    There should be ResourceManager, NmaeNode and SecondaryNameNode on the master node

    There should be DataNode and NodeManager on the slave1 and slave2 nodes.

    Then access the master node IP + 50070 port on the web page, such as 192.168.88.144:50070

    You can see that the two datanodes in the hadoop cluster have started normally.
    Then access the master node IP + 8088 port, such as 192.168.88.144:8088

    If the web page can be accessed normally, it means that the yarn is also normal.

Usage Example

  1. Try to create/delete a folder on hdfs to see if it can be used normally
    hdfs dfs -mkdir /input # Create a folder

    hdfs dfs -rm -r /input # Delete the folder /input

    Upload local files

  2. Count word occurrences
    hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /input/test.txt /output # Count previously uploaded test.txt The number of occurrences of each word in the file
    hdfs dfs -cat /output/part-r-00000 # Query results

    stop-all.sh # Stop the cluster

At this point, complete distribution is complete. Just use start-all.sh to start it.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Cloud native entry-level skills treeHomepageOverview 16993 people are learning the system