CentOS7 Hadoop3.X pseudo-distributed environment construction

Build environment: windows10, VMware16.2.3, centOS7.9, jdk-8u162-linux-x64.tar.gz, hadoop3.1.0

Links to Hadoop for various versions of Hadoop

jdk8 download link jdk

1. Basic configuration preparations

1.1 Static IP address and other configurations

Note: If you are prompted for insufficient permissions, you need to use root permissions.

1.1.1 Non-clone machine configuration static IP address

####1.1.2 Configure a static IP address for the clone machine

Mac modification method: Network Adapter–>Advanced Settings–>Generate

UUID can be generated using the following command

uuidgen

Execute the command (requires root permission) vim /etc/sysconfig/network-scripts/ifcfg-ens32 Modify the generated Mac and UUID, and modify the static IP address. Do not duplicate the IP addresses of other virtual machines, as follows :

Note: All static address configuration parameters are as shown in the figure. If they are different, the corresponding variables need to be modified (this virtual machine is cloned, and there are variables configured by the clone machine). For non-cloned virtual machines, you can find static IP configuration articles on the Internet for operation. .

1.1.3 Host name modification

How to modify the host name: vim /etc/hostname (effective after reboot)

In centOS7, you can use the following command to directly change the host name

hostnamectl set-hostname hadoop #The host name here is changed to hadoop

1) Modify the mapping relationship between IP address and host name

vim /etc/hosts

2) Add a new line of mapping, then save and exit

192.168.138.135 hadoop #Configured static IP address Modified host name

3) Verification

ping hadoop

If there is no packet loss, the mapping is successful.

1.1.4 Turn off the firewall

Check the running status of the firewall

firewall-cmd --state

Turn off firewall command

systemctl stop firewalld #Close the firewall service network
systemctl disable firewalld #Set the firewall service not to start at boot

1.2 Create hadoop user

Create hadoop user

su #Log in as root user
useradd -m hadoop -s /bin/bash #Create a new user hadoop and use /bin/bash as the shell

Set password for hadoop user

passwd hadoop

Add administrator rights to the hadoop user to facilitate deployment and execute visudo

Find root ALL=(ALL) ALL and add a line after it hadoop ALL=(ALL) ALL

Note: Although the hadoop user is given administrator rights, you still need to apply for root rights or directly switch to the root user when modifying core files.

2. Install ssh and configure ssh passwordless login

Note: Configure ssh under hadoop user

centOS7 has ssh installed by default. Run the following command to verify:

rpm -qa | grep ssh

If it is not installed, use yum to install it:

sudo yum install openssh-clients
sudo yum install openssh-server

Execute login command test

ssh localhost or ssh hadoop

1) Test login

ssh localhost or ssh hadoop


2) exit Log out

3) Configure ssh passwordless login

exit # Exit the ssh localhost just now
cd ~/.ssh/ # If there is no such directory, please execute ssh localhost first.
ssh-keygen -t rsa # There will be a prompt, just press Enter.
cat id_rsa.pub >> authorized_keys # Add authorization
chmod 600 ./authorized_keys # Modify file permissions

4) Test

At this time, use the ssh localhost or ssh hadoop command to log in directly without entering a password.

3. Install java

Hadoop 3.x requires that the JDK version must be java 8 or above. It is recommended to use jdk8

Use xtfp to upload files to user hadoop

3.1 Uninstall the jdk that comes with the system

1) Check whether the system comes with jdk (usually it does)

rpm -qa |grep java
rpm -qa |grep jdk
rpm -qa |grep gcj

2) If there is output information, batch uninstall the system’s own

rpm -qa | grep java | xargs rpm -e --nodeps

3.2 Unzip the jdk installation package

Note: Before installing jdk8, you can also uninstall the jdk that comes with centOS. It is recommended to make your own decision after reading this section.

1) Unzip the installation package in the specified directory

tar -zxvf jdk-8u162-linux-x64.tar.gz

2) Move the decompressed folder jdk-8u_162 to /usr/local/java (generally installed software will be placed in the /usr/local/ directory)

mv jdk1.8.0_162/ /usr/local/java Move the file to the /usr/local/ directory and rename the folder to java

Note: Do not create folders in advance.

3.3 Modify environment variables

vim /etc/profile

Note: Configuring system environment variables requires root permissions. Even if hadoop permissions were previously given, they cannot be modified.

You can use sudo vim /etc/profile to temporarily obtain root privileges

1) Add JAVA_HOME at the end of the file and add the bin directory under JAVA_HOME to PATH, save and exit after modification.

export JAVA_HOME=/usr/local/java
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar
export PATH=$PATH:$JAVA_HOME/bin:$PATH

As shown below:


2) Then reload the environment variables and execute the following command

source /etc/profile

3) Test whether the configuration is successful

java -version

javac -version

Note: I don’t know what the consequences will be if the version prompts are different. After configuring hadoop, there is no abnormality when running hdfs and yarn.

If the Java versions prompted are different, if you want to change them to the same version, please refer to the following operations:

Note: If you are prompted for permission issues, you need to switch to the root user to operate.

4. Install Hadoop

Upload hadoop-3.1.0.tar.gz to the hadoop user directory folder

4.1 Unzip the Hadoop installation package

Now create a /bigdata directory in the root directory to save big data related programs to be installed later, and then extract the hadoop installation package to this folder

mkdir /bigdata
tar -zxvf hadoop-3.1.0.tar.gz -C /bigdata/

4.2 Modify Hadoop configuration file

Go to the hadoop installation directory and view the contents of the directory

cd /bigdata/hadoop-3.1.0/etc/hadoop/

ll

Note: The files marked in the red box are the files that need to be modified in sequence.

4.2.1 Modify hadoop-env.sh

vim hadoop-env.sh

Add color font content

 export JAVA_HOME=/usr/local/java
4.2.2 Modify core-site.xml

core-site.xml is the core configuration file of Hadoop, which can configure the address and data storage directory of the HDFS NameNode.

vim core-site.xml

The modifications are as follows:

Note: hadoop in the red box needs to be replaced with the previously set host name

<configuration>
    <!-- Specify the nameservice address of hdfs as node-1.51doit.com and the port as 9000 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop:9000</value>
    </property>
    <!-- Specify the directory where hadoop stores data -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/bigdata/hadoop-3.1.0/data</value>
    </property>
</configuration>

4.2.3 Modify hdfs-site.xml

hdfs-site.xml is the configuration file of HDFS. Since Hadoop pseudo-distribution is currently configured on one machine, only the number of copies saved by HDFS is set to 1, that is, only one copy of data is saved.

vim hdfs-site.xml

The modifications are as follows:

<configuration>
    <!-- The copy of HDFS is 1, that is, only one copy of the data is saved -->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
4.2.4 Modify mapred-site.xml

In mapred-site.xml, you can configure the MapReduce framework to run on the YARN resource scheduling system.

vim mapred-site.xml

The modifications are as follows:

Note: An error is reported when running MapReduce wordcount during subsequent use. The solution is as follows:

Enter the command: hadoop classpath

Add the output content to the configuration file, as shown below:

<property>
    <name>yarn.application.classpath</name>
    <value>/bigdata/hadoop-3.1.0/etc/hadoop:/bigdata/hadoop-3.1.0/share/hadoop/common/lib/*:/bigdata/hadoop-3.1.0/share/hadoop/common/ *:/bigdata/hadoop-3.1.0/share/hadoop/hdfs:/bigdata/hadoop-3.1.0/share/hadoop/hdfs/lib/*:/bigdata/hadoop-3.1.0/share/hadoop/hdfs /*:/bigdata/hadoop-3.1.0/share/hadoop/mapreduce/*:/bigdata/hadoop-3.1.0/share/hadoop/yarn:/bigdata/hadoop-3.1.0/share/hadoop/yarn/ lib/*:/bigdata/hadoop-3.1.0/share/hadoop/yarn/*
    </value>
<property>
<configuration>
    <!-- Specify MapReduce to run on YARN -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <property>
        <name>yarn.application.classpath</name>
        <value>/bigdata/hadoop-3.1.0/etc/hadoop:/bigdata/hadoop-3.1.0/share/hadoop/common/lib/*:/bigdata/hadoop-3.1.0/share/hadoop/common/ *:/bigdata/hadoop-3.1.0/share/hadoop/hdfs:/bigdata/hadoop-3.1.0/share/hadoop/hdfs/lib/*:/bigdata/hadoop-3.1.0/share/hadoop/hdfs /*:/bigdata/hadoop-3.1.0/share/hadoop/mapreduce/*:/bigdata/hadoop-3.1.0/share/hadoop/yarn:/bigdata/hadoop-3.1.0/share/hadoop/yarn/ lib/*:/bigdata/hadoop-3.1.0/share/hadoop/yarn/*
        </value>
    </property>
</configuration>

4.2.5 Modify yarn-site.xml

yarn-site.xml is the relevant file for configuring YARN

vim yarn-site.xml

The modifications are as follows:

Note: The red box is also the host name

<configuration>

<!-- Site specific YARN configuration properties -->
    <!-- Specify the address of ResourceManager respectively -->
    <property>
       <name>yarn.resourcemanager.hostname</name>
       <value>hadoop</value>
    </property>
    <!-- Specify the MapReduce method separately -->
    <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
    </property>

</configuration>

4.2.6 Add Hadoop commands to system environment variables

The purpose of adding Hadoop commands to the system environment variables is to enable Hadoop commands to be executed in any directory.

vim /etc/profile

The modifications are as follows:

Reload environment variables

source /etc/profile

# Add java system environment variables
export JAVA_HOME=/usr/local/java
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar
export PATH=$PATH:$JAVA_HOME/bin

#Add Hadoop environment variables
export HADOOP_HOME=/bigdata/hadoop-3.1.0
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

4.3 Initializing the Hadoop HDFS file system

hdfs namenode -format

When you see the following information, the initialization is successful.

4.4 Starting Hadoop

4.4.1 Start HDFS file system

If you execute the startup command in the root user: start-dfs.sh

The following error will occur:

This is because Hadoop3.x is not recommended to be started by the root user. Using root to start HDFS will cause a series of hidden problems. If you want to start it as the root user, you need to add the following to the front of the start-dfs.sh command in the Hadoop installation directory sbin. Configuration:

HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

In the actual production environment, the root user will not be used to start HDFS and YARN, so you need to switch to the previously created hadoop user to start.

Because you need to switch to the hadoop user, you need to modify the user and group to which the Hadoop installation file belongs. The command is as follows:

chown -R hadoop:hadoop /bigdata/

Switch to hadoop user:

su hadoop

Note: su root requires the root user password to switch to the root user, but does not require a password to switch from the root user to other users.

Start HDFS using hadoop user

start-dfs.sh

Under normal circumstances, as shown in the figure:

If the following prompt appears:

Starting namenodes on [hadoop]
hadoop: Warning: Permanently added 'hadoop,192.168.138.135' (ECDSA) to the list of known hosts.
hadoop: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Starting datanodes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Starting secondary namenodes [hadoop]
hadoop: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

If the above prompt appears, the ssh passwordless login is not configured. You need to configure the ssh passwordless login.

4.4.2 Start YARN

The command to start YARN is as follows:

start-yarn.sh

Normally it’s as follows:

4.4.3 Verification

Enter the following command on the command line:

jps

Will appear:

Note: If initialization is performed multiple times, there will be no DataNode process. You need to delete data in /bigdata/hadoop-3.1.0/, and then re-execute the initialization command hdfs namenode -format.

Note: If any process is missing, you need to go to the corresponding configuration file to check whether the configuration content is correct.

Attachment: Correspondingly, the stop commands are stop-dfs.sh and stop-yarn.sh

5. Access Hadoop through the web interface

Enter 192.168.138.135:9870 on the browser page (both Linux and Windows are acceptable)

Note: 192.168.138.135 is the IP address of the host.

? Start the hadoop service (hdfs and yarn) first when accessing

? 1) hadoop port number 9870 2) yarn port number 8088