Build environment: windows10, VMware16.2.3, centOS7.9, jdk-8u162-linux-x64.tar.gz, hadoop3.1.0
Links to Hadoop for various versions of Hadoop
jdk8 download link jdk
1. Basic configuration preparations
1.1 Static IP address and other configurations
Note: If you are prompted for insufficient permissions, you need to use root permissions.
1.1.1 Non-clone machine configuration static IP address
####1.1.2 Configure a static IP address for the clone machine
Mac modification method: Network Adapter–>Advanced Settings–>Generate
UUID can be generated using the following command
uuidgen
Execute the command (requires root permission) vim /etc/sysconfig/network-scripts/ifcfg-ens32
Modify the generated Mac and UUID, and modify the static IP address. Do not duplicate the IP addresses of other virtual machines, as follows :
Note: All static address configuration parameters are as shown in the figure. If they are different, the corresponding variables need to be modified (this virtual machine is cloned, and there are variables configured by the clone machine). For non-cloned virtual machines, you can find static IP configuration articles on the Internet for operation. .
1.1.3 Host name modification
How to modify the host name: vim /etc/hostname
(effective after reboot)
In centOS7, you can use the following command to directly change the host name
hostnamectl set-hostname hadoop
#The host name here is changed to hadoop
1) Modify the mapping relationship between IP address and host name
vim /etc/hosts
2) Add a new line of mapping, then save and exit
192.168.138.135 hadoop
#Configured static IP address Modified host name
3) Verification
ping hadoop
If there is no packet loss, the mapping is successful.
1.1.4 Turn off the firewall
Check the running status of the firewall
firewall-cmd --state
Turn off firewall command
systemctl stop firewalld #Close the firewall service network systemctl disable firewalld #Set the firewall service not to start at boot
1.2 Create hadoop user
Create hadoop user
su #Log in as root user useradd -m hadoop -s /bin/bash #Create a new user hadoop and use /bin/bash as the shell
Set password for hadoop user
passwd hadoop
Add administrator rights to the hadoop user to facilitate deployment and execute visudo
Find root ALL=(ALL) ALL
and add a line after it hadoop ALL=(ALL) ALL
Note: Although the hadoop user is given administrator rights, you still need to apply for root rights or directly switch to the root user when modifying core files.
2. Install ssh and configure ssh passwordless login
Note: Configure ssh under hadoop user
centOS7 has ssh installed by default. Run the following command to verify:
rpm -qa | grep ssh
If it is not installed, use yum to install it:
sudo yum install openssh-clients sudo yum install openssh-server
Execute login command test
ssh localhost or ssh hadoop
1) Test login
ssh localhost
or ssh hadoop
2) exit
Log out
3) Configure ssh passwordless login
exit # Exit the ssh localhost just now cd ~/.ssh/ # If there is no such directory, please execute ssh localhost first. ssh-keygen -t rsa # There will be a prompt, just press Enter. cat id_rsa.pub >> authorized_keys # Add authorization chmod 600 ./authorized_keys # Modify file permissions
4) Test
At this time, use the ssh localhost
or ssh hadoop
command to log in directly without entering a password.
3. Install java
Hadoop 3.x requires that the JDK version must be java 8 or above. It is recommended to use jdk8
Use xtfp to upload files to user hadoop
3.1 Uninstall the jdk that comes with the system
1) Check whether the system comes with jdk (usually it does)
rpm -qa |grep java rpm -qa |grep jdk rpm -qa |grep gcj
2) If there is output information, batch uninstall the system’s own
rpm -qa | grep java | xargs rpm -e --nodeps
3.2 Unzip the jdk installation package
Note: Before installing jdk8, you can also uninstall the jdk that comes with centOS. It is recommended to make your own decision after reading this section.
1) Unzip the installation package in the specified directory
tar -zxvf jdk-8u162-linux-x64.tar.gz
2) Move the decompressed folder jdk-8u_162
to /usr/local/java
(generally installed software will be placed in the /usr/local/ directory)
mv jdk1.8.0_162/ /usr/local/java
Move the file to the /usr/local/ directory and rename the folder to java
Note: Do not create folders in advance.
3.3 Modify environment variables
vim /etc/profile
Note: Configuring system environment variables requires root permissions. Even if hadoop permissions were previously given, they cannot be modified.
You can use
sudo vim /etc/profile
to temporarily obtain root privileges
1) Add JAVA_HOME at the end of the file and add the bin directory under JAVA_HOME to PATH, save and exit after modification.
export JAVA_HOME=/usr/local/java export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar export PATH=$PATH:$JAVA_HOME/bin:$PATH
As shown below:
2) Then reload the environment variables and execute the following command
source /etc/profile
3) Test whether the configuration is successful
java -version
javac -version
Note: I don’t know what the consequences will be if the version prompts are different. After configuring hadoop, there is no abnormality when running hdfs and yarn.
If the Java versions prompted are different, if you want to change them to the same version, please refer to the following operations:
Note: If you are prompted for permission issues, you need to switch to the root user to operate.
4. Install Hadoop
Upload hadoop-3.1.0.tar.gz to the hadoop user directory folder
4.1 Unzip the Hadoop installation package
Now create a /bigdata
directory in the root directory to save big data related programs to be installed later, and then extract the hadoop installation package to this folder
mkdir /bigdata tar -zxvf hadoop-3.1.0.tar.gz -C /bigdata/
4.2 Modify Hadoop configuration file
Go to the hadoop installation directory and view the contents of the directory
cd /bigdata/hadoop-3.1.0/etc/hadoop/
ll
Note: The files marked in the red box are the files that need to be modified in sequence.
4.2.1 Modify hadoop-env.sh
vim hadoop-env.sh
Add color font content
export JAVA_HOME=/usr/local/java
4.2.2 Modify core-site.xml
core-site.xml is the core configuration file of Hadoop, which can configure the address and data storage directory of the HDFS NameNode.
vim core-site.xml
The modifications are as follows:
Note: hadoop in the red box needs to be replaced with the previously set host name
<configuration> <!-- Specify the nameservice address of hdfs as node-1.51doit.com and the port as 9000 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop:9000</value> </property> <!-- Specify the directory where hadoop stores data --> <property> <name>hadoop.tmp.dir</name> <value>/bigdata/hadoop-3.1.0/data</value> </property> </configuration>
4.2.3 Modify hdfs-site.xml
hdfs-site.xml is the configuration file of HDFS. Since Hadoop pseudo-distribution is currently configured on one machine, only the number of copies saved by HDFS is set to 1, that is, only one copy of data is saved.
vim hdfs-site.xml
The modifications are as follows:
<configuration> <!-- The copy of HDFS is 1, that is, only one copy of the data is saved --> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
4.2.4 Modify mapred-site.xml
In mapred-site.xml, you can configure the MapReduce framework to run on the YARN resource scheduling system.
vim mapred-site.xml
The modifications are as follows:
Note: An error is reported when running MapReduce wordcount during subsequent use. The solution is as follows:
Enter the command: hadoop classpath
Add the output content to the configuration file, as shown below:
<property> <name>yarn.application.classpath</name> <value>/bigdata/hadoop-3.1.0/etc/hadoop:/bigdata/hadoop-3.1.0/share/hadoop/common/lib/*:/bigdata/hadoop-3.1.0/share/hadoop/common/ *:/bigdata/hadoop-3.1.0/share/hadoop/hdfs:/bigdata/hadoop-3.1.0/share/hadoop/hdfs/lib/*:/bigdata/hadoop-3.1.0/share/hadoop/hdfs /*:/bigdata/hadoop-3.1.0/share/hadoop/mapreduce/*:/bigdata/hadoop-3.1.0/share/hadoop/yarn:/bigdata/hadoop-3.1.0/share/hadoop/yarn/ lib/*:/bigdata/hadoop-3.1.0/share/hadoop/yarn/* </value> <property>
<configuration> <!-- Specify MapReduce to run on YARN --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.application.classpath</name> <value>/bigdata/hadoop-3.1.0/etc/hadoop:/bigdata/hadoop-3.1.0/share/hadoop/common/lib/*:/bigdata/hadoop-3.1.0/share/hadoop/common/ *:/bigdata/hadoop-3.1.0/share/hadoop/hdfs:/bigdata/hadoop-3.1.0/share/hadoop/hdfs/lib/*:/bigdata/hadoop-3.1.0/share/hadoop/hdfs /*:/bigdata/hadoop-3.1.0/share/hadoop/mapreduce/*:/bigdata/hadoop-3.1.0/share/hadoop/yarn:/bigdata/hadoop-3.1.0/share/hadoop/yarn/ lib/*:/bigdata/hadoop-3.1.0/share/hadoop/yarn/* </value> </property> </configuration>
4.2.5 Modify yarn-site.xml
yarn-site.xml is the relevant file for configuring YARN
vim yarn-site.xml
The modifications are as follows:
Note: The red box is also the host name
<configuration> <!-- Site specific YARN configuration properties --> <!-- Specify the address of ResourceManager respectively --> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop</value> </property> <!-- Specify the MapReduce method separately --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
4.2.6 Add Hadoop commands to system environment variables
The purpose of adding Hadoop commands to the system environment variables is to enable Hadoop commands to be executed in any directory.
vim /etc/profile
The modifications are as follows:
Reload environment variables
source /etc/profile
# Add java system environment variables export JAVA_HOME=/usr/local/java export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar export PATH=$PATH:$JAVA_HOME/bin #Add Hadoop environment variables export HADOOP_HOME=/bigdata/hadoop-3.1.0 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
4.3 Initializing the Hadoop HDFS file system
hdfs namenode -format
When you see the following information, the initialization is successful.
4.4 Starting Hadoop
4.4.1 Start HDFS file system
If you execute the startup command in the root user: start-dfs.sh
The following error will occur:
This is because Hadoop3.x is not recommended to be started by the root user. Using root to start HDFS will cause a series of hidden problems. If you want to start it as the root user, you need to add the following to the front of the start-dfs.sh command in the Hadoop installation directory sbin. Configuration:
HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
In the actual production environment, the root user will not be used to start HDFS and YARN, so you need to switch to the previously created hadoop user to start.
Because you need to switch to the hadoop user, you need to modify the user and group to which the Hadoop installation file belongs. The command is as follows:
chown -R hadoop:hadoop /bigdata/
Switch to hadoop user:
su hadoop
Note:
su root
requires the root user password to switch to the root user, but does not require a password to switch from the root user to other users.
Start HDFS using hadoop user
start-dfs.sh
Under normal circumstances, as shown in the figure:
If the following prompt appears:
Starting namenodes on [hadoop] hadoop: Warning: Permanently added 'hadoop,192.168.138.135' (ECDSA) to the list of known hosts. hadoop: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). Starting datanodes localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts. localhost: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). Starting secondary namenodes [hadoop] hadoop: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
If the above prompt appears, the ssh passwordless login is not configured. You need to configure the ssh passwordless login.
4.4.2 Start YARN
The command to start YARN is as follows:
start-yarn.sh
Normally it’s as follows:
4.4.3 Verification
Enter the following command on the command line:
jps
Will appear:
Note: If initialization is performed multiple times, there will be no DataNode process. You need to delete data in /bigdata/hadoop-3.1.0/, and then re-execute the initialization command
hdfs namenode -format
.Note: If any process is missing, you need to go to the corresponding configuration file to check whether the configuration content is correct.
Attachment: Correspondingly, the stop commands are
stop-dfs.sh
andstop-yarn.sh
5. Access Hadoop through the web interface
Enter 192.168.138.135:9870 on the browser page (both Linux and Windows are acceptable)
Note: 192.168.138.135 is the IP address of the host.
? Start the hadoop service (hdfs and yarn) first when accessing
? 1) hadoop port number 9870 2) yarn port number 8088