【Big Data Training】-Hadoop Development Environment Construction (1)
First level, task description
The task of this level: configure JavaJDK.
Related knowledge
Configuring the development environment is the first step for us to learn an IT technology. Hadoop is developed based on Java, so we need to configure the Java development environment in the Linux system before learning Hadoop.
Download the JDK
Go to Oracle’s official website to download JDK: Click me to download JDK from Oracle’s official website
We can download it locally first, and then transfer the file to the virtual machine from Windows.
You can also copy the link address and download it in the Linux system, but the copied link address cannot be downloaded directly, because Oracle has made restrictions, and the address suffix needs to add the random code it randomly generates to download the resource.
So we can click to download, then pause, and finally copy the link address in the download management to download it in the Linux system.
Because the compressed package of JDK is about 200M, we have already downloaded JDK for you on the platform. You don’t need to go to Oracle’s official website to download it. If you want to install it in your own Linux system, you still need to download it.
We have placed the JDK compressed package in the /opt directory of the system, just switch to this directory on the command line.
Unzip
First create an /app folder on the right command line, and all our subsequent software will be installed in this directory.
Command: mkdir /app
Then, switch to the /opt directory to view the provided compressed package.
You can see that we have downloaded the installation files of JDK and Hadoop for you.
Now we unzip the JDK and move it to the /app directory.
tar -zxvf jdk-8u171-linux-x64.tar.gz mv jdk1.8.0_171/ /app
You can switch to the /app directory to view the decompressed folder.
Configure environment variables
After decompressing the JDK, you need to configure the JDK in the environment variable before it can be used. Next, configure the JDK.
Enter the command: vim /etc/profile to edit the configuration file;
Enter the following code at the end of the file (no spaces are allowed).
Then, save the modified configuration file.
Save method: In edit mode, press ESC first, then press shift + :
, and finally enter wq
, press Enter to save the modified configuration file.
Finally: source /etc/profile
makes the configuration just take effect.
Test
Finally, we can test whether the environment variable is configured successfully.
Input: java -version
If the following interface appears, the configuration is successful.
programming requirements
Follow the above steps to complete the configuration of the Java development environment.
Note: Because the environment will be reset when the training is started next time, the best way is to pass all levels at once.
Start configuring the JDK, go on.
Simple and crude method: Friends who don’t want to code can copy the following code directly to the command line, and the configuration is complete! ! !
mkdir /app cd /opt ll tar -zxvf jdk-8u171-linux-x64.tar.gz >>/dev/null 2> &1 mv jdk1.8.0_171/ /app echo "JAVA_HOME=/app/jdk1.8.0_171" >> /etc/profile echo "CLASSPATH=.:$JAVA_HOME/lib/tools.jar" >> /etc/profile echo "PATH=$JAVA_HOME/bin:$PATH" >> /etc/profile source /etc/profile java-version
The second pass
Level 2: Configuring the development environment – Hadoop installation and pseudo-distributed cluster construction
mission details
The task of this level: install and configure the Hadoop development environment.
related information
Download Hadoop
Let’s go to the official website to download: http://hadoop.apache.org/
It has already been downloaded for you on the platform (under the /opt directory), and here is just to show the download steps.
Enter wget to download Hadoop;
wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
If it is a production environment, you need to verify the integrity of the file, so I am lazy here.
Since the decompression package is about 300M, we have downloaded it for you in advance, and you can see it by switching to the /opt directory.
Next, unzip the compressed package of Hadoop, and then move the unzipped files to the /app directory.
Let’s switch to the app directory and modify the name of the hadoop folder.
Tips: If there is a file decompression size limit, you can use the ulimit -f 1000000 command to lift the limit.
Configure the Hadoop environment
Set up SSH password-free login
When operating the cluster in the future, we need to log in to the host and slave frequently, so it is necessary to set up SSH password-free login.
Enter the following code:
ssh-keygen -t rsa -P ''
Generate a password-less key pair, ask for the save path and directly enter the press Enter, and generate a key pair: id_rsa and id_rsa.pub, which are stored in the ~/.ssh directory by default.
Next: Append id_rsa.pub to the authorized key.
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Then modify the permissions: chmod 600 ~/.ssh/authorized_keys
# The java implementation to use. #export JAVA_HOME=${JAVA_HOME} export JAVA_HOME=/app/jdk1.8.0_171
export JAVA_HOME=/app/jdk1.8.0_171
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <description>HDFS URI, file system://namenode identifier: port number</description> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/hadoop/tmp</value> <description>Local hadoop temporary folder on namenode</description> </property> </configuration>
<configuration> <property> <name>dfs.name.dir</name> <value>/usr/hadoop/hdfs/name</value> <description>Store hdfs namespace metadata on namenode</description> </property> <property> <name>dfs.data.dir</name> <value>/usr/hadoop/hdfs/data</value> <description>The physical storage location of the data block on the datanode</description> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
mapred-site.xml file configuration
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
yarn-site.xml configuration
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>192.168.2.10:8099</value> <description>This address is the mr management interface</description> </property> </configuration>
Create Folder
We have configured some folder paths in the configuration file, now let’s create them, use hadoop user operation in the /usr/hadoop/ directory, create tmp, hdfs/name, hdfs/data directories, and execute the following commands:
mkdir -p /usr/hadoop/tmp mkdir /usr/hadoop/hdfs mkdir /usr/hadoop/hdfs/data mkdir /usr/hadoop/hdfs/name
Add Hadoop to environment variables
vim /etc/profile
#!/usr/bin/env bash HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
Also, start-yarn.sh, stop-yarn.sh top also need to add the following:
#!/usr/bin/env bash YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
Start start-dfs.sh again, and finally enter the command jps to verify that the following interface appears, indicating that the startup is successful:
Afterwards, if your local virtual machine has a graphical interface, you can open the Firefox browser in the graphical interface of your virtual machine and enter: http://localhost:9870/
or enter on your local windows machine http://virtual machine ip address:9870/
You can also access the management page of hadoop.
Well, at this point, the Hadoop installation is complete.
Simple and crude method: Friends who don’t want to code can copy the following code directly to the command line, and the configuration is complete! ! !
mkdir /app cd /opt ll tar -zxvf jdk-8u171-linux-x64.tar.gz >>/dev/null 2> &1 mv jdk1.8.0_171/ /app echo "JAVA_HOME=/app/jdk1.8.0_171" >> /etc/profile echo "CLASSPATH=.:$JAVA_HOME/lib/tools.jar" >> /etc/profile echo "PATH=$JAVA_HOME/bin:$PATH" >> /etc/profile source /etc/profile java-version tar -zxvf /opt/hadoop-3.1.0.tar.gz -C /app >>/dev/null 2> &1 mv /app/hadoop-3.1.0 /app/hadoop3.1 2>/dev/null ssh-keygen -t rsa -P '' <<< $'\ ' cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 600 ~/.ssh/authorized_keys echo "AuthorizedKeysFile %h/.ssh/authorized_keys" >> /etc/ssh/sshd_config echo "export JAVA_HOME=/app/jdk1.8.0_171" >> /app/hadoop3.1/etc/hadoop/hadoop-env.sh echo "export JAVA_HOME=/app/jdk1.8.0_171" >> /app/hadoop3.1/etc/hadoop/yarn-env.sh sed -i 's|||g' /app/hadoop3.1/etc/hadoop/core-site.xml sed -i 's///g' /app/hadoop3.1/etc/hadoop/core-site.xml echo " ||g' /app/hadoop3.1/etc/hadoop/hdfs-site.xml sed -i 's/" >> /app/hadoop3.1/etc/hadoop/core-site.xml echo " " >> /app/hadoop3.1/etc/hadoop/core-site.xml sed -i 's|" >> /app/hadoop3.1/etc/hadoop/core-site.xml echo " " >> /app/hadoop3.1/etc/hadoop/core-site.xml echo "fs.default.name " >> /app/hadoop3.1/etc/hadoop/core-site.xml echo "hdfs://localhost:9000 " >> /app/hadoop3.1/etc/hadoop/core-site.xml echo "HDFSURI://namenode " >> /app/hadoop3.1/etc/hadoop/core-site.xml echo "" >> /app/hadoop3.1/etc/hadoop/core-site.xml echo " " >> /app/hadoop3.1/etc/hadoop/core-site.xml echo "hadoop.tmp.dir " >> /app/hadoop3.1/etc/hadoop/core-site.xml echo "/usr/hadoop/tmp " >> /app/hadoop3.1/etc/hadoop/core-site.xml echo "namenode " >> /app/hadoop3.1/etc/hadoop/core-site.xml echo "//g' /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo " ||g' /app/hadoop3.1/etc/hadoop/mapred-site.xml sed -i 's/" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo " " >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml sed -i 's|" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo " " >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo "dfs.name.dir " >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo "/usr/hadoop/hdfs/name " >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo "namenode " >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo "" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo " " >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo "" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo "dfs.data.dir " >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo "/usr/hadoop/hdfs/data " >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo "datanode " >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo "" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo " " >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo "dfs.replication " >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo "1 " >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml echo "//g' /app/hadoop3.1/etc/hadoop/mapred-site.xml echo " ||g' /app/hadoop3.1/etc/hadoop/yarn-site.xml sed -i 's/" >> /app/hadoop3.1/etc/hadoop/mapred-site.xml echo " " >> /app/hadoop3.1/etc/hadoop/mapred-site.xml sed -i 's|" >> /app/hadoop3.1/etc/hadoop/mapred-site.xml echo " " >> /app/hadoop3.1/etc/hadoop/mapred-site.xml echo "mapreduce.framework.name " >> /app/hadoop3.1/etc/hadoop/mapred-site.xml echo "yarn " >> /app/hadoop3.1/etc/hadoop/mapred-site.xml echo "//g' /app/hadoop3.1/etc/hadoop/yarn-site.xml echo " " >> /app/hadoop3.1/etc/hadoop/yarn-site.xml echo " " >> /app/hadoop3.1/etc/hadoop/yarn-site.xml mkdir -p /usr/hadoop/tmp mkdir /usr/hadoop/hdfs mkdir /usr/hadoop/hdfs/data mkdir /usr/hadoop/hdfs/name echo "export HADOOP_HOME=/app/hadoop3.1" >> /etc/profile echo "export PATH=\$PATH:\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin" >> /etc/profile source /etc/profile hadoop namenode -format start-yarn.sh sed -i "2a\HDFS_DATANODE_USER=root" /app/hadoop3.1/sbin/start-dfs.sh sed -i "2a\HADOOP_SECURE_DN_USER=hdfs" /app/hadoop3.1/sbin/start-dfs.sh sed -i "2a\HDFS_NAMENODE_USER=root" /app/hadoop3.1/sbin/start-dfs.sh sed -i "2a\HDFS_SECONDARYNAMENODE_USER=root" /app/hadoop3.1/sbin/start-dfs.sh sed -i "2a\HDFS_DATANODE_USER=root" /app/hadoop3.1/sbin/stop-dfs.sh sed -i "2a\HADOOP_SECURE_DN_USER=hdfs" /app/hadoop3.1/sbin/stop-dfs.sh sed -i "2a\HDFS_NAMENODE_USER=root" /app/hadoop3.1/sbin/stop-dfs.sh sed -i "2a\HDFS_SECONDARYNAMENODE_USER=root" /app/hadoop3.1/sbin/stop-dfs.sh sed -i "2a\YARN_RESOURCEMANAGER_USER=root" /app/hadoop3.1/sbin/stop-yarn.sh sed -i "2a\HADOOP_SECURE_DN_USER=yarn" /app/hadoop3.1/sbin/stop-yarn.sh sed -i "2a\YARN_NODEMANAGER_USER=root" /app/hadoop3.1/sbin/stop-yarn.sh sed -i "2a\YARN_RESOURCEMANAGER_USER=root" /app/hadoop3.1/sbin/start-yarn.sh sed -i "2a\HADOOP_SECURE_DN_USER=yarn" /app/hadoop3.1/sbin/start-yarn.sh sed -i "2a\YARN_NODEMANAGER_USER=root" /app/hadoop3.1/sbin/start-yarn.sh start-dfs.sh jps" >> /app/hadoop3.1/etc/hadoop/yarn-site.xml echo " " >> /app/hadoop3.1/etc/hadoop/yarn-site.xml echo "yarn.nodemanager.aux-services " >> /app/hadoop3.1/etc/hadoop/yarn-site.xml echo "mapreduce_shuffle " >> /app/hadoop3.1/etc/hadoop/yarn-site.xml echo "" >> /app/hadoop3.1/etc/hadoop/yarn-site.xml echo " " >> /app/hadoop3.1/etc/hadoop/yarn-site.xml echo "yarn.resourcemanager.webapp.address " >> /app/hadoop3.1/etc/hadoop/yarn-site.xml echo "192.168.2.10:8099 ">> /app/hadoop3.1/etc/hadoop/yarn-site.xml echo "" >> /app/hadoop3.1/etc/hadoop/yarn-site.xml echo "