[Big data training] – Hadoop development environment construction (1)

【Big Data Training】-Hadoop Development Environment Construction (1)

First level, task description

The task of this level: configure JavaJDK.

Related knowledge
Configuring the development environment is the first step for us to learn an IT technology. Hadoop is developed based on Java, so we need to configure the Java development environment in the Linux system before learning Hadoop.

Download the JDK
Go to Oracle’s official website to download JDK: Click me to download JDK from Oracle’s official website

We can download it locally first, and then transfer the file to the virtual machine from Windows.

You can also copy the link address and download it in the Linux system, but the copied link address cannot be downloaded directly, because Oracle has made restrictions, and the address suffix needs to add the random code it randomly generates to download the resource.


So we can click to download, then pause, and finally copy the link address in the download management to download it in the Linux system.

Because the compressed package of JDK is about 200M, we have already downloaded JDK for you on the platform. You don’t need to go to Oracle’s official website to download it. If you want to install it in your own Linux system, you still need to download it.

We have placed the JDK compressed package in the /opt directory of the system, just switch to this directory on the command line.

Unzip

First create an /app folder on the right command line, and all our subsequent software will be installed in this directory.
Command: mkdir /app

Then, switch to the /opt directory to view the provided compressed package.

You can see that we have downloaded the installation files of JDK and Hadoop for you.

Now we unzip the JDK and move it to the /app directory.

tar -zxvf jdk-8u171-linux-x64.tar.gz
mv jdk1.8.0_171/ /app

You can switch to the /app directory to view the decompressed folder.

Configure environment variables
After decompressing the JDK, you need to configure the JDK in the environment variable before it can be used. Next, configure the JDK.
Enter the command: vim /etc/profile to edit the configuration file;

Enter the following code at the end of the file (no spaces are allowed).

Then, save the modified configuration file.
Save method: In edit mode, press ESC first, then press shift + :, and finally enter wq, press Enter to save the modified configuration file.

Finally: source /etc/profile makes the configuration just take effect.
Test
Finally, we can test whether the environment variable is configured successfully.
Input: java -version If the following interface appears, the configuration is successful.
programming requirements
Follow the above steps to complete the configuration of the Java development environment.

Note: Because the environment will be reset when the training is started next time, the best way is to pass all levels at once.

Start configuring the JDK, go on.

Simple and crude method: Friends who don’t want to code can copy the following code directly to the command line, and the configuration is complete! ! !

mkdir /app
cd /opt
ll
tar -zxvf jdk-8u171-linux-x64.tar.gz >>/dev/null 2> &1
mv jdk1.8.0_171/ /app
echo "JAVA_HOME=/app/jdk1.8.0_171" >> /etc/profile
echo "CLASSPATH=.:$JAVA_HOME/lib/tools.jar" >> /etc/profile
echo "PATH=$JAVA_HOME/bin:$PATH" >> /etc/profile
source /etc/profile
java-version


The second pass

Level 2: Configuring the development environment – Hadoop installation and pseudo-distributed cluster construction

mission details
The task of this level: install and configure the Hadoop development environment.

related information
Download Hadoop
Let’s go to the official website to download: http://hadoop.apache.org/
It has already been downloaded for you on the platform (under the /opt directory), and here is just to show the download steps.

Enter wget to download Hadoop;

wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz

If it is a production environment, you need to verify the integrity of the file, so I am lazy here.

Since the decompression package is about 300M, we have downloaded it for you in advance, and you can see it by switching to the /opt directory.

Next, unzip the compressed package of Hadoop, and then move the unzipped files to the /app directory.
Let’s switch to the app directory and modify the name of the hadoop folder.

Tips: If there is a file decompression size limit, you can use the ulimit -f 1000000 command to lift the limit.

Configure the Hadoop environment

Set up SSH password-free login

When operating the cluster in the future, we need to log in to the host and slave frequently, so it is necessary to set up SSH password-free login.
Enter the following code:

ssh-keygen -t rsa -P ''

Generate a password-less key pair, ask for the save path and directly enter the press Enter, and generate a key pair: id_rsa and id_rsa.pub, which are stored in the ~/.ssh directory by default.

Next: Append id_rsa.pub to the authorized key.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Then modify the permissions: chmod 600 ~/.ssh/authorized_keys



# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/app/jdk1.8.0_171

export JAVA_HOME=/app/jdk1.8.0_171

<configuration>
 <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>HDFS URI, file system://namenode identifier: port number</description>
</property>
  
<property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/hadoop/tmp</value>
    <description>Local hadoop temporary folder on namenode</description>
</property>
</configuration>

<configuration>
<property>
    <name>dfs.name.dir</name>
    <value>/usr/hadoop/hdfs/name</value>
    <description>Store hdfs namespace metadata on namenode</description>
</property>
  
<property>
    <name>dfs.data.dir</name>
    <value>/usr/hadoop/hdfs/data</value>
    <description>The physical storage location of the data block on the datanode</description>
</property>
  
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>
</configuration>

mapred-site.xml file configuration

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

yarn-site.xml configuration

<configuration>
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
</property>
<property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>192.168.2.10:8099</value>
        <description>This address is the mr management interface</description>
</property>
</configuration>

Create Folder
We have configured some folder paths in the configuration file, now let’s create them, use hadoop user operation in the /usr/hadoop/ directory, create tmp, hdfs/name, hdfs/data directories, and execute the following commands:

mkdir -p /usr/hadoop/tmp
mkdir /usr/hadoop/hdfs
mkdir /usr/hadoop/hdfs/data
mkdir /usr/hadoop/hdfs/name

Add Hadoop to environment variables
vim /etc/profile




#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

Also, start-yarn.sh, stop-yarn.sh top also need to add the following:

#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

Start start-dfs.sh again, and finally enter the command jps to verify that the following interface appears, indicating that the startup is successful:

Afterwards, if your local virtual machine has a graphical interface, you can open the Firefox browser in the graphical interface of your virtual machine and enter: http://localhost:9870/ or enter on your local windows machine http://virtual machine ip address:9870/ You can also access the management page of hadoop.

Well, at this point, the Hadoop installation is complete.

Simple and crude method: Friends who don’t want to code can copy the following code directly to the command line, and the configuration is complete! ! !


mkdir /app
cd /opt
ll
tar -zxvf jdk-8u171-linux-x64.tar.gz >>/dev/null 2> &1
mv jdk1.8.0_171/ /app
echo "JAVA_HOME=/app/jdk1.8.0_171" >> /etc/profile
echo "CLASSPATH=.:$JAVA_HOME/lib/tools.jar" >> /etc/profile
echo "PATH=$JAVA_HOME/bin:$PATH" >> /etc/profile
source /etc/profile
java-version




tar -zxvf /opt/hadoop-3.1.0.tar.gz -C /app >>/dev/null 2> &1
mv /app/hadoop-3.1.0 /app/hadoop3.1 2>/dev/null
ssh-keygen -t rsa -P '' <<< $'\
'
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

echo "AuthorizedKeysFile %h/.ssh/authorized_keys" >> /etc/ssh/sshd_config
echo "export JAVA_HOME=/app/jdk1.8.0_171" >> /app/hadoop3.1/etc/hadoop/hadoop-env.sh
echo "export JAVA_HOME=/app/jdk1.8.0_171" >> /app/hadoop3.1/etc/hadoop/yarn-env.sh


sed -i 's|||g' /app/hadoop3.1/etc/hadoop/core-site.xml
sed -i 's///g' /app/hadoop3.1/etc/hadoop/core-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/core-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/core-site.xml
echo "fs.default.name" >> /app/hadoop3.1/etc/hadoop/core-site.xml
echo "hdfs://localhost:9000" >> /app/hadoop3.1/etc/hadoop/core-site.xml
echo "HDFSURI://namenode" >> /app/hadoop3.1/etc/hadoop/core-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/core-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/core-site.xml
echo "hadoop.tmp.dir" >> /app/hadoop3.1/etc/hadoop/core-site.xml
echo "/usr/hadoop/tmp" >> /app/hadoop3.1/etc/hadoop/core-site.xml
echo "namenode" >> /app/hadoop3.1/etc/hadoop/core-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/core-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/core-site.xml

sed -i 's|||g' /app/hadoop3.1/etc/hadoop/hdfs-site.xml
sed -i 's///g' /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "dfs.name.dir" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "/usr/hadoop/hdfs/name" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "namenode" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "dfs.data.dir" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "/usr/hadoop/hdfs/data" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "datanode" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "dfs.replication" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "1" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/hdfs-site.xml

sed -i 's|||g' /app/hadoop3.1/etc/hadoop/mapred-site.xml
sed -i 's///g' /app/hadoop3.1/etc/hadoop/mapred-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/mapred-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/mapred-site.xml
echo "mapreduce.framework.name" >> /app/hadoop3.1/etc/hadoop/mapred-site.xml
echo "yarn" >> /app/hadoop3.1/etc/hadoop/mapred-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/mapred-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/mapred-site.xml


sed -i 's|||g' /app/hadoop3.1/etc/hadoop/yarn-site.xml
sed -i 's///g' /app/hadoop3.1/etc/hadoop/yarn-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/yarn-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/yarn-site.xml
echo "yarn.nodemanager.aux-services" >> /app/hadoop3.1/etc/hadoop/yarn-site.xml
echo "mapreduce_shuffle" >> /app/hadoop3.1/etc/hadoop/yarn-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/yarn-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/yarn-site.xml
echo "yarn.resourcemanager.webapp.address" >> /app/hadoop3.1/etc/hadoop/yarn-site.xml
echo "192.168.2.10:8099">> /app/hadoop3.1/etc/hadoop/yarn-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/yarn-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/yarn-site.xml
echo "" >> /app/hadoop3.1/etc/hadoop/yarn-site.xml



mkdir -p /usr/hadoop/tmp
mkdir /usr/hadoop/hdfs
mkdir /usr/hadoop/hdfs/data
mkdir /usr/hadoop/hdfs/name

echo "export HADOOP_HOME=/app/hadoop3.1" >> /etc/profile
echo "export PATH=\$PATH:\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin" >> /etc/profile

source /etc/profile

hadoop namenode -format

start-yarn.sh


sed -i "2a\HDFS_DATANODE_USER=root" /app/hadoop3.1/sbin/start-dfs.sh
sed -i "2a\HADOOP_SECURE_DN_USER=hdfs" /app/hadoop3.1/sbin/start-dfs.sh
sed -i "2a\HDFS_NAMENODE_USER=root" /app/hadoop3.1/sbin/start-dfs.sh
sed -i "2a\HDFS_SECONDARYNAMENODE_USER=root" /app/hadoop3.1/sbin/start-dfs.sh


sed -i "2a\HDFS_DATANODE_USER=root" /app/hadoop3.1/sbin/stop-dfs.sh
sed -i "2a\HADOOP_SECURE_DN_USER=hdfs" /app/hadoop3.1/sbin/stop-dfs.sh
sed -i "2a\HDFS_NAMENODE_USER=root" /app/hadoop3.1/sbin/stop-dfs.sh
sed -i "2a\HDFS_SECONDARYNAMENODE_USER=root" /app/hadoop3.1/sbin/stop-dfs.sh

sed -i "2a\YARN_RESOURCEMANAGER_USER=root" /app/hadoop3.1/sbin/stop-yarn.sh
sed -i "2a\HADOOP_SECURE_DN_USER=yarn" /app/hadoop3.1/sbin/stop-yarn.sh
sed -i "2a\YARN_NODEMANAGER_USER=root" /app/hadoop3.1/sbin/stop-yarn.sh

sed -i "2a\YARN_RESOURCEMANAGER_USER=root" /app/hadoop3.1/sbin/start-yarn.sh
sed -i "2a\HADOOP_SECURE_DN_USER=yarn" /app/hadoop3.1/sbin/start-yarn.sh
sed -i "2a\YARN_NODEMANAGER_USER=root" /app/hadoop3.1/sbin/start-yarn.sh

  
start-dfs.sh
  
 
   
jps



If you still don’t understand, please leave a message in the comment area! ! !