Install hadoop and configure hue

0. Description

For the initial stage of big data learning, I also tried to build a corresponding cluster environment. Understand some functions, configurations, and principles of components by building an environment.
In the actual learning process, I mostly use docker to quickly build the environment.
Here is a record of my process of building hadoop.

1. Download hadoop

Download address: Apache Hadoop

wget https://mirrors.bfsu.edu.cn/apache/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz

# Unzip
tar -zxvf hadoop-2.10.1.tar.gz

# Copy to the /usr/hadoop/ directory
# sudo mkdir /usr/hadoop/
# cp -r hadoop-2.10.1 /usr/hadoop/

#Add HADOOP_HOME
sudo /etc/profile
# Add the following content, save and exit
#HADOOP_HOME
export HADOOP_HOME=/home/airwalk/bigdata/soft/hadoop-2.10.1
export PATH=$HADOOP_HOME/bin:$PATH
export PATH=$HADOOP_HOME/sbin:$PATH


# to validate
source /etc/profile

# test
hdfs version

#The results are as follows
airwalk@svr43:/usr/hadoop/hadoop-2.10.1$ hdfs version
Hadoop 2.10.1
Subversion https://github.com/apache/hadoop -r 1827467c9a56f133025f28557bfc2c562d78e816
Compiled by centos on 2020-09-14T13:17Z
Compiled with protoc 2.5.0
From source with checksum 3114edef868f1f3824e7d0f68be03650
This command was run using /home/airwalk/bigdata/soft/hadoop-2.10.1/share/hadoop/common/hadoop-common-2.10.1.jar


# test
cd bigdata/soft/hadoop-2.10.1
mkdir input
cp etc/hadoop/* input
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.1.jar grep input/ output '[a-z.] + '

At this point, hadoop is successfully installed on a machine.

2. Configuration

svr43 server42 server37
hdfs namenode,datanode datanode SecondaryNamenode,datanode
yarn nodeManager resourceManager, nodeManager nodeManager

0: Password-free login configuration

  • Generate the key on the svr43 machine (namenode node of hdfs)
cd ~/.ssh
## Just press Enter below
ssh-keygen-trsa

## Then execute in this directory
ssh-copy-id server42
ssh-copy-id server37
# I also need to log in myself without a password
ssh-copy-id svr43

  • Generate a key on the server42 machine and log in to other nodes without a password, because this node is yarn’s resourceManger.
cd ~/.ssh
## Just press Enter below
ssh-keygen-trsa

## Then execute in this directory
# I also need to log in myself without a password
ssh-copy-id server42
ssh-copy-id server37
ssh-copy-id svr43

!! Note that the following exception occurs

airwalk@server42:~/.ssh$ ssh svr43
Warning: the ECDSA host key for 'svr43' differs from the key for the IP address '192.168.0.43'
Offending key for IP in /home/airwalk/.ssh/known_hosts:3
Matching host key in /home/airwalk/.ssh/known_hosts:11
Are you sure you want to continue connecting (yes/no)? yes
Welcome to Ubuntu 16.04.6 LTS (GNU/Linux 4.4.0-142-generic x86_64)

# Solution
ssh-keygen -R 192.168.0.43
  • Generate the root key on the svr43 machine (namenode node of hdfs)
# Switch to the root account
sudo su root
cd /root/.ssh
ssh-keygen-trsa
ssh-copy-id server42
ssh-copy-id server37
ssh-copy-id svr43

1: Configure core-site.xml

The location of the temporary file. Be careful not to place it on a disk that is too small. The following directory is used here.

/home/airwalk/bigdata/soft/hadoop-2.10.1/data/tmp
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.0.43:9000</value>
<!-- Here we directly use the method of configuring IP -->
#<value>hdfs://svr43:9000</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/home/airwalk/bigdata/soft/hadoop-2.10.1/data/tmp</value>
</property>

2: hdfs configuration file

Configure hadoop-env.sh

echo $JAVA_HOME
vim hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Configure hdfs-site.xml

Because there are 3 clusters, the number of replicas here is changed to 3.

 <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!-- Specify hadoop auxiliary namenode node host configuration -->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>server37:50090</value>
    </property>

3: yarn configuration

Configure yarn-env.sh

vim yarn-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Configure yarn-site.xml

 <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
<!-- Specify hadoop auxiliary namenode node host configuration -->
<property>
    <name>yarn.resourcemanager.hostname</name>
    <!-- Here we directly use the method of configuring IP -->
    <value>192.168.0.42</value>
        <!-- <value>server42</value> -->
</property>

4: MapReduce configuration

Configure mapred-env.sh

vim yarn-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Configure mapred-site.xml

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

3. Start

  • On the namenode node, perform formatting operations
hdfs namenode -format
  • Start namenode
cd sbin
./hadoop-daemon.sh start namenode
./sbin/hadoop-daemon.sh start namenode
  • Start datanode
# Start all three machines
./sbin/hadoop-daemon.sh start datanode
  • quit
./sbin/hadoop-daemon.sh stop namenode
# All three machines exit
./sbin/hadoop-daemon.sh stop datanode

4. Cluster startup

1: Configure slaves, all nodes must be modified

cd /home/airwalk/bigdata/soft/hadoop-2.10.1/etc/hadoop
vim salves
# Add the name of the slave host, spaces and blank lines are not allowed
svr43
server42
server37

2: Start hdfs cluster

Can automatically start all datanodes and namenodes in the cluster

# Execute the following command on the namenode node of hdfs
./sbin/start-dfs.sh

3: Start yarn

# Needs to be processed on the resourceManger node (server42)
./sbin/start-yarn.sh

starting yarn daemons
resourcemanager running as process 10206. Stop it first.
server42: nodemanager running as process 10550. Stop it first.
svr43: starting nodemanager, logging to /home/airwalk/bigdata/soft/hadoop-2.10.1/logs/yarn-airwalk-nodemanager-svr43.out
server37: starting nodemanager, logging to /home/airwalk/bigdata/soft/hadoop-2.10.1/logs/yarn-airwalk-nodemanager-server37.out

5. View

Links on the namenode node

http://192.168.0.43:50070/

6. Configure hue

Hadoop configuration file modification

hdfs-site.xml

<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>

core-site.html

<property>
  <name>hadoop.proxyuser.airwalk.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.airwalk.groups</name>
  <value>*</value>
</property>

<property>
  <name>hadoop.proxyuser.root.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.groups</name>
  <value>*</value>
</property>

httpfs-site.xml configuration

 <!-- Hue HttpFS proxy airwalk setting -->
    <property>
    <name>httpfs.proxyuser.airwalk.hosts</name>
    <value>*</value>
    </property>
    <property>
    <name>httpfs.proxyuser.airwalk.groups</name>
    <value>*</value>
    </property>

HUE configuration file modification

[[hdfs_clusters]] [[[default]]]

fs_defaultfs=hdfs://mycluster

webhdfs_url=http://node1:50070/webhdfs/v1

hadoop_bin=/usr/hadoop-2.5.1/bin

hadoop_conf_dir=/usr/hadoop-2.5.1/etc/hadoop

Start hdfs and restart hue

Solution:

1. Turn off HDFS permission verification

hdfs-site.xml

<property>
 <name>dfs.permissions.enabled</name>
  <value>false</value>
</property>
 docker run -tid --name hue88 -p 8888:8888 -v /home/airwalk/bigdata/soft/hadoop-2.10.1/etc/hadoop:/etc/hadoop gethue/hue:latest

 docker cp hue.ini hue88:/usr/share/hue/desktop/conf/
 
 docker restart hue88
 
 docker exec -it --user root <container id> /bin/bash
sudo apt-get install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c + + krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap- devel python-devel sqlite-devel gmp-devel rsync


# Author: One punch hurts so much
# Link: https://www.jianshu.com/p/a80ec32afb27
# Source: Jianshu

# Reference documentation
[Install :: Hue SQL Assistant Documentation (gethue.com)](https://docs.gethue.com/administrator/installation/install/)
# After installing all dependencies, then
# /home/airwalk/bigdata/soft/hue is the directory you want to install
sudo PREFIX=/home/airwalk/bigdata/soft/hue make install
  • python3.8 installation
https://blog.csdn.net/qq_39779233/article/details/106875184
  • Install npm
sudo apt install npm
npm install --unsafe-perm=true --allow-root

  • Install node
# Select the source and version number. This is 10.x. Other versions only need to be changed to something such as: 12.x. Note that there is an x after it.
curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -

#Install the corresponding node
sudo apt-get install -y nodejs

# Check
node --version

  • question
# gyp ERR! stack Error: EACCES: permission denied, mkdir problem solution
# npm Some commands are not allowed to be executed under the root user. They will automatically switch from the root user to an ordinary user. If you set this here, you can execute it under the current user.

sudo npm i --unsafe-perm

# Then execute the following command with root permissions
PREFIX=/home/airwalk/bigdata/soft/hue make install

Compilation and installation successful! ! !