Hadoop and Hive installation

1. Tools

Linux system: Centos, version 7.0 and above
JDK: jdk1.8
Hadoop: 3.1.3
Hive: 3.1.2
Virtual machine: VMware
mysql:5.7.11

Tool download address: https://pan.baidu.com/s/10J_1w1DW9GQC7NOYw5fwvg?pwd=0kdr
Extraction code: 0kdr

Tip: The following is the text of this article. The following cases are for reference.

2. JDK installation

Download the jdk-8u181-linux-x64.tar.gz package and upload this package to the /opt directory.

cd /opt
Unzip the installation package tar zxvf jdk-8u181-linux-x64.tar.gz
Delete the installation package rm -f jdk-8u181-linux-x64.tar.gz

Use root privileges to edit the profile file and set environment variables

vi/etc/profile
export JAVA_HOME= /usr/java/jdk1.8.0_181
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin

Make the modified files take effect

source /etc/profile

3. Install mysql

Download the mysql-5.7.11-linux-glibc2.5-x86_64.tar.gz package, upload this package to the /opt directory, and rename it to mysql.

cd /opt

tar -xzvf mysql-5.7.11-linux-glibc2.5-x86_64.tar.gz

mv mysql-5.7.11-linux-glibc2.5-x86_64 mysql

First check whether the user group exists

groups mysql

Create user groups and usernames

groupadd mysql & amp; & amp; useradd -r -g mysql mysql

Grant file data directory permissions

chown mysql:mysql -R /opt/mysql/data

Modify the /etc/my.cnf configuration file and create it if it is not available.

vi /etc/my.cnf
[mysqld]
port=3306
user=mysql
basedir=/opt/mysql/
datadir=/opt/mysql/data
socket=/tmp/mysql.sock
symbolic-links=0

[mysqld_safe]
log-error=/opt/mysql/data/mysql.log
pid-file=/opt/mysql/data/mysql.pid

[client]
port=3306
default-character-set=utf8

Initialize mysql service

cd /opt/mysql/bin

Execute the command, and there will be a default password. Some people will report errors here because libaio is not installed. Let’s install it first.

yum install libaio -y
./mysqld --defaults-file=/etc/my.cnf --user=mysql --initialize

Start mysql

cp /opt/mysql/support-files/mysql.server /etc/init.d/mysql
service mysql start

Enter directory

cd /opt/mysql/bin

To log in, just enter the temporary password just now (copy and paste directly)

./mysql -u root -p

Change the password. The password I set is root. At the end, set it according to your needs.

alter user 'root'@'localhost' identified with mysql_native_password BY 'root';

Refresh to make the operation effective

flush privileges;

Change database connection permissions

use mysql;
update user set host='%' where user = 'root';
flush privileges;

quit

exit

test

The IP of my virtual machine is 192.168.19.10

Some people fail to connect because the firewall does not open the port. There are two methods, close the firewall or open the port.

Turn off firewall

systemctl stop firewalld

open port

firewall-cmd --zone=public --add-port=3306/tcp --permanent

After opening the port, you need to restart the firewall to take effect.

firewall-cmd --reload

Set up auto-start at power on

Add to service list

chkconfig --add mysql

View list

chkconfig --list

Generally, 2345 is on or on, if it is not executing a command

chkconfig --level 2345 mysql on

Add system path

vi /etc/profile
export PATH=/opt/mysql/bin:$PATH
source /etc/profile

4. hadoop installation

The installation steps are exactly the same as those of jdk. There is /opt, and then the downloaded and decompressed hadoop is placed under this folder. The most important thing is the configuration file. If the path in the configuration file is correct, that’s fine. The configuration code is as follows:

vi/etc/profile
export HADOOP_HOME=/opt/hadoop-3.1.3
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

After installation, you can enter the hadoop version command in the terminal to view:

Hadoop distributed configuration

Terminal input mkdir /opt/hadoop-3.1.3/tmp to create the tmp folder
Enter mkdir /opt/hadoop-3.1.3/data/namenode in the terminal to create the namenode folder
Enter mkdir /opt/hadoop-3.1.3/data/datanode in the terminal to create the datanode folder
Enter cd /opt/hadoop-3.1.3/etc/hadoop/ in the terminal. Pay attention to your path. The files that need to be modified later are all under this directory. Enter this directory first.

Enter /opt/hadoop-3.1.3/etc/hadoop

cd /opt/hadoop-3.1.3/etc/hadoop

Configure core-site.xml: Enter vi core-site.xml to open the file and add
(I use three virtual machines in full distribution, KingSSM is my host name, and the other two are Slave1 and Slave2)

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://kingssm:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/opt/hadoop-3.1.3/tmp</value>
  </property>
</configuration>

Configure hdfs-site.xml: Enter vi hdfs-site.xml to open the file and add

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/opt/hadoop-3.1.3/data/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/opt/hadoop-3.1.3/data/datanode</value>
    </property>
    <property>
         <name>dfs.permissions</name>
         <value>false</value>
    </property>
</configuration>


Configure mapred.site.xml: Enter vi mapred-site.xml and open the file and add

<configuration>
   <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
   </property>
   <property>
        <name>mapred.job.tracker</name>
        <value>kingssm:9001</value>
   </property>
</configuration>

Configure yarn-site.xml: Enter yarn-site.xml to open the file and add

<configuration>
<!-- Site specific YARN configuration properties -->
  <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
  </property>
  <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>kingssm</value>
  </property>
  <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
  </property>
  <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
  </property>
</configuration>

Configure hadoop-env.sh: Enter vi hadoop-env.sh to open the file and add

export JAVA_HOME=/opt/jdk1.8.0_181
export HADOOP_HOME=/opt/hadoop-3.1.3
export PATH=$PATH:/opt/hadoop-3.1.3/bin
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
export HADOOP_PID_DIR=/opt/hadoop-3.1.3/pids

Configure yarn-env.sh: Enter vi yarn-env.sh to open the file and add

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

Configure workers: Enter vi workers to open the file and add it. Replace it with your host name and IP address (KingSSM is the host name of the virtual machine currently being operated, and the other two are the host names and IP addresses of the two virtual machines to be cloned later. The address needs to be modified in the virtual machine)

Enter cd /opt/hadoop-3.1.3/sbin/ in the terminal to enter the new directory
Configure start-dfs.sh: Enter vi start-dfs.sh to open the file and add

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

Configure stop-dfs.sh: Enter vi stop-dfs.sh to open the file and add

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

Turn off firewall

systemctl stop firewalld
systemctl disable firewalld

Modify hostname

# View host name
hostname

# Modify hostname
hostnamectl --static set-hostname kingssm

Set static IP
Enter ip route in the terminal to view the gateway

Enter vi /etc/sysconfig/network-scripts/ifcfg-ens33 to modify the file: modify or add the following content. Choose the IP address yourself, but make sure it corresponds to the gateway. For example, if the gateway is 192.168.12.128, then the IP address must be preceded by 192.168.12, the latter part is optional. NDS1 is the same as the gateway, the subnet mask is 255.255.255.0

Add mapping between virtual machines
Enter vi /etc/hosts in the terminal and add

SSH password-free login
First run

ssh localhost

Normally, you can log in without a password. If you still have to enter a password, it means that your ssh is not configured properly. What I want to talk about here is that the password verification method of dsa has been turned off after ssh7.0. If your secret key is generated through dsa, you need to use rsa to generate the secret key instead.

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

run again

ssh localhost

If you do not need to enter a password, it means that ssh is configured. Next run

ssh-keygen -t rsa and then press Enter;
When you can enter again, enter the following command to publish the public key:
ssh-copy-id kingssm
ssh-copy-id slave1
ssh-copy-id slave2

5. Clone the virtual machine and start the cluster

Shut down the kingssm virtual machine currently in use, and then clone the two virtual machines.

Click on the virtual machine——>right-click——>Manage——>Clone——>Full Clone

After the cloning is completed, open all three virtual machines, then set the host names slave1 and slave2 respectively for the two cloned machines, and modify the IP addresses.

Start the cluster
All three virtual machines need to be formatted first

Open the terminal and operate as root. For all three terminals, enter hadoop namenode -format to format.

After the formatting is completed, start the cluster in kingssm and enter start-all.sh to start the cluster (if it is closed, enter stop-all.sh)

After startup, enter jps to check the startup status. kingssm and slave should have the following information

Visit the web page to view the results: kingssm:9870

Visit the web page to view the results: kingssm:8088

6. hive installation

Modify hadoop’s core-site.xml and add the following content:

Modify the hadoop configuration file /opt/hadoop-3.1.3/core-site.xml and add the following configuration items:

<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>

Download the software package apache-hive-3.1.2-bin.tar.gz, upload it to the /usr directory and unzip it and rename hive

cd /opt
tar -xzvf apache-hive-3.1.2-bin.tar.gz
mv apache-hive-3.1.2-bin

Modify hive’s environment configuration file: hive-env.sh

cd /export/server/hive-3.1.2/conf
cp hive-env.sh.template hive-env.sh
vim hive-env.sh

Modify the content:

# Configure hadoop home directory
HADOOP_HOME=/opt/hadoop-3.1.3/
#Configure the path to the hive configuration file
export HIVE_CONF_DIR=/opt/hive/conf/
# Configure hive's lib directory
export HIVE_AUX_JARS_PATH=/opt/hive/lib/

Create configuration file

cd /opt/conf/
vi hive-site.xml

Copy the following content into the configuration file

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>javax.jdo.option.ConnectionURL</name>
                <value>jdbc:mysql://kingssm:3306/metastore?createDatabaseIfNotExist=true & amp;amp;useSSL=false</value>
        </property>
 
        <property>
                <name>javax.jdo.option.ConnectionDriverName</name>
                <value>com.mysql.jdbc.Driver</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionUserName</name>
                <value>root</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionPassword</name>
                <value>123456</value>
        </property>
<property>
                <name>hive.metastore.warehouse.dir</name>
                <value>/user/hive/warehouse</value>
        </property>
        <property>
                <name>hive.metastore.schema.verification</name>
                <value>false</value>
        </property>
        <property>
                <name>hive.metastore.event.db.notification.api.auth</name>
                <value>false</value>
        </property>
         <property>
                <name>hive.cli.print.current.db</name>
                <value>true</value>
        </property>
                <!-- Remote mode deployment metastore service address -->
<property>
<name>hive.metastore.uris</name>
<value>thrift://kingssm:9083</value>
</property>
         <property>
                <name>hive.cli.print.header</name>
                <value>true</value>
        </property>
        <property>
                <name>hive.server2.thrift.bind.host</name>
                <value>kingssm</value>
        </property>
        <property>
                <name>hive.server2.thrift.port</name>
                <value>10000</value>
        </property>
</configuration>

vi /etc/profile
export HIVE_HOME=/opt/hive
export PATH=$PATH:$HIVE_HOME/bin
source /etc/profile

Connect to MySQL, username root, password root

mysql -uroot -proot

To create hive metadata, the SQL needs to be consistent with the configuration in hive-site.xml
Create a database, database name: metastore

create database metastore;
show databases;

Initialize metabase

schematool -initSchema -dbType mysql -verbose

Seeing schemaTool completed indicates that the initialization was successful.

Verify installation

hive

quit

quit;

If you encounter the following errors and solutions:

The conflict between hadoo’s slf4j and hive’s slf4j

Delete /opt/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar

The two guava.jar versions of hadoop and hive are inconsistent

Replace the higher version with the lower version

Create HDFS hive-related directories

hadoop fs -mkdir /tmp
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g + w /tmp
hadoop fs -chmod g + w /user/hive/warehouse

Start hive service: metastore

Start the metastore service first:
Front desk startup:

cd /opt/hive/bin
hive --service metastore

? Note: After the foreground is started, the foreground interface will always be occupied and operations cannot be performed.
? Benefits: Generally start through the front desk first, observe whether the metastore service starts well
Foreground exit: ctrl + c

Background startup:
When there is no problem with the front-end startup, you can exit it, then start it through the background, and mount the background service.

cd /opt/hive/bin
nohup hive --service metastore &

? After startup, check through jps to see if a runjar appears. If it does, it means there is no problem (it is recommended to wait for about a minute and perform a second verification)
? Note: If it fails, start it through the foreground, observe the startup log, see what the problem is, and try to solve it

How to exit the background:
Check the process id through jps and then use kill -9

Start hive service: hiveserver2 service

Then start the hiveserver2 service item:
Front desk startup:

cd /opt/hive/bin
hive --service hiveserver2

? Note: After the foreground is started, the foreground interface will always be occupied and operations cannot be performed.
? Benefits: Generally start through the front desk first, observe whether the hiveserver2 service starts well
Foreground exit: ctrl + c

Background startup:
When there is no problem with the front-end startup, you can exit it, then start it through the background, and mount the background service.

cd /opt/hive/bin
nohup hive --service hiveserver2 &

? After startup, check through jps to see if a runjar appears. If it does, it means there is no problem (it is recommended to wait for about a minute and perform a second verification)
? Note: If it fails, start it through the foreground, observe the startup log, see what the problem is, and try to solve it

How to exit the background:
Check the process id through jps and then use kill -9

Beeline-based connection method

cd /opt/hive/bin
beeline –Enter beeline client
Connect hive:
!connect jdbc:hive2://kingssm:10000
Then enter the username: root
Finally enter the password: Doesn’t matter (usually the login password of the virtual machine is written)

Problems may occur

Go to /opt/hadoop-3.1.3/data/datanode/current under our hadoop and modify the VERSION file. Just change the datanodeUuid to two different ids. You can change it at will~