1. Tools
Linux system: Centos, version 7.0 and above
JDK: jdk1.8
Hadoop: 3.1.3
Hive: 3.1.2
Virtual machine: VMware
mysql:5.7.11
Tool download address: https://pan.baidu.com/s/10J_1w1DW9GQC7NOYw5fwvg?pwd=0kdr
Extraction code: 0kdr
Tip: The following is the text of this article. The following cases are for reference.
2. JDK installation
Download the jdk-8u181-linux-x64.tar.gz package and upload this package to the /opt directory.
cd /opt Unzip the installation package tar zxvf jdk-8u181-linux-x64.tar.gz Delete the installation package rm -f jdk-8u181-linux-x64.tar.gz
Use root privileges to edit the profile file and set environment variables
vi/etc/profile export JAVA_HOME= /usr/java/jdk1.8.0_181 export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin
Make the modified files take effect
source /etc/profile
3. Install mysql
Download the mysql-5.7.11-linux-glibc2.5-x86_64.tar.gz package, upload this package to the /opt directory, and rename it to mysql.
cd /opt tar -xzvf mysql-5.7.11-linux-glibc2.5-x86_64.tar.gz mv mysql-5.7.11-linux-glibc2.5-x86_64 mysql
First check whether the user group exists
groups mysql
Create user groups and usernames
groupadd mysql & amp; & amp; useradd -r -g mysql mysql
Grant file data directory permissions
chown mysql:mysql -R /opt/mysql/data
Modify the /etc/my.cnf configuration file and create it if it is not available.
vi /etc/my.cnf
[mysqld] port=3306 user=mysql basedir=/opt/mysql/ datadir=/opt/mysql/data socket=/tmp/mysql.sock symbolic-links=0 [mysqld_safe] log-error=/opt/mysql/data/mysql.log pid-file=/opt/mysql/data/mysql.pid [client] port=3306 default-character-set=utf8
Initialize mysql service
cd /opt/mysql/bin
Execute the command, and there will be a default password. Some people will report errors here because libaio is not installed. Let’s install it first.
yum install libaio -y
./mysqld --defaults-file=/etc/my.cnf --user=mysql --initialize
Start mysql
cp /opt/mysql/support-files/mysql.server /etc/init.d/mysql
service mysql start
Enter directory
cd /opt/mysql/bin
To log in, just enter the temporary password just now (copy and paste directly)
./mysql -u root -p
Change the password. The password I set is root
. At the end, set it according to your needs.
alter user 'root'@'localhost' identified with mysql_native_password BY 'root';
Refresh to make the operation effective
flush privileges;
Change database connection permissions
use mysql;
update user set host='%' where user = 'root';
flush privileges;
quit
exit
test
The IP of my virtual machine is 192.168.19.10
Some people fail to connect because the firewall does not open the port. There are two methods, close the firewall or open the port.
Turn off firewall
systemctl stop firewalld
open port
firewall-cmd --zone=public --add-port=3306/tcp --permanent
After opening the port, you need to restart the firewall to take effect.
firewall-cmd --reload
Set up auto-start at power on
Add to service list
chkconfig --add mysql
View list
chkconfig --list
Generally, 2345 is on or on, if it is not executing a command
chkconfig --level 2345 mysql on
Add system path
vi /etc/profile
export PATH=/opt/mysql/bin:$PATH
source /etc/profile
4. hadoop installation
The installation steps are exactly the same as those of jdk. There is /opt, and then the downloaded and decompressed hadoop is placed under this folder. The most important thing is the configuration file. If the path in the configuration file is correct, that’s fine. The configuration code is as follows:
vi/etc/profile
export HADOOP_HOME=/opt/hadoop-3.1.3 export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
After installation, you can enter the hadoop version command in the terminal to view:
Hadoop distributed configuration
Terminal input mkdir /opt/hadoop-3.1.3/tmp to create the tmp folder Enter mkdir /opt/hadoop-3.1.3/data/namenode in the terminal to create the namenode folder Enter mkdir /opt/hadoop-3.1.3/data/datanode in the terminal to create the datanode folder Enter cd /opt/hadoop-3.1.3/etc/hadoop/ in the terminal. Pay attention to your path. The files that need to be modified later are all under this directory. Enter this directory first.
Enter /opt/hadoop-3.1.3/etc/hadoop
cd /opt/hadoop-3.1.3/etc/hadoop
Configure core-site.xml: Enter vi core-site.xml to open the file and add
(I use three virtual machines in full distribution, KingSSM is my host name, and the other two are Slave1 and Slave2)
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://kingssm:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop-3.1.3/tmp</value> </property> </configuration>
Configure hdfs-site.xml: Enter vi hdfs-site.xml to open the file and add
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/opt/hadoop-3.1.3/data/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/opt/hadoop-3.1.3/data/datanode</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
Configure mapred.site.xml: Enter vi mapred-site.xml and open the file and add
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.job.tracker</name> <value>kingssm:9001</value> </property> </configuration>
Configure yarn-site.xml: Enter yarn-site.xml to open the file and add
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>kingssm</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> </configuration>
Configure hadoop-env.sh: Enter vi hadoop-env.sh to open the file and add
export JAVA_HOME=/opt/jdk1.8.0_181 export HADOOP_HOME=/opt/hadoop-3.1.3 export PATH=$PATH:/opt/hadoop-3.1.3/bin export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native" export HADOOP_PID_DIR=/opt/hadoop-3.1.3/pids
Configure yarn-env.sh: Enter vi yarn-env.sh to open the file and add
YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
Configure workers: Enter vi workers to open the file and add it. Replace it with your host name and IP address (KingSSM is the host name of the virtual machine currently being operated, and the other two are the host names and IP addresses of the two virtual machines to be cloned later. The address needs to be modified in the virtual machine)
Enter cd /opt/hadoop-3.1.3/sbin/ in the terminal to enter the new directory
Configure start-dfs.sh: Enter vi start-dfs.sh to open the file and add
HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
Configure stop-dfs.sh: Enter vi stop-dfs.sh to open the file and add
HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
Turn off firewall
systemctl stop firewalld systemctl disable firewalld
Modify hostname
# View host name hostname # Modify hostname hostnamectl --static set-hostname kingssm
Set static IP
Enter ip route in the terminal to view the gateway
Enter vi /etc/sysconfig/network-scripts/ifcfg-ens33 to modify the file: modify or add the following content. Choose the IP address yourself, but make sure it corresponds to the gateway. For example, if the gateway is 192.168.12.128, then the IP address must be preceded by 192.168.12, the latter part is optional. NDS1 is the same as the gateway, the subnet mask is 255.255.255.0
Add mapping between virtual machines
Enter vi /etc/hosts in the terminal and add
SSH password-free login
First run
ssh localhost
Normally, you can log in without a password. If you still have to enter a password, it means that your ssh is not configured properly. What I want to talk about here is that the password verification method of dsa has been turned off after ssh7.0. If your secret key is generated through dsa, you need to use rsa to generate the secret key instead.
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
run again
ssh localhost
If you do not need to enter a password, it means that ssh is configured. Next run
ssh-keygen -t rsa and then press Enter; When you can enter again, enter the following command to publish the public key: ssh-copy-id kingssm ssh-copy-id slave1 ssh-copy-id slave2
5. Clone the virtual machine and start the cluster
Shut down the kingssm virtual machine currently in use, and then clone the two virtual machines.
Click on the virtual machine——>right-click——>Manage——>Clone——>Full Clone
After the cloning is completed, open all three virtual machines, then set the host names slave1 and slave2 respectively for the two cloned machines, and modify the IP addresses.
Start the cluster
All three virtual machines need to be formatted first
Open the terminal and operate as root. For all three terminals, enter hadoop namenode -format to format.
After the formatting is completed, start the cluster in kingssm and enter start-all.sh to start the cluster (if it is closed, enter stop-all.sh)
After startup, enter jps to check the startup status. kingssm and slave should have the following information
Visit the web page to view the results: kingssm:9870
Visit the web page to view the results: kingssm:8088
6. hive installation
Modify hadoop’s core-site.xml and add the following content:
Modify the hadoop configuration file /opt/hadoop-3.1.3/core-site.xml and add the following configuration items:
<property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property>
Download the software package apache-hive-3.1.2-bin.tar.gz, upload it to the /usr directory and unzip it and rename hive
cd /opt tar -xzvf apache-hive-3.1.2-bin.tar.gz mv apache-hive-3.1.2-bin
Modify hive’s environment configuration file: hive-env.sh
cd /export/server/hive-3.1.2/conf cp hive-env.sh.template hive-env.sh vim hive-env.sh
Modify the content:
# Configure hadoop home directory HADOOP_HOME=/opt/hadoop-3.1.3/ #Configure the path to the hive configuration file export HIVE_CONF_DIR=/opt/hive/conf/ # Configure hive's lib directory export HIVE_AUX_JARS_PATH=/opt/hive/lib/
Create configuration file
cd /opt/conf/ vi hive-site.xml
Copy the following content into the configuration file
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://kingssm:3306/metastore?createDatabaseIfNotExist=true & amp;amp;useSSL=false</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <property> <name>hive.metastore.event.db.notification.api.auth</name> <value>false</value> </property> <property> <name>hive.cli.print.current.db</name> <value>true</value> </property> <!-- Remote mode deployment metastore service address --> <property> <name>hive.metastore.uris</name> <value>thrift://kingssm:9083</value> </property> <property> <name>hive.cli.print.header</name> <value>true</value> </property> <property> <name>hive.server2.thrift.bind.host</name> <value>kingssm</value> </property> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> </configuration>
vi /etc/profile
export HIVE_HOME=/opt/hive export PATH=$PATH:$HIVE_HOME/bin
source /etc/profile
Connect to MySQL, username root, password root
mysql -uroot -proot
To create hive metadata, the SQL needs to be consistent with the configuration in hive-site.xml
Create a database, database name: metastore
create database metastore; show databases;
Initialize metabase
schematool -initSchema -dbType mysql -verbose
Seeing schemaTool completed indicates that the initialization was successful.
Verify installation
hive
quit
quit;
If you encounter the following errors and solutions:
The conflict between hadoo’s slf4j and hive’s slf4j
Delete /opt/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar
The two guava.jar versions of hadoop and hive are inconsistent
Replace the higher version with the lower version
Create HDFS hive-related directories
hadoop fs -mkdir /tmp
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g + w /tmp
hadoop fs -chmod g + w /user/hive/warehouse
Start hive service: metastore
Start the metastore service first:
Front desk startup:
cd /opt/hive/bin hive --service metastore
? Note: After the foreground is started, the foreground interface will always be occupied and operations cannot be performed.
? Benefits: Generally start through the front desk first, observe whether the metastore service starts well
Foreground exit: ctrl + c
Background startup:
When there is no problem with the front-end startup, you can exit it, then start it through the background, and mount the background service.
cd /opt/hive/bin nohup hive --service metastore &
? After startup, check through jps to see if a runjar appears. If it does, it means there is no problem (it is recommended to wait for about a minute and perform a second verification)
? Note: If it fails, start it through the foreground, observe the startup log, see what the problem is, and try to solve it
How to exit the background:
Check the process id through jps and then use kill -9
Start hive service: hiveserver2 service
Then start the hiveserver2 service item:
Front desk startup:
cd /opt/hive/bin hive --service hiveserver2
? Note: After the foreground is started, the foreground interface will always be occupied and operations cannot be performed.
? Benefits: Generally start through the front desk first, observe whether the hiveserver2 service starts well
Foreground exit: ctrl + c
Background startup:
When there is no problem with the front-end startup, you can exit it, then start it through the background, and mount the background service.
cd /opt/hive/bin nohup hive --service hiveserver2 &
? After startup, check through jps to see if a runjar appears. If it does, it means there is no problem (it is recommended to wait for about a minute and perform a second verification)
? Note: If it fails, start it through the foreground, observe the startup log, see what the problem is, and try to solve it
How to exit the background:
Check the process id through jps and then use kill -9
Beeline-based connection method
cd /opt/hive/bin
beeline –Enter beeline client
Connect hive:
!connect jdbc:hive2://kingssm:10000
Then enter the username: root
Finally enter the password: Doesn’t matter (usually the login password of the virtual machine is written)
Problems may occur
Go to /opt/hadoop-3.1.3/data/datanode/current under our hadoop and modify the VERSION file. Just change the datanodeUuid to two different ids. You can change it at will~