0. Description
For the initial stage of big data learning, I also tried to build a corresponding cluster environment. Understand some functions, configurations, and principles of components by building an environment.
In the actual learning process, I mostly use docker to quickly build the environment.
Here is a record of my process of building hadoop.
1. Download hadoop
Download address: Apache Hadoop
wget https://mirrors.bfsu.edu.cn/apache/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz # Unzip tar -zxvf hadoop-2.10.1.tar.gz # Copy to the /usr/hadoop/ directory # sudo mkdir /usr/hadoop/ # cp -r hadoop-2.10.1 /usr/hadoop/ #Add HADOOP_HOME sudo /etc/profile # Add the following content, save and exit #HADOOP_HOME export HADOOP_HOME=/home/airwalk/bigdata/soft/hadoop-2.10.1 export PATH=$HADOOP_HOME/bin:$PATH export PATH=$HADOOP_HOME/sbin:$PATH # to validate source /etc/profile # test hdfs version #The results are as follows airwalk@svr43:/usr/hadoop/hadoop-2.10.1$ hdfs version Hadoop 2.10.1 Subversion https://github.com/apache/hadoop -r 1827467c9a56f133025f28557bfc2c562d78e816 Compiled by centos on 2020-09-14T13:17Z Compiled with protoc 2.5.0 From source with checksum 3114edef868f1f3824e7d0f68be03650 This command was run using /home/airwalk/bigdata/soft/hadoop-2.10.1/share/hadoop/common/hadoop-common-2.10.1.jar # test cd bigdata/soft/hadoop-2.10.1 mkdir input cp etc/hadoop/* input hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.1.jar grep input/ output '[a-z.] + '
At this point, hadoop is successfully installed on a machine.
2. Configuration
svr43 | server42 | server37 | |
---|---|---|---|
hdfs | namenode,datanode | datanode | SecondaryNamenode,datanode |
yarn | nodeManager | resourceManager, nodeManager | nodeManager |
0: Password-free login configuration
- Generate the key on the svr43 machine (namenode node of hdfs)
cd ~/.ssh ## Just press Enter below ssh-keygen-trsa ## Then execute in this directory ssh-copy-id server42 ssh-copy-id server37 # I also need to log in myself without a password ssh-copy-id svr43
- Generate a key on the server42 machine and log in to other nodes without a password, because this node is yarn’s resourceManger.
cd ~/.ssh ## Just press Enter below ssh-keygen-trsa ## Then execute in this directory # I also need to log in myself without a password ssh-copy-id server42 ssh-copy-id server37 ssh-copy-id svr43
!! Note that the following exception occurs
airwalk@server42:~/.ssh$ ssh svr43 Warning: the ECDSA host key for 'svr43' differs from the key for the IP address '192.168.0.43' Offending key for IP in /home/airwalk/.ssh/known_hosts:3 Matching host key in /home/airwalk/.ssh/known_hosts:11 Are you sure you want to continue connecting (yes/no)? yes Welcome to Ubuntu 16.04.6 LTS (GNU/Linux 4.4.0-142-generic x86_64) # Solution ssh-keygen -R 192.168.0.43
- Generate the root key on the svr43 machine (namenode node of hdfs)
# Switch to the root account sudo su root cd /root/.ssh ssh-keygen-trsa ssh-copy-id server42 ssh-copy-id server37 ssh-copy-id svr43
1: Configure core-site.xml
The location of the temporary file. Be careful not to place it on a disk that is too small. The following directory is used here.
/home/airwalk/bigdata/soft/hadoop-2.10.1/data/tmp
<property> <name>fs.defaultFS</name> <value>hdfs://192.168.0.43:9000</value> <!-- Here we directly use the method of configuring IP --> #<value>hdfs://svr43:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/airwalk/bigdata/soft/hadoop-2.10.1/data/tmp</value> </property>
2: hdfs configuration file
Configure hadoop-env.sh
echo $JAVA_HOME vim hadoop-env.sh export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Configure hdfs-site.xml
Because there are 3 clusters, the number of replicas here is changed to 3.
<property> <name>dfs.replication</name> <value>3</value> </property> <!-- Specify hadoop auxiliary namenode node host configuration --> <property> <name>dfs.namenode.secondary.http-address</name> <value>server37:50090</value> </property>
3: yarn configuration
Configure yarn-env.sh
vim yarn-env.sh export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Configure yarn-site.xml
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- Specify hadoop auxiliary namenode node host configuration --> <property> <name>yarn.resourcemanager.hostname</name> <!-- Here we directly use the method of configuring IP --> <value>192.168.0.42</value> <!-- <value>server42</value> --> </property>
4: MapReduce configuration
Configure mapred-env.sh
vim yarn-env.sh export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Configure mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
3. Start
- On the namenode node, perform formatting operations
hdfs namenode -format
- Start namenode
cd sbin ./hadoop-daemon.sh start namenode ./sbin/hadoop-daemon.sh start namenode
- Start datanode
# Start all three machines ./sbin/hadoop-daemon.sh start datanode
- quit
./sbin/hadoop-daemon.sh stop namenode # All three machines exit ./sbin/hadoop-daemon.sh stop datanode
4. Cluster startup
1: Configure slaves, all nodes must be modified
cd /home/airwalk/bigdata/soft/hadoop-2.10.1/etc/hadoop vim salves # Add the name of the slave host, spaces and blank lines are not allowed svr43 server42 server37
2: Start hdfs cluster
Can automatically start all datanodes and namenodes in the cluster
# Execute the following command on the namenode node of hdfs ./sbin/start-dfs.sh
3: Start yarn
# Needs to be processed on the resourceManger node (server42) ./sbin/start-yarn.sh starting yarn daemons resourcemanager running as process 10206. Stop it first. server42: nodemanager running as process 10550. Stop it first. svr43: starting nodemanager, logging to /home/airwalk/bigdata/soft/hadoop-2.10.1/logs/yarn-airwalk-nodemanager-svr43.out server37: starting nodemanager, logging to /home/airwalk/bigdata/soft/hadoop-2.10.1/logs/yarn-airwalk-nodemanager-server37.out
5. View
Links on the namenode node
http://192.168.0.43:50070/
6. Configure hue
Hadoop configuration file modification
hdfs-site.xml
<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
core-site.html
<property> <name>hadoop.proxyuser.airwalk.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.airwalk.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property>
httpfs-site.xml configuration
<!-- Hue HttpFS proxy airwalk setting --> <property> <name>httpfs.proxyuser.airwalk.hosts</name> <value>*</value> </property> <property> <name>httpfs.proxyuser.airwalk.groups</name> <value>*</value> </property>
HUE configuration file modification
[[hdfs_clusters]] [[[default]]] fs_defaultfs=hdfs://mycluster webhdfs_url=http://node1:50070/webhdfs/v1 hadoop_bin=/usr/hadoop-2.5.1/bin hadoop_conf_dir=/usr/hadoop-2.5.1/etc/hadoop
Start hdfs and restart hue
Solution:
1. Turn off HDFS permission verification
hdfs-site.xml
<property> <name>dfs.permissions.enabled</name> <value>false</value> </property>
docker run -tid --name hue88 -p 8888:8888 -v /home/airwalk/bigdata/soft/hadoop-2.10.1/etc/hadoop:/etc/hadoop gethue/hue:latest docker cp hue.ini hue88:/usr/share/hue/desktop/conf/ docker restart hue88 docker exec -it --user root <container id> /bin/bash
sudo apt-get install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c + + krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap- devel python-devel sqlite-devel gmp-devel rsync # Author: One punch hurts so much # Link: https://www.jianshu.com/p/a80ec32afb27 # Source: Jianshu
# Reference documentation [Install :: Hue SQL Assistant Documentation (gethue.com)](https://docs.gethue.com/administrator/installation/install/) # After installing all dependencies, then # /home/airwalk/bigdata/soft/hue is the directory you want to install sudo PREFIX=/home/airwalk/bigdata/soft/hue make install
- python3.8 installation
https://blog.csdn.net/qq_39779233/article/details/106875184
- Install npm
sudo apt install npm npm install --unsafe-perm=true --allow-root
- Install node
# Select the source and version number. This is 10.x. Other versions only need to be changed to something such as: 12.x. Note that there is an x after it. curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash - #Install the corresponding node sudo apt-get install -y nodejs # Check node --version
- question
# gyp ERR! stack Error: EACCES: permission denied, mkdir problem solution # npm Some commands are not allowed to be executed under the root user. They will automatically switch from the root user to an ordinary user. If you set this here, you can execute it under the current user. sudo npm i --unsafe-perm # Then execute the following command with root permissions PREFIX=/home/airwalk/bigdata/soft/hue make install
Compilation and installation successful! ! !