Deploy cluster on hadoop 3.3.6 under linux

1. Introduction to hadoop

Hadoop is a distributed system infrastructuredeveloped by the Apache Foundation. Users can develop distributed programs without understanding the underlying details of distribution. Make full use of the power of clusters for high-speed computing and storage. Hadoop implements a distributed file system (Distributed File System), one of which is HDFS (Hadoop Distributed File System)

2. hadoop download

hadoop 3.3.6 download

3. Hadoop environment variable configuration

3.1. Software version

jdk1.8

hadoop3.3.6

zookeeper 3.8.1

3.2. hosts configuration

192.168.42.139 node1
192.168.42.140node2
192.168.42.141node3

3.3.profile

export JAVA_HOME=/usr/local/jdk1.8.0_391
export JRE_HOME=/usr/local/jdk1.8.0_391/jre
export HBASE_HOME=/usr/local/bigdata/hbase-2.5.6
export HADOOP_HOME=/usr/local/bigdata/hadoop-3.3.6
export FLINK_HOME=/usr/local/bigdata/flink-1.18.0
export SCALA_HOME=/usr/local/bigdata/scala-2.13.12
export SPARK_HOME=/usr/local/bigdata/spark-3.5.0-bin-hadoop3
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAR_HOME/lib
export PATH=.:$JAVA_HOME/bin:$JRE_HOME/bin:$FLINK_HOME/bin:$SPARK_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$PYTHON_HOME/bin: $PATH

4. Modification of hadoop configuration file

4.1. First perform password-free login processing on the three servers.

4.2. Create a directory in hadoop

logs

data

data/

data/datanode

data/namenode

data/tmp

4.3. hadoop-env.sh

 export JAVA_HOME=/usr/local/jdk1.8.0_391

4.4, hdfs-site.xml

 <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///usr/local/bigdata/hadoop-3.3.6/data/namenode</value> //Note that the previous part of the path is changed to your own
    </property>
    <property>
          <name>dfs.datanode.data.dir</name>
          <value>file:///usr/local/bigdata/hadoop-3.3.6/data/datanode</value> //Note that the previous part of the path is changed to your own
     </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>node2:9860</value>
    </property>

4.5, yarn-site.xml

<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>node1</value>
    </property>
    <property>
        <name>yarn.application.classpath</name>
        <value>/usr/local/bigdata/hadoop-3.3.6/etc/hadoop:/usr/local/bigdata/hadoop-3.3.6/share/hadoop/common/lib/*:/usr/local/bigdata/ hadoop-3.3.6/share/hadoop/common/*:/usr/local/bigdata/hadoop-3.3.6/share/hadoop/hdfs:/usr/local/bigdata/hadoop-3.3.6/share/hadoop/ hdfs/lib/*:/usr/local/bigdata/hadoop-3.3.6/share/hadoop/hdfs/*:/usr/local/bigdata/hadoop-3.3.6/share/hadoop/mapreduce/*:/usr /local/bigdata/hadoop-3.3.6/share/hadoop/yarn:/usr/local/bigdata/hadoop-3.3.6/share/hadoop/yarn/lib/*:/usr/local/bigdata/hadoop-3.3 .6/share/hadoop/yarn/*</value>
    </property>

4.6, core-site.xml

 <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/bigdata/hadoop-3.3.6/data</value> //Note that the previous part of the path is changed to your own
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node1:9000</value>
    </property>
        <property>
        <name>hadoop.http.authentication.simple.anonymous.allowed</name>
        <value>true</value>
    </property>

4.7, mapred-site.xml

 <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
    </property>

4.8, workers

node1
node2
node3

5. Master node format file system

hdfs namenode -format
2023-11-11 15:43:13,499 INFO util.GSet: 1.0% max memory 839.5 MB = 8.4 MB
2023-11-11 15:43:13,499 INFO util.GSet: capacity = 2^20 = 1048576 entries
2023-11-11 15:43:13,500 INFO namenode.FSDirectory: ACLs enabled? true
2023-11-11 15:43:13,500 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
2023-11-11 15:43:13,500 INFO namenode.FSDirectory: XAttrs enabled? true
2023-11-11 15:43:13,500 INFO namenode.NameNode: Caching file names occurring more than 10 times
2023-11-11 15:43:13,504 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536
2023-11-11 15:43:13,505 INFO snapshot.SnapshotManager: SkipList is disabled
2023-11-11 15:43:13,508 INFO util.GSet: Computing capacity for map cachedBlocks
2023-11-11 15:43:13,508 INFO util.GSet: VM type = 64-bit
2023-11-11 15:43:13,508 INFO util.GSet: 0.25% max memory 839.5 MB = 2.1 MB
2023-11-11 15:43:13,508 INFO util.GSet: capacity = 2^18 = 262144 entries
2023-11-11 15:43:13,749 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2023-11-11 15:43:13,749 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2023-11-11 15:43:13,749 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2023-11-11 15:43:13,753 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2023-11-11 15:43:13,753 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2023-11-11 15:43:13,755 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2023-11-11 15:43:13,755 INFO util.GSet: VM type = 64-bit
2023-11-11 15:43:13,755 INFO util.GSet: 0.029999999329447746% max memory 839.5 MB = 257.9 KB
2023-11-11 15:43:13,755 INFO util.GSet: capacity = 2^15 = 32768 entries
2023-11-11 15:43:13,778 INFO namenode.FSImage: Allocated new BlockPoolId: BP-576865479-192.168.42.139-1699688593772
2023-11-11 15:43:13,791 INFO common.Storage: Storage directory /usr/local/bigdata/hadoop-3.3.6/data/namenode has been successfully formatted.
2023-11-11 15:43:13,816 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/bigdata/hadoop-3.3.6/data/namenode/current/fsimage.ckpt_000000000000000000 using no compression
2023-11-11 15:43:13,906 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/bigdata/hadoop-3.3.6/data/namenode/current/fsimage.ckpt_0000000000000000000 of size 399 bytes saved in 0 seconds .
2023-11-11 15:43:13,912 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2023-11-11 15:43:13,928 INFO namenode.FSNamesystem: Stopping services started for active state
2023-11-11 15:43:13,928 INFO namenode.FSNamesystem: Stopping services started for standby state
2023-11-11 15:43:13,931 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2023-11-11 15:43:13,932 INFO namenode.NameNode: SHUTDOWN_MSG:
/****************************************************** ***********
SHUTDOWN_MSG: Shutting down NameNode at node1/192.168.42.139
*************************************************** **********/

6. Hadoop startup

6.1. Start hdfs on the master node, ./start-dfs.sh

[root@node1 sbin]# ./start-dfs.sh
Starting namenodes on [node1]
Last login: Sat Nov 11 14:54:45 CST 2023 from 192.168.42.1pts/1
Starting datanodes
Last login: Saturday November 11 15:47:00 CST 2023pts/0
Starting secondary namenodes [node2]
Last login: Saturday November 11 15:47:02 CST 2023pts/0

6.2. Start yarn on the master node, ./start-yarn.sh

[root@node1 sbin]# ./start-yarn.sh
Starting resource manager
Last login: Saturday November 11 15:47:06 CST 2023pts/0
Starting node managers
Last login: Saturday November 11 15:48:16 CST 2023pts/0

6.3. View on the main node jps

[root@node1 sbin]# jps
94961 NodeManager
94742 ResourceManager
96025Jps
91930 NameNode
92186 DataNode

6.4. View on node2 node jps

[root@node2 sbin]# jps
91826 SecondaryNameNode
95655Jps
93996 NodeManager
91583 DataNode

6.5. View in node3 node jps

[root@node3 bigdata]# jps
91257 DataNode
93595 NodeManager
96207Jps

7. View hadoop

http://192.168.42.139:8088/

http://192.168.42.139:9870/

In this way, hadoop starts successfully.