Install Hadoop on Windows and use Maven in IDEA to implement HDFS API operations

The installation system of hadoop this time is Windows 10, the Hadoop version is 3.3.6, the Maven version is 3.9.4, and the Java version is 17.0.2. This tutorial is based on the follow-up tutorial after the Hadoop cluster and Java installation are completed in the previous tutorial. If the installation has not been completed, please check the pre-requisite tutorial. Hadoop, java installation
################################################ ###########################################
In addition, the big data tutorial series is still being updated continuously (including running some instances, installing databases, spark, mapreduce, hive, etc.), everyone is welcome to follow me

Client environment preparation

Hadoop download and decompression

  1. Download Hadoop3.3.6 version (the specific version can be modified according to your own needs) and unzip it to a non-Chinese directory
    This time I chose version 3.3.6. download link
    Apache Hadoop
    Then download the windows dependency package (The supported version of the dependency package should match the Hadoop version), and extract the dependency package to the overlay file in the hadoop-3.3.6\bin directory.
    winutils contains most Hadoop versions

Environment variable configuration

  1. Configure HADOOP_HOME environment variable
    The location of the environment variables is in Computer-Properties-Advanced System Settings-Environment Variables

    Fill in your Hadoop directory according to the new HADOOP_HOME system variable

    Then select Path in User Variables and click Edit to create the following variables
%HADOOP_HOME%\bin
%HADOOP_HOME%\sbin


Verify whether the Hadoop environment variables are normal. Double-click hadoop-3.3.6\bin\winutils.exe, if the following error is reported. It means that the Microsoft runtime library is missing (genuine systems often have this problem). If no error is reported and the black running box flashes by, it means the operation is normal.
Runtime library program installation address Microsoft runtime library repair

Maven download and configuration

Maven download and installation

  1. Maven download (the version used this time is version 3.9.4)
    Official website download address
  2. After the download is complete, unzip the compressed package into a directory that does not contain the Chinese path.
  3. Create a new repository in the maven folder to store the warehouse files

Maven configuration

  1. Open the settings.xml file in the apache-maven-3.9.4\conf path
  2. Target the following content
<!-- localRepository
   | The path to the local repository maven will use to store artifacts.
   |
   | Default: ${user.home}/.m2/repository
 -->
  1. Add content after it: (repository path is filled in with the path of the repository you just created)
 <localRepository>repository path</localRepository>


4. Set mirror source
5. Still in the settings.xml file
6. Comment out the source image and add the Alibaba image

<mirror>
 <id>nexus-aliyun</id>
 <name>nexus-aliyun</name>
 <url>http://maven.aliyun.com/nexus/content/groups/public</url>
 <mirrorOf>central</mirrorOf>
</mirror>
<!--
   <mirror>
     <id>maven-default-http-blocker</id>
     <mirrorOf>external:http:*</mirrorOf>
     <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
     <url>http://0.0.0.0/</url>
     <blocked>true</blocked>
   </mirror>
 -->


7. Save the file

IDEA

Modify Maven settings in IDEA

  1. Press Ctrl + Alt + S to open the IDEA settings interface
  2. Select Build, Execution, Deployment--Build Tools--Maven in order
  3. Modify them one by one and check Use settings from .mvn/maven.config
Maven home path: The path where the Maven folder is located
User settings file: The path where settings.xml is located
Local repository: the path where the repository is located


Save after modification is completed

Create a Maven project in IDEA

  1. Open IDEA and select File-New-Project
  2. Select Maven Archetype where Archetype is selected as maven-archetype-quickstart
    Open Advanced Settings to set more settings. Other options are based on your own needs (the corresponding translation is shown in the figure below).
  3. Import the corresponding dependency coordinates + log addition, and paste the following content before in pom.xml
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.3.6</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.30</version>
</dependency>
</dependencies>

Note that the version number of hadoop-client is modified according to the version you downloaded. The dependent package is downloaded through the network, and the download speed is related to the network speed
4. In the src/main/resources directory of the project, create a new file, name it “log4j.properties”, and fill in the file (print log, level)

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n


5. Create package name: com.hadoop.hdfs (under java)
8. Create HdfsClient class

9. In the HdfsClient class, enter

package com.hadoop.hdfs;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.junit.Test;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

public class HdfsClient {<!-- -->
    @Test
    public void testMkdirs() throws IOException, URISyntaxException, InterruptedException {<!-- -->
// 1 Get the file system
        Configuration configuration = new Configuration();
//Error demonstration: If the user is not set, a permission error will be reported FileSystem fs = FileSystem.get(newURI("hdfs://hadoop102:8020"), configuration);
        FileSystem fs = FileSystem.get(new URI("hdfs://hadoop01:8020"), configuration,"root");
        //Hadoop01 is modified to the address of the node where namenode is located
// 2 Create directory
        fs.mkdirs(new Path("/xiyou/huaguoshan/"));
// 3 Close the resource
        fs.close();
    }
}

If there is an underlined wavy line, it means that an exception needs to be thrown (alt + Enter) Ctrl + p prompts the parameters of the function
10. Execute program
If the program can run normally, the /xiyou/huaguoshan/ folder will appear in the hdfs system.
When the client operates HDFS, it has a user identity. By default, the HDFS client API starts from
When using the Windows default user to access HDFS, a permission exception error will be reported. So when accessing HDFS, you must configure the user.

org.apache.hadoop.security.AccessControlException: Permission denied:
user=56576, access=WRITE,
inode="/xiyou/huaguoshan":hadoop:supergroup:drwxr-xr-x
  1. Program improvement: encapsulation of common steps
    (1) First encapsulate the preparation work into init, as follows:
public void init() throws URISyntaxException, IOException, InterruptedException {<!-- -->
        //NN address of the connected cluster
        URI uri = new URI("hdfs://hadoop102:8020"); //8020 is the hdfs internal communication port
        // URI uri = new URI("hdfs://hadoop102:8020")
        //Create a configuration file
        Configuration conf = new Configuration();

        //user
        String user = "hadoop";
        //Get customers
        // end object
        fs = FileSystem.get(uri, conf,user);
    }

If fs in init is used in close() and testMkdirs(), it must be defined as a global variable. Select fs and press ctrl + alt + f. A definition form will pop up. Press the enter key to define.

(2) Encapsulate the operation of closing resources, as follows:

public void close() throws IOException {<!-- -->
        //3. Close the resource
        fs.close();
}

(3) The original testMkdirs() function is only used to process the main logic:

public void testMkdirs() throws URISyntaxException, IOException, InterruptedException {<!-- -->
//2.Create a folder
fs.mkdirs(new Path("/xiyou/huaguoshan1"));
}

(4) Finally: To use Junit testing, @Test specifies the code to be tested, @Before specifies the initialization code to be executed before executing the test code, and @After specifies the operations to be executed after the specified code is executed.