Using Java API to operate HDFS

  • (1) Experimental principle

The experimental principle of using Java API to operate HDFS is as follows:

Configure the Hadoop environment: First, you need to configure the Hadoop environment, including setting the Hadoop installation path, configuring core-site.xml and hdfs-site.xml files so that Java programs can connect to HDFS.

Introducing Hadoop dependencies: In Java projects, you need to introduce Hadoop-related dependencies, including hadoop-common, hadoop-hdfs and other dependencies, so that you can use the API provided by Hadoop.

Create a Configuration object: Use the org.apache.hadoop.conf.Configuration class to create a Configuration object, which contains Hadoop configuration information.

Create a FileSystem object: Use the static method get() of the org.apache.hadoop.fs.FileSystem class, pass in the Configuration object, and create a FileSystem object, which is used to interact with HDFS.

Perform HDFS operations: Through the FileSystem object, you can perform various HDFS operations, such as creating directories, uploading files, downloading files, deleting files, renaming files, etc. For specific operations, you can use the methods provided by the FileSystem object, such as create(), copyFromLocalFile(), copyToLocalFile(), delete(), rename(), etc.

Close the FileSystem object: After the operation is completed, you need to call the close() method of the FileSystem object to close the connection with HDFS and release resources.

Through the above steps, you can use Java API to operate HDFS to manage and operate the HDFS file system.

  • (2) Experimental environment

To use Java API to operate HDFS, you need to set up the following experimental environment:

Install Java Development Kit (JDK): First you need to install JDK, it is recommended to use Java 8 or higher.

Install Hadoop: Install Hadoop on a local or remote server. You can download the latest version of Hadoop from the Hadoop official website and install and configure it according to the official documentation.

Set Hadoop environment variables: Add the Hadoop installation path to the system’s environment variables so that the Java program can find Hadoop’s related dependent libraries and configuration files.

Introducing Hadoop dependencies: In Java projects, Hadoop-related dependencies need to be introduced, including hadoop-common, hadoop-hdfs and other dependencies. Dependencies can be managed using build tools such as Maven or Gradle.

Write Java code: Use Java to write code to operate HDFS through the Java API provided by Hadoop. Hadoop related parameters need to be configured in the code, such as HDFS URL, configuration file path, etc.

Compile and run Java programs: Use a Java compiler to compile Java code into a bytecode file, and then use the Java Virtual Machine (JVM) to run the compiled bytecode file.

After the experimental environment is set up, you can use the Java API to operate HDFS to manage and operate the HDFS file system.

(3) Experimental steps

1. Create a Maven project

First, open IDEA, click New Project, select Maven on the left, and then click next directly

Set the project name to HadoopDemo and click Finish

Click Enable Auto-Import in the lower right corner (automatically import Jar package files), and an empty Maven project is created.

2. Import dependencies

First edit the pom.xml (core file of the Maven project) file, add the following content, and import dependencies (required jar packages)

code display

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>untitled</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>20</maven.compiler.source>
        <maven.compiler.target>20</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.3.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>3.3.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.3.0</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
    </dependencies>
</project>

IDEA will automatically save the file and import the dependency packages. Click Maven on the right, expand Dependencies, and you can see the four dependency packages and the imports.

3. Configure Hadoop operating environment in windows

After finding and decompressing the downloaded hadoop installation package, add winutils.exe, winutils.pdb, and Hadoop.dll files in the bin directory of the Hadoop installation path.

If you want to download the Hadoop installation package, please follow me and ask for it.

4. Configuration environment

Successful configuration display

3. Initialization

We use junit to test, first create a class and add the following content

4. HDFS code operation

(1) Upload files toHDFSFile system

Show successful upload

(2) from HDFSDownload files to local

Transfer thelol lineTxt file just transferred toHDFS to your computer

(3)Create table of contents

(4) View the HDFS directory File information

All codes are as follows

package cn.itcast.hdfsdemo;

import java.io.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.junit.*;
    public class HDFS_CURD {

        //Objects that can operate the HDFS file system
        FileSystem fs = null;

        @Before
        public void init() throws Exception {
            //Construct a configuration parameter object and set a parameter: the UR of the HDFS to be accessed
            Configuration conf = new Configuration();
            // Specify using HDFS for access
            conf.set("fs.defaultFS", "hdfs://hadoop1:9000");
            // Set the client identity (root is the user name of the virtual machine, any one of the hadoop cluster nodes can be used)
            System.setProperty("HADOOP_USER_NAME", "root");
            // Obtain the HDFS file system client object through the static get() method of FileSystem
            fs = FileSystem.get(conf);
        }


        @Test
        public void testAddFileToHdfs() throws IOException {
            //Specify the file uploaded by the local file system (static)
            Path src = new Path("E:\IDEAPractise\upload\lol line.txt");
            //Specify the directory to upload files to HDFS
            Path dst = new Path("/testFile");
            fs.copyFromLocalFile(src, dst);
            System.out.println("Upload successful");
            //Close the resource
            fs.close();
        }

        //Download files from HDFS to local
        @Test
        public void testDownloadFileToLocal() throws IOException {
            //Path to be downloaded (HDFS)
               Path src = new Path("/testFile/lol line.txt");
               //The path where the download will be saved after successful download (windows)
               Path dst = new Path("E:\IDEAPractise\downloadFile/lol.txt");
               //download
            fs.copyToLocalFile(false,src,dst,true);
            System.out.println("Download successful");
        }

        //Directory operation (create directory)
        @Test
        public void testMkdirAndDeleteAndRename() throws IOException {
            fs.mkdirs(new Path("/a/b/c"));
            fs.mkdirs(new Path("/a2/b2/c2"));
            fs.rename(new Path("/a"), new Path("/a3"));
            fs.delete(new Path("/a2"),true);
            System.out.println("Created successfully");
        }
        //View file information in the directory
        @Test
        public void testListFiles() throws IOException {
            RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/car"),true);
            while (listFiles.hasNext()) {
                LocatedFileStatus fileStatus = listFiles.next();
                System.out.println("File name: " + fileStatus.getPath().getName());
                System.out.println("Number of copies of file: " + fileStatus.getReplication());
                System.out.println("File permissions: " + fileStatus.getReplication());
                System.out.println("File size: " + fileStatus.getLen() + "bytes");
                 BlockLocation[] blockLocations = fileStatus.getBlockLocations();
                 for (BlockLocation b1 : blockLocations){
                     String[] hosts = b1.getHosts();
                     System.out.println("The host name of the virtual machine where the file's Block is located:");
                     for(String host : hosts) {
                         System.out.println(host);
                     }
                 }
            }
            System.out.println("--------------------------------------------- ------------------");
    }
      }

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Java skill treeUsing JDBC to operate databasesJDBC Overview 138829 people are learning the system