The installation system of hadoop this time is Windows 10, the Hadoop version is 3.3.6, the Maven version is 3.9.4, and the Java version is 17.0.2. This tutorial is based on the follow-up tutorial after the Hadoop cluster and Java installation are completed in the previous tutorial. If the installation has not been completed, please check the pre-requisite tutorial. Hadoop, java installation
################################################ ###########################################
In addition, the big data tutorial series is still being updated continuously (including running some instances, installing databases, spark, mapreduce, hive, etc.), everyone is welcome to follow me
Client environment preparation
Hadoop download and decompression
- Download Hadoop3.3.6 version (the specific version can be modified according to your own needs) and unzip it to a non-Chinese directory
This time I chose version 3.3.6. download link
Apache Hadoop
Then download the windows dependency package (The supported version of the dependency package should match the Hadoop version), and extract the dependency package to the overlay file in thehadoop-3.3.6\bin
directory.
winutils contains most Hadoop versions
Environment variable configuration
- Configure
HADOOP_HOME
environment variable
The location of the environment variables is in Computer-Properties-Advanced System Settings-Environment Variables
Fill in your Hadoop directory according to the newHADOOP_HOME
system variable
Then select Path in User Variables and click Edit to create the following variables
%HADOOP_HOME%\bin %HADOOP_HOME%\sbin
Verify whether the Hadoop environment variables are normal. Double-click hadoop-3.3.6\bin\winutils.exe
, if the following error is reported. It means that the Microsoft runtime library is missing (genuine systems often have this problem). If no error is reported and the black running box flashes by, it means the operation is normal.
Runtime library program installation address Microsoft runtime library repair
Maven download and configuration
Maven download and installation
- Maven download (the version used this time is version 3.9.4)
Official website download address - After the download is complete, unzip the compressed package into a directory that does not contain the Chinese path.
- Create a new
repository
in the maven folder to store the warehouse files
Maven configuration
- Open the
settings.xml
file in theapache-maven-3.9.4\conf
path - Target the following content
<!-- localRepository | The path to the local repository maven will use to store artifacts. | | Default: ${user.home}/.m2/repository -->
- Add content after it: (repository path is filled in with the path of the repository you just created)
<localRepository>repository path</localRepository>
4. Set mirror source
5. Still in the settings.xml
file
6. Comment out the source image and add the Alibaba image
<mirror> <id>nexus-aliyun</id> <name>nexus-aliyun</name> <url>http://maven.aliyun.com/nexus/content/groups/public</url> <mirrorOf>central</mirrorOf> </mirror> <!-- <mirror> <id>maven-default-http-blocker</id> <mirrorOf>external:http:*</mirrorOf> <name>Pseudo repository to mirror external repositories initially using HTTP.</name> <url>http://0.0.0.0/</url> <blocked>true</blocked> </mirror> -->
7. Save the file
IDEA
Modify Maven settings in IDEA
- Press
Ctrl + Alt + S
to open the IDEA settings interface - Select
Build, Execution, Deployment--Build Tools--Maven
in order
- Modify them one by one and check
Use settings from .mvn/maven.config
Maven home path: The path where the Maven folder is located User settings file: The path where settings.xml is located Local repository: the path where the repository is located
Save after modification is completed
Create a Maven project in IDEA
- Open IDEA and select
File-New-Project
- Select
Maven Archetype
whereArchetype
is selected asmaven-archetype-quickstart
OpenAdvanced Settings
to set more settings. Other options are based on your own needs (the corresponding translation is shown in the figure below).
- Import the corresponding dependency coordinates + log addition, and paste the following content before
in
pom.xml
<dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>3.3.6</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> <version>1.7.30</version> </dependency> </dependencies>
Note that the version number of hadoop-client is modified according to the version you downloaded. The dependent package is downloaded through the network, and the download speed is related to the network speed
4. In the src/main/resources
directory of the project, create a new file, name it “log4j.properties”, and fill in the file (print log, level)
log4j.rootLogger=INFO, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n log4j.appender.logfile=org.apache.log4j.FileAppender log4j.appender.logfile.File=target/spring.log log4j.appender.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
5. Create package name: com.hadoop.hdfs
(under java)
8. Create HdfsClient
class
9. In the HdfsClient
class, enter
package com.hadoop.hdfs; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.junit.Test; import java.io.IOException; import java.net.URI; import java.net.URISyntaxException; public class HdfsClient {<!-- --> @Test public void testMkdirs() throws IOException, URISyntaxException, InterruptedException {<!-- --> // 1 Get the file system Configuration configuration = new Configuration(); //Error demonstration: If the user is not set, a permission error will be reported FileSystem fs = FileSystem.get(newURI("hdfs://hadoop102:8020"), configuration); FileSystem fs = FileSystem.get(new URI("hdfs://hadoop01:8020"), configuration,"root"); //Hadoop01 is modified to the address of the node where namenode is located // 2 Create directory fs.mkdirs(new Path("/xiyou/huaguoshan/")); // 3 Close the resource fs.close(); } }
If there is an underlined wavy line, it means that an exception needs to be thrown (alt + Enter) Ctrl + p prompts the parameters of the function
10. Execute program
If the program can run normally, the /xiyou/huaguoshan/
folder will appear in the hdfs
system.
When the client operates HDFS, it has a user identity. By default, the HDFS client API starts from
When using the Windows default user to access HDFS, a permission exception error will be reported. So when accessing HDFS, you must configure the user.
org.apache.hadoop.security.AccessControlException: Permission denied: user=56576, access=WRITE, inode="/xiyou/huaguoshan":hadoop:supergroup:drwxr-xr-x
- Program improvement: encapsulation of common steps
(1) First encapsulate the preparation work into init, as follows:
public void init() throws URISyntaxException, IOException, InterruptedException {<!-- --> //NN address of the connected cluster URI uri = new URI("hdfs://hadoop102:8020"); //8020 is the hdfs internal communication port // URI uri = new URI("hdfs://hadoop102:8020") //Create a configuration file Configuration conf = new Configuration(); //user String user = "hadoop"; //Get customers // end object fs = FileSystem.get(uri, conf,user); }
If fs in init is used in close() and testMkdirs(), it must be defined as a global variable. Select fs and press ctrl + alt + f. A definition form will pop up. Press the enter key to define.
(2) Encapsulate the operation of closing resources, as follows:
public void close() throws IOException {<!-- --> //3. Close the resource fs.close(); }
(3) The original testMkdirs() function is only used to process the main logic:
public void testMkdirs() throws URISyntaxException, IOException, InterruptedException {<!-- --> //2.Create a folder fs.mkdirs(new Path("/xiyou/huaguoshan1")); }
(4) Finally: To use Junit testing, @Test specifies the code to be tested, @Before specifies the initialization code to be executed before executing the test code, and @After specifies the operations to be executed after the specified code is executed.