Java monitoring directory real-time upload to HDFS

Background description: In order to meet the real-time monitoring of unstructured files in specific directories on the Linux server and upload them to HDFS usage instructions Apache’s Commons-IO to implement file monitoring functions required pom <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>3.0.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>3.0.0</version> </dependency> <dependency> <groupId>commons-io</groupId> <artifactId>commons-io</artifactId> <version>2.6</version> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> <version>3.9</version> […]

HDFS High Availability Cluster Configuration File for Hadoop

core-site.xml configuration <?xml version=”1.0″ encoding=”UTF-8″?> <?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?> <!– Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License […]

Install Hadoop on Windows and use Maven in IDEA to implement HDFS API operations

The installation system of hadoop this time is Windows 10, the Hadoop version is 3.3.6, the Maven version is 3.9.4, and the Java version is 17.0.2. This tutorial is based on the follow-up tutorial after the Hadoop cluster and Java installation are completed in the previous tutorial. If the installation has not been completed, please […]

Practice k8s+flink+hdfs+dlink (5: Install dockers, cri-docker, harbor warehouse, k8s)

1: Install docker. (Required on all servers) Install some necessary system tools sudo yum install -y yum-utils device-mapper-persistent-data lvm2 Add software source information sudo yum-config-manager –add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo sudo sed -i ‘s + download.docker.com + mirrors.aliyun.com/docker-ce + ‘ /etc/yum.repos.d/docker-ce.repo Update and install Docker-CE sudo yum makecache fast sudo yum -y install docker-ce Start Docker service sudo […]

Practice k8s+flink+hdfs+dlink (4: k8s (2) architecture)

One: Node. 1.1 Why use nodes. Kubertnetes executes your workload by storing containers in Pods on nodes. So we need to register the node in advance. 1.2 Definition. A group of worker machines, called nodes, run containerized applications. Each cluster has at least one worker node. 1.3 How to use node nodes 1.3.1 Add nodes. […]

LZO configuration of HDFS (1)

Table of Contents Table of Contents 1. Introduction to lzo algorithm 2. Using lzo algorithm in hadoop 3. lzo algorithm on HDFS 4. HDFS configuration lzo compression (1) Compile a) Environmental preparation 1. Download maven Linux version 2. Upload and decompress the maven package 3. Configure the maven environment 4. Download the following plug-ins through […]

Hadoop(04) HDFS programming practice

Hadoop Distributed File System (HDFS) is one of the core components of Hadoop. If Hadoop is already installed, it already contains the HDFS component and does not need to be installed separately. To study this guide, you need to install Hadoop on the Linux system. If Linux and Hadoop are not installed on the machine, […]

DataX-web imports business data in mysql into HDFS for storage and maps it to Hive tables

Problem description The following is the problem I encountered: showing part of the error message through screenshots Cause analysis: By viewing the log information of log management in DataX-web, the following information is obtained: 2023-10-05 13:19:50.025 [job-0] WARN DBUtil – test connection of [jdbc:mysql://node:3306/nshop] failed, for Code:[DBUtilErrorCode-10], Description: [Failed to connect to the database. Please […]

Hadoop HDFS few simple operations

Getting started with HDFS 1 File operations 1.1 Command line interaction 1.1.1 Overview A common command format is as follows: hdfs [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS] SUBCOMMAND: Admin Commands, Client Commands, Daemon Commands. At the same time, HDFS Shell CLI supports operating a variety of file systems, including local file systems (file:///) and distributed file systems […]