Java monitoring directory real-time upload to HDFS

Background description: In order to meet the real-time monitoring of unstructured files in specific directories on the Linux server and upload them to HDFS usage instructions Apache’s Commons-IO to implement file monitoring functions required pom <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>3.0.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>3.0.0</version> </dependency> <dependency> <groupId>commons-io</groupId> <artifactId>commons-io</artifactId> <version>2.6</version> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> <version>3.9</version> […]

HDFS High Availability Cluster Configuration File for Hadoop

core-site.xml configuration <?xml version=”1.0″ encoding=”UTF-8″?> <?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?> <!– Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License […]

Install Hadoop on Windows and use Maven in IDEA to implement HDFS API operations

The installation system of hadoop this time is Windows 10, the Hadoop version is 3.3.6, the Maven version is 3.9.4, and the Java version is 17.0.2. This tutorial is based on the follow-up tutorial after the Hadoop cluster and Java installation are completed in the previous tutorial. If the installation has not been completed, please […]

Practice k8s+flink+hdfs+dlink (5: Install dockers, cri-docker, harbor warehouse, k8s)

1: Install docker. (Required on all servers) Install some necessary system tools sudo yum install -y yum-utils device-mapper-persistent-data lvm2 Add software source information sudo yum-config-manager –add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo sudo sed -i ‘s + download.docker.com + mirrors.aliyun.com/docker-ce + ‘ /etc/yum.repos.d/docker-ce.repo Update and install Docker-CE sudo yum makecache fast sudo yum -y install docker-ce Start Docker service sudo […]

Practice k8s+flink+hdfs+dlink (4: k8s (2) architecture)

One: Node. 1.1 Why use nodes. Kubertnetes executes your workload by storing containers in Pods on nodes. So we need to register the node in advance. 1.2 Definition. A group of worker machines, called nodes, run containerized applications. Each cluster has at least one worker node. 1.3 How to use node nodes 1.3.1 Add nodes. […]

Python+Pickle/Parquet/HDF5…Comparison of quantization factor calculation performance under different file format storage modes

In quantitative trading, high-frequency factor calculation based on financial market L1/L2 quotations and transaction high-frequency data is a common investment research requirement. As the amount of financial market data continues to increase, traditional relational databases have been unable to meet the storage and query needs of large-scale data. In order to cope with this challenge, […]

LZO configuration of HDFS (1)

Table of Contents Table of Contents 1. Introduction to lzo algorithm 2. Using lzo algorithm in hadoop 3. lzo algorithm on HDFS 4. HDFS configuration lzo compression (1) Compile a) Environmental preparation 1. Download maven Linux version 2. Upload and decompress the maven package 3. Configure the maven environment 4. Download the following plug-ins through […]

Hadoop(04) HDFS programming practice

Hadoop Distributed File System (HDFS) is one of the core components of Hadoop. If Hadoop is already installed, it already contains the HDFS component and does not need to be installed separately. To study this guide, you need to install Hadoop on the Linux system. If Linux and Hadoop are not installed on the machine, […]

DataX-web imports business data in mysql into HDFS for storage and maps it to Hive tables

Problem description The following is the problem I encountered: showing part of the error message through screenshots Cause analysis: By viewing the log information of log management in DataX-web, the following information is obtained: 2023-10-05 13:19:50.025 [job-0] WARN DBUtil – test connection of [jdbc:mysql://node:3306/nshop] failed, for Code:[DBUtilErrorCode-10], Description: [Failed to connect to the database. Please […]