Flume construction and installation When uploading HDFS web pages, the connection was refused…Trying::1…telnet: connect to address::1: Connection refused

Table of Contents

1. Flume

1.Features of Flume:

2. What can Flume do?

3. Flume collection and storage

4. Flume’s three major components

5. Flume official website connection Chinese version

2. Install Flume

(1) Upload and decompress the software package

(2) Configure environment variables

3. Test Flume

(1) Edit the Flume configuration file and start it

1. Create a new configuration file

2. Edit configuration file

3. Start the configuration file

4. Install Telnet service

5. Problem solving:

(1) Question:

(2) Cause analysis:

(3)Solution:

4. Write data migration

1. Create a new folder

2. Create configuration file

3. Start the cluster and view the process

4. Create a data file directory and put the data files in it

5.Write python files

1) Requirements:

2) Run the python file

3) View generated files

6. Run the configuration file and upload the web page

1) Run the configuration file:

2) Check whether the operation is successful

?edit

3) Check whether there are files on the HDFS webpage

?Edit 4) Try downloading the file

5) Solve the problem

a) Question:

b) Cause analysis:

c) Solve the problem:

6) Download successful

5. Document processing


一、Flume

1.Features of Flume:

It is a system that is distributed, reliable, and highly available for massive log collection, aggregation and transmission. Flume also has good custom expansion capabilities for special scenarios. Therefore, Flume can be applied to most daily data collection scenarios.

2. What can Flume do?

Flume is a tool/service that can collect data resources such as logs and events, and centralize and store these huge amounts of data from various data sources.

3. Flume collection and storage

Flume can collect various forms of source data such as files, folders, kafka, mysql database, etc., and can output the collected data (sink) to many external sources such as HDFS, hbase, hive, kafka, etc. in the storage system.

4. The three major components of Flume

  • Source: read data
  • Channel: Receive stored data; writing method: content, file
  • Sink: Read stored data; Writing method: print to console, HDFS

5.flume official website connection Chinese version

https://flume.liyifeng.org/#exec-source

2. Install Flume

(1) Upload and decompress the software package

  • Upload package:
cd /opt/softwares

  • Unzip the package:

tar command package name -C decompression path

tar -xf apache-flume-1.9.0-bin.tar.gz -C /opt/modules/

  • Set soft connection: (environment variables can be used directly)
? ln -s apache-flume-1.9.0-bin/ flume

(2) Configure environment variables

  • Path:
vi /etc/profile

  • Configure environment variables:
export FLUME_HOME=/opt/modules/flume
export PATH=$FLUME_HOME/bin:$PATH

  • Update environment variables:
source /etc/profile

  • Change profile:
 cp flume-env.sh.template flume-env.sh

Copy the file to make changes so that we can find the original file if we make a mistake.

  • Add java environment variables:
export JAVA_HOME=/opt/modules/jdk1.8.0_241

  • View version:
flume-ng version

3. Test Flume

(1) Edit Flume configuration file and start

1. Create a new configuration file

vi nc-flume.conf

2. Edit configuration file

Select the appropriate components source, channel, and sink respectively.

The test here selects netcat, memory, and logger as components.

#sink alias
a1.sinks = k1

# Configure source-related information. Netcat of data source. Data of one port of a host. Specify host port.
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Configure channel-memory
a1.channels.c1.type =memory

# Configure sink-console printing
a1.sinks.k1.type=logger

# Bind the corresponding relationship of source channel sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3. Startup configuration file

 flume-ng agent -n a1 --conf-file nc-flume.conf -Dflume.root.logger=INFO,console

4. Install Telnet service

Start a new node!

yum install telnet -y

Successful installation:

try to connect

 telnet localhost 44444

Connection refused!

5. Problem Solving:

(1) Question:
  • Trying ::1…
    telnet: connect to address ::1: Connection refused
    Trying 127.0.0.1…
    telnet: connect to address 127.0.0.1: Connection refused
(2) Cause analysis:

This is a connection refused and the port cannot be found.

(3) Solution:

Check the config file: !

It is found here that the configuration is wrong

correct:

# Alias of source
a1.sources = r1
# Alias of channel
a1.channels = c1
Alias for #sink
a1.sinks = k1

# Configure source-related information. Netcat of data source. Data of one port of a host. Specify host port.
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Configure channel-memory
a1.channels.c1.type =memory

# Configure sink-console printing
a1.sinks.k1.type=logger

# Bind the corresponding relationship of source channel sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

After modification:

Start the profile again:

Successfully connected! ! ! This can easily go wrong! !

4. Write data migration

Try to write data and upload it to the HDFS web page!

1.New folder

2. Create configuration file

  • Use exec, file, and hdfs components to configure and upload files to the web page.
  • Specify the name and path of the generated file
#Alias
a1.sources = r1
a1.channels = c1
a1.sinks = k1

#editsources
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/data/students_info.txt

# edit channels
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /opt/modules/flume/channels_checkpoints_file/checkpoint
a1.channels.c1.dataDirs = /opt/modules/flume/channels_check_file/data


#edit sinks
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d
a1.sinks.k1.hdfs.filePrefix = user-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 1
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.fileType = DataStream

#Edit channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3. Start the cluster and view the process

4. Create a data file directory and put the data file in it

mkdir data

5. Write python file

1)Requirement:

  • The name is 3 characters and random. The name is composed of 1,2,3 (19 characters in each field)
  • Gender is male or female
  • The student number is 3 digits (minimum 001, maximum 100)
  • Age 15-22 years old
  • The score consists of four subjects, with a minimum score of 60 points and a full score of 100 points (the total score of the four subjects is added up)
  • Write to log file
#!/usr/bin/python
#coding=UTF-8
import random
import time

# Name
nameArray1 = ["Zhao", "Qian", "Sun", "Li", "Zhou", "Wu", "Zheng", "Wang", "Deng", "Ma", "Yang", "Han" , "Su", "Jiang", "Chiang", "Zhong", "Liu",
              "Chen", "Fang", "Zeng"]
nameArray2 = ["Ming", "He", "Jian", "Chao", "Hong", "Empty", "Zheng", "He", "Nine", "Ke", "Xiang", "Kai" , "Hui", "Shu", "Jia", "Pu", "Peng",
              "Home", "Europe", "Fei"]
nameArray3 = ["Hui", "Ke", "Military", "Learning", "District", "Fei", "心", "新", "美", "cloth", "丽", "风" , "flat", "high", "floor", "machine", "table",
              "Ni", "Ruo", "Ru"]

# gender
sexArr = ["male", "female"]


# Schedule (* * * * */1 once per second) to send files to the students_info.txt file
def log():
    name1 = nameArray1[random.randint(0, 19)]
    name2 = nameArray2[random.randint(0, 19)]
    name3 = nameArray3[random.randint(0, 19)]
    name = name1 + name2 + name3
    # # student ID
    # student_id = str(random.randint(0, 100))

    # Generate a 3-digit student number, starting from 001 and up to 100
    student_id = str(random.randint(1, 100)).zfill(3)

    #Age (15-22)
    age = random.randint(15, 22)
    sex = sexArr[random.randint(0, 1)]
    # score
    chinese_score = random.randint(60, 100)
    math_score = random.randint(60, 100)
    english_score = random.randint(60, 100)
    computer_score = random.randint(60, 100)

    total_score = chinese_score + math_score + english_score + computer_score
    info = "{},{},{},{},{},{},{},{}".format(student_id, name, sex, chinese_score, math_score, english_score,
                                            computer_score, total_score)
    # info = f"{student_id},{name},{sex},{chinese_score},{math_score},{english_score},{computer_score},{total_score}"
    print(info)

    # Time written to log file
    with open('students_info.txt', 'a + ') as student_log:
        student_log.writelines(info)
        student_log.writelines("\
")


while True:
    log()
    time.sleep(1) # Take a look every 1 second

2) Run python file

python student.py

3) View generated files

6. Run configuration file, upload web page

1) Run configuration file:

flume-ng agent -n a1 –conf-file file name + print to console (-Dflume.root.loggger=INFO,console)

flume-ng agent -n a1 --conf-file student-score_exec-file-hdfs.conf -Dflume.root.logger=INFO,console

2) Check whether the operation is successful

3) Check whether the HDFS web page has files

ip address:9870

4) Try to download the file

  • Check whether it is consistent with the file in the virtual machine
  • Click to download

  • Cannot download

5) Solve the problem

a)Question:

Can’t download file

b)Cause analysis:

It may be a firewall or mapping problem

c) Solve the problem:
  • Check the firewall status:
systemctl status firewalld.service

closed

  • Find the Windows mapping file and set the mapping
C:\Windows\System32\drivers\etc

Because the files inside cannot be changed, you have to drag them outside to change them.

Open with notepad

Add mapping

192.168.58.3 hadoop01
192.168.58.4 hadoop02
192.168.58.5 hadoop03

move it back again

6) Download successful

5. File processing

Chart analysis and subsequent hive operations can be performed on this file.