Flume installation configuration

Table of Contents

Foreword:

1. Correspondence between java and Flume versions

2. Download Flume (both methods will work)

(1) The first method is to download Flume in window and transfer it to the Linux virtual machine.

(2) The second method is to directly use the wget command to install to the specified folder in the virtual machine.

3. Flume configuration

(1)flume-env.sh configuration

(2) Configure flume environment variables

4. Getting Started with Flume

5. Flume data collection test

Foreword:

The version of the java running environment corresponds to the Flume version to be installed and configured. If you use Flume1.6 version, you must use the Java1.6 and above running environment. Since the remainder of this chapter will be based on Flume1.8.0, it is required to install java1. 8 and above operating environment. (In this article, /home/export/software is the directory where the software compressed package is placed, and /home/export/servers is the software package directory. You can adjust your own corresponding paths.)

1. Correspondence between java and Flume versions

Flume version	Dependent JRE version
Flume 1.9.0	Java1.8 or higher
Flume 1.8.0	Java1.8 or higher
Flume 1.7.0	Java1.7 or higher
Flume 1.4.0, 1.5.0, 1.5.2, 1.6.0	Java1.6 or higher (1.7 recommended)

2. Download Flume (both methods are acceptable)

(1) The first method, the window downloads Flume and transfers it to the Linux virtual machine

Download link: https://mirrors.huaweicloud.com/apache/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz

Enter the /home/export/software directory, enter rz on the command line, and a dialog box will pop up to select the file to transfer (window-》linux)

#Decompress Flume
tar -zxvf apache-flume-1.8.0-bin.tar.gz -C /home/export/servers/

(2) The second method is to directly use the wget command to install to the specified folder on the virtual machine

#Download the Flume installation package to /home/export/software
wget -p /home/export/software https://mirrors.huaweicloud.com/apache/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz

#After the download is successful, there will be a mirrors.huaweicloud.com Huawei mirror directory in the /home/export/software directory, and the Flume compressed package is in it.

#Extract Flume to /home/export/servers
tar -zxvf /home/export/software/mirrors.huaweicloud.com/apache/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz -C /home/export/servers/

3. Flume configuration

(1) flume-env.sh configuration

#Enter the /home/export/servers directory
#Create soft connection
ln -s apache-flume-1.8.0-bin/ flume

#Enter the conf directory under flume
cd flume/conf/

#Copy configuration file flume-env.sh.template =》 flume-env.sh
cp flume-env.sh.template flume-env.sh

#Modify flume-env.sh and modify export JAVA_HOME to your own jdk path
vim flume-env.sh

(2) Configure flume environment variables

#Configure environment variables

vim /etc/profile

#Add to the end of the file
export FLUME_HOME=/home/export/servers/flume
export PATH=$PATH:$FLUME_HOME/bin

#Make the configuration environment effective
source /etc/profile

4. Getting Started with Flume

After completing the installation and configuration of Flume, you can use Flume. Next, a simple single-agent structure case will be used to demonstrate the introductory use of Flume. The specific steps are as follows.

1. Configure Flume collection scheme
Because Flume collects data from a variety of types and sources, and performs different types of data transmission and aggregation according to development needs. To this end, based on actual business needs, Flume has specially designed Flume Source Flume Channel and Flume Sink to match different data types and transmission requirements.
In order to correctly use Flume to collect data, you must write a Flume collection solution that suits the needs of developers. Next, write a collection netcat (a Linux tool for TCP/UDP connection and monitoring, mainly used in the field of network transmission and debugging) ) source data collection plan.

#Configure the Flume collection scheme, edit the netcat-logger.conf file as follows and save it in the /export/servers/flume/conf directory

cd /home/export/servers/flume/conf/

vim netcat-logger.conf

#Copy the following content into netcat-logger.conf

 #Example configuration solution: single-node Flume configuration
 #Define the names of each component in Agent
 #The Agent is named a1, the sources are named r1, the sinks are named k1, and the channels are named c1
 a1.sources = r1
 a1.sinks = k1
 a1.channels = c1
 #Describe and configure the sources component (data source type, application address for collecting data sources)
 a1.sources.r1.type = netcat
 a1.sources.r1.bind = localhost
 a1.sources.r1.port = 44444
 #Describe and configure the sinks component (the type of data outflow after collection)
 a1.sinks.k1.type = logger
 #Describe and configure channels (cache type, memory cache size and transaction cache size)
 a1.channels.c1.type = memory
 a1.channels.c1.capacity = 1000
 a1.channels.c1.transactionCapacity = 100
 #Bind source and sink through the same channel connection
 a1.sources.r1.channels = c1
 a1.sinks.k1.channel = c1

(2) Start Flume

cd /home/export/servers/flume/
flume-ng agent --conf conf/ --conf-file conf/netcat-logger.conf --name a1 -Dflume.root.logger=INFO,console

The 44444 port number appears after startup, proving that Flume has been configured successfully.

5. Flume data collection test

In order to verify and view the effect of Flume collecting data, you can simulate generating netcat data on port 44444 of the local machine. First, open or clone a terminal session box and enter the following command in the new session box.

#Install the telnet tool (can be ignored if installed)
yum -y install telnet

#The function of the following command is to use the telnet tool to connect to the 44444 port of the local machine to continuously send information as the data source to be collected by Flume.
telnet lcalhost 44444

#Flume collects data for testing. In the telnet tool test interface, enter the message Hello and press the Enter key. The following information can be viewed in the Flume terminal session window:

The above screenshot results show that Flume has accurately monitored and collected the telnet data sent by the monitoring application, and output it to the console for display according to the instructions at startup.

6. Distributed cluster deployment (guaranteeing high reliability)

Cluster (master, hadoop02, hadoop03), directly distribute the Flume configuration in the third step to hadoop02, hadoop03

#Distribute the Flume installation directory to hadoop02 and hadoop03

scp -r /home/export/servers/apache-flume-1.8.0-bin hadoop02:`pwd`
scp -r /home/export/servers/apache-flume-1.8.0-bin hadoop03:$PWD
scp /etc/profile hadoop02:`pwd`
scp /etc/profile hadoop03:$WPD

#Build soft links on hadoop02 and hadoop03
cd /home/export/servers
ln -s apache-flume-1.8.0-bin/ flume

#Refresh environment variables on hadoop02 and hadoop03
source /etc/profile