Linux deployment ETL tool kettle

1 System environment

jdk1.8

kettle8

centos7

2 Configuration process

2.1 Configure jdk

Check if there is a java environment

java -version

If not, execute the following command, if yes, skip it

mkdir /usr/local/java

cd /usr/local/java

tar -zxvf jdk-8u101-linux-x64.tar.gz

vim /etc/profile

export JAVA_HOME=/usr/local/java/jdk1.8.0_101
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}

source /etc/profile

java -version

date to check whether the time between servers is synchronized

date

yum -y install ntp

ntpdate -u cn.pool.ntp.org

2.2 Server enables ssh password-free

Check if ssh service is installed

ssh

If it is not installed, execute the following command. If it is installed, skip it.

sudo apt-get update

sudo apt-get install openssh-server

Check whether the ssh service is started

service --status-all | grep ssh

start ssh

sudo service sshd start

Generate rsa key pair on host A

ssh-keygen -t rsa

After pressing Enter three times, the “.ssh” folder will be generated in the root directory. There will be two files in it, namely id_rsa and id_rsa.pub.
id_rsa: is the private key of the local machine. The host will use this private key to encrypt the data before transmitting data to other hosts using the ssh protocol.
id_rsa.pub: is the public key of this machine, because the ssh protocol uses asymmetric encryption (the public key can be used to decrypt data encrypted with the private key, and similarly, the private key can also be used to decrypt data encrypted with the public key) , so the host generally puts the public key in the ssh server of other hosts that need to log in remotely.

Transfer the public key generated on host A to host B

First, use the command:
Executed on the command line of host A, the public key generated on host A can be transferred to host B.
ssh-copy-id host B username@hostBip


The second is manual operation:
First execute the command on host A
scp .ssh/id_rsa.pub Host B username@Host Bip:~/home
After the command is successfully executed, the public key of host A will be transferred to the home directory of host B.
Switch to host B and execute the command line
cat ~/home/id_rsa.pub >> ~/.ssh/authorized_keys

Password-free login

Restart the ssh service systemctl restart sshd on host A
Then execute ssh username@ip

2.3 Check whether the port is occupied

netstat -ntlp //View all current tcp ports

netstat -ntulp | grep 80 //View all port 80 usage

jps command fails, install it

yum install java-1.8.0-openjdk-devel.x86_64

2.4 Kettle installation under Linux

After downloading from the official website, transfer the compressed package to the server for decompression.

unzip xxxx

2.5 Test whether Kettle is installed successfully

cd data-integration

./kitchen.sh

The error message is as follows:

############################################## ########################
WARNING: no libwebkitgtk-1.0 detected, some features will be unavailable
    Consider installing the package with apt-get or yum.
    e.g. 'sudo apt-get install libwebkitgtk-1.0-0'
################################################ #####################

input

wget ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/home:/matthewdva:/build:/EPEL:/el7/RHEL_7/x86_64/webkitgtk-2.4. 9-1.el7.x86_64.rpm


yum install webkitgtk-2.4.9-1.el7.x86_64.rpm

2.6 Kettle cluster environment configuration

The path where Carte’s configuration file is located:/kettlle/data-integration/pwd

There are 6 files in this directory

A master server configuration file carte-config-master-8080.xml

Four slave server configuration files

carte-config-8081.xml, carte-config-8082.xml

carte-config-8083.xml, carte-config-8084.xml

A cluster account password file kettle.pwd (the password can be modified)

The configuration content of the main server (carte-config-master-8080.xml) is:

<slaveserver>
   <name>master1</name>
   <hostname>localhost</hostname>
   <port>8080</port>
   <master>Y</master>
 </slaveserver>

name: Specify the Kettle main server name

hostname: Specify the Kettle main server IP address

port: Specify the Kettle main server port number

master: Specifies whether it is the master server

This file does not need to be configured in a pseudo-distributed environment.

The configuration content of the sub-server (carte-config-8081.xml) is:

Among them, name, hostname, and port in masters need to be exactly the same as those in carte-config-master-8080.xml.

<masters>
    <slaveserver>
     <name>master1</name>
     <hostname>master</hostname>
     <port>8080</port>
     <username>cluster</username>
     <password>cluster</password>
     <master>Y</master>
    </slaveserver>
</masters>
  
  <report_to_masters>Y</report_to_masters>
  
  <slaveserver>
   <name>slave1-8081</name>
   <hostname>slave1</hostname>
   <port>8082</port>
   <username>cluster</username>
   <password>cluster</password>
   <master>N</master>
  </slaveserver>

Based on the above configuration, other slave servers overwrite the changed pwd folders in the master server with the corresponding folders on other servers. Copy files to target host IP

scp -rp file to be copied username@target host ip: target host folder

Copy files from the target host to the local machine

scp -rp username@target host ip: target host file to be copied to a folder on this machine

2.7 Cluster carte service is enabled

Start the main server

./carte.sh ip port

The master server keeps running in the background and writes logs to a custom file

nohup /opt/Kettle/data-integration/./carte.sh 192.168.1.132 9090


/opt/Kettle/data-integration/logs/out.log 2> & amp;1 & amp;

Start the slave server

./carte.sh pwd/carte-config-8081.xml

Keep the slave server running in the background and write logs to a custom file

nohup /opt/Kettle/data-integration/./carte.sh /opt/Kettle/data-integration/pwd/carte-config-9091.xml


>/opt/Kettle/data-integration/logs/out.log 2> & amp;1 & amp;

Browser access ip:port

The default account password is cluster

2.8 Configure the cluster in the kettle graphical interface

  1. Open the graphical interface of kettle locally and create a new test transformation

  2. Select “Subserver” in “Conversion”, right-click “New”, and fill in the master-slave server information in the pop-up dialog box

  3. In the main object tree, select “Kettle Cluster Schmas”, right-click “New”, fill in the master-slave server information in the pop-up dialog box, then click “Select Subserver”, add the newly created subserver, and then confirm .
  4. Right-click on the output and select “Cluster”, select the “Schema” cluster you created, and click “OK”
  5. Right-click “New” in “Run Configurations” and fill in the following parameters
  6. Click “Run”, select the “test cluster” you created, and finally click “Start”