1 System environment
jdk1.8
kettle8
centos7
2 Configuration process
2.1 Configure jdk
Check if there is a java environment
java -version
If not, execute the following command, if yes, skip it
mkdir /usr/local/java cd /usr/local/java tar -zxvf jdk-8u101-linux-x64.tar.gz vim /etc/profile export JAVA_HOME=/usr/local/java/jdk1.8.0_101 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin export PATH=$PATH:${JAVA_PATH} source /etc/profile java -version
date to check whether the time between servers is synchronized
date yum -y install ntp ntpdate -u cn.pool.ntp.org
2.2 Server enables ssh password-free
Check if ssh service is installed
ssh
If it is not installed, execute the following command. If it is installed, skip it.
sudo apt-get update sudo apt-get install openssh-server
Check whether the ssh service is started
service --status-all | grep ssh
start ssh
sudo service sshd start
Generate rsa key pair on host A
ssh-keygen -t rsa
After pressing Enter three times, the “.ssh” folder will be generated in the root directory. There will be two files in it, namely id_rsa and id_rsa.pub.
id_rsa: is the private key of the local machine. The host will use this private key to encrypt the data before transmitting data to other hosts using the ssh protocol.
id_rsa.pub: is the public key of this machine, because the ssh protocol uses asymmetric encryption (the public key can be used to decrypt data encrypted with the private key, and similarly, the private key can also be used to decrypt data encrypted with the public key) , so the host generally puts the public key in the ssh server of other hosts that need to log in remotely.
Transfer the public key generated on host A to host B
First, use the command: Executed on the command line of host A, the public key generated on host A can be transferred to host B. ssh-copy-id host B username@hostBip The second is manual operation: First execute the command on host A scp .ssh/id_rsa.pub Host B username@Host Bip:~/home After the command is successfully executed, the public key of host A will be transferred to the home directory of host B. Switch to host B and execute the command line cat ~/home/id_rsa.pub >> ~/.ssh/authorized_keys
Password-free login
Restart the ssh service systemctl restart sshd on host A Then execute ssh username@ip
2.3 Check whether the port is occupied
netstat -ntlp //View all current tcp ports netstat -ntulp | grep 80 //View all port 80 usage
jps command fails, install it
yum install java-1.8.0-openjdk-devel.x86_64
2.4 Kettle installation under Linux
After downloading from the official website, transfer the compressed package to the server for decompression.
unzip xxxx
2.5 Test whether Kettle is installed successfully
cd data-integration ./kitchen.sh
The error message is as follows:
############################################## ######################## WARNING: no libwebkitgtk-1.0 detected, some features will be unavailable Consider installing the package with apt-get or yum. e.g. 'sudo apt-get install libwebkitgtk-1.0-0' ################################################ #####################
input
wget ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/home:/matthewdva:/build:/EPEL:/el7/RHEL_7/x86_64/webkitgtk-2.4. 9-1.el7.x86_64.rpm yum install webkitgtk-2.4.9-1.el7.x86_64.rpm
2.6 Kettle cluster environment configuration
The path where Carte’s configuration file is located:/kettlle/data-integration/pwd
There are 6 files in this directory
A master server configuration file carte-config-master-8080.xml
Four slave server configuration files
carte-config-8081.xml, carte-config-8082.xml
carte-config-8083.xml, carte-config-8084.xml
A cluster account password file kettle.pwd (the password can be modified)
The configuration content of the main server (carte-config-master-8080.xml) is:
<slaveserver> <name>master1</name> <hostname>localhost</hostname> <port>8080</port> <master>Y</master> </slaveserver>
name: Specify the Kettle main server name
hostname: Specify the Kettle main server IP address
port: Specify the Kettle main server port number
master: Specifies whether it is the master server
This file does not need to be configured in a pseudo-distributed environment.
The configuration content of the sub-server (carte-config-8081.xml) is:
Among them, name, hostname, and port in masters need to be exactly the same as those in carte-config-master-8080.xml.
<masters> <slaveserver> <name>master1</name> <hostname>master</hostname> <port>8080</port> <username>cluster</username> <password>cluster</password> <master>Y</master> </slaveserver> </masters> <report_to_masters>Y</report_to_masters> <slaveserver> <name>slave1-8081</name> <hostname>slave1</hostname> <port>8082</port> <username>cluster</username> <password>cluster</password> <master>N</master> </slaveserver>
Based on the above configuration, other slave servers overwrite the changed pwd folders in the master server with the corresponding folders on other servers. Copy files to target host IP
scp -rp file to be copied username@target host ip: target host folder
Copy files from the target host to the local machine
scp -rp username@target host ip: target host file to be copied to a folder on this machine
2.7 Cluster carte service is enabled
Start the main server
./carte.sh ip port
The master server keeps running in the background and writes logs to a custom file
nohup /opt/Kettle/data-integration/./carte.sh 192.168.1.132 9090 /opt/Kettle/data-integration/logs/out.log 2> & amp;1 & amp;
Start the slave server
./carte.sh pwd/carte-config-8081.xml
Keep the slave server running in the background and write logs to a custom file
nohup /opt/Kettle/data-integration/./carte.sh /opt/Kettle/data-integration/pwd/carte-config-9091.xml >/opt/Kettle/data-integration/logs/out.log 2> & amp;1 & amp;
Browser access ip:port
The default account password is cluster
2.8 Configure the cluster in the kettle graphical interface
-
Open the graphical interface of kettle locally and create a new test transformation
-
Select “Subserver” in “Conversion”, right-click “New”, and fill in the master-slave server information in the pop-up dialog box
- In the main object tree, select “Kettle Cluster Schmas”, right-click “New”, fill in the master-slave server information in the pop-up dialog box, then click “Select Subserver”, add the newly created subserver, and then confirm .
- Right-click on the output and select “Cluster”, select the “Schema” cluster you created, and click “OK”
- Right-click “New” in “Run Configurations” and fill in the following parameters
- Click “Run”, select the “test cluster” you created, and finally click “Start”