Install and deploy Kettle8.2 on ubuntu22

Premise

Kettle is an ETL open source tool written in pure Java. Currently, both Kettle7 and Kettle8 require Java8 or above to run properly. Therefore, before running kettle, first check whether the java environment is correctly configured and whether the java version is 8 or above.

kettle installation

1. Create the kettle directory and extract the kettle zip package to the kettle directory

sudo unzip pdi-ce-8.2.0.0-342.zip

mv data-integration/ ./kettle/

2. Check that the following sh files have execution permissions. If not, please add them

3. Execute the kitchen.sh script

If a warning appears after execution, just install it according to the prompts, otherwise some features may not be available (mainly the use of spoon, which can be ignored if it is a no-interface environment)

The following are the detailed steps for the package warning that libwebkitgtk-1.0-0 needs to be installed.

vim /etc/apt/sources.list

Add at the end of the file;

deb http://cz.archive.ubuntu.com/ubuntu bionic main universe

implement

sudo apt-get update

In the Ubuntu update source, it prompts “There is no digital signature. This source cannot be safely used for updates, so the source is disabled by default.” The main reason is that apt-get update does not have a public key and cannot verify the following signature.


Approach

sudo apt-key adv –keyserver keyserver.ubuntu.com –recv-keys 3B4FE6ACC0B21F32

3B4FE6ACC0B21F32 is the missing key. Download whatever is missing.

Perform installation

sudo apt-get install libwebkitgtk-1.0-0

A bit slow, please be patient. . .

Re-execute the kitchen.sh script

After installation, the warning is gone. The following prompt interface appears, indicating that kettle can be used normally.

At the same time, there should be a .kettle directory in the home directory.

./spoon.sh

kettle conversion and job execution

In kettle, the two tools pan and kitchen are used to perform transformation respectively.
(conversion) and job (job), as shown below

For file storage, not a database resource library, files can be stored in the following ways:
All transformation files are stored in /srv/kettle/transfomation/
All job files are stored in /srv/kettle/jobs
All log files are stored in /var/kettle/logs

1. Use pan to perform transformation

pan syntax

./pan.sh -option=value arg1 arg2

eg:

sudo ./pan.sh -file=/srv/kettle/transformation/EtltestTrans.ktr -level=Detailed > /var/kettle/log/EtltensTrans.log & amp;

2. Use kitchen to execute job

kitchen syntax

./kitchen.sh -option=value arg1 arg2

e.g.

sudo ./kitchen.sh -file=/srv/kettle/jobs/EtltesJob.kjb -level=Detailed > /var/kettle/logs/EtltestJob.log & amp;

Common parameter list:

kettle server-side deployment

1. Execute tasks regularly through the Start component

In kettle, we can set up scheduled tasks through the start component, as shown below. This method is not recommended because the job will always occupy a process and is prone to memory overflow.

2. Execute kettle tasks through crontab

In Linux, crontab is used to submit and manage tasks that users perform periodically.
For example (file storage location):
All transformation files are stored in /srv/kettle/transformtions/
All job files are stored in /srv/kettle/jobs/
All log files are stored in /var/kettle/logs/
All execution scripts are stored in /srv/kettle/script/

1) First we create a script. Because crontab executes the task, we need to reconfigure and import the java configuration.

export JRE_HOME=/opt/java/jre
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

#cd workspace
cd /opt/kettle/data-integration/

#exec job
./kitchen.sh -file=/srv/kettle/jobs/EtltestJob.kjb -level=Detailed > /var/kettle/logs/EtltestJob.log

2) On the terminal, type “contab -e” to enter the scheduled task file and add the task.

# m h dom mon dow command
0 2 * * * /srv/kettle/script

3) Restart cron and view tasks

3. Kettle is scheduled remotely through carte

There are many modes for kettle deployment, and the above is the most native mode (pan/kitchen). But this approach is not conducive to monitoring, scheduling and resource allocation. Kettle itself provides a web service carte for scheduling. Carte allows remote http requests to monitor, start, and stop jobs and trans running on the carte service. The general process of deploying and using carte is as follows:

1) Modify xml configuration file

vim carte-config-master-8080.xml

From the description of kettle.pwd, you can know that the default username and password are cluster (if you are not sure, you can set the username and password through the node). If you want to change the password, you can configure it in the configuration file.

2) Start carte

When starting, add the configuration file you just created

nohub ./carte.sh pwd/carte-config-master-8080.xml & amp;

After the startup is complete, you can access carte. The interface is very simple.

3) Configure sub-server

The above is the successful opening of the carte service. Next, you need to connect the spoon to the carte. In the tree on the left we need to add a subserver. As follows:

4) Create a new running configuration and select slave server for setting

5) Submit task