ETL tool kettle8.2 integrates spring boot data extraction

1. Add dependencies in pom, mainly relying on the following jars 2. Name the storage path of the created etl diagram: write code import org.pentaho.di.core.database.DatabaseMeta; import org.pentaho.di.core.exception.KettleException; import org.pentaho.di.core.logging.LogWriter; import org.pentaho.di.core.util.EnvUtil; import org.pentaho.di.job.Job; import org.pentaho.di.job.JobEntryLoader; import org.pentaho.di.job.JobMeta; import org.pentaho.di.repository.Repository; import org.pentaho.di.repository.RepositoryDirectory; import org.pentaho.di.repository.RepositoryMeta; import org.pentaho.di.repository.UserInfo; import org.pentaho.di.trans.StepLoader; import org.pentaho.di.trans.Trans; import org.pentaho.di.trans.TransMeta; /** * Java calls […]

Springboot integrates the use of ETL engine Kettle

Introduction ETL is the abbreviation of Extract-Transform-Load in English, which is used to describe the process of extracting, transforming, and loading data from the source to the destination. Extract from a variety of distributed and heterogeneous source data (such as relational data), and clean “dirty” data such as incomplete data, duplicate data, and error data […]

[ETL actual combat] Kettle reads large table data in batches

Foreword, the virtual machine builds the environment and finds that the kettle takes up a lot of memory. After checking the server resources, it is found that it is the polkitd process There is no good solution for the time being. At present, I just read the filter of a large table. Before that, it […]

python (1) use kettle to call python script to realize the division of addresses into provinces and cities, and output the results to the database.

1. Install python Use python3, download it from the official website Default installation path: C:\Users\Administrator\AppData\Local\Programs\Python\Python310 2. Install the packages required by python 1. Install pip 1.1 Method 1 1. Enter the scipts in the directory where python.exe is located, mine is this path C:\Users\Administrator\AppData\Local\Programs\Python\Python310 Double-click pip to install pip3 also needs to be installed 1.2 […]

Java calls kettle to put ktr/xml file

Hello everyone, I have recently encountered a rather difficult problem, which is to implement update and insert, which means that when executing SQL, update and insert SQL syntax does have this function on duplicate key update but this kind of requirement is primary key or unique index Too limited, my business is relatively complicated, so […]

Error reporting when running kettle in Linux environment

Background It runs normally in the Wins environment, but when the same converted file is uploaded to Linux, an error is reported. 1. Kettle client running interface 2. The cmd command line runs normally in the Wins environment Run the code under the cmd command line: d: cd D:\tools\pdi-ce-9.1.0.0-324\data-integration D:\tools\pdi-ce-9.1.0.0-324\data-integration>pan.bat /file:D:\tools\pdi-ce-9.1.0.0-324\data-integration\samples\jobs\insert_103_table_output. ktr /level:Basic > D:\tools\pdi-ce-9.1.0.0-324\data-integration\logs\insert_103_table_output_01.log […]

Kettle parses json data

Encountered the need to parse the json data, just have a kettle, so use the kettle to try to parse the json data 1. JSON data collection json acquisition problem, kettle can obtain json data according to the given url address through RESR Client, but I am not sure whether it is my kettle version […]

ETL tool – JAVA calls Kettle transformation, job script

1. JAVA calls Kettle conversion Before writing the Java program, use Spoon to design the conversion process, here to pull the CSDN article list and store it in txt text as an example: The pulled interface is https://blog.csdn.net/community/home-api/v1/get-business-list?page=1 & amp;size=20 & amp;businessType=blog & amp; orderby= &noMore=false &year= &month= &username=qq_43692950 The return format is as follows: […]

ETL tool – Kettle case, pull network list data

1. Kettle actual combat case The previous article introduced Kettle‘s query, connection, statistics, and script operators. You should have a corresponding understanding of most operators in Kettle. Next, we Based on the actual combat case of Kettle, pull all the data of the CSDN blog list and store it in the Excel file. Before the […]

Kettle-based deployment graphical interface (spoon)

I recently used kettle to deploy windows, mac, and linux services to make a summary of the pitfalls encountered. 1. Mac and Linux deployment: 1 Pull the docker image docker pull hiromuhota/webspoon 2Create and run docker container docker run -d -p 8080:8080 hiromuhota/webspoon –name webspoon –restart=always #-d background mapping #8080: 8080 server actual port: mapped […]