DataX-web imports business data in mysql into HDFS for storage and maps it to Hive tables

Problem description

The following is the problem I encountered: showing part of the error message through screenshots

Cause analysis:

By viewing the log information of log management in DataX-web, the following information is obtained:

2023-10-05 13:19:50.025 [job-0] WARN DBUtil - test connection of [jdbc:mysql://node:3306/nshop] failed, for Code:[DBUtilErrorCode-10], Description: [Failed to connect to the database. Please check your account, password, database name, IP, Port or ask the DBA for help (pay attention to the network environment).]. - The specific error message is: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Could not create connection to database server..
2023-10-05 13:19:50 [AnalysisStatistics.analysisStatisticsLog-53] 2023-10-05 13:19:50.027 [job-0] ERROR RetryUtil - Exception when calling callable, Exception Msg: DataX cannot connect to the corresponding database, The possible reasons are: 1) The configured ip/port/database/jdbc is wrong and cannot be connected. 2) The configured username/password is wrong and the authentication fails. Please confirm with the DBA whether the connection information of the database is correct.
2023-10-05 13:19:50 [AnalysisStatistics.analysisStatisticsLog-53] java.lang.Exception: DataX cannot connect to the corresponding database. The possible reasons are: 1) The configured ip/port/database/jdbc is wrong and cannot be connected. 2) The configured username/password is wrong and the authentication fails. Please confirm with the DBA whether the connection information of the database is correct. 

By checking the above possible errors:

  1. After verification, it is not that the user name and password of the database are wrong, and there is a corresponding database in the database. It is not a problem that there is no corresponding database.
  2. There is no problem with the configured ip/port/database/jdbc connection driver, and the connection can be made when testing the connection, so the error should not be caused by the driver configuration.

Checking the above possible errors does not solve the problem of the executor failing to execute the task.

[root@node ~]# cd /usr/local/datax-web-2.1.2/
[root@node datax-web-2.1.2]# ls
bin modules packages README.md userGuid.md
[root@node datax-web-2.1.2]# cd modules/
[root@node modules]# ls
datax-admin datax-executor
[root@node modules]# cd datax-executor/
[root@node datax-executor]# ls
bin conf data json lib logs
[root@node datax-executor]# cd bin
[root@node bin]#ls
configure.sh console.out datax-executor.sh env.properties
[root@node bin]# cat console.out

What needs to be noted here is: Usually, when you encounter a problem and need to check the logs, there are two directions:

One is log information related to login, check datax-admin/bin/console.out;

The other is the executor log information related to task execution, by viewing datax-executor/bin/console.out

Because an error occurred during task execution, I chose to view datax-executor/bin/console.out

  1. By checking the log of the executor under the DataX-web installation path, we found the following problems:

  2. After discovering the above problem, I finally found out the problem by consulting official information and other information.
  3. The data in mysql is read through the JDBC connection driver, so the corresponding jar package dependency needs to be used.
  4. It turns out that it is caused by a problem with the mysql version. I am using mysql-8.0.26 here, but the mysql version in the two plug-ins of DataX is mysql-5.1.34. , resulting in incorrect registration information and failure to connect to the database.

Solution:

You only need to put the dependent jar package in mysql into the libs path of the corresponding mysqlreader and mysqlwriter in the corresponding DataX plug-ins, and delete the original 5.1.34 version. Just rely on it.

$DATAX_HOME/plugin/reader/mysqlreader/libs

$DATAX_HOME/plugin/writer/mysqlwriter/libs

Of course, for convenience, you can also put the dependency packages in mysql into the $DataX_HOME/lib path, which is trouble-free and convenient, but this is not recommended due to personal choice. Try to use the above methods as much as possible.

Note: I am reading from mysql and writing to hdfs, so I do not need to replace the mysql dependent jar package in mysqlwriter. I only need to replace the mysql dependent jar package in the read plug-in. However, in order to avoid the same error when writing later, I will replace them all here.

[root@node reader]# ls
cassandrareader ftpreader hbase11xreader mongodbreader odpsreader ossreader otsstreamreader rdbmsreader streamreader
drdsreader hbase094xreader hdfsreader mysqlreader oraclereader otsreader postgresqlreader sqlserverreader txtfilereader
[root@node reader]# cd mysqlreader/libs/
[root@node libs]# ls
commons-collections-3.0.jar commons-math3-3.1.1.jar fastjson-1.1.46.sec01.jar logback-classic-1.0.13.jar plugin-rdbms-util-0.0.1-SNAPSHOT.jar
commons-io-2.4.jar datax-common-0.0.1-SNAPSHOT.jar guava-r05.jar logback-core-1.0.13.jar slf4j-api-1.7.10.jar
commons-lang3-3.3.2.jar druid-1.0.15.jar hamcrest-core-1.3.jar mysql-connector-java-5.1.34.jar
[root@node libs]# cp /root/mysql-connector-java-8.0.26.jar ./
[root@node libs]# ls
commons-collections-3.0.jar commons-math3-3.1.1.jar fastjson-1.1.46.sec01.jar logback-classic-1.0.13.jar mysql-connector-java-8.0.26.jar
commons-io-2.4.jar datax-common-0.0.1-SNAPSHOT.jar guava-r05.jar logback-core-1.0.13.jar plugin-rdbms-util-0.0.1-SNAPSHOT.jar
commons-lang3-3.3.2.jar druid-1.0.15.jar hamcrest-core-1.3.jar mysql-connector-java-5.1.34.jar slf4j-api-1.7.10.jar
[root@node libs]# rm -rf mysql-connector-java-5.1.34.jar
[root@node libs]# ls
commons-collections-3.0.jar commons-math3-3.1.1.jar fastjson-1.1.46.sec01.jar logback-classic-1.0.13.jar plugin-rdbms-util-0.0.1-SNAPSHOT.jar
commons-io-2.4.jar datax-common-0.0.1-SNAPSHOT.jar guava-r05.jar logback-core-1.0.13.jar slf4j-api-1.7.10.jar
commons-lang3-3.3.2.jar druid-1.0.15.jar hamcrest-core-1.3.jar mysql-connector-java-8.0.26.jar
[root@node libs]# cd /usr/local/datax/plugin/writer/mysqlwriter/libs
[root@node libs]# ls
commons-collections-3.0.jar commons-math3-3.1.1.jar fastjson-1.1.46.sec01.jar logback-classic-1.0.13.jar plugin-rdbms-util-0.0.1-SNAPSHOT.jar
commons-io-2.4.jar datax-common-0.0.1-SNAPSHOT.jar guava-r05.jar logback-core-1.0.13.jar slf4j-api-1.7.10.jar
commons-lang3-3.3.2.jar druid-1.0.15.jar hamcrest-core-1.3.jar mysql-connector-java-5.1.34.jar
[root@node libs]# cp /root/mysql-connector-java-8.0.26.jar ./
[root@node libs]# rm -rf mysql-connector-java-5.1.34.jar
[root@node libs]# ls
commons-collections-3.0.jar commons-math3-3.1.1.jar fastjson-1.1.46.sec01.jar logback-classic-1.0.13.jar plugin-rdbms-util-0.0.1-SNAPSHOT.jar
commons-io-2.4.jar datax-common-0.0.1-SNAPSHOT.jar guava-r05.jar logback-core-1.0.13.jar slf4j-api-1.7.10.jar
commons-lang3-3.3.2.jar druid-1.0.15.jar hamcrest-core-1.3.jar mysql-connector-java-8.0.26.jar

Most importantly, don’t forget to restart DataX-web.

[root@node datax-web-2.1.2]# ./bin/stop-all.sh
2023-10-05 14:30:56.214 [INFO] (51761) Try to Stop Modules In Order
2023-10-05 14:30:56.237 [INFO] (51769) ####### Begin To Stop Module: [datax-admin] ######
2023-10-05 14:30:56.248 [INFO] (51777) load environment variables
2023-10-05 14:30:56.672 [INFO] (51777) Killing DATAX-ADMIN (pid 11240) ...
2023-10-05 14:30:56.678 [INFO] (51777) Waiting DATAX-ADMIN to stop complete ...
2023-10-05 14:30:59.336 [INFO] (51777) DATAX-ADMIN stop success
2023-10-05 14:30:59.346 [INFO] (52104) ####### Begin To Stop Module: [datax-executor] ######
2023-10-05 14:30:59.356 [INFO] (52112) load environment variables
2023-10-05 14:30:59.801 [INFO] (52112) Killing DATAX-EXEXUTOR (pid 11533) ...
2023-10-05 14:30:59.806 [INFO] (52112) Waiting DATAX-EXEXUTOR to stop complete ...
2023-10-05 14:31:02.302 [INFO] (52112) DATAX-EXEXUTOR stop success
[root@node datax-web-2.1.2]# ./bin/start-all.sh
2023-10-05 14:31:36.189 [INFO] (53831) Try To Start Modules In Order
2023-10-05 14:31:36.199 [INFO] (53839) ####### Begin To Start Module: [datax-admin] ######
2023-10-05 14:31:36.209 [INFO] (53847) load environment variables
2023-10-05 14:31:36.542 [INFO] (53847) /usr/local/java/bin/java
2023-10-05 14:31:36.546 [INFO] (53847) Waiting DATAX-ADMIN to start complete ...
2023-10-05 14:31:36.823 [INFO] (53847) DATAX-ADMIN start success
2023-10-05 14:31:36.851 [INFO] (54063) ####### Begin To Start Module: [datax-executor] ######
2023-10-05 14:31:36.905 [INFO] (54075) load environment variables
2023-10-05 14:31:37.576 [INFO] (54075) /usr/local/java/bin/java
2023-10-05 14:31:37.595 [INFO] (54075) Waiting DATAX-EXEXUTOR to start complete ...
2023-10-05 14:31:38.096 [INFO] (54075) DATAX-EXEXUTOR start success
[root@node datax-web-2.1.2]# jps
24212 NodeManager
54565Jps
54008 DataXAdminApplication
23049 DataNode
22811 NameNode
23982 ResourceManager
55215 RunJar
60575 RunJar
54319 DataXExecutorApplication

Finally, re-execute the failed task and it will succeed.

Here is some information about my successful execution of the task:

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. MySQL entry skill treeSQL advanced skillsPivot table 74197 people are learning the system