记一个Datax现存问题

目录

复现

1
2
3
4
5
docker run --name datax-issue -p 27017:27017 -d mongo:4.0.4

docker exec -it datax-issue /bin/bash

mongo
1
2
3
4
5
6
use datax

db.getCollection("gps").insert( {
longitude: 34.9016151428223,
latitude: NaN
} );
1
2
3
4
5
wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

tar xf datax.tar.gz && cd datax

vim job/job.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "mongodbreader",
"parameter": {
"address": [
"127.0.0.1:27017"
],
"dbName": "datax",
"collectionName": "gps",
"column": [
{
"name": "longitude",
"type": "double"
},
{
"name": "latitude",
"type": "double"
}
]
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print": true
}
}
}
]
}
}
1
python bin/datax.py job/job.json
1
2
3
4
5
6
7
8
com.alibaba.datax.common.exception.DataXException: Code:[Framework-13], Description:[DataX插件运行时出错, 具体原因请参看DataX运行结束时的错误诊断信息 .].  - java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:497)
at java.math.BigDecimal.<init>(BigDecimal.java:383)
at java.math.BigDecimal.<init>(BigDecimal.java:809)
at com.alibaba.datax.common.element.DoubleColumn.<init>(DoubleColumn.java:30)
at com.alibaba.datax.plugin.reader.mongodbreader.MongoDBReader$Task.startRead(MongoDBReader.java:128)
at com.alibaba.datax.core.taskgroup.runner.ReaderRunner.run(ReaderRunner.java:57)
at java.lang.Thread.run(Thread.java:748)

解决

1
2
3
git clone https://github.com/alibaba/DataX.git && cd DataX

vim mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/MongoDBReader.java
1
2
3
4
5
6
//TODO deal with Double.isNaN()
if(Double.isNaN((Double) tempCol)) {
record.addColumn(new StringColumn(null));
} else {
record.addColumn(new DoubleColumn((Double) tempCol));
}
1
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
  • 然后将这里编译生成的mongodbreader-0.0.1-SNAPSHOT.jar替换下载包中相应*.jar
1
python bin/datax.py job/job.json
1
2
3
4
5
6
7
8
34.9016151428223	null
任务启动时刻 : 2020-09-25 09:52:40
任务结束时刻 : 2020-09-25 09:52:51
任务总计耗时 : 10s
任务平均流量 : 1B/s
记录写入速度 : 0rec/s
读出记录总数 : 1
读写失败总数 : 0

原理

  • JShell (Java 9 REPL Read Eval Print Loop) = Java 9新增的一个交互式的编程环境工具
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/Library/Java/JavaVirtualMachines/jdk-11.0.8.jdk/Contents/Home/bin/jshell
| Welcome to JShell -- Version 11.0.8
| For an introduction type: /help intro

jshell> double ZERO = 0;
ZERO ==> 0.0

jshell> ZERO / ZERO;
$2 ==> NaN

jshell> Math.sqrt(-1);
$3 ==> NaN

jshell> /exit
| Goodbye

增补

  • 问题1: Double无法转换成Integer错误
1
2
3
// mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/util/CollectionSplitUtil.java
- int docCount = result.getInteger("count");
+ int docCount = result.getDouble("count").intValue();
  • 问题2: job执行时卡住不动且无报错
1
2
// channel 配置成1 根本原因待定位
"channel": 1
  • 问题3: 保持服务器上任务持续执行
1
2
3
4
5
6
7
8
9
sudo apt install -y screen

screen -S ots

cd ~/datax

python bin/datax.py job/local2ots.json // 接着关闭客户端即可

screen -r ots // 此时可以恢复终端任务

参考