下载工具地址:http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
该工具默认没有带elasticsearchwriter插件,需要下载源码编译。
git clone https://github.com/alibaba/DataX.git
进入目录修改pom文件只留下 elasticsearchwriter 模块,然后执行
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
复制target文件下的datax/plugin/writer/ 到datax工具目录下。
执行python bin/datax.py job/odps2es.json
odps2es配置参考后面提供的内容。
我自己遇到过“/datax/plugin/reader/.DS_Store/plugin.json]不存在. 请检查您的配置文件. ”的问题。
解决方法:后面排查发现下载后,通过双击压缩包解压工具会出现这个问题。重新对压缩包进行解压,执行 tar -zxvf datax.tar.gz。
这里提供了配置例子仅供参考(更多配置内容查看 https://help.aliyun.com/knowledge_list/74300.html?spm=a2c4g.11186631.6.617.214e69a4TpBReD ):
{
"job":{
"setting":{
"speed":{
"byte":10485760
},
"errorLimit":{
"record":0,
"percentage":0.02
}
},
"content":[
{
"reader":{
"name":"odpsreader",
"parameter":{
"partition":[
"ds='20190603'"
],
"isCompress":false,
"accessId":"XXXXXXXXXX",
"accessKey":"XXXXXXXXXX",
"odpsServer":"http://service-corp.odps.aliyun-inc.com/api",//更改为你的endpoint
"endpoint":"http://service-corp.odps.aliyun-inc.com/api",
"project":"ais_server_data",
"column":[
"id",
"text"
],
"emptyAsNull":true,
"table":"count_table"
}
},
"writer":{
"name":"elasticsearchwriter",
"parameter":{
"endpoint":"http://xxx.xxx.xxx.xxx:9999",
"accessId":"XXXXX",
"accessKey":"XXXXX",
"index":"count_table",
"type":"default",
"cleanup":true,
"settings":{
"index":{
"number_of_shards":1,
"number_of_replicas":0
}
},
"discovery":false,
"batchSize":1000,
"splitter":",",
"column":[
{
"name":"id",
"type":"long"
},
{
"name":"text",
"type":"keyword"
}
]
}
}
}
]
}
}