gpt4 book ai didi

jdbc - Elasticsearch JDBC River吞噬了整个内存

转载 作者:行者123 更新时间:2023-12-03 00:30:07 30 4
gpt4 key购买 nike

我正在尝试从mysql表中将1600万个文档(47gb)索引为elasticsearch索引。我正在使用jparante's elasticsearch jdbc river执行此操作。但是,在创建河流并等待约15分钟之后,整个堆内存将被消耗,而没有河流运行或文档被索引的任何迹象。当我有大约10至12百万条记录需要索引时,这条河过去运行得很好。我曾尝试过3-4次河,但徒劳无功。
Heap Memory pre allocated to the ES process = 10g
elasticsearch.yml

 cluster.name: test_cluster

index.cache.field.type: soft
index.cache.field.max_size: 50000
index.cache.field.expire: 2h

cloud.aws.access_key: BBNYJC25Dij8JO7YM23I(fake)
cloud.aws.secret_key: GqE6y009ZnkO/+D1KKzd6M5Mrl9/tIN2zc/acEzY(fake)
cloud.aws.region: us-west-1

discovery.type: ec2
discovery.ec2.groups: sg-s3s3c2fc(fake)
discovery.ec2.any_group: false
discovery.zen.ping.timeout: 3m

gateway.recover_after_nodes: 1
gateway.recover_after_time: 1m

bootstrap.mlockall: true

network.host: 10.111.222.33(fake)


curl -XPUT 'http://--address--:9200/_river/myriver/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://--address--:3306/mydatabase",
"user" : "USER",
"password" : "PASSWORD",
"sql" : "select * from mytable order by creation_time desc",
"poll" : "5d",
"versioning" : false
},
"index" : {
"index" : "myindex",
"type" : "mytype",
"bulk_size" : 500,
"bulk_timeout" : "240s"
}
}'

系统属性:
16gb RAM
200gb disk space

最佳答案

根据您的elasticsearch-river-jdbc版本(使用ls -lrt plugins/river-jdbc/查找),此错误可能已关闭(https://github.com/jprante/elasticsearch-river-jdbc/issues/45)

否则,在Github上提交错误报告。

关于jdbc - Elasticsearch JDBC River吞噬了整个内存,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15106498/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com