gpt4 book ai didi

elasticsearch - 为什么分片在批量插入期间被初始化和重新定位

转载 作者:行者123 更新时间:2023-12-04 06:31:31 25 4
gpt4 key购买 nike

我正在尝试将数据批量插入到具有 3 个数据节点的 4 节点 Elasticsearch 集群中。

数据节点规范:16 个 CPU - 7GB 内存 - 500GB SSD

数据被插入到非数据节点上并拆分成 5 个分片并设置为具有 1 个副本。大约有 250GB 的数据要插入。

然而,在每个节点上插入约 40GB 的数据并处理一小时后,同时在整个时间跨度内最大使用 CPU 约 60% 和 RAM 约 30% 后,一些分片进入初始化状态:

~$ curl -XGET 'http://localhost:9200/_cluster/health/osm?level=shards&pretty=true'
{
"cluster_name" : "elastic_osm",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 3,
"active_primary_shards" : 5,
"active_shards" : 9,
"relocating_shards" : 1,
"initializing_shards" : 1,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"indices" : {
"osm" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 9,
"relocating_shards" : 1,
"initializing_shards" : 1,
"unassigned_shards" : 0,
"shards" : {
"0" : {
"status" : "yellow",
"primary_active" : true,
"active_shards" : 1,
"relocating_shards" : 0,
"initializing_shards" : 1,
"unassigned_shards" : 0
},
"1" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
},
"2" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 2,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 0
},
"3" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
},
"4" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
}
}
}
}

再深入一点,我发现有一个节点的堆空间有问题:

~$ curl -XGET 'localhost:9200/osm/_search_shards?pretty=true'
{
"nodes" : {
"1DpvDUf7SKywJrBgQqs9eg" : {
"name" : "elastic-osm-node-1",
"transport_address" : "inet[/xxx.xxx.x.x:xxxx]",
"attributes" : {
"master" : "true"
}
},
"FiBYw-v_QfO3nJQfHflf_w" : {
"name" : "elastic-osm-node-3",
"transport_address" : "inet[/xxx.xxx.x.x:x]",
"attributes" : {
"master" : "true"
}
},
"ibpt8lGiS6yDJf4e09RN9Q" : {
"name" : "elastic-osm-node-2",
"transport_address" : "inet[/xxx.xxx.x.x:xxxx]",
"attributes" : {
"master" : "true"
}
}
},
"shards" : [ [ {
"state" : "STARTED",
"primary" : true,
"node" : "ibpt8lGiS6yDJf4e09RN9Q",
"relocating_node" : null,
"shard" : 0,
"index" : "osm"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : null,
"shard" : 0,
"index" : "osm",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2015-10-30T10:42:25.539Z",
"details" : "shard failure [engine failure, reason [already closed by tragic event]][OutOfMemoryError[Java heap space]]"
}
} ], [ {
"state" : "STARTED",
"primary" : true,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : null,
"shard" : 1,
"index" : "osm"
}, {
"state" : "STARTED",
"primary" : false,
"node" : "1DpvDUf7SKywJrBgQqs9eg",
"relocating_node" : null,
"shard" : 1,
"index" : "osm"
} ], [ {
"state" : "RELOCATING",
"primary" : false,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : "1DpvDUf7SKywJrBgQqs9eg",
"shard" : 2,
"index" : "osm"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "ibpt8lGiS6yDJf4e09RN9Q",
"relocating_node" : null,
"shard" : 2,
"index" : "osm"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "1DpvDUf7SKywJrBgQqs9eg",
"relocating_node" : "FiBYw-v_QfO3nJQfHflf_w",
"shard" : 2,
"index" : "osm"
} ], [ {
"state" : "STARTED",
"primary" : false,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : null,
"shard" : 3,
"index" : "osm"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "1DpvDUf7SKywJrBgQqs9eg",
"relocating_node" : null,
"shard" : 3,
"index" : "osm"
} ], [ {
"state" : "STARTED",
"primary" : false,
"node" : "ibpt8lGiS6yDJf4e09RN9Q",
"relocating_node" : null,
"shard" : 4,
"index" : "osm"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : null,
"shard" : 4,
"index" : "osm"
} ] ]
}

但是服务器上设置的 ES_HEAP_SIZE 是内存的一半:

~$ echo $ES_HEAP_SIZE
7233.0m

而且使用量只有5g:

~$ free -g
total used
Mem: 14 5

如果我再等一会儿,节点就会完全离开集群,所有副本都会进入初始化状态,这会使我的插入失败并停止:

{
"state" : "INITIALIZING",
"primary" : false,
"node" : "ibpt8lGiS6yDJf4e09RN9Q",
"relocating_node" : null,
"shard" : 3,
"index" : "osm",
"unassigned_info" : {
"reason" : "NODE_LEFT",
"at" : "2015-10-30T10:53:32.044Z",
"details" : "node_left[FiBYw-v_QfO3nJQfHflf_w]"
}

Conf :为了加快插入速度,我在数据节点 elasticsearch 配置上使用了这些参数

刷新间隔:-1,threadpool.bulk.size: 16,threadpool.bulk.queue_size: 1000

为什么会这样?我该如何解决这个问题并让我的批量插入成功?对于最大堆大小,我是否需要超过 50% 的 RAM?

编辑:由于调整 elasticsearch 参数不好,我删除了线程池参数并且它工作但非常慢。 Elasticsearch 并非设计用于过快地摄取过多数据。

最佳答案

删除这些设置:

threadpool.bulk.size: 16
threadpool.bulk.queue_size: 1000

这些设置的默认值应该足以避免集群过载。

并确保按照说明 here 正确调整批量索引过程的大小.根据集群/数据,批量需要具有一定的大小。对于那些希望尽可能摄取的人,您不能使用任何您想要的值。每个集群都有局限性,您应该测试自己的集群。

关于elasticsearch - 为什么分片在批量插入期间被初始化和重新定位,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33434657/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com