gpt4 book ai didi

elasticsearch - 滚动时间增量对 Elasticsearch 的影响

转载 作者:行者123 更新时间:2023-12-02 23:57:02 31 4
gpt4 key购买 nike

我正在使用ElasticSearch进行项目研究,并对其进行查询以获取成员信息。它有300万条记录。

我正在为200万用户运行广告 Activity ,并且用户数据显示在 elasticsearch6.2 上。我查询ES,并使用滚动批量提取记录(一次50条记录)。另外,我想将 SEARCH上下文保留为 1天,因为如果广告系列运行过程由于任何原因而失败,我可以从停止它的位置恢复该广告系列。这样,我将不再从头开始竞选。我还将保存 scrollID 并将用于恢复广告系列。

在测试过程中,我发现CPU利用率提高了50%(ES配置:在aws上运行2个带有4个分片的节点,实例类型: i3.xlarge.elasticsearch ),其CPU利用率仍保持50%不变。

CPU使用率和将搜索上下文保留1天之间是否有任何关系。 BTW Activity 需要6个小时才能完成。

最佳答案

documentation

Normally, the background merge process optimizes the index by merging together smaller segments to create new bigger segments, at which time the smaller segments are deleted. This process continues during scrolling, but an open search context prevents the old segments from being deleted while they are still in use. This is how Elasticsearch is able to return the results of the initial search request, regardless of subsequent changes to documents.



因此,随着滚动光标过期至24h,似乎您禁止Lucene合并段,从而增加了分片的负载。

稍后在 documentation中,提供了有关如何清除滚动光标的说明:

Search context are automatically removed when the scroll timeout has been exceeded. However keeping scrolls open has a cost, as discussed in the previous section so scrolls should be explicitly cleared as soon as the scroll is not being used anymore using the clear-scroll API:



广告 Activity 完成后,您应该尝试清除光标。

关于elasticsearch - 滚动时间增量对 Elasticsearch 的影响,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52535472/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com