gpt4 book ai didi

apache-spark - es.scroll.limit和es.scroll.size有什么区别

转载 作者:行者123 更新时间:2023-12-02 22:29:14 26 4
gpt4 key购买 nike

我对这两个参数完全感到困惑,

es.scroll.size
es.scroll.limit

我做了一些测试,仍然不知道。
es.scroll.limit = es.scroll.size * num_of_scrolls ???

最佳答案

es.scroll.sizees.scroll.limit都是在从分布式集群(例如Apache-Spark for exmaple)发出请求时传递给elasticsearch.hadoop的配置参数。

在阅读这两个参数之前,重要的是要从docs了解有关elasticsearch.hadoop的信息:

Shards play a critical role when reading information from Elasticsearch. Since it acts as a source, elasticsearch-hadoop will create one Hadoop InputSplit per Elasticsearch shard, or in case of Apache Spark one Partition, that is given a query that works against index I. elasticsearch-hadoop will dynamically discover the number of shards backing I and then for each shard will create, in case of Hadoop an input split (which will determine the maximum number of Hadoop tasks to be executed) or in case of Spark a partition which will determine the RDD maximum parallelism.



因此,我们了解到分片数量会影响运行的查询数量。 ES小组成员james.baiera还说 here:

ES-Hadoop uses the scroll endpoint to collect all the data for processing within Spark. ES-Hadoop performs the multiple scroll requests under the hood on its own...



因此,集群为每个分区创建了一个滚动请求,而每个分区又为每个分区创建了滚动请求!这些滚动中的每一个都受到上述 limitsize参数的影响。

同样,按照 documentation:

es.scroll.size (default 50)

Number of results/items returned by each individual per request.

es.scroll.limit (default -1)

Number of total results/items returned by each individual scroll. A negative value indicates that all documents that match should be returned. Do note that this applies per scroll which is typically bound to one of the job tasks. Thus the total number of documents returned is LIMIT * NUMBER_OF_SCROLLS (OR TASKS)


Size指出滚动条的每个 调用而不是整个滚动条所请求的文档数。
Limit指定在滚动API调用的所有 调用中要检索的最大文档数(还记得与索引中的分片一样多的滚动API调用吗?)

所以现在这个计算很有意义:

整个集群检索到的文档总数=每个滚动API调用的限制(es.scroll.limit)*滚动调用的数量(索引中每个分片一个)。

当我自己尝试执行此操作时,我得到了不错的结果,我查询了一个索引,其中包含14个分片,limit1,实际上该集群提取了14个文档。

正如nefo_x在他的answer中所述,实际上limit也将限制size,这仅是合理的-整个滚动API调用中的每个调用都不应大于该滚动API调用中所有调用的整个限制,对吗?

关于apache-spark - es.scroll.limit和es.scroll.size有什么区别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47193321/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com