gpt4 book ai didi

elasticsearch - 控制聚合中创建的桶数

转载 作者:行者123 更新时间:2023-11-29 02:56:35 25 4
gpt4 key购买 nike

Elasticsearch 中,您可以在聚合中创建的存储桶数量有限制。如果它创建的桶数超过指定的限制,您将在 ES 6.x 中收到一条警告消息,并且在未来的版本中会抛出错误。

这是警告信息:

This aggregation creates too many buckets (10001) and will throw an error in future versions. You should update the [search.max_buckets] cluster setting or use the [composite] aggregation to paginate all buckets in multiple requests.

ES 7.x 起,该限制设置为 10000,但可以调整。

问题是,我实际上无法计算(或估计)聚合将创建多少个桶。

考虑以下请求:

GET /zone_stats_hourly/_search
{
"aggs":{
"apps":{
"terms":{
"field":"appId",
"size":<NUM_TERM_BUCKETS>,
"min_doc_count":1,
"shard_min_doc_count":0,
"show_term_doc_count_error":false,
"order":[
{
"_count":"desc"
},
{
"_key":"asc"
}
]
},
"aggregations":{
"histogram":{
"days":{
"field":"processTime",
"time_zone":"UTC",
"interval":"1d",
"offset":0,
"order":{
"_key":"asc"
},
"keyed":false,
"min_doc_count":0
},
"aggregations":{
"requests":{
"sum":{
"field":"requests"
}
},
"filled":{
"sum":{
"field":"filledRequests"
}
},
"matched":{
"sum":{
"field":"matchedRequests"
}
},
"imp":{
"sum":{
"field":"impressions"
}
},
"cv":{
"sum":{
"field":"completeViews"
}
},
"clicks":{
"sum":{
"field":"clicks"
}
},
"installs":{
"sum":{
"field":"installs"
}
},
"actions":{
"sum":{
"field":"actions"
}
},
"earningsIRT":{
"sum":{
"field":"earnings.inIRT"
}
},
"earningsUSD":{
"sum":{
"field":"earnings.inUSD"
}
},
"earningsEUR":{
"sum":{
"field":"earnings.inEUR"
}
},
"dealBasedEarnings":{
"nested":{
"path":"dealBasedEarnings"
},
"aggregations":{
"types":{
"terms":{
"field":"dealBasedEarnings.type",
"size":4,
"min_doc_count":1,
"shard_min_doc_count":0,
"show_term_doc_count_error":false,
"order":[
{
"_count":"desc"
},
{
"_key":"asc"
}
]
},
"aggregations":{
"dealBasedEarningsIRT":{
"sum":{
"field":"dealBasedEarnings.amount.inIRT"
}
},
"dealBasedEarningsUSD":{
"sum":{
"field":"dealBasedEarnings.amount.inUSD"
}
},
"dealBasedEarningsEUR":{
"sum":{
"field":"dealBasedEarnings.amount.inEUR"
}
}
}
}
}
}
}
}
}
}
},
"size":0,
"_source":{
"excludes":[]
},
"stored_fields":["*"],
"docvalue_fields":[
{
"field":"eventTime",
"format":"date_time"
},
{
"field":"processTime",
"format":"date_time"
},
{
"field":"postBack.time",
"format":"date_time"
}
],
"query":{
"bool":{
"must":[
{
"range":{
"processTime":{
"from":1565049600000,
"to":1565136000000,
"include_lower":true,
"include_upper":false,
"boost":1.0
}
}
}
],
"adjust_pure_negative":true,
"boost":1.0
}
}
}

如果我将 <NUM_TERM_BUCKETS> 设置为 2200 并执行请求,我会收到一条警告消息,指出我创建的不仅仅是 10000 存储桶(怎么办?!)。

ES 的示例响应:

#! Deprecation: 299 Elasticsearch-6.7.1-2f32220 "This aggregation creates too many buckets (10001) and will throw an error in future versions. You should update the [search.max_buckets] cluster setting or use the [composite] aggregation to paginate all buckets in multiple requests."
{
"took": 6533,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 103456,
"max_score": 0,
"hits": []
},
"aggregations": {
"apps": {
"doc_count_error_upper_bound": 9,
"sum_other_doc_count": 37395,
"buckets":[...]
}
}
}

更有趣的是,在将 <NUM_TERM_BUCKETS> 减少到 2100 之后,我没有收到任何警告消息,这意味着创建的桶数低于 10000

我很难找到这背后的原因,但一无所获。

在实际执行请求之前,是否有任何公式或东西可以计算或估计聚合将创建的桶数?

我想知道聚合是否在 ES 7.x 或更高版本中针对指定的 search.max_buckets 抛出错误,以便我可以决定是否使用 composite 聚合。

更新

我尝试了一个更简单的聚合,其中不包含对具有大致 80000 文档的索引的嵌套或子聚合。

请求如下:

GET /my_index/_search
{
"size":0,
"query":{
"match_all":{}
},
"aggregations":{
"unique":{
"terms":{
"field":"_id",
"size":<NUM_TERM_BUCKETS>
}
}
}
}

如果我将 <NUM_TERM_BUCKETS> 设置为 7000 ,我会在 ES 7.3 中收到此错误响应:

{
"error":{
"root_cause":[
{
"type":"too_many_buckets_exception",
"reason":"Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
"max_buckets":10000
}
],
"type":"search_phase_execution_exception",
"reason":"all shards failed",
"phase":"query",
"grouped":true,
"failed_shards":[
{
"shard":0,
"index":"my_index",
"node":"XYZ",
"reason":{
"type":"too_many_buckets_exception",
"reason":"Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
"max_buckets":10000
}
}
]
},
"status":503
}

如果我将 <NUM_TERM_BUCKETS> 减少到 6000,它会成功运行。

说真的,我很困惑。这种聚合到底是如何创造出比 10000 桶更多的东西的?谁能回答这个问题?

最佳答案

根据 Terms Aggregation 的文档:

The shard_size parameter can be used to minimize the extra work that comes with bigger requested size. When defined, it will determine how many terms the coordinating node will request from each shard. Once all the shards responded, the coordinating node will then reduce them to a final result which will be based on the size parameter - this way, one can increase the accuracy of the returned terms and avoid the overhead of streaming a big list of buckets back to the client.

The default shard_size is (size * 1.5 + 10).

为了解决分布式系统中的准确性问题,Elasticsearch 从每个分片中请求一个大于 size 的数字。

因此,可以使用以下公式计算简单术语聚合的 NUM_TERM_BUCKETS 的最大值:

maxNumTermBuckets = (search.maxBuckets - 10) / 1.5

6660 search.maxBuckets = 10000

关于elasticsearch - 控制聚合中创建的桶数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57393548/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com