gpt4 book ai didi

elasticsearch - Elasticsearch深度聚合文档计数不匹配

转载 作者:行者123 更新时间:2023-12-03 01:57:20 25 4
gpt4 key购买 nike

我对ES 1.7.2的安装进行了一些汇总,以求和一些值。

找到了一种困难的方法,在某些随机情况下,每个聚合的doc_count与嵌套级别的doc_count的SUM不匹配。

"key": 503,
"doc_count": 383778,
"regionid": {...}

因此doc_count = 383778

如果我求和以下列表中Regionid的每个元素的doc_count,则我有doc_count = 383718
 "key": 503,
"doc_count": 383778,
"regionid": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 303821,
"ProviderId": {...}
},
{
"key": 27,
"doc_count": 23834,
"ProviderId": {...}
},
{
"key": 25,
"doc_count": 9565,
"ProviderId": {...}
},
{
"key": 36,
"doc_count": 8857,
"ProviderId": {...}
},
{
"key": 14,
"doc_count": 8222,
"ProviderId": {...}
},
{
"key": 68,
"doc_count": 6746,
"ProviderId": {...}
},
{
"key": 19,
"doc_count": 4574,
"ProviderId": {...}
},
{
"key": 28,
"doc_count": 4164,
"ProviderId": {...}
},
{
"key": 10,
"doc_count": 3006,
"ProviderId": {...}
},
{
"key": 31,
"doc_count": 2020,
"ProviderId": {...}
},
{
"key": 21,
"doc_count": 1410,
"ProviderId": {...}
},
{
"key": 32,
"doc_count": 1368,
"ProviderId": {...}
},
{
"key": 22,
"doc_count": 1367,
"ProviderId": {...}
},
{
"key": 8,
"doc_count": 1010,
"ProviderId": {...}
},
{
"key": 16,
"doc_count": 825,
"ProviderId": {...}
},
{
"key": 35,
"doc_count": 559,
"ProviderId": {...}
},
{
"key": 34,
"doc_count": 517,
"ProviderId": {...}
},
{
"key": 26,
"doc_count": 414,
"ProviderId": {...}
},
{
"key": 18,
"doc_count": 371,
"ProviderId": {...}
},
{
"key": 15,
"doc_count": 362,
"ProviderId": {...}
},
{
"key": 33,
"doc_count": 185,
"ProviderId": {...}
},
{
"key": 9,
"doc_count": 143,
"ProviderId": {...}
},
{
"key": 29,
"doc_count": 102,
"ProviderId": {...}
},
{
"key": 17,
"doc_count": 100,
"ProviderId": {...}
},
{
"key": 30,
"doc_count": 96,
"ProviderId": {...}
},
{
"key": 20,
"doc_count": 80,
"ProviderId": {...}
}
]
}
},

你们知道为什么会这样吗?

也许是个错误?

我的部分汇总:
 {
"aggs": {
"Provider": {
"terms": {
"field": "Provider"
},
"aggs": {
"Gateway": {
"terms": {
"field": "Gateway"
},
"aggs": {
"CustomerId": {
"terms": {
"field": "CustomerId"
},
"aggs": {
"regionid": {
"terms": {
"field": "regionid"

任何帮助表示赞赏。
谢谢

最佳答案

ES中的汇总并不精确,它们是基于采样记录数量的估计。给定足够大的样本量,该数字可能是准确的,但对性能会有重大影响。

您可以在ES documentation on shard_size for terms aggregation中阅读有关“碎片大小”的更多信息

索引越平坦(意味着聚合返回的存储桶越多),您就越需要增加分片大小。我们发现,对于我们系统中的固定索引而言,20倍乘数是一个很好的经验法则。因此,如果要返回聚合的前10条记录,我们将使用200的分片大小。

关于elasticsearch - Elasticsearch深度聚合文档计数不匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35640586/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com