gpt4 book ai didi

elasticsearch - Elasticsearch聚合的平均值

转载 作者:行者123 更新时间:2023-12-03 00:18:26 30 4
gpt4 key购买 nike

我正在尝试在单个ES查询中计算所有已定义聚合的平均值。查询结果用于填充this table

第一列(“等待时间”)是存储桶,而其余五个是这些存储桶上的指标。问题在于,我还需要在存储分区上计算出的每个指标的平均值,如第五行所示。

这是到目前为止我编写的ES查询的相关部分:

  "aggs": {
"by_lead_time": {
"range": {
"script": "use(groovy.time.TimeCategory) { def duration = new Date(doc['checkIn'].value) - new Date(doc['timestamp'].value); return duration.days; }",
"ranges": [
{
"to": 1,
"key": "Same day"
},
{
"from": 1,
"to": 7,
"key": "Same week"
},
{
"from": 7,
"to": 14,
"key": "Next week"
},
{
"from": 14,
"to": 31,
"key": "Same month"
},
{
"from": 31,
"to": 93,
"key": "Within 3 months"
},
{
"from": 93,
"key": "Longer than 3 months"
}
]
},
"aggs": {
"averageDailyRate": {
"avg": {
"script": "use(groovy.time.TimeCategory) { def duration = new Date(doc['checkOut'].value) - new Date(doc['checkIn'].value); return doc['totalPreTax'].value / duration.days; }"
}
},
"averageLeadTime": {
"avg": {
"script": "use(groovy.time.TimeCategory) { def duration = new Date(doc['checkIn'].value) - new Date(doc['timestamp'].value); return duration.days; }"
}
},
"bookingCount": {
"value_count": {
"field": "uuid"
}
},
"roomNights": {
"sum": {
"script": "use(groovy.time.TimeCategory) { def duration = new Date(doc['checkOut'].value) - new Date(doc['checkIn'].value); return duration.days; };"
}
},
"averageLengthOfStay": {
"avg": {
"script": "use(groovy.time.TimeCategory) { def duration = new Date(doc['checkOut'].value) - new Date(doc['checkIn'].value); return duration.days; }"
}
},
"totalRevenue": {
"sum": {
"field": "totalPreTax"
}
},
"lowestDailyRate": {
"nested": {
"path": "nights"
},
"aggs": {
"min_rate": {
"min": {
"field": "nights.rate.amount"
}
}
}
},
"highestDailyRate": {
"nested": {
"path": "nights"
},
"aggs": {
"max_rate": {
"max": {
"field": "nights.rate.amount"
}
}
}
},
"averageOccupants": {
"avg": {
"script": "return doc['noOfAdults'].value + doc['noOfChildren'].value"
}
}
}
}
}

除提取总体平均值外,这与提取所需值的预期效果相同。问题是,除了客户端应用程序上的人工工作之外,我不知道如何在存储桶值计算后对它们执行 "avg"。从表格上看应该很清楚,但是请记住,这不是每个存储桶的平均值 ,而不是,而是每个指标的所有值的平均值。

我应该怎么做呢?

最佳答案

您可以在ES 2.0中使用pipeline aggregations(更具体地说是average bucket aggregation)执行此操作。

我仅使用roomNightsaverageDailyRate平均值测试了您的方案。 2.0中的查询看起来像这样,其他数值聚合应该以类似的方式进行:

{
"size": 0,
"aggs": {
"by_lead_time": {
"range": {
"script": "use(groovy.time.TimeCategory) { def duration = new Date(doc['checkIn'].value) - new Date(doc['timestamp'].value); return duration.days; }",
"ranges": [
{
"to": 1,
"key": "Same day"
},
{
"from": 1,
"to": 7,
"key": "Same week"
},
{
"from": 7,
"to": 14,
"key": "Next week"
},
{
"from": 14,
"to": 31,
"key": "Same month"
},
{
"from": 31,
"to": 93,
"key": "Within 3 months"
},
{
"from": 93,
"key": "Longer than 3 months"
}
]
},
"aggs": {
"roomNights": {
"sum": {
"script": "use(groovy.time.TimeCategory) { def duration = new Date(doc['checkOut'].value) - new Date(doc['checkIn'].value); return duration.days; };"
}
},
"averageDailyRate": {
"avg": {
"script": "use(groovy.time.TimeCategory) { def duration = new Date(doc['checkOut'].value) - new Date(doc['checkIn'].value); return doc['totalPreTax'].value / duration.days; }"
}
}
}
},
"avg_roomNights": {
"avg_bucket": {
"buckets_path": "by_lead_time>roomNights"
}
},
"avg_averageDailyRate": {
"avg_bucket": {
"buckets_path": "by_lead_time>averageDailyRate"
}
}
}
}

另外,您需要注意此bug-https://github.com/elastic/elasticsearch/issues/14273-在2.0版中,它将使您的脚本无法使用。 我测试了使用本地构建的2.0.1快照版本提供的查询。如果您有兴趣在2.x中进行测试, these是有关如何直接从github构建版本的说明。

关于elasticsearch - Elasticsearch聚合的平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33653311/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com