gpt4 book ai didi

python - 如何在Elasticsearch DSL(elasticsearch-dsl-py)中使用存储桶,指标或管道用于多层聚合

转载 作者:行者123 更新时间:2023-12-03 02:34:41 27 4
gpt4 key购买 nike

我有一个像下面这样的原始聚合脚本,但是很难将其转换为elasticsearch dsl。

我已阅读该文档并找到描述,说我们可以使用.bucket()、. metric()和.pipeline()方法来嵌套聚合,但是缺少进一步说明如何使用这三种方法进行更复杂的聚合的信息,喜欢更多的层次。

{
"aggs": {
"statistics": {
"terms": {
"field":"id"
},
"aggs":{
"date":{
"date_histogram":{
"min_doc_count":0,
"field":"date",
"interval":"1d",
"format":"yyyy-MM-dd"
},
"aggs":{
"column_a":{
"avg":{
"field":"column_a"
}
},
"column_b":{
"avg":{
"field":"column_b"
}
},
"column_c":{
"avg":{
"field":"column_c"
}
},
"a_gap":{
"serial_diff":{
"buckets_path":"column_a"
}
},
"b_gap":{
"serial_diff":{
"buckets_path":"column_b"
}
},
"c_gap":{
"serial_diff":{
"buckets_path":"column_c"
}
}
}
},
"sum_a_gap":{
"sum_bucket":{
"buckets_path":"date>a_gap"
}
},
"sum_b_gap":{
"sum_bucket":{
"buckets_path":"date>b_gap"
}
},
"sum_c_gap":{
"sum_bucket":{
"buckets_path":"date>c_gap"
}
}
}
}
}
}

我这样的Elasticsearch-dsl查询使“sum_a_gap”与“column_a”和“a_gap”具有相同的级别。
self._search.aggs
.bucket('statistics', 'terms', field='id')
.bucket('date', 'date_histogram', field='date',
interval='1d', min_doc_count=0, format='yyyy-MM-dd')
.metric('column_a', 'avg', field='column_a')
.metric('column_b', 'avg', field='column_b')
.metric('column_c', 'avg', field='column_c')
.pipeline('a_gap', 'serial_diff', buckets_path='column_a')
.pipeline('b_gap', 'serial_diff', buckets_path='column_b')
.pipeline('c_gap', 'serial_diff', buckets_path='column_c')
.pipeline('sum_a_gap', 'sum_bucket', buckets_path='date>a_gap')
.pipeline('sum_b_gap', 'sum_bucket', buckets_path='date>b_gap')
.pipeline('sum_c_gap', 'sum_bucket', buckets_path='date>c_gap')

提前谢谢了!

最佳答案

最终,我明白了。我更改了一些订单,结果与预期的一样。这将在同一层以及“date”下的其他指标和管道聚合“id”,“date”和“sum _ {} _ gap”。

self._search.aggs
.bucket('statistics', 'terms', field='id')
.pipeline('sum_a_gap', 'sum_bucket', buckets_path='date>a_gap')
.pipeline('sum_b_gap', 'sum_bucket', buckets_path='date>b_gap')
.pipeline('sum_c_gap', 'sum_bucket', buckets_path='date>c_gap')
.bucket('date', 'date_histogram', field='date',
interval='1d', min_doc_count=0, format='yyyy-MM-dd')
.metric('column_a', 'avg', field='column_a')
.metric('column_b', 'avg', field='column_b')
.metric('column_c', 'avg', field='column_c')
.pipeline('a_gap', 'serial_diff', buckets_path='column_a')
.pipeline('b_gap', 'serial_diff', buckets_path='column_b')
.pipeline('c_gap', 'serial_diff', buckets_path='column_c')

关于python - 如何在Elasticsearch DSL(elasticsearch-dsl-py)中使用存储桶,指标或管道用于多层聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59134250/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com