I would like to create a visualizer by summing up duration field after retrieving max id per group in Elasticsearch. For example:
我想创建一个可视化工具后,在Elasticearch中检索每个组的最大ID后总结持续时间字段。例如:
Data is:
数据为:
id |
workflow |
sid |
duration |
1 |
A |
x1 |
1m |
1 |
A |
x2 |
2m |
2 |
A |
x1 |
2m |
2 |
A |
x2 |
3m |
1 |
B |
y1 |
1m |
1 |
B |
y2 |
2m |
2 |
B |
y1 |
2m |
2 |
B |
y2 |
3m |
3 |
B |
y1 |
4m |
3 |
B |
y2 |
2m |
Given the table below, expected returned data as follows, which is max of id per workflow and sum up the duration.
给出下表,预期返回的数据如下,即每个工作流的最大id数和持续时间之和。
id |
workflow |
total |
2 |
A |
5m |
3 |
B |
6m |
I'm new to Elasticsearch query and Kibana. Appreciate it if you can provide a pointer how to resolve my problem statement.
我对Elasticearch Query和Kibana不熟悉。如果你能提供一个如何解决我的问题陈述的指针,我将不胜感激。
{
"size": 0,
"aggs": {
"my-bucket": {
"terms": {
"field": "workflow"
},
"aggs": {
"max_id": {
"max": {
"field": "id"
}
}
}
}
}
}
I have the search query above with expected bucket of workflow and max id #. How to use the max id # to retrieve the sid and sum up the duration.
我有上面的搜索查询,具有预期的工作流桶和最大ID号。如何使用最大id#来检索SID并汇总时长。
更多回答
Instead of finding max_id, it might be easier to sort all buckets by id and only show the top one:
与查找max_id相比,按id对所有存储桶进行排序并只显示最上面的一个存储桶可能更容易:
DELETE test
PUT test
{
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1
},
"mappings": {
"properties": {
"id": {
"type": "long"
},
"duration_min": {
"type": "integer"
},
"sid": {
"type": "keyword"
},
"workflow": {
"type": "keyword"
}
}
}
}
POST test/_bulk?refresh
{"index":{}}
{"id": 1, "workflow": "A", "sid": "x1", "duration_min": 1}
{"index":{}}
{"id": 1, "workflow": "A", "sid": "x2", "duration_min": 2}
{"index":{}}
{"id": 2, "workflow": "A", "sid": "x1", "duration_min": 2}
{"index":{}}
{"id": 2, "workflow": "A", "sid": "x2", "duration_min": 3}
{"index":{}}
{"id": 1, "workflow": "B", "sid": "y1", "duration_min": 1}
{"index":{}}
{"id": 1, "workflow": "B", "sid": "y2", "duration_min": 2}
{"index":{}}
{"id": 2, "workflow": "B", "sid": "y1", "duration_min": 2}
{"index":{}}
{"id": 2, "workflow": "B", "sid": "y2", "duration_min": 3}
{"index":{}}
{"id": 3, "workflow": "B", "sid": "y1", "duration_min": 4}
{"index":{}}
{"id": 3, "workflow": "B", "sid": "y2", "duration_min": 2}
GET test/_search
{
"size": 0,
"aggs": {
"by_workflow": {
"terms": {
"field": "workflow"
},
"aggs": {
"by_id": {
"terms": {
"field": "id"
},
"aggs": {
"sids": {
"terms": {
"field": "sid"
}
},
"duration_sum": {
"sum": {
"field": "duration_min"
}
},
"sales_bucket_sort": {
"bucket_sort": {
"sort": [
{ "_key": { "order": "desc" } }
],
"size": 1
}
}
}
}
}
}
}
}
This is another approach that I have learned from Elastic Stack community.
这是我从Elastic Stack社区学到的另一种方法。
GET test/_search
{
"size": 0,
"aggs": {
"workflow": {
"terms": {
"field": "workflow"
},
"aggs": {
"ids": {
"terms": {
"field": "id",
"order": { "max_id": "desc" },
"size": 1
},
"aggs": {
"max_id": {
"max": {
"field": "id"
}
},
"sum_duration": {
"sum": {
"field": "duration"
}
}
}
}
}
}
}
}
更多回答
Thanks for your sharing, it helps my understanding.
谢谢你的分享,这有助于我的理解。
Heh. Not sure what I was thinking here. Yeah, your solution is much better. :)
呵呵。不知道我在想什么。是啊,你的解决方案好多了。:)
I think you should remove "max_id": { "max": { "field": "id" } }
since this value is already available from the parent agg and accept this as a solution.
我认为您应该删除“max_id”:{“max”:{“field”:“id”}},因为这个值已经在父agg中可用,并接受它作为解决方案。
我是一名优秀的程序员,十分优秀!