gpt4 book ai didi

java - ElasticSearch 聚合 - 获取时间序列中最大直方图值的确切时间

转载 作者:太空宇宙 更新时间:2023-11-04 13:51:25 24 4
gpt4 key购买 nike

我对 Elasticsearch 还很陌生,所以如果这是一个微不足道的问题,我深表歉意。

我有一个时间序列,每 n 秒不规则更新一次,我想绘制历史记录。数据包含一个名为“score”的长变量,以及一个名为“time”的长变量,每个“分数”作为时间戳。

为了减少长时间尺度图中的点数(例如一整年),我想将数据汇总到 256 个桶中,并使用每个桶的最大“分数”值;但是,我需要保留每个分数的原始时间戳,而不是存储桶的开头。

我设法通过运行以下查询来获取存储桶:

curl -XGET 'http://localhost:9200/localhost.localdomain/SET_APPS/_search' -d'
{
"query" : {
"range" : {
"time" : {
"from" : 1429010378445,
"to" : 1431602378445,
"include_lower" : true,
"include_upper" : true
}
}
},
"aggregations" : {
"time_hist" : {
"histogram" : {
"field" : "time",
"interval" : 10125000,
"order" : {
"_count" : "asc"
},
"min_doc_count" : 0,
"extended_bounds" : {
"min" : 1429010378445,
"max" : 1431602378445
}
},
"aggregations" : {
"max_score" : {
"max" : {
"field" : "score"
}
}
}
}
}
}
}'

但是,我只获取存储桶的时间戳,而我需要分数的原始时间:

{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 4,
"failed": 1,
"failures": [{
"index": "localhost.localdomain",
"shard": 2,
"status": 500,
"reason
": "QueryPhaseExecutionException[[localhost.localdomain][2]: query[filtered(time:[1429010378445 TO 1431602378445])->cache(_type:SET_APPS)],from[0],size
[10]: Query Failed [Failed to execute main query]]; nested: IllegalStateException[unexpected docvalues type NONE for field 'score' (expected one of [S
ORTED_NUMERIC, NUMERIC]). Use UninvertingReader or index with docvalues.]; "
}]
},
"hits": {
"total": 2018,
"max_score": 1.0,
"hits": [{
"_index": "localhost.localdomain",
"_type": "SET_APPS",
"_id": "AU09dUBR80Hb_Fungv_r",
"_score": 1.0,
"_source": {
time: 1431255203918,
score: 6027
}
}, {
"_index": "localhost.localdomain",
"_type": "SET_APPS",
"_id": "AU09c7MS80Hb_Fungv_X",
"_score": 1.0,
"_source": {
time: 1431255102221,
score: 5518,
}
}
....

]
},
"aggregations": {
"time_hist": {
"buckets": [{
"key": 1429002000000,
"doc_count": 0,
"max_score": {
"value": null
}
},
......
{
"key": 1431249750000,
"doc_count": 215,
"max_score": {
"value": 8564.0,
"value_as_string": "8564.0"
}
}, {
"key": 1431280125000,
"doc_count": 228,
"max_score": {
"value": 18602.0,
"value_as_string": "18602.0"
}
}, {
"key": 1431259875000,
"doc_count": 658,
"max_score": {
"value": 17996.0,
"value_as_string": "17996.0"
}
}, {
"key": 1431270000000,
"doc_count": 917,
"max_score": {
"value": 17995.0,
"value_as_string": "17995.0"
}
}]
}
}
}

在上面的结果中,如果我们专门查询分数 18602,我们会得到真实的时间戳:

$ curl -XGET 'http://localhost:9200/localhost.localdomain/SET_APPS/_search' -d'
{
"fields": [ "time", "score" ],
"query" : {
"term": {
"score": "18602"
}
}
}'
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"localhost.localdomain
","_type":"SET_APPS","_id":"AU0-90Vsi-vs_2ajcYu-","_score":1.0,"fields":{"score":[18602],"time":[1431280502124]}}]}}

感谢任何帮助!

最佳答案

我想我找到了解决方案:

$ curl -XGET 'http://localhost:9200/localhost.localdomain/SET_APPS/_search?pretty=true' -d'
{
"size":0,
"query" : {
"constant_score" : {
"filter" : {
"range" : {
"time" : {
"gte" : 1457868375000,
"lt" : 1460460375000
}
}
}
}
},
"aggregations" : {
"time_hist" : {
"histogram" : {
"field" : "time",
"interval" : 10125000,
"order" : {
"_count" : "asc"
},
"min_doc_count" : 0,
"extended_bounds" : {
"min" : 1429010378445,
"max" : 1431602378445
}
},
"aggregations" : {
"max_time": {
"terms": {
"field":"time",
"order" : {
"max_score": "desc"
},
"size":1

},
"aggregations":{
"max_score" : {
"max" : {
"field" : "score"
}
}
}
}
}
}
}
}
}' > foo

这似乎产生了预期的效果:

  
...

"aggregations" : {
"time_hist" : {
"buckets" : [ {
"key" : 1429002000000,
"doc_count" : 0,
"max_time" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}, {
"key" : 1429012125000,
"doc_count" : 0,
"max_time" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
},

...
{
"key" : 1431249750000,
"doc_count" : 270,
"max_time" : {
"doc_count_error_upper_bound" : -1,
"sum_other_doc_count" : 269,
"buckets" : [ {
"key" : 1431255810484,
"doc_count" : 1,
"max_score" : {
"value" : 8564.0,
"value_as_string" : "8564.0"
}
} ]
}
}, {
"key" : 1431280125000,
"doc_count" : 285,
"max_time" : {
"doc_count_error_upper_bound" : -1,
"sum_other_doc_count" : 284,
"buckets" : [ {
"key" : 1431280502124,
"doc_count" : 1,
"max_score" : {
"value" : 18602.0,
"value_as_string" : "18602.0"
}
} ]
}
}, {
"key" : 1431259875000,
"doc_count" : 821,
"max_time" : {
"doc_count_error_upper_bound" : -1,
"sum_other_doc_count" : 820,
"buckets" : [ {
"key" : 1431269132642,
"doc_count" : 1,
"max_score" : {
"value" : 17996.0,
"value_as_string" : "17996.0"
}
} ]
}
}, {
"key" : 1431270000000,
"doc_count" : 1155,
"max_time" : {
"doc_count_error_upper_bound" : -1,
"sum_other_doc_count" : 1154,
"buckets" : [ {
"key" : 1431278681884,
"doc_count" : 1,
"max_score" : {
"value" : 17995.0,
"value_as_string" : "17995.0"
}
} ]
}
} ]
}
}
}

这是生成此内容的 Java 代码...

public synchronized List<Pair<Long, Long>> 
getScores(Calendar start, Calendar finish, int maxUniqueScoreEntries)
throws IOException
{
List<Pair<Long, Long>> retVal = new ArrayList<>(maxUniqueScoreEntries);
try
{
long startTimeMs = start.getTimeInMillis();
long finishTimeMs = finish.getTimeInMillis();

Pair<Long, Long> firstVal = new Pair<Long, Long>(start.getTimeInMillis(), 0L);
retVal.add(firstVal);

SearchRequestBuilder srb = client.prepareSearch()
.setIndices(solutionName)
.setTypes(ThreadMgrWebSocketsSvc.Subprotocols.SET_APPS.toString())
.setQuery(QueryBuilders.rangeQuery("time").from(startTimeMs).to(finishTimeMs))
.addAggregation(
AggregationBuilders.histogram("time_hist").minDocCount(0).field("time").order(Order.COUNT_ASC)
.extendedBounds(startTimeMs, finishTimeMs)
.interval((finishTimeMs - startTimeMs) / maxUniqueScoreEntries)
.subAggregation(
AggregationBuilders.terms("max_time")
.field("time")
.order(Terms.Order.aggregation("max_score", false))
.size(1)
.subAggregation(
AggregationBuilders.max("max_score").field("score"))
)
);

SearchResponse sr = srb.execute().actionGet();

Histogram timeHist = sr.getAggregations().get("time_hist");
List<? extends Bucket> timeHistBuckets = timeHist.getBuckets();
for (int i = 0, len = timeHistBuckets.size(); i < len; i++)
{
Long epochTime = null;
Long maxScore = null;

Histogram.Bucket maxScoreBucket = timeHistBuckets.get(i);

Terms maxTimeTermAgg = maxScoreBucket.getAggregations().get("max_time");

List<Terms.Bucket> buckets = maxTimeTermAgg.getBuckets();

for (int j = 0, jlen = buckets.size(); j < jlen; j++)
{
Terms.Bucket bucket = buckets.get(j);

epochTime = bucket.getKeyAsNumber().longValue();
Aggregation agg = bucket.getAggregations().get("max_score");

if (agg instanceof Max)
{
double value = ((Max) agg).getValue();
if (value > 0)
{
maxScore = (long) ((value > 0) ? value : 0);
}

}

}

if (epochTime != null && maxScore != null)
{
System.out.printf(" %d - Date = %s; rawTime = %d ; val = %d\n", i, new DateTime(epochTime).toString(),
epochTime, maxScore);

Pair<Long, Long> val = new Pair<>(epochTime, maxScore);
retVal.add(val);

}

}


System.out.printf("query was %s, %s \n", new DateTime(startTimeMs).toString(),
new DateTime(finishTimeMs).toString());

Pair<Long, Long> last = retVal.get(retVal.size() - 1);
if (last.getSecond().longValue() != finish.getTimeInMillis())
{
Pair<Long, Long> endVal = new Pair<Long, Long>(finish.getTimeInMillis(), 0L);
retVal.add(endVal);
}
}
catch (Exception e)
{
retVal.add(new Pair<Long, Long>(start.getTimeInMillis(), 0L));
retVal.add(new Pair<Long, Long>(finish.getTimeInMillis(), 0L));

}

Collections.sort(retVal);



return retVal;
}

关于java - ElasticSearch 聚合 - 获取时间序列中最大直方图值的确切时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30240291/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com