gpt4 book ai didi

elasticsearch - elasticsearch 中低基数字段的缓慢聚合

转载 作者:行者123 更新时间:2023-12-02 22:14:21 25 4
gpt4 key购买 nike

我们使用 elasticsearch 7.2,我的映射中有两个字段

{
"state": {
"type": "long",
"store": true,
"null_value": 0
}
}

{
"csat": {
"type": "integer",
"store": true
}
}

csat 字段的平均聚合速度非常慢,而 state 字段的平均聚合速度非常快。这两个字段都只有 5 种类型的值。事实上 csat 是一个稀疏字段,而 state 是我所有文档(~220495625 个文档)中的一个字段

配置文件 API 在 AvgAggregator ES 类中显示缓慢,但除了从 docvalues 获取值外,我没有看到任何奇怪的地方。对于没有 csat 字段的文档,是否有可能获取文档值需要时间?很难说

这里是一个节点的hotthread

96.5% (482.4ms out of 500ms) cpu usage by thread 'elasticsearch[es7advcl02-14][search][T#8]'
10/10 snapshots sharing following 39 elements
app//org.apache.lucene.codecs.lucene80.IndexedDISI.advanceExact(IndexedDISI.java:399)
app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$SparseNumericDocValues.advanceExact(Lucene80DocValuesProducer.java:424)
app//org.elasticsearch.index.fielddata.FieldData$DoubleCastedValues.advanceExact(FieldData.java:446)
app//org.elasticsearch.index.fielddata.SingletonSortedNumericDoubleValues.advanceExact(SingletonSortedNumericDoubleValues.java:44)
app//org.elasticsearch.search.aggregations.metrics.AvgAggregator$1.collect(AvgAggregator.java:83)
app//org.elasticsearch.search.aggregations.LeafBucketCollector.collect(LeafBucketCollector.java:82)
app//org.apache.lucene.search.MatchAllDocsQuery$1$1.score(MatchAllDocsQuery.java:64)
app//org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
app//org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:652)
app//org.apache.lucene.search.XIndexSearcher.search(XIndexSearcher.java:44)
app//org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:177)
app//org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
app//org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:271)
app//org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:114)
app//org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$18(IndicesService.java:1305)
app//org.elasticsearch.indices.IndicesService$$Lambda$4388/0x0000000802064840.accept(Unknown Source)
app//org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$19(IndicesService.java:1362)
app//org.elasticsearch.indices.IndicesService$$Lambda$4389/0x0000000802064c40.get(Unknown Source)
app//org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:174)
app//org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:157)
app//org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433)
app//org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:123)
app//org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1368)
app//org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1302)
app//org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:333)
app//org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:360)
app//org.elasticsearch.search.SearchService.lambda$executeQueryPhase$1(SearchService.java:340)
app//org.elasticsearch.search.SearchService$$Lambda$4236/0x0000000802024040.apply(Unknown Source)
app//org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:145)
app//org.elasticsearch.action.ActionListener$$Lambda$3643/0x0000000801dab040.accept(Unknown Source)
app//org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62)
app//org.elasticsearch.search.SearchService$2.doRun(SearchService.java:1052)
app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
app//org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44)
app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758)
app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.base@12.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
java.base@12.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
java.base@12.0.1/java.lang.Thread.run(Thread.java:835)

我观察到的其他奇怪的事情是,当我提供以下查询时,csat avg 聚合得到改善

{
"query": {
"bool": {
"filter": {
"exists": "csat"
}
}
}
}

更新,

不仅是平均,对于 csat 而言,即使是术语聚合也很慢

最佳答案

基于上面的对话,建议在任何时候应用过滤器(existsdate rangesbucketing ranges ...)可以缩小你的 aggs,从而加快它们的速度。您可以通过检查响应中的 took 毫秒来观察过滤器的效果,同时放弃缓存和单个 hits,即:

GET my_index/_search?request_cache=false
{
"size": 0,
"query": {
"bool": {
"filter": {
"exists": {
"field": "csat"
}
}
}
},
"aggs": {
"avg_agg": {
"avg": {
"field": "csat"
}
}
}
}

关于elasticsearch - elasticsearch 中低基数字段的缓慢聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61301862/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com