gpt4 book ai didi

elasticsearch - 获取 HitTest 门命中数组聚合的唯一文档数sum_other_doc_count

转载 作者:行者123 更新时间:2023-12-03 02:28:41 25 4
gpt4 key购买 nike

我有大量包含关键字值数组的文档(数百万):

对应:

{
"my_index": {
"mappings": {
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"keywords": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}

示例文件:
{
"id": "abc",
"keywords": ["cat", "dog", "person"]
}
{
"id": "def",
"keywords": ["tree", "person"]
}
{
"id": "ghi",
"keywords": ["person", "human"]
}
...

假设我获得了前3个关键字存储桶,其余的显示在“other”中,如下所示:
/GET /my_index/_search
{
"size": 0,
"track_total_hits": true,
"aggs": {
"keyword_buckets": {
"terms": {
"field": "keywords.keyword",
"size": 3
}
}
}
}

有2,232,121个文档,但我正在像以下这样操作:
{
"took": 256,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2232121,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"keyword_buckets": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 6250132,
"buckets": [
{
"key": "person",
"doc_count": 326552
},
{
"key": "human",
"doc_count": 326529
},
{
"key": "photograph",
"doc_count": 222190
}
]
}
}
}

我在“其他”存储桶中收到6,250,132个文档。我的期望是前3名和“其他”之和为2,232,121。用SQL术语来说,它将获取所有存储桶的 DISTINCT文档计数。

为此我需要执行的查询是什么?

最佳答案

Elasticsearch没有提供确切的doc_count。文档计数始终是近似值。这是因为按设计的Elasticsearch查询会查看每个碎片的热门词汇并将其合并。您可以阅读有关它的更多信息here

关于elasticsearch - 获取 HitTest 门命中数组聚合的唯一文档数sum_other_doc_count,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60488536/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com