gpt4 book ai didi

elasticsearch - 如何按字段过滤top_hits聚合

转载 作者:行者123 更新时间:2023-12-03 00:47:49 26 4
gpt4 key购买 nike

我在600M文档的很大索引上创建查询时遇到一些问题。我快解决了,但是我被卡住了。

我拥有的文件类型如下:

{
"first_name" : "John",
"last_name" : "Doe",
"company_domain" : "google",
"provider_a_id" : "1234",
"provider_b_id" : "14"
}

我需要为每个公司返回2位联系人,其中 provider_a_id与我之前获得的ID列表匹配。

我得出的这种汇总结果是,每个公司返回2个联系人:
{
"size": 0,
"aggs": {
"COMPANIES": {
"terms": {
"field": "company_domain.keyword",
"order": { "_key": "asc" },
"size": 2
},
"aggs": {
"EMPLOYEES": {
"top_hits": {
"size": 2
}
}
}
}
}
}

这很好,因为我可以解决一部分问题,但是问题是我现在还需要使用 provider_a_id缩小搜索范围。将需要执行以下操作:
        "EMPLOYEES": {
"top_hits": {
"size": 2
// provider_a_id is in [.......] // list with 10K Ids
}
}

你知道我该怎么解决吗?

最佳答案

您需要在top_hits之前使用过滤器聚合。
我已经过滤了一个值(条款),您可以使用条款聚合来过滤列表

制图

PUT testindex7/_mappings
{
"properties": {
"first_name" :{
"type": "text"
},
"last_name" : {
"type": "text"
},
"company_domain" :{
"type": "text",
"fields": {
"keyword":{
"type": "keyword"
}
}
},
"provider_a_id" : {
"type": "integer"
},
"provider_b_id" : {
"type": "integer"
}
}
}

数据:
 [
{
"_index" : "testindex7",
"_type" : "_doc",
"_id" : "OvU4OG0BCNyxVsPT3Xtn",
"_score" : 1.0,
"_source" : {
"first_name" : "a",
"last_name" : "b",
"company_domain" : "google",
"provider_a_id" : "100",
"provider_b_id" : "1"
}
},
{
"_index" : "testindex7",
"_type" : "_doc",
"_id" : "O_U5OG0BCNyxVsPTAHsD",
"_score" : 1.0,
"_source" : {
"first_name" : "c",
"last_name" : "d",
"company_domain" : "google",
"provider_a_id" : "101",
"provider_b_id" : "2"
}
},
{
"_index" : "testindex7",
"_type" : "_doc",
"_id" : "PPU5OG0BCNyxVsPTJ3tZ",
"_score" : 1.0,
"_source" : {
"first_name" : "e",
"last_name" : "f",
"company_domain" : "google",
"provider_a_id" : "102",
"provider_b_id" : "3"
}
}
]

查询:
GET testindex7/_search
{
"size": 0,
"aggs": {
"COMPANIES": {
"terms": {
"field": "company_domain.keyword",
"order": {
"_key": "asc"
},
"size": 2
},
"aggs": {
"EMPLOYEES": {
"filter": {
"terms": {
"provider_a_id": [100,101]
}
},
"aggs": {
"top_emps": {
"top_hits": {
"size": 2
}
}
}
}
}
}
}
}

结果:
"aggregations" : {
"COMPANIES" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "google",
"doc_count" : 3,
"EMPLOYEES" : {
"doc_count" : 2,
"top_emps" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "testindex7",
"_type" : "_doc",
"_id" : "OvU4OG0BCNyxVsPT3Xtn",
"_score" : 1.0,
"_source" : {
"first_name" : "a",
"last_name" : "b",
"company_domain" : "google",
"provider_a_id" : "100",
"provider_b_id" : "1"
}
},
{
"_index" : "testindex7",
"_type" : "_doc",
"_id" : "O_U5OG0BCNyxVsPTAHsD",
"_score" : 1.0,
"_source" : {
"first_name" : "c",
"last_name" : "d",
"company_domain" : "google",
"provider_a_id" : "101",
"provider_b_id" : "2"
}
}
]
}
}
}
}
]
}
}

关于elasticsearch - 如何按字段过滤top_hits聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57940895/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com