gpt4 book ai didi

elasticsearch 使用 aggs 过滤数组数据

转载 作者:行者123 更新时间:2023-12-02 22:40:18 25 4
gpt4 key购买 nike

我使用 Elasticsearch 来存储我的生物数据。

我尝试使用过滤后的 aggs 进行查询,但返回的数据不是我想要的。
问题来自这样一个事实,即我为每个样本都有一个“d_”属性,它是一个数组。我只需要对该数组的某些元素进行聚合,但我无法过滤它们。

//我手动编辑数据以使其更易于理解,因此可能存在一些拼写错误

我的数据示例:

   [    {
"_index": "botanique",
"_type": "specimens",
"_id": "227CB8A3E2834AAEB50B1ECF6B672180",
"_score": 1,
"_source": {
....
"d_": [
{ // -------------- dont want this
"taxonid": "BB7C33A3126648D095BEDDABB0BD2758",
"scientificname": "Lastreopsis effusa",
"scientificnameauthorship": "(Sw.) Tindale"
},
{ // -------------- want this
"taxonid": "704FC303D7F74C02912D0FEB5C6FC55D",
"scientificname": "Parapolystichum effusum",
"scientificnameauthorship": "(sw.) copel."
}
]
}
} , {
"_index": "botanique",
"_type": "specimens",
"_id": "11A22DE8E4AD45BBAC7783E508079DCD",
"_score": 1,
"_source": {
....
"d_": [
{ // -------------- want this
"taxonid": "A94D243348DF4CAD926B6C3965D948A3",
"scientificname": "Parapolystichum effusum",
"scientificnameauthorship": "(Sw.) Ching",
} ,
{ // -------------- dont want this
"taxonid": "B01A89AA961A46F2984722C311DC2BDD",
"scientificname": "Lastreopsis effusa",
"scientificnameauthorship": "(willd. ex schkuhr) proctor"
}
]
}
},{
"_index": "botanique",
"_type": "specimens",
"_id": "1647F5E23D304EFAAB9D3E3BE80FD3CE",
"_score": 1,
"_source": {
...
"d_": [
{ // -------------- want this
"taxonid": "D70C4478D2B0437AA940994E98D696C5",
"scientificname": "Parapolystichum effusum",
"scientificnameauthorship": "(Sw.) Ching"
} ,
{ // -------------- dont want this
"taxonid": "011E5DA526FC4098953DBD1F9E5F4424",
"scientificname": "Lastreopsis effusa",
"scientificnameauthorship": "(Sw.) Tindale",
}
]
}
}
]

例如,我想要一个关于所有“d_.scientificnameauthorship”和“d_.taxonid”的 aggs,其中“d_.scientificname”等于“parapolystichum effusum”。
所以我应该(希望)得到“scientificnameauthorship”:“(sw.)copel。” , "(Sw.) Ching"但不是 "(willd. ex schkuhr) proctor"。我失败了……

我的查询:
{
"_source": ["d_" ],
"size": 3,
"query": {
"filtered": {"filter": {"bool": {"must": [{"term": {
"d_.scientificname": "parapolystichum effusum"
}}] } }}
},
"aggs": {
"scientificname": {
"terms": {
"field": "d_.scientificname",
"size": 1,
"include": {
"pattern": "parapolystichum effusum",
"flags": "CANON_EQ|CASE_INSENSITIVE"
}
},
"aggs": {
"scientificnameauthorship": {
"terms": {
"field": "d_.scientificnameauthorship",
"size": 10
}
}
}
}
}
}

返回的数据包括标本的所有“科学名称作者”
{
"aggregations": {
"scientificname": {
"buckets": [{
"key": "parapolystichum effusum",
"doc_count": 269,
"scientificnameauthorship": {
"buckets": [
{ // ------ want this
"key": "(sw.) ching",
"doc_count": 269
} ,
{ // ------ want this
"key": "(sw.) copel.",
"doc_count": 34
} ,
{ // ------ dont want this
"key": "(sw.) tindale",
"doc_count": 262
} ,
{ // ------ dont want this
"key": "(willd. ex schkuhr) proctor",
"doc_count": 7
} ,
{ // ------ dont want this
"key": "fée",
"doc_count": 2
}
]
}
}]
}
}
}
  • 如何在 aggs 查询中进行编辑?
  • 如何仅在 hits 中获取数组的项目?

  • 得到这个 :
    {   
    "hits": {
    "total": 269,
    "max_score": 1,
    "hits": [
    {
    "_index": "botanique",
    "_type": "specimens",
    "_id": "1647F5E23D304EFAAB9D3E3BE80FD3CE",
    "_score": 1,
    "_source": {
    ...
    "d_": [{ // -------------- want this
    "taxonid": "D70C4478D2B0437AA940994E98D696C5",
    "scientificname": "Parapolystichum effusum",
    "scientificnameauthorship": "(Sw.) Ching"
    }]
    }
    }
    }
    }

    而不是这个:
    {   
    "hits": {
    "total": 269,
    "max_score": 1,
    "hits": [
    {
    "_index": "botanique",
    "_type": "specimens",
    "_id": "1647F5E23D304EFAAB9D3E3BE80FD3CE",
    "_score": 1,
    "_source": {
    ...
    "d_": [
    { // -------------- want this
    "taxonid": "D70C4478D2B0437AA940994E98D696C5",
    "scientificname": "Parapolystichum effusum",
    "scientificnameauthorship": "(Sw.) Ching"
    } ,
    { // -------------- dont want this
    "taxonid": "011E5DA526FC4098953DBD1F9E5F4424",
    "scientificname": "Lastreopsis effusa",
    "scientificnameauthorship": "(Sw.) Tindale",
    }
    ]
    }
    }
    }
    }

    非常感谢你

    // 编辑 1

    我也尝试像这样在 aggs 中放置一个过滤器,但不起作用:
    {
    "query": {
    "filtered": {"filter": {"bool": {"must": [{"term": {
    "d_.scientificname": "parapolystichum effusum"
    }}] } }}
    },
    "aggs" : {
    "scientificname" : {
    "filter" : {"term": {
    "d_.scientificname": "parapolystichum effusum"
    }},
    "aggs": {
    "scientificnameauthorship": {
    "terms": {
    "field": "d_.scientificnameauthorship",
    "size": 10
    }
    }
    }
    }
    }
    }

    最佳答案

    您可以使用嵌套的聚合器作为父聚合器。然后在父聚合器中创建一个新的过滤聚合器来过滤数组(列表数据)并附加另一个子聚合器以进行术语聚合。
    https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-aggregations-bucket-nested-aggregation.html
    示例查询

    "filteredaggs" : {
    "nested" : {
    "path" : "D_"
    },
    "aggs" : {
    "maxdays" : {
    "filter" : {
    "terms" : {
    "scientificname" : ["xyz", "pqr"]
    }
    },
    "aggs" : {
    "myfinalaggregator" : {
    "terms" : {
    "field" : "scientificnameauthorship"
    }
    }
    }
    }
    }
    }

    希望这对你有用。

    关于elasticsearch 使用 aggs 过滤数组数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33626289/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com