gpt4 book ai didi

elasticsearch - Elasticsearch 7.8嵌套聚合未返回正确的数据

转载 作者:行者123 更新时间:2023-12-02 23:07:55 24 4
gpt4 key购买 nike

我一直在努力一个星期,试图从Elasticsearch嵌套的聚集索引中获取正确的数据。下面是我的索引映射和两个示例文档。我想找到的是:

  • 将所有文档与 xforms.sentence.tokens.value字段等于匹配24
  • 在匹配的文档集中进行按 xforms.sentence.tokens.tag 分组的匹配计数,其中 xforms.sentence.tokens.value 等于 24
  • li

    因此,作为示例,在插入的文档下面,我期望的输出是:
    {“JJ”:1,“NN”:1}
    {
    "_doc": {
    "_meta": {},
    "_source": {},
    "properties": {
    "originalText": {
    "type": "text"
    },
    "testDataId": {
    "type": "text"
    },
    "xforms": {
    "type": "nested",
    "properties": {
    "sentence": {
    "type": "nested"
    },
    "predicate": {
    "type": "nested"
    }
    }
    },
    "corpusId": {
    "type": "text"
    },
    "row": {
    "type": "text"
    },
    "batchId": {
    "type": "text"
    },
    "processor": {
    "type": "text"
    }
    }
    }
    }
    插入的示例文档如下:
    {
    "_id": "28",
    "_source": {
    "testDataId": "5e97e9bef033448b893e485baa0fdf15",
    "originalText": "Some text with the word 24",
    "xforms": [{
    "sentence": {
    "tokens": [{
    "lemma": "Some",
    "index": 1,
    "after": " ",
    "tag": "JJ",
    "value": "Some"
    },
    {
    "lemma": "text",
    "index": 2,
    "after": " ",
    "tag": "NN",
    "value": "text"
    },
    {
    "lemma": "with",
    "index": 3,
    "after": " ",
    "tag": "NN",
    "value": "with"
    },
    {
    "lemma": "the",
    "index": 4,
    "after": "",
    "tag": "CD",
    "value": "the"
    },
    {
    "lemma": "word",
    "index": 5,
    "after": " ",
    "tag": "CC",
    "value": "word"
    },
    {
    "lemma": "24",
    "index": 6,
    "after": " ",
    "tag": "JJ",
    "value": "24"
    }
    ],
    "type": "RAW"
    },
    "originalSentence": "Some text with the word 24 in it",
    "id": "e724611d8c024bcb8f0158b60e3df87e"
    }]
    }
    },
    {
    "_id": "56",
    "_source": {
    "testDataId": "5e97e9bef033448b893e485baa0fad15",
    "originalText": "24 word",
    "xforms": [{
    "sentence": {
    "tokens": [{
    "lemma": "24",
    "index": 1,
    "after": " ",
    "tag": "NN",
    "value": "24"
    },
    {
    "lemma": "word",
    "index": 2,
    "after": " ",
    "tag": "JJ",
    "value": "word"
    }
    ],
    "type": "RAW"
    },
    "originalSentence": "24 word",
    "id": "e724611d8c024bcb8f0158b60e3d123"
    }]
    }
    }

    最佳答案

    扩展@Gibbs的答案,@ N Kiram,您还需要将tokens设置为nested:

    {
    "xforms":{
    "type":"nested",
    "properties":{
    "sentence":{
    "type":"nested",
    "properties":{
    "tokens":{ <----
    "type":"nested"
    }
    }
    },
    "predicate":{
    "type":"nested"
    }
    }
    }
    }
    然后,只有这样,您的aggs才会产生正确的计数:
    {
    "aggregations":{
    "xforms":{
    "doc_count":8,
    "inner":{
    "doc_count":2,
    "tag_count":{
    "doc_count_error_upper_bound":0,
    "sum_other_doc_count":0,
    "buckets":[
    {
    "key":"JJ",
    "doc_count":1
    },
    {
    "key":"NN",
    "doc_count":1
    }
    ]
    }
    }
    }
    }
    }
    注意:您必须重新索引才能应用更改的映射。

    关于elasticsearch - Elasticsearch 7.8嵌套聚合未返回正确的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63431950/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com