gpt4 book ai didi

elasticsearch - 计算查询中的所有字词

转载 作者:行者123 更新时间:2023-12-03 01:36:54 25 4
gpt4 key购买 nike

是否有可能为查询中的每个术语找到计数?

例如。我有以下声明要数:

(age == 20 || age == 30) && gender == 'male'

我想使用一次rest调用返回所有条件的总计数+子计数。

预期计数结果:
  • age == 20
  • age == 30
  • age == 20 || age == 30
  • gender == 'male'
  • (age == 20 || age == 30) && gender == 'male'

  • 针对此特定方案构建的示例搜索查询:
    {
    "query": {
    "bool": {
    "must": [
    {
    "bool": {
    "should": [
    {
    "term": {
    "age": { "value": 20,"boost": 1 } // count 1
    }
    },
    {
    "term": {
    "age": { "value": 30,"boost": 1 } // count 2
    }
    }
    ],
    "adjust_pure_negative": true, "boost": 1
    } // count 3
    },
    {
    "term": {
    "gender.keyword": { "value": "male", "boost": 1 } // count 4
    }
    }
    ],
    "adjust_pure_negative": true,
    "boost": 1
    } // count 5
    }
    }

    最佳答案

    已更新以计算任意条件

    根据您的评论,如果您的目标是能够计算结果集中的任意条件,则可以使用Filters Aggregation。通过让您定义使用查询来定义聚合结果中每个存储区的计数,从而可以实现此目的。这要求您为要捕获的每种可能的组合编写查询。如果您需要找出所有组合,那么最好返回单个存储桶计数并自己像下面原始解决方案中那样进行数学运算。对于您的情况,它看起来像这样:

    {
    "aggs": {
    "conditions": {
    "filters": {
    "filters": {
    "age == 20": {"term": {"age": 20}},
    "age == 30": {"term": {"age": 30}},
    "age == 20 || age == 30": {
    "bool": {
    "should": [
    {"term": {"age": 20}},
    {"term": {"age": 30}}
    ]
    }
    },
    "gender == male": {"term": {"gender.keyword": "male"}},
    "(age == 20 || age == 30) && gender == 'male'": {
    "bool": {
    "must": [
    {"term": {"gender.keyword": "male"}}
    ],
    "should": [
    {"term": {"age": 20}},
    {"term": {"age": 30}}
    ]
    }
    }
    }
    }
    }
    }
    }

    给出您的结果:
    {
    "aggregations": {
    "conditions": {
    "buckets": {
    "(age == 20 || age == 30) && gender == 'male'": {
    "doc_count": 12
    },
    "age == 20": {
    "doc_count": 8
    },
    "age == 20 || age == 30": {
    "doc_count": 19
    },
    "age == 30": {
    "doc_count": 11
    },
    "gender == male": {
    "doc_count": 12
    }
    }
    }
    }
    }

    编辑:原始答案未正确处理(A || B)

    您要查找的功能称为“聚合”,特别是 Terms Aggregation。字词汇总将计算结果集中与您的查询子句匹配的字段的每个可能值的文档数。您也可以嵌套聚合。因此,在下面的示例中,Elasticearch将找到与您的查询匹配的所有文档,然后计算与每个年龄段匹配的文档数量(20、30等),然后为每个年龄段计数与每种性别匹配的文档数量。然后,您可以进行数学运算以计算所需的不同组合。

    您的查询如下所示:
    {
    "query": {
    ...
    },
    "aggs": {
    "age": {
    "terms": {"field": "age"},
    "aggs": {
    "gender": {
    "terms": {"field": "gender"}
    }
    }
    },
    "gender_total": {"terms": {"field": "gender"}}
    }
    }

    结果看起来像这样:
    {
    "hits": { ... },
    "aggregations": {
    "gender_total": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "male",
    "doc_count": 12
    },
    {
    "key": "female",
    "doc_count": 7
    }
    ]
    },
    "age": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": 30,
    "doc_count": 11,
    "gender": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "male",
    "doc_count": 9
    },
    {
    "key": "female",
    "doc_count": 2
    }
    ]
    }
    },
    {
    "key": 20,
    "doc_count": 8,
    "gender": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "female",
    "doc_count": 5
    },
    {
    "key": "male",
    "doc_count": 3
    }
    ]
    }
    }
    ]
    }
    }
    }

    因此,例如,要计算 (age == 20 || age == 30) && gender == 'male'的计数,您可以执行类似以下python psuedo-code的操作:

    # Pull out the bucket objects for each aggregation
    age_buckets = result['aggregations']['age']['buckets']
    gender_buckets = result['aggregations']['gender_total']['buckets']

    # Get the bucket values we care about
    age_20 = [b for b in age_buckets if b['key'] == 20][0]
    age_30 = [b for b in age_buckets if b['key'] == 30][0]
    male = [b for b in gender_buckets if b['key'] == 'male'][0]

    # Get the sub-buckets
    age_20_male = [b for b in age_20['gender']['buckets'] if b['key'] == 'male'][0]
    age_30_male = [b for b in age_30['gender']['buckets'] if b['key'] == 'male'][0]

    # age == 20
    count_1 = age_20['doc_count']

    # age == 30
    count_2 = age_30['doc_count']

    # age == 20 || age == 30
    count_3 = count_1 + count_2

    # gender == 'male'
    count_4 = male['doc_count']

    # (age == 20 || age == 30) && gender == 'male'
    count = age_20_male['doc_count'] + age_30_male['doc_count']

    关于elasticsearch - 计算查询中的所有字词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51788979/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com