gpt4 book ai didi

json - Elasticsearch:聚合,按字段计数

转载 作者:行者123 更新时间:2023-12-02 22:30:52 24 4
gpt4 key购买 nike

我将此数据插入了 flex 搜索:

[
{ "name": "Cassandra Irwin", "location": "Monzon de Campos" .. },
{ "name": "Gayle Mooney", "location": "Villarroya del Campo" .. },
{ "name": "Angelita Charles", "location": "Revenga de Campos" .. },
{ "name": "Sheppard Sweet", "location": "Santiago del Campo" .. },
..
..

旁注:重现:
1)下载: http://wmo.co/20160928_es_query/bulk.json
2)执行:curl -s -XPOST' http://localhost:9200/testing/external/_bulk?pretty'--data-binary @ bulk.json

问题:获取每个“位置”有多少记录的计数。

解决方案1:桶聚合..没有给出期望的结果
curl -s -XPOST 'localhost:9200/testing/_search?pretty' -d '
{
"aggs": { "location_count": { "terms": { "field":"location", "size":100 }}}
}' | jq '.aggregations'

结果:
{"location_count":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,
"buckets":[
{"key":"campo", "doc_count":47},
{"key":"del", "doc_count":47},
{"key":"campos", "doc_count":29},
{"key":"de", "doc_count":29},
{"key":"villarroya","doc_count":15},
{"key":"torre", "doc_count":12},
{"key":"monzon", "doc_count":11},
{"key":"santiago", "doc_count":11},
{"key":"pina", "doc_count":9},
{"key":"revenga", "doc_count":9},
{"key":"uleila", "doc_count":9}
]}}

问题:它将“位置”字段拆分为单词,然后每个单词返回文档计数。

解决方案2:预期的结果,但性能令人担忧。

我可以使用此查询来做到这一点,提取所有位置并在jq(每个方便的JSON cli工具)中进行汇总,
但这在应用于大量数据时可能会成为性能噩梦:
curl -s -XPOST 'localhost:9200/testing/_search?pretty' -d '
{
"query": { "wildcard": { "location": "*" } }, "size":1000,
"_source": ["location"]
}' | jq '[.hits.hits[] |
{location:._source.location,"count":1}] |
group_by(.location) |
map({ key: .[0].location, value: map(.count)|add })'

结果:
[
{ "key": "Monzon de Campos", "value": 11 },
{ "key": "Pina de Campos", "value": 9 },
{ "key": "Revenga de Campos", "value": 9 },
{ "key": "Santiago del Campo", "value": 11 },
{ "key": "Torre del Campo", "value": 12 },
{ "key": "Uleila del Campo", "value": 9 },
{ "key": "Villarroya del Campo", "value": 15 }
]

这是我想要的确切结果。

问题:如何通过 flex 搜索查询获得相同的结果?
(即,通过 flex 搜索而不是jq处理聚合)

最佳答案

您需要在not_analyzed字段中添加location子字段。

首先像这样修改您的映射:

curl -XPOST 'http://localhost:9200/testing/_mapping/external' -d '{
"properties": {
"location": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}'

然后再次为您的数据重新编制索引:
curl -s -XPOST 'http://localhost:9200/testing/external/_bulk?pretty' --data-binary @bulk.json

最后,您将能够像这样(在 location.raw字段上)运行查询并获得您期望的结果:
curl -s -XPOST 'localhost:9200/testing/_search?pretty' -d '
{
"aggs": { "location_count": { "terms": { "field":"location.raw", "size":100 }}}
}' | jq '.aggregations'

关于json - Elasticsearch:聚合,按字段计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39741180/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com