gpt4 book ai didi

node.js - 有没有一种方法可以使用elasticsearch为每个匹配字段仅返回一次命中?

转载 作者:太空宇宙 更新时间:2023-11-04 00:22:53 25 4
gpt4 key购买 nike

注意:已更新以包含 NodeJS 客户端详细信息。请参阅下面的编辑。

我试图避免重复查询 ElasticSearch 来获取我需要的信息。

假设我有一个包含城市事件的数据集。数据集中的文档可能如下所示:

{
city: 'Berlin',
event: 'Dance party',
date: '2017-04-15'
},
{
city: 'Seattle',
event: 'Wine tasting',
date: '2017-04-18'
},
{
city: 'Berlin',
event: 'Dance party,
date: '2017-04-21'
},
{
city: 'Hong Kong',
event: 'Theater',
date: '2017-04-25'
}...

现在假设所有跟踪城市的列表已知,我需要获取每个城市的最新事件。因此,我需要能够向查询提供一系列城市名称,类似于 ['Berlin', 'Hong Kong', 'Seattle'] 并仅返回最后三个事件。

我当前的查询只能通过以 1 的大小重复运行并精确匹配城市名称来完成此操作,如下所示:

{
size: 1,
body: {
sort: [
{'date': {'order': 'desc'}}
],
query: {
'match_phrase': {'city': 'Berlin'}
}
}
}

有没有一种方法可以编写脚本,以便我可以将整个城市列表传递到一个查询中,并按预期仅获取每个城市的最新条目?

编辑

我的新脚本如下所示:

{
'query': {
'match_all': {}
},
'_source': ['city', 'event', 'date'],
'aggs': {
'cities': {
'terms': {
'field': 'city',
'size': 100
},
'aggs': {
'top_cities': {
'top_hits': {
'size': 1,
'_source': 'event',
'sort': {
'date': 'desc'
}
}
}
}
}
}
}

这看起来确实应该有效。但我仍然缺少大量我知道的城市,其中一个城市出现了多次。

我在 Node 中使用 elasticsearch-js 包运行它。客户端是这样执行的:

let client = new elasticSearch.Client(
{
"host": [
"host1:9200",
"host2:9200",
"host3:9200"
]
}
);
client.search(SEARCH_PARAMS)
.then(function (resp) {
console.log(JSON.stringify(resp));
});

这是生成的 JSON 的(经过清理的)版本:

{
"took": 77,
"timed_out": false,
"_shards": {
"total": 42,
"successful": 42,
"failed": 0
},
"hits": {
"total": 5685608,
"max_score": 1,
"hits": [{
"_index": "sanitized",
"_type": "sanitized",
"_id": "AVu489lVgqYk_9QxQb-U",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-15",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized",
"_id": "AVu489lVgqYk_9QxQb-X",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-15",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_1",
"_id": "AVu489lVgqYk_9QxQb-a",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu489lVgqYk_9QxQb-b",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu489lVgqYk_9QxQb-d",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Hong Kong"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu489lVgqYk_9QxQb-f",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Hong Kong"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu49AkKCe9swQD44WnN",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Seattle"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu49AkKCe9swQD44WnP",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "New York"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_1",
"_id": "AVu49AkKCe9swQD44WnY",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu49AkKCe9swQD44Wnb",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Berlin"
}
}]
}
}

经过仔细检查,由于某种原因,聚合没有被添加到 resp 对象中。

最佳答案

除了过滤查询中的城市之外,我建议在城市字段上使用 terms 聚合,然后使用 top_hits 子聚合来检索每个城市的最新事件:

{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"cities": {
"terms": {
"field": "city",
"size": 100
},
"aggs": {
"top_events": {
"top_hits": {
"size": 1,
"_source": "event",
"sort": {
"date": "desc"
}
}
}
}
}
}
}

关于node.js - 有没有一种方法可以使用elasticsearch为每个匹配字段仅返回一次命中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43926928/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com