gpt4 book ai didi

api - 在Elasticsearch上查询每种类型的最新文档

转载 作者:行者123 更新时间:2023-11-29 02:51:08 26 4
gpt4 key购买 nike

我试图在 Elasticsearch 上运行一个开始看起来像简单查询的东西,但我似乎无法获得我正在寻找的结果。

这是我正在尝试做的一个简短示例:

我有一个新闻数据库。每条新闻都包含来源、标题、时间戳和用户。

我想要获取给定用户的每个可用来源的最后(基于时间戳)标题。

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/news" -d '{
"mappings": {
"news": {
"properties": {
"source": { "type": "string", "index": "not_analyzed" },
"headline": { "type": "object" },
"timestamp": { "type": "date", "format": "date_hour_minute_second_millis" },
"user": { "type": "string", "index": "not_analyzed" }
}
}
}
}'

# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"news","_type":"news"}}
{"user": "John", "source": "CNN", "headline": "Great news", "timestamp": "2015-07-28T00:07:29.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "John", "source": "CNN", "headline": "More great news", "timestamp": "2015-07-28T00:08:23.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "John", "source": "ESPN", "headline": "Sports news", "timestamp": "2015-07-28T00:09:32.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "John", "source": "ESPN", "headline": "More sports news", "timestamp": "2015-07-28T00:10:35.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "Mary", "source": "Yahoo", "headline": "More news", "timestamp": "2015-07-28T00:11:54.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "Mary", "source": "Yahoo", "headline": "Crazy news", "timestamp": "2015-07-28T00:12:31.000"}
'

例如,我如何从 John 那里获得最近的 CNN 和最近的 ESPN 头条新闻?

我一直在研究多重搜索 API,但这意味着我需要事先了解所有来源(在本例中为 CNN 和 ESPN)。

最佳答案

首先,请注意,我必须将 headline 字段的映射更改为 string,因为在您的示例文档中,标题是 string s 而不是 objects。

因此,像下面这样的查询将检索到您期望的内容:

curl -XPOST "$ELASTICSEARCH_ENDPOINT/news/_search" -d '{
"size": 0,
"query": {
"filtered": {
"filter": {
"term": {
"user": "John" <--- filter for user=John
}
}
}
},
"aggs": {
"sources": {
"terms": {
"field": "source" <--- aggregate by source
},
"aggs": {
"latest": {
"top_hits": {
"size": 1, <--- only take the first...
"_source": [ <--- only the date and headline
"headline",
"timestamp"
],
"sort": {
"timestamp": "desc" <--- ...and only the latest hit
}
}
}
}
}
}
}'

这会产生这样的结果:

{
...
"aggregations" : {
"sources" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "CNN",
"doc_count" : 2,
"latest" : {
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [ {
"_index" : "news",
"_type" : "news",
"_id" : "AU7Sh3VDGDddn2ZNuDVl",
"_score" : null,
"_source":{
"headline": "More great news",
"timestamp": "2015-07-28T00:08:23.000"
},
"sort" : [ 1438042103000 ]
} ]
}
}
}, {
"key" : "ESPN",
"doc_count" : 2,
"latest" : {
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [ {
"_index" : "news",
"_type" : "news",
"_id" : "AU7Sh3VDGDddn2ZNuDVn",
"_score" : null,
"_source":{
"headline": "More sports news",
"timestamp": "2015-07-28T00:10:35.000"
},
"sort" : [ 1438042235000 ]
} ]
}
}
} ]
}
}
}

关于api - 在Elasticsearch上查询每种类型的最新文档,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31665142/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com