gpt4 book ai didi

Elasticsearch - 每个文档的匹配计数

转载 作者:行者123 更新时间:2023-11-29 02:46:44 27 4
gpt4 key购买 nike

我正在使用此查询来搜索字段中出现的短语。

"query": {
"match_phrase": {
"content": "my test phrase"
}
}

我需要计算每个文档的每个短语发生了多少匹配(如果这可能的话?)

我考虑过聚合器,但认为它们不符合要求,因为它们会给我整个索引而不是每个文档的匹配数。

谢谢。

最佳答案

这可以通过使用 Script Fields 来实现/无痛脚本

您可以计算每个字段出现的次数并将其加起来以用于文档。

例子:

## Here's my test index with some sample values

POST t1/doc/1 <-- this has one occurence
{
"content" : "my test phrase"
}

POST t1/doc/2 <-- this document has 5 occurences
{
"content": "my test phrase ",
"content1" : "this is my test phrase 1",
"content2" : "this is my test phrase 2",
"content3" : "this is my test phrase 3",
"content4" : "this is my test phrase 4"

}

POST t1/doc/3
{
"content" : "my test new phrase"
}

现在使用脚本我可以计算每个字段的短语匹配。我对每个字段计数一次,但您可以修改脚本以对每个字段进行多次匹配。

显然,这里的缺点是您需要在脚本中提及文档中的每个字段,除非有一种我不知道的循环遍历文档字段的方法。

POST t1/_search
{
"script_fields": {
"phrase_Count": {
"script": {
"lang": "painless",
"source": """
int count = 0;

if(doc['content.keyword'].size() > 0 && doc['content.keyword'].value.indexOf(params.phrase)!=-1) count++;
if(doc['content1.keyword'].size() > 0 && doc['content1.keyword'].value.indexOf(params.phrase)!=-1) count++;
if(doc['content2.keyword'].size() > 0 && doc['content2.keyword'].value.indexOf(params.phrase)!=-1) count++;
if(doc['content3.keyword'].size() > 0 && doc['content3.keyword'].value.indexOf(params.phrase)!=-1) count++;
if(doc['content4.keyword'].size() > 0 && doc['content4.keyword'].value.indexOf(params.phrase)!=-1) count++;

return count;
""",
"params": {
"phrase": "my test phrase"
}
}
}
}
}

这将给我每个文档的短语计数作为脚本字段

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [
{
"_index" : "t1",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"phrase_Count" : [
5 <--- count of occurrences of the phrase in the document
]
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"phrase_Count" : [
1
]
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"fields" : {
"phrase_Count" : [
0
]
}
}
]
}
}

关于Elasticsearch - 每个文档的匹配计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45617814/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com