gpt4 book ai didi

filter - 要求与ElasticSearch中的文本进行多次匹配

转载 作者:行者123 更新时间:2023-12-02 22:33:31 24 4
gpt4 key购买 nike

我正在尝试针对ElasticSearch创建一个过滤器,该过滤器需要多个匹配项才能返回结果。例如,在以下文本中:

If you're uneasy at the idea of riding in a vehicle that drives itself, just wait till you see Google's new car. It has no gas pedal, no brake and no steering wheel. Google has been demonstrating its driverless technology for several years by retrofitting Toyotas, Lexuses and other cars with cameras and sensors. But now, for the first time, the company has unveiled a prototype of its own: a cute little car that looks like a cross between a VW Beetle and a golf cart.



如果将最小匹配数设置为2并搜索 Google,则我希望得到此结果,因为 Google在文本中出现了两次。但是,在本文中搜索具有相同预期数目的 Toyota不会导致结果。

如何构造此过滤器?

最佳答案

可能不完全是您要查找的内容,但是您可以在查询中添加解释,然后在客户端按字词匹配次数进行过滤。从文档中,查询将如下所示:

GET /_search?explain 
{
"query" : { "match" : { "tweet" : "honeymoon" }}
}

结果如下所示:
"_explanation": { 
"description": "weight(tweet:honeymoon in 0)
[PerFieldSimilarity], result of:",
"value": 0.076713204,
"details": [
{
"description": "fieldWeight in 0, product of:",
"value": 0.076713204,
"details": [
{
"description": "tf(freq=1.0), with freq of:",
"value": 1,
"details": [
{
"description": "termFreq=1.0",
"value": 1
}
]
},
{
"description": "idf(docFreq=1, maxDocs=1)",
"value": 0.30685282
},
{
"description": "fieldNorm(doc=0)",
"value": 0.25,
}
]
}
]
}

然后,您可以在描述字段中过滤术语频率,并查找> 1的值。

我相信您可以使用脚本直接进行此操作(无需客户端过滤),因为您可以引用术语频率:
Term statistics:

Term statistics for a field can be accessed with a subscript operator like this: _index['FIELD']['TERM']. This will never return null, even if term or field does not exist. If you do not need the term frequency, call _index['FIELD'].get('TERM', 0) to avoid uneccesary initialization of the frequencies. The flag will have only affect is your set the index_options to docs (see mapping documentation).

_index['FIELD']['TERM'].df()
df of term TERM in field FIELD. Will be returned, even if the term is not present in the current document.
_index['FIELD']['TERM'].ttf()
The sum of term frequencys of term TERM in field FIELD over all documents. Will be returned, even if the term is not present in the current document.
_index['FIELD']['TERM'].tf()
tf of term TERM in field FIELD. Will be 0 if the term is not present in the current document.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

但是,我还没有做到这一点,使用服务器端脚本时,安全性和性能都存在正常的顾虑。

关于filter - 要求与ElasticSearch中的文本进行多次匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23935502/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com