gpt4 book ai didi

elasticsearch - Elasticsearch在确切的字符串上不匹配

转载 作者:行者123 更新时间:2023-12-02 22:44:39 25 4
gpt4 key购买 nike

我创建了一个类别索引,并带有完成提示,并且它的行为与我的预期不符。

curl -XPUT http://localhost:9200/categories/category/_mapping -d '{
"category" : {
"properties" : {
"categoryDescription" : {
"type" : "string"
},
"suggest" : {
"type" : "completion",
"analyzer" : "simple",
"search_analyzer" : "simple",
"payloads" : true
}
}
}
}'

我有一个索引为“墨西哥杂货店”的类别,当我搜索该字符串时,我的命中率为零,并且只有建议的结果:
{
"query":{
"fuzzy":{
"categoryDescription":{
"value":"mexican grocery store"
}
}
},
"from":0,
"size":20,
"suggest":{
"category-suggest":{
"text":"mexican grocery store",
"completion":{
"field":"suggest","fuzzy":{"fuzziness":2}
}
}
}
}

{
"took":19,
"timed_out":false,
"_shards":{"total":5,"successful":5,"failed":0},
"hits":{
"total":0,"max_score":null,"hits":[]
},
"suggest":{
"category-suggest":[
{
"text":"mexican grocery store",
"offset":0,
"length":21,
"options":[
{
"text":"Mexican Grocery Store",
"score":1.0,
"payload":{"id":5915028960051200}
}
]
}
]
}
}

我不仅会获得完全匹配的零匹配,而且当我输入字符串“墨西哥”时,在“墨西哥”类别之前还会列出一串带有“医疗”一词的类别,这对我来说没有任何意义要么。
{
"query":{
"fuzzy":{
"categoryDescription":{
"value":"mexican"
}
}
},
"from":0,
"size":20,
"suggest":{
"category-suggest":{
"text":"mexican",
"completion":{
"field":"suggest","fuzzy":{"fuzziness":2}
}
}
}
}

{
"took":11,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":25,
"max_score":3.8085938,
"hits":[
{
"_index":"categories",
"_type":"category",
"_id":"4993638215974912",
"_score":3.8085938,
"_source":{
"id":4993638215974912,
"categoryDescription":"Medical Spa",
"suggest":{
"input":["Medical Spa"],
"output":"Medical Spa",
"payload":{"id":4993638215974912}}}},
{"_index":"categories","_type":"category","_id":"6401013099528192","_score":3.8085938,"_source":{"id":6401013099528192,"categoryDescription":"Medical School","suggest":{"input":["Medical School"],"output":"Medical School","payload":{"id":6401013099528192}}}},{"_index":"categories","_type":"category","_id":"4712163239264256","_score":3.4429123,"_source":{"id":4712163239264256,"categoryDescription":"Medical Examiner","suggest":{"input":["Medical Examiner"],"output":"Medical Examiner","payload":{"id":4712163239264256}}}},{"_index":"categories","_type":"category","_id":"5978800634462208","_score":3.4429123,"_source":{"id":5978800634462208,"categoryDescription":"Medical Center","suggest":{"input":["Medical Center"],"output":"Medical Center","payload":{"id":5978800634462208}}}},{"_index":"categories","_type":"category","_id":"5415850681040896","_score":3.4429123,"_source":{"id":5415850681040896,"categoryDescription":"Medical Clinic","suggest":{"input":["Medical Clinic"],"output":"Medical Clinic","payload":{"id":5415850681040896}}}},{"_index":"categories","_type":"category","_id":"4852900727619584","_score":2.75433,"_source":{"id":4852900727619584,"categoryDescription":"Medical Billing Service","suggest":{"input":["Medical Billing Service"],"output":"Medical Billing Service","payload":{"id":4852900727619584}}}},{"_index":"categories","_type":"category","_id":"5352079006629888","_score":2.4411354,"_source":{"id":5352079006629888,"categoryDescription":"Mexican Restaurant","suggest":{"input":["Mexican Restaurant"],"output":"Mexican Restaurant","payload":{"id":5352079006629888}}}},{"_index":"categories","_type":"category","_id":"5915028960051200","_score":2.143557,"_source":{"id":5915028960051200,"categoryDescription":"Mexican Grocery Store","suggest":{"input":["Mexican Grocery Store","shop"],"output":"Mexican Grocery Store","payload":{"id":5915028960051200}}}},{"_index":"categories","_type":"category","_id":"6392217006505984","_score":2.0527549,"_source":{"id":6392217006505984,"categoryDescription":"Latin American Restaurant","suggest":{"input":["Latin American Restaurant"],"output":"Latin American Restaurant","payload":{"id":6392217006505984}}}},{"_index":"categories","_type":"category","_id":"5149768867119104","_score":2.0527549,"_source":{"id":5149768867119104,"categoryDescription":"Occupational Medical Physician","suggest":{"input":["Occupational Medical Physician"],"output":"Occupational Medical Physician","payload":{"id":5149768867119104}}}},{"_index":"categories","_type":"category","_id":"5157465448513536","_score":2.0527549,"_source":{"id":5157465448513536,"categoryDescription":"Central American Restaurant","suggest":{"input":["Central American Restaurant"],"output":"Central American Restaurant","payload":{"id":5157465448513536}}}},{"_index":"categories","_type":"category","_id":"6479078425100288","_score":2.0527549,"_source":{"id":6479078425100288,"categoryDescription":"American Football Field","suggest":{"input":["American Football Field"],"output":"American Football Field","payload":{"id":6479078425100288}}}},{"_index":"categories","_type":"category","_id":"4789129053208576","_score":1.9529084,"_source":{"id":4789129053208576,"categoryDescription":"Mexican Goods Store","suggest":{"input":["Mexican Goods Store","shop"],"output":"Mexican Goods Store","payload":{"id":4789129053208576}}}},{"_index":"categories","_type":"category","_id":"5275113192685568","_score":1.9138902,"_source":{"id":5275113192685568,"categoryDescription":"Medical Laboratory","suggest":{"input":["Medical Laboratory"],"output":"Medical Laboratory","payload":{"id":5275113192685568}}}},{"_index":"categories","_type":"category","_id":"5838063146106880","_score":1.7436681,"_source":{"id":5838063146106880,"categoryDescription":"Medical Group","suggest":{"input":["Medical Group"],"output":"Medical Group","payload":{"id":5838063146106880}}}},{"_index":"categories","_type":"category","_id":"4649491076481024","_score":1.7436681,"_source":{"id":4649491076481024,"categoryDescription":"American Restaurant","suggest":{"input":["American Restaurant"],"output":"American Restaurant","payload":{"id":4649491076481024}}}},{"_index":"categories","_type":"category","_id":"5458456756617216","_score":1.5311122,"_source":{"id":5458456756617216,"categoryDescription":"Traditional American Restaurant","suggest":{"input":["Traditional American Restaurant"],"output":"Traditional American Restaurant","payload":{"id":5458456756617216}}}},{"_index":"categories","_type":"category","_id":"6183309797228544","_score":1.5311122,"_source":{"id":6183309797228544,"categoryDescription":"Public Medical Center","suggest":{"input":["Public Medical Center"],"output":"Public Medical Center","payload":{"id":6183309797228544}}}},{"_index":"categories","_type":"category","_id":"6706677332049920","_score":1.5311122,"_source":{"id":6706677332049920,"categoryDescription":"Native American Goods Store","suggest":{"input":["Native American Goods Store","shop"],"output":"Native American Goods Store","payload":{"id":6706677332049920}}}},{"_index":"categories","_type":"category","_id":"6119538122817536","_score":1.3949344,"_source":{"id":6119538122817536,"categoryDescription":"Medical Supply Store","suggest":{"input":["Medical Supply Store","shop"],"output":"Medical Supply Store","payload":{"id":6119538122817536}}}}]},"suggest":{"category-suggest":[{"text":"mexican","offset":0,"length":7,"options":[{"text":"Medical Billing Service","score":1.0,"payload":{"id":4852900727619584}},{"text":"Medical Center","score":1.0,"payload":{"id":5978800634462208}},{"text":"Medical Clinic","score":1.0,"payload":{"id":5415850681040896}},{"text":"Medical Examiner","score":1.0,"payload":{"id":4712163239264256}},{"text":"Medical Group","score":1.0,"payload":{"id":5838063146106880}}]}]}}

最佳答案

您将字段categoryDescription索引为string,因此Elasticsearch在您的输入上运行其标准分析器,并将Mexican Grocery Store转换为三个 token [mexican, grocery, store]
fuzzy查询属于术语查询的族,也就是说,它在术语级别运行,并且不通过任何分析器运行。使用Mexican Grocery Store输入的模糊查询将尝试将这些单词作为一个词而不是三个不同的词进行匹配。它找不到任何东西,因为完整的短语在索引中不作为一个术语存在。您可以将一个未分析的子字段添加到categoryDescription或仅使用小写的 token 过滤器,然后对该字段运行模糊查询以产生“完全匹配”。

对于第二部分,模糊查询不区分已修改的匹配项(应用了模糊性的匹配项)和精确匹配项。
在执行实际搜索之前,将模糊词与给定字段中所有词的列表进行内部匹配并进行扩展。在您的示例中,它变成了类似

"boolean": {
"should": [
{
"term": {
"categoryDescription": "medical"
}
},
{
"term": {
"categoryDescription": "mexican"
}
}
]
}

由此可见,为什么会返回像 Medical Spa这样的东西。这些匹配的得分也比 Mexican Grocery Store高,因此它们将被首先返回。我怀疑这是由于术语频率(医学出现的频率比墨西哥出现的频率高)引起的,但是应该在启用 explain的情况下再次运行查询,以确切了解为什么得分更高。

如果要对模糊匹配应用惩罚,则可以将模糊查询和术语查询包装到 bool(boolean) 查询中:
{
"query": {
"boolean": {
"should": [
{
"fuzzy": {
"categoryDescription": "mexican"
}
},
{
"term": {
"categoryDescription": "mexican"
}
}
]
}
}
}

这将减少仅 fuzzy部分匹配一半的文档的分数(由于 bool(boolean) 查询的协调因子)。

关于elasticsearch - Elasticsearch在确切的字符串上不匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37259182/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com