gpt4 book ai didi

c# - ElasticSearch搜索结果不佳

转载 作者:行者123 更新时间:2023-12-03 02:01:29 25 4
gpt4 key购买 nike

我对ElasticSearch相当陌生,在获取我认为不错的搜索结果时遇到了问题。我的目标是能够根据用户输入的短语搜索药物索引(6个字段)。可能是一个或多个字。我尝试了几种方法,但下面将概述迄今为止找到的最佳方法。让我知道我在做什么错。我猜我缺少基本的东西。

这是我正在处理的字段的子集

...
"hits": [
{
"_index": "indexus2",
"_type": "Medication",
"_id": "17471",
"_score": 8.829264,
"_source": {
"SearchContents": " chew chewable oral po tylenol",
"MedShortDesc": "Tylenol PO Chew",
"MedLongDesc": "Tylenol Oral Chewable"
"GenericDesc": "ACETAMINOPHEN ORAL"
...
}
}
...

我要搜索的字段使用Edge NGram分析器。我正在使用C#Nest库进行索引
 settings.Analysis.Tokenizers.Add("edgeNGram", new EdgeNGramTokenizer()
{
MaxGram = 50,
MinGram = 2,
TokenChars = new List<string>() { "letter", "digit" }
});

settings.Analysis.Analyzers.Add("edgeNGramAnalyzer", new CustomAnalyzer()
{
Filter = new string[] { "lowercase" },
Tokenizer = "edgeNGram"
});

我正在对有问题的字段使用more_like_this查询
GET indexus2/Medication/_search
{
"query": {
"more_like_this" : {
"fields" : ["MedShortDesc",
"MedLongDesc",
"GenericDesc",
"SearchContents"],
"like_text" : "vicodin",
"min_term_freq" : 1,
"max_query_terms" : 25,
"min_word_len": 2
}
}
}

问题在于,对于“vicodin”的搜索,我希望首先看到与完整作品相匹配的内容,但我没有。这是该查询结果的子集。维柯丁直到第7个结果才出现
"hits": [
{
"_index": "indexus2",
"_type": "Medication",
"_id": "31192",
"_score": 4.567309,
"_source": {
"SearchContents": " oral po victrelis",
"MedShortDesc": "Victrelis PO",
"MedLongDesc": "Victrelis Oral",
"RepresentativeRoutedGenericDesc": "BOCEPREVIR ORAL",
...
}
}
<5 more similar results>
{
"_index": "indexus2",
"_type": "Medication",
"_id": "26198",
"_score": 2.2836545,
"_source": {
"SearchContents": " (original 5 500 feeding mg strength) tube via vicodin",
"MedShortDesc": "Vicodin 5 mg-500 mg (Original Strength) via feeding tube",
"MedLongDesc": "Vicodin 5 mg-500 mg (Original Strength) via feeding tube",
"GenericDesc": "HYDROCODONE BITARTRATE/ACETAMINOPHEN ORAL",
...
}
}

场映射
"OrderableMedLongDesc": {
"type": "string",
"analyzer": "edgeNGramAnalyzer"
},
"OrderableMedShortDesc": {
"type": "string",
"analyzer": "edgeNGramAnalyzer"
},
"RepresentativeRoutedGenericDesc": {
"type": "string",
"analyzer": "edgeNGramAnalyzer"
},
"SearchContents": {
"type": "string",
"analyzer": "edgeNGramAnalyzer"
},

这是ES为我的分析器设置显示的内容
          "analyzer": {
"edgeNGramAnalyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "edgeNGram"
}
},
"tokenizer": {
"edgeNGram": {
"min_gram": "2",
"type": "edgeNGram",
"max_gram": "50"
}
}

最佳答案

按照上面的映射,edgeNGramAnalyzer是字段的search-analyzer,结果搜索查询也将获得“edge ngrammed”。您可能不希望这样。

更改映射以仅将index_analyzer选项设置为edgeNgramAnalyzer

然后,search_analyzer将默认为standard

例:

"SearchContents": {
"type": "string",
"index_analyzer": "edgeNGramAnalyzer"
},

关于c# - ElasticSearch搜索结果不佳,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32075256/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com