作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我使用用于自动完成的edge_ngram标记生成器的结果很奇怪。我试图弄清楚如何使我的结果更相关。我从elasticsearch文档中复制了example。
我有以下说明的文档:
apple
,则“APPLEBEE'S,chili”的得分要高于“无皮苹果”
apples
,则“婴儿食品,水果,苹果酱,初中”的得分要高于“苹果,生的,金黄的,有皮的苹果”
apple
或apples
时,包含单词apples
的结果应比APPLEBEE'S
或applesauce
更高的分数。
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase",
"asciifolding"
]
},
"autocomplete_search": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter"
]
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
查询:
"query": {
"match": {
"description": {
"query": "apple",
"operator": "and"
}
}
}
如何使相关性更高的得分更高?
最佳答案
由于新的BM25算法(用于评分)中称为(dl)的匹配字段的长度而导致发生此问题,您可以轻松地在查询中使用explain param来详细了解它
http://{{hostname}}:{{port}}//_search?explain=true
APPLEBEE'S, chili
的长度最短,因此得分更高,这是此文档的tf得分
{
"value": 0.5344296,
"description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details": [
{
"value": 1.0,
"description": "freq, occurrences of term within document",
"details": []
},
{
"value": 1.2,
"description": "k1, term saturation parameter",
"details": []
},
{
"value": 0.75,
"description": "b, length normalization parameter",
"details": []
},
{
"value": 11.0,
"description": "dl, length of field", ---> note this
"details": []
},
{
"value": 17.333334,
"description": "avgdl, average length of field",
"details": []
}
]
}
解决方案
english
分析器的字段,如
multi-fields示例所示,以下是完整示例
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase",
"asciifolding"
]
},
"autocomplete_search": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter"
]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
}
}
并索引您的样本文档
{
"name" : "Apples, raw, without skin"
}
{
"name" : "APPLEBEE'S, chili"
}
{
"name" : "Babyfood, fruit, applesauce, junior"
}
{
"name" : "Apples, raw, golden delicious, with skin"
}
并搜索查询
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "apple",
"fields": [
"name.english",
"name"
]
}
}
]
}
}
}
和搜索结果,请注意包含apple
的文档的得分更高
"hits": [
{
"_index": "edgelow",
"_type": "_doc",
"_id": "1",
"_score": 0.6747451,
"_source": {
"name": "Apples, raw, without skin"
}
},
{
"_index": "edgelow",
"_type": "_doc",
"_id": "4",
"_score": 0.60996956,
"_source": {
"name": "Apples, raw, golden delicious, with skin"
}
},
{
"_index": "edgelow",
"_type": "_doc",
"_id": "2",
"_score": 0.12822598,
"_source": {
"name": "APPLEBEE'S, chili"
}
},
{
"_index": "edgelow",
"_type": "_doc",
"_id": "3",
"_score": 0.09446116,
"_source": {
"name": "Babyfood, fruit, applesauce, junior"
}
}
]
关于elasticsearch - 如何使较短(较近)的 token 匹配更相关? (edge_ngram),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64530450/
我是一名优秀的程序员,十分优秀!