gpt4 book ai didi

elasticsearch - 使完整单词比Edge NGram子集得分更高

转载 作者:行者123 更新时间:2023-12-02 22:50:39 25 4
gpt4 key购买 nike

我试图在匹配全名的文档上获得更高的分数,而不是具有相同值的Edge NGram子集。

结果是:

Pos Name              _score       _id

1 Baritone horn 7.56878 1786
2 Baritone ukulele 7.56878 2313
3 Bari 7.56878 2360
4 Baritone voice 7.56878 1787

我本想使第三个(“Bari”)具有较高的分数,因为它是全名,但是,由于边缘ngram分解将使所有其他单词都具有完全被“bari”单词索引的索引。这样您就可以在结果表上看到所有分数都相等了,我什至都不知道 flex 搜索如何排序,因为_id甚至不是顺序的,也不是名称的顺序。

我该如何实现?

谢谢

示例“代码”

设定值
{
"analysis": {
"filter": {
"edgeNGram_filter": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"edgeNGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"edgeNGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}

source

对应:
{
"name": {
"type": "string",
"index": "not_analyzed"
},
"suggest": {
"type": "completion",
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer",
"payloads": true
}
}

查询:
POST /attribute-tree/attribute/_search
{
"query": {
"match": {
"suggest": "Bari"
}
}
}

结果:

(仅留下相关数据)
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 7.56878,
"hits": [
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "1786",
"_score": 7.56878,
"_source": {
"name": "Baritone horn",
"suggest": {
"input": [
"Baritone",
"horn"
],
"output": "Baritone horn"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "2313",
"_score": 7.56878,
"_source": {
"name": "Baritone ukulele",
"suggest": {
"input": [
"Baritone",
"ukulele"
],
"output": "Baritone ukulele"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "2360",
"_score": 7.56878,
"_source": {
"name": "Bari",
"suggest": {
"input": [
"Bari"
],
"output": "Bari"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "1787",
"_score": 7.568078,
"_source": {
"name": "Baritone voice",
"suggest": {
"input": [
"Baritone",
"voice"
],
"output": "Baritone voice"
}
}
}
]
}
}

最佳答案

您可以使用bool查询运算符及其should子句将分数添加到完全匹配项中,如下所示:

POST /attribute-tree/attribute/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"suggest": "Bari"
}
}
],
"should": [
{
"match": {
"name": "Bari"
}
}
]
}
}
}

should子句中的查询在 ElasticSearch definitive guide中称为signal子句,这是您可以区分完全匹配和ngram匹配的方式。您将拥有与must子句匹配的所有文档,但是由于 should查询的得分公式,与 bool查询匹配的文档的得分更高:
score = ("must" queries total score + matching "should" queries total score) / (total number of "must" queries and "should" queries)

结果就是您所期望的,Bari是第一个结果(在得分上遥遥领先:)):
"hits": {
"total": 3,
"max_score": 0.4339554,
"hits": [
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "2360",
"_score": 0.4339554,
"_source": {
"name": "Bari",
"suggest": {
"input": [
"Bari"
],
"output": "Bari"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "1786",
"_score": 0.04500804,
"_source": {
"name": "Baritone horn",
"suggest": {
"input": [
"Baritone",
"horn"
],
"output": "Baritone horn"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "2313",
"_score": 0.04500804,
"_source": {
"name": "Baritone ukulele",
"suggest": {
"input": [
"Baritone",
"ukulele"
],
"output": "Baritone ukulele"
}
}
}
]

关于elasticsearch - 使完整单词比Edge NGram子集得分更高,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32581426/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com