gpt4 book ai didi

lucene - Boost 在 Elasticsearch 上不起作用

转载 作者:行者123 更新时间:2023-12-02 05:43:55 25 4
gpt4 key购买 nike

我是 Elasticsearch 的新手,对 _score 的计算方式感到困惑。我试图通过阅读网上的一些论坛条目( herehere )来了解发生了什么,但仍然有一些疑问,无法完全解决我的问题。

瞄准

给定一些包含字段 titlecontent 的文档,查找与 query 匹配的文档,并增强 title 上的匹配 字段。

数据

PUT /sample/myType/1
{
"title": "Blabbertalk here",
"content": "Foobar here"
}

PUT /sample/myType/2
{
"title": "Foobar here",
"content": "Blabbertalk here"
}

查询

GET /sample/myType/_search
{
"query": {
"bool" : {
"should": [
{
"match" : {
"title" : {
"query": "Blabbertalk",
"fuzziness": 0.7,
"boost": 2
}
}
},
{
"match" : {
"content" : {
"query": "Blabbertalk",
"fuzziness": 0.7,
"boost": 1
}
}
}
],
"minimum_number_should_match": 1
}
}
}

结果

{
"took": 21,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.095891505,
"hits": [
{
"_index": "sample",
"_type": "myType",
"_id": "1",
"_score": 0.095891505,
"_source": {
"title": "Blabbertalk here",
"content": "Foobar here"
}
},
{
"_index": "sample",
"_type": "myType",
"_id": "2",
"_score": 0.095891505,
"_source": {
"title": "Foobar here",
"content": "Blabbertalk here"
}
}
]
}
}

问题

虽然这会在顶部返回正确的结果,但在另一种情况下却不会(这促使我问这个问题)。尽管如此,这个样本也有同样让我困惑的问题。

  1. 这两个文档的得分非常接近(事实上,太接近了)。我的期望是文档 1 的得分应该更高(大约翻倍),因为匹配发生在 title 字段
  2. 使用 ?explain 参数运行相同的查询会发现 boost 未应用于 _score 计算(见下文)。看来 Elasticsearch 确实能够识别增强因子(如以下行所示:"description": "weight(title:blabbertalk^2.0 in 0) [PerFieldSimilarity], result of:"),但是,进一步查看细节会发现分数计算中实际上没有考虑任何提升因素。
  3. 对于上述问题,我对title:blabbertalk^2.0 in 0表示怀疑。 in 0 到底是什么意思?我很确定这并不意味着“在 0 个文档中匹配”。 0 是否会抵消提升?如果是这样,有办法解决这个问题吗?

最后但并非最不重要的一点是,我意识到这可能不是 Elasticsearch 的问题,因为我相信 Elasticsearch 将这个评分和搜索任务委托(delegate)给后端的 Lucene。不过,我对 Lucene 也不太熟悉,所以如果有人能对此有所了解,我将非常感激。

非常感谢您花时间阅读这么长的问题并帮助我。非常感谢,非常感谢!



使用解释参数运行相同的查询

{
"took": 40,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.095891505,
"hits": [
{
"_shard": 2,
"_node": "NaOynONhQWSvUmH77e_L9w",
"_index": "sample",
"_type": "myType",
"_id": "1",
"_score": 0.095891505,
"_source": {
"title": "Blabbertalk here",
"content": "Foobar here"
},
"_explanation": {
"value": 0.095891505,
"description": "product of:",
"details": [
{
"value": 0.19178301,
"description": "sum of:",
"details": [
{
"value": 0.19178301,
"description": "weight(title:blabbertalk^2.0 in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.19178301,
"description": "fieldWeight in 0, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
},
{
"value": 0.5,
"description": "coord(1/2)"
}
]
}
},
{
"_shard": 3,
"_node": "NaOynONhQWSvUmH77e_L9w",
"_index": "sample",
"_type": "myType",
"_id": "2",
"_score": 0.095891505,
"_source": {
"title": "Foobar here",
"content": "Blabbertalk here"
},
"_explanation": {
"value": 0.095891505,
"description": "product of:",
"details": [
{
"value": 0.19178301,
"description": "sum of:",
"details": [
{
"value": 0.19178301,
"description": "weight(content:blabbertalk in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.19178301,
"description": "fieldWeight in 0, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
},
{
"value": 0.5,
"description": "coord(1/2)"
}
]
}
}
]
}
}

最佳答案

我认为 boost 参数应该放在匹配查询本身中,而不是放在字段中,例如:

{
"query": {
"bool" : {
"should": [
{
"match" : {
"title" : {
"query": "Blabbertalk",
"fuzziness": 0.7
},
"boost": 2
}
},
{
"match" : {
"content" : {
"query": "Blabbertalk",
"fuzziness": 0.7
},
"boost": 1
}
}
],
"minimum_number_should_match": 1
}
}
}

虽然你可以用 multi_match 来简化相反:

{
"multi_match" : {
"query": "Blabbertalk",
"type": "most_fields",
"fields": [ "title^2", "content" ],
"fuzziness": 0.7
}
}

关于lucene - Boost 在 Elasticsearch 上不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24690125/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com