gpt4 book ai didi

elasticsearch - Elasticsearch 集群中相同查询的不同结果

转载 作者:行者123 更新时间:2023-12-02 22:23:25 25 4
gpt4 key购买 nike

我创建了一个具有 3 个节点的 Elasticsearch 集群,具有 3 个分片和 2 个副本。
当使用相同的数据命中相同的索引时,相同的查询会获取不同的结果。
现在的结果基本上是按 _score 字段 desc 排序的(我认为这是默认的排序方式),并且要求还希望结果按分数的 desc 顺序排序。
所以在这里我的问题是为什么相同的查询会产生不同的结果,然后如何将其纠正为每次使用相同的查询都具有相同的结果。

附上查询

    {
"from": 0,
"size": 10,
"query": {
"bool": {
"must": {
"bool": {
"must": {
"terms": {
"context": [
"my name"
]
}
},
"should": {
"multi_match": {
"query": "test",
"fields": [
"field1^2",
"field2^2",
"field3^3"
]
}
},
"minimum_should_match": "1"
}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"audiencecomb": [
"1235"
]
}
},
{
"terms": {
"consumablestatus": [
"1"
]
}
}
],
"minimum_should_match": "1"
}
}
}
}

}

最佳答案

可能的原因之一可能是分布式 IDF,默认情况下 Elastic 在每个分片上使用本地 IDF,以节省一些性能,这将导致跨集群的不同 idf。所以,你应该试试 ?search_type=dfs_query_then_fetch ,这将明确要求 Elastic 计算全局 IDF。

However, for performance reasons, Elasticsearch doesn’t calculate the IDF across all documents in the index. Instead, each shard calculates a local IDF for the documents contained in that shard.

Because our documents are well distributed, the IDF for both shards will be the same. Now imagine instead that five of the foo documents are on shard 1, and the sixth document is on shard 2. In this scenario, the term foo is very common on one shard (and so of little importance), but rare on the other shard (and so much more important). These differences in IDF can produce incorrect results.

In practice, this is not a problem. The differences between local and global IDF diminish the more documents that you add to the index. With real-world volumes of data, the local IDFs soon even out. The problem is not that relevance is broken but that there is too little data.

For testing purposes, there are two ways we can work around this issue. The first is to create an index with one primary shard, as we did in the section introducing the match query. If you have only one shard, then the local IDF is the global IDF.

The second workaround is to add ?search_type=dfs_query_then_fetch to your search requests. The dfs stands for Distributed Frequency Search, and it tells Elasticsearch to first retrieve the local IDF from each shard in order to calculate the global IDF across the whole index.



更多信息请看 here

关于elasticsearch - Elasticsearch 集群中相同查询的不同结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41909205/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com