gpt4 book ai didi

elasticsearch - ElasticSearch:关联顺序错误

转载 作者:行者123 更新时间:2023-12-02 23:13:55 24 4
gpt4 key购买 nike

这是我的索引的映射:

{
"itens" : {
"mappings" : {
"properties" : {
"card_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}

当我运行此搜索时:
GET itens/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "camisa",
"_name": "camisa"
}
}
},
{
"match": {
"name": {
"query": "flamengo",
"_name": "flamengo"
}
}
},
{
"match": {
"name": {
"query": "edição",
"_name": "edição"
}
}
},
{
"match": {
"name": {
"query": "torcedor",
"_name": "torcedor"
}
}
}
]
}
}
}

我得到以下结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : 3.2621913,
"hits" : [
{
"_index" : "itens",
"_type" : "_doc",
"_id" : "lDJ-5WwBSsI9bleNzslS",
"_score" : 3.2621913,
"_source" : {
"card_id" : "centauro",
"name" : "Bola Nike Edição Flamengo"
},
"matched_queries" : [
"edição",
"flamengo"
]
},
{
"_index" : "itens",
"_type" : "_doc",
"_id" : "lzKB5WwBSsI9bleNeMnt",
"_score" : 3.0658486,
"_source" : {
"card_id" : "centauro",
"name" : "Camisa do Flamengo Vermelha Edição 100 Anos"
},
"matched_queries" : [
"camisa",
"edição",
"flamengo"
]
},
{
"_index" : "itens",
"_type" : "_doc",
"_id" : "yV4q0WwB-vWXMqGoqMdJ",
"_score" : 2.7421699,
"_source" : {
"card_id" : "centauro",
"name" : "Camisa Flamengo 2019 Masculina Modelo Torcedor"
},
"matched_queries" : [
"camisa",
"torcedor",
"flamengo"
]
},
...and some others...

我的问题是:为什么第二个和第三个结果的排序要比第一个结果低(得分较低),我该如何解决?

第二个和第三个结果都具有3个匹配的查询,而第一个结果只有2个。这显然是不正确的相关性顺序,因为第二个和第三个结果与我的搜索的相关性比第一个要大。

我找到了 this ElasticSearch doc about relevancies that looks wrong,并尝试使用 _search?search_type=dfs_query_then_fetch进行搜索,但是得到的结果相同。

编辑:

我为具有相同映射关系的测试创建了一个新索引,并插入了我谈论过的以下3个文档: Bola Nike Edição FlamengoCamisa do Flamengo Vermelha Edição 100 AnosCamisa Flamengo 2019 Masculina Modelo Torcedor

我运行了相同的查询,结果与预期的一样正确。因此,我认为也许只有在这些3之外还有其他文件时才会出现问题。因此,我将原始索引中的其他文件插入“bang!”,问题再次出现。

我只需要插入2个其他文件即可重复该问题: Camisa Palmeiras 2019 Masculina Modelo TorcedorCamisa Internacional 2019 Masculina Modelo Torcedor

我的搜索结果是这样的:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 1.6201596,
"hits" : [
{
"_index" : "teste",
"_type" : "_doc",
"_id" : "nzKM8mwBSsI9bleNrsmM",
"_score" : 1.6201596,
"_source" : {
"card_id" : "some place",
"name" : "Bola Nike Edição Flamengo"
},
"matched_queries" : [
"edição",
"flamengo"
]
},
{
"_index" : "teste",
"_type" : "_doc",
"_id" : "gCaO8mwBepmixz6CaMCt",
"_score" : 1.5693209,
"_source" : {
"card_id" : "some place",
"name" : "Camisa do Flamengo Vermelha Edição 100 Anos"
},
"matched_queries" : [
"camisa",
"edição",
"flamengo"
]
},
{
"_index" : "teste",
"_type" : "_doc",
"_id" : "fyaN8mwBepmixz6CQcBc",
"_score" : 1.3466781,
"_source" : {
"card_id" : "some place",
"name" : "Camisa Flamengo 2019 Masculina Modelo Torcedor"
},
"matched_queries" : [
"camisa",
"torcedor",
"flamengo"
]
},
{
"_index" : "teste",
"_type" : "_doc",
"_id" : "gSaP8mwBepmixz6CbsDW",
"_score" : 0.8151792,
"_source" : {
"card_id" : "some place",
"name" : "Camisa Palmeiras 2019 Masculina Modelo Torcedor"
},
"matched_queries" : [
"camisa",
"torcedor"
]
},
{
"_index" : "teste",
"_type" : "_doc",
"_id" : "giaP8mwBepmixz6C4MCL",
"_score" : 0.8151792,
"_source" : {
"card_id" : "some place",
"name" : "Camisa Internacional 2019 Masculina Modelo Torcedor"
},
"matched_queries" : [
"camisa",
"torcedor"
]
}
]
}
}

我使用 ?explain=true运行搜索,结果太长了,无法在此处粘贴,但是我将在结果中粘贴前两个文档的说明:
{
"_shard" : "[teste][0]",
"_node" : "xnRySBw_T7Kjsl4wAa_2yg",
"_index" : "teste",
"_type" : "_doc",
"_id" : "nzKM8mwBSsI9bleNrsmM",
"_score" : 1.6201596,
"_source" : {
"card_id" : "some place",
"name" : "Bola Nike Edição Flamengo"
},
"matched_queries" : [
"edição",
"flamengo"
],
"_explanation" : {
"value" : 1.6201596,
"description" : "sum of:",
"details" : [
{
"value" : 0.6173784,
"description" : "weight(name:flamengo in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.6173784,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.5389965,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 3,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 5,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.52064633,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 4.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 5.8,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 1.0027812,
"description" : "weight(name:edição in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 1.0027812,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.87546873,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 2,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 5,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.52064633,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 4.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 5.8,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
},
{
"_shard" : "[teste][0]",
"_node" : "xnRySBw_T7Kjsl4wAa_2yg",
"_index" : "teste",
"_type" : "_doc",
"_id" : "gCaO8mwBepmixz6CaMCt",
"_score" : 1.5693209,
"_source" : {
"card_id" : "some place",
"name" : "Camisa do Flamengo Vermelha Edição 100 Anos"
},
"matched_queries" : [
"camisa",
"edição",
"flamengo"
],
"_explanation" : {
"value" : 1.5693209,
"description" : "sum of:",
"details" : [
{
"value" : 0.26523292,
"description" : "weight(name:camisa in 1) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.26523292,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.2876821,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 4,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 5,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.41907516,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 7.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 5.8,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 0.4969361,
"description" : "weight(name:flamengo in 1) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.4969361,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.5389965,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 3,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 5,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.41907516,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 7.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 5.8,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 0.80715185,
"description" : "weight(name:edição in 1) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.80715185,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.87546873,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 2,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 5,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.41907516,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 7.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 5.8,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
}

我不知道要在这里找什么。我知道的是,第一个结果的得分应该比第二个更低。

最佳答案

I found this ElasticSearch doc about relevancies that looks wrong and I tried to search with _search?search_type=dfs_query_then_fetch, but it gets me the same results.



Elasticsearch 7.0版将默认的主分片数量更改为1。因此,只要您没有明确指定其他数字,就不会再有此问题。在查询结果中,您可以看到默认值只有一个碎片: "_shards" : { "total" : 1

首先,让我们创建一个最小的可复制示例。

对应:
PUT itens
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
}
}

示例文件:
PUT itens/_doc/1
{
"name": "Bola Nike Edição Flamengo"
}
PUT itens/_doc/2
{
"name": "Camisa do Flamengo Vermelha Edição 100 Anos"
}
PUT itens/_doc/3
{
"name": "Camisa Flamengo 2019 Masculina Modelo Torcedor"
}

我正在使用您上面提供的查询,并得到以下结果:
"hits" : [
{
"_index" : "itens",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.5471338,
"_source" : {
"name" : "Camisa Flamengo 2019 Masculina Modelo Torcedor"
},
"matched_queries" : [
"camisa",
"torcedor",
"flamengo"
]
},
{
"_index" : "itens",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.97927666,
"_source" : {
"name" : "Camisa do Flamengo Vermelha Edição 100 Anos"
},
"matched_queries" : [
"camisa",
"edição",
"flamengo"
]
},
{
"_index" : "itens",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6860854,
"_source" : {
"name" : "Bola Nike Edição Flamengo"
},
"matched_queries" : [
"edição",
"flamengo"
]
}
]

因此,通过最少的示例,您将获得期望的结果。

要调试查询所发生的情况,请将 ?explain=true参数添加到查询中,以使整行看起来像 GET itens/_search?explain=true。这将增加很多输出,但是应该更好地解释那里发生的事情。请将该问题添加到您的原始问题中,如果结果不清楚,请添加评论,以便我们再看看。

关于elasticsearch - ElasticSearch:关联顺序错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57734922/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com