elasticsearch - ElasticSearch:关联顺序错误-6ren

elasticsearch - ElasticSearch:关联顺序错误

转载作者：行者123 更新时间：2023-12-02 23:13:55

这是我的索引的映射:

{
  "itens" : {
    "mappings" : {
      "properties" : {
        "card_id" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

当我运行此搜索时:

GET itens/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": {
              "query": "camisa",
              "_name": "camisa"
            }
          }
        },
        {
          "match": {
            "name": {
              "query": "flamengo",
              "_name": "flamengo"
            }
          }
        },
        {
          "match": {
            "name": {
              "query": "edição",
              "_name": "edição"
            }
          }
        },
        {
          "match": {
            "name": {
              "query": "torcedor",
              "_name": "torcedor"
            }
          }
        }
      ]
    }
  }
}

我得到以下结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : 3.2621913,
    "hits" : [
      {
        "_index" : "itens",
        "_type" : "_doc",
        "_id" : "lDJ-5WwBSsI9bleNzslS",
        "_score" : 3.2621913,
        "_source" : {
          "card_id" : "centauro",
          "name" : "Bola Nike Edição Flamengo"
        },
        "matched_queries" : [
          "edição",
          "flamengo"
        ]
      },
      {
        "_index" : "itens",
        "_type" : "_doc",
        "_id" : "lzKB5WwBSsI9bleNeMnt",
        "_score" : 3.0658486,
        "_source" : {
          "card_id" : "centauro",
          "name" : "Camisa do Flamengo Vermelha Edição 100 Anos"
        },
        "matched_queries" : [
          "camisa",
          "edição",
          "flamengo"
        ]
      },
      {
        "_index" : "itens",
        "_type" : "_doc",
        "_id" : "yV4q0WwB-vWXMqGoqMdJ",
        "_score" : 2.7421699,
        "_source" : {
          "card_id" : "centauro",
          "name" : "Camisa Flamengo 2019 Masculina Modelo Torcedor"
        },
        "matched_queries" : [
          "camisa",
          "torcedor",
          "flamengo"
        ]
      },
      ...and some others...

我的问题是:为什么第二个和第三个结果的排序要比第一个结果低(得分较低)，我该如何解决？

第二个和第三个结果都具有3个匹配的查询，而第一个结果只有2个。这显然是不正确的相关性顺序，因为第二个和第三个结果与我的搜索的相关性比第一个要大。

我找到了 this ElasticSearch doc about relevancies that looks wrong，并尝试使用 _search?search_type=dfs_query_then_fetch进行搜索，但是得到的结果相同。

编辑:

我为具有相同映射关系的测试创建了一个新索引，并插入了我谈论过的以下3个文档: Bola Nike Edição Flamengo， Camisa do Flamengo Vermelha Edição 100 Anos和 Camisa Flamengo 2019 Masculina Modelo Torcedor。

我运行了相同的查询，结果与预期的一样正确。因此，我认为也许只有在这些3之外还有其他文件时才会出现问题。因此，我将原始索引中的其他文件插入“bang!”，问题再次出现。

我只需要插入2个其他文件即可重复该问题: Camisa Palmeiras 2019 Masculina Modelo Torcedor和 Camisa Internacional 2019 Masculina Modelo Torcedor。

我的搜索结果是这样的:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 1.6201596,
    "hits" : [
      {
        "_index" : "teste",
        "_type" : "_doc",
        "_id" : "nzKM8mwBSsI9bleNrsmM",
        "_score" : 1.6201596,
        "_source" : {
          "card_id" : "some place",
          "name" : "Bola Nike Edição Flamengo"
        },
        "matched_queries" : [
          "edição",
          "flamengo"
        ]
      },
      {
        "_index" : "teste",
        "_type" : "_doc",
        "_id" : "gCaO8mwBepmixz6CaMCt",
        "_score" : 1.5693209,
        "_source" : {
          "card_id" : "some place",
          "name" : "Camisa do Flamengo Vermelha Edição 100 Anos"
        },
        "matched_queries" : [
          "camisa",
          "edição",
          "flamengo"
        ]
      },
      {
        "_index" : "teste",
        "_type" : "_doc",
        "_id" : "fyaN8mwBepmixz6CQcBc",
        "_score" : 1.3466781,
        "_source" : {
          "card_id" : "some place",
          "name" : "Camisa Flamengo 2019 Masculina Modelo Torcedor"
        },
        "matched_queries" : [
          "camisa",
          "torcedor",
          "flamengo"
        ]
      },
      {
        "_index" : "teste",
        "_type" : "_doc",
        "_id" : "gSaP8mwBepmixz6CbsDW",
        "_score" : 0.8151792,
        "_source" : {
          "card_id" : "some place",
          "name" : "Camisa Palmeiras 2019 Masculina Modelo Torcedor"
        },
        "matched_queries" : [
          "camisa",
          "torcedor"
        ]
      },
      {
        "_index" : "teste",
        "_type" : "_doc",
        "_id" : "giaP8mwBepmixz6C4MCL",
        "_score" : 0.8151792,
        "_source" : {
          "card_id" : "some place",
          "name" : "Camisa Internacional 2019 Masculina Modelo Torcedor"
        },
        "matched_queries" : [
          "camisa",
          "torcedor"
        ]
      }
    ]
  }
}

我使用 ?explain=true运行搜索，结果太长了，无法在此处粘贴，但是我将在结果中粘贴前两个文档的说明:

{
        "_shard" : "[teste][0]",
        "_node" : "xnRySBw_T7Kjsl4wAa_2yg",
        "_index" : "teste",
        "_type" : "_doc",
        "_id" : "nzKM8mwBSsI9bleNrsmM",
        "_score" : 1.6201596,
        "_source" : {
          "card_id" : "some place",
          "name" : "Bola Nike Edição Flamengo"
        },
        "matched_queries" : [
          "edição",
          "flamengo"
        ],
        "_explanation" : {
          "value" : 1.6201596,
          "description" : "sum of:",
          "details" : [
            {
              "value" : 0.6173784,
              "description" : "weight(name:flamengo in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.6173784,
                  "description" : "score(freq=1.0), product of:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.5389965,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 3,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 5,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.52064633,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 4.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.8,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 1.0027812,
              "description" : "weight(name:edição in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 1.0027812,
                  "description" : "score(freq=1.0), product of:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.87546873,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 2,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 5,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.52064633,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 4.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.8,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      },
      {
        "_shard" : "[teste][0]",
        "_node" : "xnRySBw_T7Kjsl4wAa_2yg",
        "_index" : "teste",
        "_type" : "_doc",
        "_id" : "gCaO8mwBepmixz6CaMCt",
        "_score" : 1.5693209,
        "_source" : {
          "card_id" : "some place",
          "name" : "Camisa do Flamengo Vermelha Edição 100 Anos"
        },
        "matched_queries" : [
          "camisa",
          "edição",
          "flamengo"
        ],
        "_explanation" : {
          "value" : 1.5693209,
          "description" : "sum of:",
          "details" : [
            {
              "value" : 0.26523292,
              "description" : "weight(name:camisa in 1) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.26523292,
                  "description" : "score(freq=1.0), product of:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.2876821,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 4,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 5,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.41907516,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 7.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.8,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.4969361,
              "description" : "weight(name:flamengo in 1) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.4969361,
                  "description" : "score(freq=1.0), product of:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.5389965,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 3,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 5,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.41907516,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 7.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.8,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.80715185,
              "description" : "weight(name:edição in 1) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.80715185,
                  "description" : "score(freq=1.0), product of:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.87546873,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 2,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 5,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.41907516,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 7.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.8,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }

我不知道要在这里找什么。我知道的是，第一个结果的得分应该比第二个更低。

最佳答案

I found this ElasticSearch doc about relevancies that looks wrong and I tried to search with _search?search_type=dfs_query_then_fetch, but it gets me the same results.

Elasticsearch 7.0版将默认的主分片数量更改为1。因此，只要您没有明确指定其他数字，就不会再有此问题。在查询结果中，您可以看到默认值只有一个碎片: "_shards" : { "total" : 1。

首先，让我们创建一个最小的可复制示例。

对应:

PUT itens
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      }
    }
  }
}

示例文件:

PUT itens/_doc/1
{
  "name": "Bola Nike Edição Flamengo"
}
PUT itens/_doc/2
{
  "name": "Camisa do Flamengo Vermelha Edição 100 Anos"
}
PUT itens/_doc/3
{
  "name": "Camisa Flamengo 2019 Masculina Modelo Torcedor"
}

我正在使用您上面提供的查询，并得到以下结果:

"hits" : [
  {
    "_index" : "itens",
    "_type" : "_doc",
    "_id" : "3",
    "_score" : 1.5471338,
    "_source" : {
      "name" : "Camisa Flamengo 2019 Masculina Modelo Torcedor"
    },
    "matched_queries" : [
      "camisa",
      "torcedor",
      "flamengo"
    ]
  },
  {
    "_index" : "itens",
    "_type" : "_doc",
    "_id" : "2",
    "_score" : 0.97927666,
    "_source" : {
      "name" : "Camisa do Flamengo Vermelha Edição 100 Anos"
    },
    "matched_queries" : [
      "camisa",
      "edição",
      "flamengo"
    ]
  },
  {
    "_index" : "itens",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 0.6860854,
    "_source" : {
      "name" : "Bola Nike Edição Flamengo"
    },
    "matched_queries" : [
      "edição",
      "flamengo"
    ]
  }
]

因此，通过最少的示例，您将获得期望的结果。

要调试查询所发生的情况，请将 ?explain=true参数添加到查询中，以使整行看起来像 GET itens/_search?explain=true。这将增加很多输出，但是应该更好地解释那里发生的事情。请将该问题添加到您的原始问题中，如果结果不清楚，请添加评论，以便我们再看看。

关于elasticsearch - ElasticSearch:关联顺序错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57734922/

文章推荐： c# - 通过捕获的镜像以编程方式创建 Azure VM

文章推荐： powershell - $MyInspiration.MyCommand.Name 提供奇怪的结果

文章推荐： elasticsearch - 多层嵌套查询

变异操作的 GraphQL 顺序
我正在创建一个有效的突变，但我不确定它是否按照我认为的方式工作。但是，我想知道执行顺序是什么？异步从上到下同步同步随机顺序其他我想确保在执行插入/更新插入之前从表中删除某些项目。使用以下突变
isabelle - field 顺序
如何更改规则中的前提顺序？例如，在伊莎贝尔的自然演绎规则中: mp: ?P ⟶ ?Q ⟹ ?P ⟹ ?Q 我们可以将顺序更改为: ?P ⟹ ?P ⟶ ?Q ⟹ ?Q 我可以用 rev_mp或者定义一
java - LinkedHashMap 顺序
关闭。这个问题需要details or clarity .它目前不接受答案。想改善这个问题吗？通过 editing this post 添加详细信息并澄清问题. 8年前关闭。 Improve thi
按关联的 hibernate 顺序
我正在使用 Hibernate 3.2，并使用标准来构建查询。我想为多对一关联添加和“排序”，但我不知道如何做到这一点。 Hibernate 查询最终看起来像这样，我猜: select t1.a, t
Javascript:顺序，而不是并行
我正在开发一个项目，但无法让我的 javascript 按顺序工作。我知道 javascript 可以并行执行任务，因此当您向不响应的服务器发出请求时，它不会被卡住。这有它的优点和缺点。就我而言，这是
dart - future 顺序
在下面的代码中，我认为f1 > f2 > f3是调用顺序，但是仅f1被调用。如何获得依次调用的3个函数？我已经将以下内容添加到main函数中，它可以按预期工作，但是我想知道是否还有其他确定的方法可以
javascript - 在对象数组中添加位置/顺序
我有一个如下所示的对象数组: [{ "id": 1, "Size": 90, "Maturity": 24, }, { "id": 2, "S
docker - Docker多阶段构建:顺序
这是征求意见和要求的请求。我是Docker的新手。我想要一个用于Python项目的生产和开发容器(可能也进行单元测试)。我的搜索指向多阶段Dockerfile(以及运行它们的多个docker-com
r - 所有可能的组合(顺序)
我想知道解决以下问题的有效方法是什么: 假设我在组 1 中有三个字符，在组 2 中有两个字符: group_1 = c("X", "Y", "Z") group_2 = c("A", "B") 显然，
Cordova Hook 顺序
在 Cordova 网站上，您可以看到一长串按字母顺序排列的钩子(Hook)列表，但它们触发和执行的正确顺序是什么？我正在尝试在构建/编译之前将 cordova.js 脚本添加到 index.htm
r - 所有可能的组合(顺序)
我想知道解决以下问题的有效方法是什么: 假设我在组 1 中有三个字符，在组 2 中有两个字符: group_1 = c("X", "Y", "Z") group_2 = c("A", "B") 显然，
JAVA HashSet 顺序
这个问题已经有答案了: 奥 git _a (2 个回答) 已关闭 9 年前。这是我的一个练习的代码， public class RockTest { public static void main(
java - java中哪些数据结构支持排序/顺序
我使用 HashMap 来存储一些数据，但每当新数据保存到 HashMap 或旧数据移出 HashMap 时，我都需要将其保持升序。但是hashmap本身不支持顺序，我可以使用什么数据结构来支持顺序？
f# - 顺序 - 随后几年的同一日期
我想创建一个序列，当星期几与函数参数中的日期相同时，它会返回所有年份的结果(例如:自开始日期起，2 月 12 日为星期日的所有年份)。 let myDate (dw:System.DayOfWeek)
C# LINQ 顺序
我有一个包含许多元素的 Xelement。我有以下代码来对它们进行排序: var calculation = from y in x.Elements("row")
Javascript Action 顺序
假设我有: 在 javacript 文件中，我为类按钮和 ID 名称定义了点击操作，例如: $("#name").click(function(event){ alert("hi"); }) $
Swift LayoutSubViews 顺序
我有一个包含 2 个 subview 的 View - collectionView 和自定义 View 。我想设置一个操作在布置 2 个 View 后运行，但layoutSubViews 运行了两次
Java 顺序 UUID
关闭。这个问题需要更多 focused .它目前不接受答案。想改进这个问题？更新问题，使其仅关注一个问题 editing this post . 2年前关闭。 Improve this questi
c++ - 如何比较两个双向迭代器的(顺序)？
我想知道 C++ 中是否有内置方法来比较两个双向迭代器的顺序。例如，我有一个 Sum 函数来计算同一列表中 2 个迭代器之间的总和: double Sum(std::list::const_itera
MySQL ORDER BY 顺序
在 MySQL 中，这两个查询之间有区别吗？ SELECT * FROM .... ORDER BY Created,Id DESC 和 SELECT * FROM .... ORDER BY Cre

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

elasticsearch - ElasticSearch:关联顺序错误