elasticsearch - 使完整单词比Edge NGram子集得分更高-6ren

elasticsearch - 使完整单词比Edge NGram子集得分更高

转载作者：行者123 更新时间：2023-12-02 22:50:39

我试图在匹配全名的文档上获得更高的分数，而不是具有相同值的Edge NGram子集。

结果是:

Pos Name              _score       _id

1   Baritone horn     7.56878     1786
2   Baritone ukulele  7.56878     2313
3   Bari              7.56878     2360
4   Baritone voice    7.56878     1787

我本想使第三个(“Bari”)具有较高的分数，因为它是全名，但是，由于边缘ngram分解将使所有其他单词都具有完全被“bari”单词索引的索引。这样您就可以在结果表上看到所有分数都相等了，我什至都不知道 flex 搜索如何排序，因为_id甚至不是顺序的，也不是名称的顺序。

我该如何实现？

谢谢

示例“代码”

设定值

{
  "analysis": {
    "filter": {
      "edgeNGram_filter": {
        "type": "edgeNGram",
        "min_gram": 3,
        "max_gram": 20,
        "token_chars": [
          "letter",
          "digit",
          "punctuation",
          "symbol"
        ]
      }
    },
    "analyzer": {
      "edgeNGram_analyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": [
          "lowercase",
          "asciifolding",
          "edgeNGram_filter"
        ]
      },
      "whitespace_analyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": [
          "lowercase",
          "asciifolding"
        ]
      }
    }
  }
}

source

对应:

{
  "name": {
    "type": "string",
    "index": "not_analyzed"
  },
  "suggest": {
    "type": "completion",
    "index_analyzer": "nGram_analyzer",
    "search_analyzer": "whitespace_analyzer",
    "payloads": true
  }
}

查询:

POST /attribute-tree/attribute/_search
{
  "query": {
    "match": {
      "suggest": "Bari"
    }
  }
}

结果:

(仅留下相关数据)

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 7.56878,
    "hits": [
      {
        "_index": "attribute-tree",
        "_type": "attribute",
        "_id": "1786",
        "_score": 7.56878,
        "_source": {
          "name": "Baritone horn",
          "suggest": {
            "input": [
              "Baritone",
              "horn"
            ],
            "output": "Baritone horn"
          }
        }
      },
      {
        "_index": "attribute-tree",
        "_type": "attribute",
        "_id": "2313",
        "_score": 7.56878,
        "_source": {
          "name": "Baritone ukulele",
          "suggest": {
            "input": [
              "Baritone",
              "ukulele"
            ],
            "output": "Baritone ukulele"
          }
        }
      },
      {
        "_index": "attribute-tree",
        "_type": "attribute",
        "_id": "2360",
        "_score": 7.56878,
        "_source": {
          "name": "Bari",
          "suggest": {
            "input": [
              "Bari"
            ],
            "output": "Bari"
          }
        }
      },
      {
        "_index": "attribute-tree",
        "_type": "attribute",
        "_id": "1787",
        "_score": 7.568078,
        "_source": {
          "name": "Baritone voice",
          "suggest": {
            "input": [
              "Baritone",
              "voice"
            ],
            "output": "Baritone voice"
          }
        }
      }
    ]
  }
}

最佳答案

您可以使用bool查询运算符及其should子句将分数添加到完全匹配项中，如下所示:

POST /attribute-tree/attribute/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "suggest": "Bari"
          }
        }
      ],
      "should": [
        {
          "match": {
            "name": "Bari"
          }
        }
      ]
    }
  }
}

should子句中的查询在 ElasticSearch definitive guide中称为signal子句，这是您可以区分完全匹配和ngram匹配的方式。您将拥有与must子句匹配的所有文档，但是由于 should查询的得分公式，与 bool查询匹配的文档的得分更高:

score = ("must" queries total score + matching "should" queries total score) / (total number of "must" queries and "should" queries)

结果就是您所期望的，Bari是第一个结果(在得分上遥遥领先:)):

"hits": {
      "total": 3,
      "max_score": 0.4339554,
      "hits": [
         {
            "_index": "attribute-tree",
            "_type": "attribute",
            "_id": "2360",
            "_score": 0.4339554,
            "_source": {
               "name": "Bari",
               "suggest": {
                  "input": [
                     "Bari"
                  ],
                  "output": "Bari"
               }
            }
         },
         {
            "_index": "attribute-tree",
            "_type": "attribute",
            "_id": "1786",
            "_score": 0.04500804,
            "_source": {
               "name": "Baritone horn",
               "suggest": {
                  "input": [
                     "Baritone",
                     "horn"
                  ],
                  "output": "Baritone horn"
               }
            }
         },
         {
            "_index": "attribute-tree",
            "_type": "attribute",
            "_id": "2313",
            "_score": 0.04500804,
            "_source": {
               "name": "Baritone ukulele",
               "suggest": {
                  "input": [
                     "Baritone",
                     "ukulele"
                  ],
                  "output": "Baritone ukulele"
               }
            }
         }
      ]

关于elasticsearch - 使完整单词比Edge NGram子集得分更高，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32581426/

文章推荐： javascript - 单独 react native onPress TouchableHighlights 吗？

文章推荐： timestamp - 删除 CodeFluent 生成的源文件中的时间戳

javascript 正则表达式得分
学生分数的正则表达式是什么:12.5, 99.5, 87, 1.66 该字段可以为空 (.) 的最大字符长度为 5，如下所示:99.99 分数介于 0 到 100 之间我用过这个，但是不起作用 va
mysql - 得分+当前玩家得分排名前10的玩家
对于服务器游戏...我有表珠宝: rank,player_id, plscore. 我想显示按分数从高到低排序的前 10 名玩家，如果当前玩家不在前 10 名，则加上当前玩家的分数。如果我/你目前不
ios - 得分+10时随机颜色
我的游戏中颜色很少: class GameScene: SKScene { let colors = [SKColor.green, SKColor.red, SKColor.blue, SKColo
javascript - 多项选择测验 - 得分？
我正在尝试用 HTML 创建一个简单的多项选择程序，但我在获取用户输入并在最后显示他们的分数时遇到了问题。有人可以帮帮我吗？我的多项选择测验有 10 个问题，每个问题有 4 个选择。例如有一个问题
Foursquare field 得分/评级
有谁知道如何使用 Foursquare API 获取 field 的分数/评级(例如 9.0/10)？我正在通过无用户访问进行连接。 https://developer.foursquare.com
c# - Jaccard 得分/距离或重叠百分比
我希望能够计算一个矩形相对于矩形网格的 Jaccard 分数/距离(距离为 1 分)。我的网格是 50x50(总共 1625625 个矩形)。我能够在 0.34 秒内针对所有这些计算出我的输入矩形的
elasticsearch - Elasticsearch 得分/排序-轮换类别
我有这样的文件(当然是简化的情况): Category: A, Rating: 10 Category: A, Rating: 9 Category: A, Rating: 5 Category: B
java - Libgdx 显示得分并每秒加 1 得分
我想每秒将分数增加 1 分，但我很难让它正常工作。例如 (伪代码): int score = 0f // on create updateEverySecond() { score += 1
ios - Facebook 游戏 iOS 得分
我现在正在制作一款新游戏，您可以在其中保存您的高分，但我不知道是否可以实现 Facebook 排行榜。这样用户就可以看到他们的 friend 并看到他们的高分是多少。这可能吗？好吧，我在不同的应用程序
C# 来自 Ruby Wilson 得分
谁能帮我把它转换成 C#。这真的伤害了我的大脑。 http://www.evanmiller.org/how-not-to-sort-by-average-rating.html require 's
ios - Sprite Kit 动态更新杀戮/得分/健康标签
最好的方法是什么才能让标签包含击杀数、生命值或随着与其相关的变量发生变化而更新的分数？目前我只是使用 SKLabelNode 并使用变量为其分配文本，但未计算文本属性，因此它在初始化后保持静态。每次更
javascript - Google PageSpeed 得分 - 1 渲染阻塞 CSS 文件
我有一个 Wordpress 网站。尝试使用 Google PageSpeed Insights Tool 获得 100/100 分数，但遇到 1 个“错误”。谷歌表示； Eliminate rend
google-pagespeed - V5 中的 Google Page Speed Insights 得分
自 V5 以来，与 V4 相比，评分发生了变化。该文档解释了性能、渐进式 Web 应用程序、可访问性、最佳实践和 SEO 的分数，但没有解释总体分数。根据图片，桌面版为 59。任何人都可以帮助我了解
ios - TLSphinx cmusphinx pocketsphinx 假设结果文本空字符串得分负 (-) 数字
我运行了自述文件中的示例代码 tryolabs/TLSphinx README.md ，Hypothesis的text属性的结果是空格，而score属性的结果是负数-4420。为什么我在假设的文本属
scikit-learn - sklearn metrics.log_loss 是正值 vs. 得分 'neg_log_loss' 是负值
确保我做对了: 如果我们使用 sklearn.metrics.log_loss独立的，即 log_loss(y_true,y_pred)，它产生一个正分数——分数越小，性能越好。但是，如果我们使用
javascript - 页面加载后在 React 中加载第三方 iframe，使 iframe 不影响 PageSpeed 得分
我有一个 iframe加载第三方小部件。我只想显示这个iframe在我的页面加载后，因为我不想减慢我的页面加载速度。我关注了 medium article其中描述了如何执行此操作，但他们的解决方案不起

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

elasticsearch - 使完整单词比Edge NGram子集得分更高