elasticsearch - ElasticSearch:使用edge_ngram和模糊性进行部分/完全评分-6ren

elasticsearch - ElasticSearch:使用edge_ngram和模糊性进行部分/完全评分

转载作者：行者123 更新时间：2023-12-02 22:40:15

27

4

在ElasticSearch中，我尝试使用带有模糊性的edge_ngram获得正确的评分。我希望精确匹配具有最高的分数，而子匹配具有较低的分数。以下是我的设置和评分结果。

settings: {
          number_of_shards: 1,
          analysis: {
             filter: {
                ngram_filter: {
                   type: 'edge_ngram',
                   min_gram: 2,
                   max_gram: 20
                }
             },
             analyzer: {
                ngram_analyzer: {
                   type: 'custom',
                   tokenizer: 'standard',
                   filter: [
                      'lowercase',
                      'ngram_filter'
                   ]
                }
             }
          }
       },
    mappings: [{
          name: 'voter',
          _all: {
                'type': 'string',
                'index_analyzer': 'ngram_analyzer',
                'search_analyzer': 'standard'
             },
             properties: {
                last: {
                   type: 'string',
                   required : true,
                   include_in_all: true,
                   term_vector: 'yes',
                   index_analyzer: 'ngram_analyzer',
                   search_analyzer: 'standard'
                },
                first: {
                   type: 'string',
                   required : true,
                   include_in_all: true,
                   term_vector: 'yes',
                   index_analyzer: 'ngram_analyzer',
                   search_analyzer: 'standard'
                },

             }

       }]

在执行了名字为“Michael”的POST之后，我进行了如下查询，并更改了“Michael”，“Michae”，“Micha”，“Mich”，“Mic”和“Mi”。

GET voter/voter/_search
{
 "query": {
    "match": {
      "_all": {
        "query": "Michael",
        "fuzziness": 2,
        "prefix_length": 1
      }
    }
  }
}

我的成绩是:

-"Michael": 0.19535106
-"Michae": 0.2242768
-"Micha": 0.24513611
-"Mich": 0.22340237
-"Mic": 0.21408978
-"Mi": 0.15438235

如您所见，得分结果没有达到预期。我希望“Michael”的得分最高，而“Mi”的得分最低

任何帮助，将不胜感激!

最佳答案

解决此问题的一种方法是像这样在映射中添加文本的原始版本

                   last: {
                       type: 'string',
                       required : true,
                       include_in_all: true,
                       term_vector: 'yes',
                       index_analyzer: 'ngram_analyzer',
                       search_analyzer: 'standard',
                       "fields": {
                            "raw": { 
                               "type":  "string"  <--- index with standard analyzer
                              }
                          }
                    },
                    first: {
                       type: 'string',
                       required : true,
                       include_in_all: true,
                       term_vector: 'yes',
                       index_analyzer: 'ngram_analyzer',
                       search_analyzer: 'standard',
                       "fields": {
                            "raw": { 
                               "type":  "string"  <--- index with standard analyzer
                              }
                          }
                    },

您也可以使用 index : not_analyzed将其设置为精确

然后您可以像这样查询

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "_all": {
              "query": "Michael",
              "fuzziness": 2,
              "prefix_length": 1
            }
          }
        },
        {
          "match": {
            "last.raw": {
              "query": "Michael",
              "boost": 5
            }
          }
        },
        {
          "match": {
            "first.raw": {
              "query": "Michael",
              "boost": 5
            }
          }
        }
      ]
    }
  }
}

匹配更多条款的文档将获得更高的评分。
您可以根据需要指定 boost。

关于elasticsearch - ElasticSearch:使用edge_ngram和模糊性进行部分/完全评分，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33833781/

27

4

0

文章推荐： elasticsearch - IllegalArgumentException-仅支持<= 256个有限字符串

文章推荐： c - C 中的函数 strtol 超出范围...或不

solr 评分 - fieldnorm
当我搜索“iphone”时，我有以下记录和分数 - 记录1: 字段名称 - 显示名称:“iPhone” 字段名称 - 名称:“iPhone” 11.654595 = (MATCH) sum of:
elasticsearch - 根据子字段值对父文档进行排序/评分
Types Description: parent type 1)Parent Type: "product" 2)childType : "ratings" 问题描述:我有一个es查询(q
使用数据转储进行 Freebase 评分
如果您使用 Freebase 搜索按名称获取任何实体的匹配项，您将获得按 relevance score 排序的结果.例如尝试 Taj Mahal . 我正在尝试使用 Freebase 数据转储获得类
mysql - 如何根据排序顺序百分位数进行分类/评分
我试图根据多个不同的标准给不同的城市打从 1 到 5 的“分数”，最终将分数相加并决定哪个城市最好。表“international_tobacco_alcohol”包含居民用于酒精和烟草的收入百分比
多个索引的 Elasticsearch 评分
我有一年中任何一个季度的索引(“index-2015.1”，“index-2015.2”...) 我在每个索引上有大约 3000 万个文档。文档有一个文本字段('title') 我的文档排序方式是(
algorithm - 非线性比较排序/评分
我有一个数组，我想根据为数组中的每个元素分配一个分数来排序。假设可能的分数范围是 0-100。为了获得该分数，我们将使用 2 个比较数据点，一个权重为 75，一个权重为 25。我们称它们为 valu
regex - 根据歧义对正则表达式进行“评分”
关闭。这个问题需要更多focused .它目前不接受答案。想改进这个问题吗？更新问题，使其只关注一个问题 editing this post . 关闭 4 年前。 Improve this qu
dynamic - 如何做随机数的星星？ (评分)
做一排星星作为评级是微不足道的，但我不确定做随机数的正确 flutter 方法是什么？换句话说，假设我的评分最多为 5 颗星，我该怎么做，只有一颗或两颗星？我可以有一个 switch 语句，并返回带
.net - 评分/评级引擎 - 建议和示例？
我需要创建一个灵活的(最好是动态的)评分引擎，就像信用评分或保费计算系统一样。有创建评分引擎实践经验的人有任何建议、示例或建议模式吗？我已经知道: Rete Algorithm FICO The o
sorting - ElasticSearch 深度嵌套排序/评分
我的索引中有以下类型的文档，但由于深度嵌套方面，找不到正确排序的方法。文档示例: { "metadatas": [{ "name": "name", "timeValidity"
elasticsearch - Lucene 自定义相似度/评分
我正在寻找 Lucene (Java) 中的相似性模块，它给出基于权重的分数。我知道这很模糊，最好用一个例子来解释。 Document 1 ----------- Firstname: Frances
java - Lucene 8 评分
我对 Lucene 8 比较陌生，想了解如何将旧版 Solr 4 评分迁移到 Lucene。这就是 Solr 4 目前的做法。 /* * From the SolrRelevan
Lucene:完全禁用加权，评分，排名，
我正在使用 Lucene 来构建标记共现的大型索引(例如 [elephant,animal]、[melon,fruit]、[宝马，汽车]，...)。我使用 BooleanQuery 查询索引以获取绝对
Android 评分 baar 无法正确显示
Ratingbar 星未正确显示。我不知道我做错了什么。当我使用自定义样式时，只显示一颗星，它的长度等于 5 星。风格是: @drawable/manual_ratingbar
java - Jsoup imdb 评分
我编写了一个程序，它读取 imdb 上排名前 250 的电影的名称和评分，并返回评分的平均值。我有以下程序 import java.io.IOException; import org.jsoup.*
Elasticsearch:使用 Ngrams 评分
我有一个直截了当的问题，我将 ngram 用于部分匹配。实现效果很好，但得分结果并不像我希望的那样有效。我希望我的分数结果看起来像这样: 柯:.1 Kev:.2 凯维:.3 凯文:.4 相反，我得到以
MySQL 评分/投票系统(根据票数按最佳评分准确排序)
假设我有一个像这样的 MySQL 表: 软件表: id int name text votes int rating int 其中投票是某人为该项目投票的次数，评分是这些投票的平均值。示例数据: i
java - Lucene EdgeNGramTokenFilter 评分
我在索引期间使用过滤器 EdgeNGramTokenFilter。当我寻找一个词时。当 Lucene 找到完整单词或另一个单词的一部分时，它的评分不会产生差异。例如，如果我正在查找单词 PUB。我
java - ElasticSearch 排名 - 评分
我们正在使用 java 并使用 elasticsearch java api 开发一个应用程序。我们对元数据建立了索引，并希望在索引时或搜索时使用排名/评分。而且，我不知道是否可以对用户单击结果时选
comparison - lucene vs solr 评分
有人可以解释(或引用引用资料)用更简单的词来比较 SOLR 和 LUCENE 使用的评分机制。它们有什么区别吗？我不太擅长 solr/lucene，但我的发现表明它们似乎不同。 P.S:我只是尝试

首页

博学

6Ren·AI

商城

elasticsearch - ElasticSearch:使用edge_ngram和模糊性进行部分/完全评分