elasticsearch - 优先考虑某些字段的ES搜索结果-6ren

elasticsearch - 优先考虑某些字段的ES搜索结果

转载作者：行者123 更新时间：2023-12-02 22:54:48

我正在使用elasticsearch-6.4.3。我创建了一个索引flight-location_methods

      settings index: {
          analysis: {
              "filter": {
                  "autocomplete_filter": {
                      "type": "edge_ngram",
                      "min_gram": 1,
                      "max_gram": 20
                  }
              },
              "analyzer": {
                  "autocomplete": {
                      "type": "custom",
                      "tokenizer": "standard",
                      "filter": ["lowercase",  "autocomplete_filter"]
                  }
              }
          }
      }

      mapping do
        indexes :airport_code, type: "text", analyzer: "autocomplete", search_analyzer: "standard"
        indexes :airport_name, type: "text", analyzer: "autocomplete", search_analyzer: "standard"
        indexes :city_name, type: "text", analyzer: "autocomplete", search_analyzer: "standard"
        indexes :country_name, type: "text", analyzer: "autocomplete", search_analyzer: "standard"
      end

上面的代码片段来自我为索引创建的 represents the mapping的Ruby代码。

当我执行此查询时:

GET /flight-location_methods/_search
{
  "from": 0,
  "size": 1000,
  "query": {
    "function_score": {
      "functions": [
        {
          "filter": {
            "match": {
              "city_name": "new yo"
            }
          },
          "weight": 50
        },
        {
          "filter": {
            "match": {
              "country_name": "new yo"
            }
          },
          "weight": 50
        }
      ],
      "max_boost": 200,
      "score_mode": "max",
      "boost_mode": "multiply",
      "min_score": 10
    }
  }
}

我得到这个结果:

  {
    "_index": "flight-location_methods",
    "_type": "_doc",
    "_id": "tcoj1G0Bdo5Q9AduxCKi",
    "_score": 50,
    "_source": {
      "airport_name": "Ouvea",
      "airport_code": "UVE",
      "city_name": "Ouvea",
      "country_name": "New Caledonia"
    }
  },
  {
    "_index": "flight-location_methods",
    "_type": "_doc",
    "_id": "zMoj1G0Bdo5Q9AduxCKi",
    "_score": 50,
    "_source": {
      "airport_name": "Palmerston North",
      "airport_code": "PMR",
      "city_name": "Palmerston North",
      "country_name": "New Zealand"
    }
  },
  {
    "_index": "flight-location_methods",
    "_type": "_doc",
    "_id": "1Moj1G0Bdo5Q9AduxCKi",
    "_score": 50,
    "_source": {
      "airport_name": "Westport",
      "airport_code": "WSZ",
      "city_name": "Westport",
      "country_name": "New Zealand"
    }
  },
  {
    "_index": "flight-location_methods",
    "_type": "_doc",
    "_id": "1coj1G0Bdo5Q9AduxCKi",
    "_score": 50,
    "_source": {
      "airport_name": "Whangarei",
      "airport_code": "WRE",
      "city_name": "Whangarei",
      "country_name": "New Zealand"
    }
  },
  {
    "_index": "flight-location_methods",
    "_type": "_doc",
    "_id": "Rsoj1G0Bdo5Q9AduxCOi",
    "_score": 50,
    "_source": {
      "airport_name": "Municipal",
      "airport_code": "RNH",
      "city_name": "New Richmond",
      "country_name": "United States"
    }
  },
  {
    "_index": "flight-location_methods",
    "_type": "_doc",
    "_id": "fsoj1G0Bdo5Q9AduxCOi",
    "_score": 50,
    "_source": {
      "airport_name": "New London",
      "airport_code": "GON",
      "city_name": "New London",
      "country_name": "United States"
    }
  },
  {
    "_index": "flight-location_methods",
    "_type": "_doc",
    "_id": "gMoj1G0Bdo5Q9AduxCOi",
    "_score": 50,
    "_source": {
      "airport_name": "New Ulm",
      "airport_code": "ULM",
      "city_name": "New Ulm",
      "country_name": "United States"
    }
  },
  {
    "_index": "flight-location_methods",
    "_type": "_doc",
    "_id": "5coj1G0Bdo5Q9AduxCSi",
    "_score": 50,
    "_source": {
      "airport_name": "Cape Newenham",
      "airport_code": "EHM",
      "city_name": "Cape Newenham",
      "country_name": "United States"
    }
  },
  {
    "_index": "flight-location_methods",
    "_type": "_doc",
    "_id": "Ycoj1G0Bdo5Q9AduxCWi",
    "_score": 50,
    "_source": {
      "airport_name": "East 60th Street H/P",
      "airport_code": "JRE",
      "city_name": "New York",
      "country_name": "United States"
    }
  }

如您所见， New York should be on top但实际上不是。

另外，我使用 can not use AND operator是因为如果搜索文本包含多个单词，我希望搜索文本中的任何单词出现在任何字段中。但是，如果所有搜索文本都在一个字段中，则优先级应该更高。

最佳答案

让我们首先讨论elasticsearch标记化程序和标记化过程:

A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words). ES docs

现在，让我们描述 自动完成分析器的工作方式:

提供了标准 token 生成器 token 作为标准elasticsearch token 生成器(为简化起见，我们说这是单词)

小写过滤器使所有字符变小。

然后edge_ngram过滤器将每个单词分解为 token 。

从这里开始魔术:我认为您对1到20的 token 的定义太多了。可能存在包含10个以上字符的单词，但对于我们而言，这是不相关的。同样，仅包含一个对我们不可用的字符的 token 。我改变它:

   "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 5
        }
      }

然后在我们的索引中将有很多单词部分，长度从2到5个字符。现在，当我们知道要搜索的内容时，就可以创建映射并编写查询:

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0,
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 5
        }
      },
      "analyzer": {
        "autocomplete": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "airport_name": {
          "type": "text",
          "fields": {
            "ngram": {
              "type": "text",
              "analyzer": "autocomplete"
            }
          }
        },
        "airport_code": {
          "type": "keyword",
          "fields": {
            "ngram": {
              "type": "text",
              "analyzer": "autocomplete"
            }
          }
        },
        "city_name": {
          "type": "keyword",
          "fields": {
            "ngram": {
              "type": "text",
              "analyzer": "autocomplete"
            }
          }
        },
        "country_name": {
          "type": "keyword",
          "fields": {
            "ngram": {
              "type": "text",
              "analyzer": "autocomplete"
            }
          }
        }
      }
    }
  }
}

我使用ngram字段和常规字段来制作字段，以保持进行聚合的能力。例如，通过多个机场查找城市是很好的。

现在我们可以运行一个简单的查询来获取纽约:

{
   "size": 20, 
   "query": {
     "query_string": {
       "default_field": "city_name.ngram",
       "query": "new yo",
       "default_operator": "AND"
     }
   }
}

Answer
{
  "took": 15,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 13.896059,
    "hits": [
      {
        "_index": "test-index",
        "_type": "_doc",
        "_id": "BtBD2W0BCDulLSY6pKM8",
        "_score": 13.896059,
        "_source": {
          "airport_name": "Flushing",
          "airport_code": "FLU",
          "city_name": "New York",
          "country_name": "United States"
        }
      }
    ]
  }
}

或者使用boosting创建 boosting或 text查询。在大数据列表上进行查询时，这也将更加有效。

您的查询应如下所示:

{
   "query": {
     "function_score": {
       "query": {
         "query_string": {
           "query": "new yo",
           "analyzer": "autocomplete"
         }
       },
       "functions": [
         {
           "filter": {"terms": {
             "city_name.ngram": [
               "new",
               "yo"
             ]
           }},
           "weight": 2
         },
         {
           "filter": {"terms": {
             "country_name.ngram": [
               "new",
               "yo"
             ]
           }},
           "weight": 2
         }
       ],
       "max_boost": 30,
       "min_score": 5, 
       "score_mode": "max",
       "boost_mode": "multiply"
     }
   }
}

在此查询中，纽约将是第一个，因为我们通过查询部分过滤了所有不相关的文档。并乘以2 city_name.ngram字段分数，在此字段中，我们有2个 token ，那么此字段将获得最高分数。同样，查询的底线是min_score，它过滤而不是相关文档。您可以阅读有关当前的Elasticsearch相关算法 here的信息。
顺便说一句，我不想将过滤器放在权重相同的函数中。您应该决定是否是更重要的 Realm 。这使您的搜索更加清晰。

关于elasticsearch - 优先考虑某些字段的ES搜索结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58424832/

django - 带有破折号的 LANGUAGE_CODE 不起作用 : "es-mx, es-es, es-ar" but "es" does
在 settings.py LANGUAGE_CODE = 'es-mx'或 LANGUAGE_CODE = 'es-ar'不起作用，但是 LANGUAGE_CODE = 'es'或 LANGUAGE
opengl-es - OpenGL ES 2.0 和 OpenGL ES 3.0 有什么区别
我想知道OpenGL ES 2.0和OpenGL ES 3.0之间有什么区别。 OpenGL ES 3.0的主要优点是什么？最佳答案总体而言，这些变化通过更大的缓冲区、更多的格式、更多的统一等提高
elasticsearch - 从 golang 服务器到 ES 的 ES 查询返回错误，而 postman 直接向 ES 请求返回预期结果
这是我为此端点使用 Postman localhost:9201/response_v2_862875ee3a88a6d09c95bdbda029ce2b/_search 的请求正文 { "_sour
opengl-es-2.0 - OpenGL ES 2.0 等效于使用 GL_POINT_SMOOTH 的 ES 1.0 圆？
OpenGL ES 2.0 没有 ES 1.0 那样的 GL_POINT_SMOOTH 定义。这意味着我用来绘制圆圈的代码不再有效: glEnable(GL_POINT_SMOOTH); glPoin
opengl-es - OpenGL es 中的纹理未声明标识符
我尝试编译这个着色器: varying vec2 TexCoords; varying vec4 color; uniform sampler2D text; uniform vec3 textCol
opengl-es - OPenGL ES 中的按位运算
我是 OpenGL ES 的新手，我使用的是 OpenGL ES 2.0 版本。我可以在片段着色器中使用按位操作(右移、左移)吗？最佳答案 OpenGL ES 2.0 没有按位运算符。 ES 3.0
opengl-es - OpenGL ES-仅用线条绘制三角形？
有没有办法只用线画一个三角形？我认为GL_TRIANGLES选项可使三角形充满颜色。最佳答案使用glPolygonMode(face, model)设置填充模式: glPolygonMode(G
opengl-es - 使用opengl es shader将YUV转换为RGB
我想用一个包含 yuv 数据的采样器在 opengl es 着色器中将 yuv 转换为 rgb。我的代码如下: 1)我将 yuv 数据发送到纹理: GLES20.glTexImage2D(GLES20
.htaccess 更改/es/到.es
我正在使用这样的域: http://www.domain.com/es/blabla.html 我想更改 .es 的/es 部分并将 URLS 转换为类似以下内容: http://www.domain
opengl-es - OpenGL ES GL_TEXTURE_RECTANGLE
有谁知道OpenGL ES是否支持GL_TEXTURE_RECTANGLE？我计划将它用于 2D 图形以支持非二次幂图像。我当前的实现使用 alpha=0 填充的 POT 纹理，对于拉伸(stretc
opengl-es - OpenGL ES 渲染到用户空间内存
我需要在具有 PowerVR SGX 硬件的 ARM 设备上实现离屏纹理渲染。一切都完成了(使用了像素缓冲区和 OpenGL ES 2.0 API)。唯一 Unresolved 问题是速度很慢glR
opengl-es - OpenGL ES 片段着色器显然不可能返回白色
这是一个奇怪的事情。我有一个片段着色器，据我所知只能返回黑色或红色，但它将像素渲染为白色。如果我删除一根特定的线，它会返回我期望的颜色。它适用于 WebGL，但不适用于 Raspberry Pi 上的
opengl-es - OpenGL ES glPushClientAttrib
我正在考虑将一些 OpenGL 代码移植到 OpenGL ES 并且想知道这段代码到底做了什么: glPushClientAttrib(GL_CLIENT_VERTEX_ARRAY_BIT) 因为 g
opengl-es - OpenGL ES glPushClientAttrib
我正在考虑将一些 OpenGL 代码移植到 OpenGL ES 并且想知道这段代码到底做了什么: glPushClientAttrib(GL_CLIENT_VERTEX_ARRAY_BIT) 因为 g
opengl-es - GLSL ES 中的最大程序
GLSL ES最多可以编译多少个程序？所以假设我创建了 100 个片段着色器，每个都有不同的效果。所以在运行时我编译所有这些并动态地我用 glUseProgram 交换它们。我假设每次我编译一个新的
shader - OpenGL ES 2.0 与 iPhone : GL_POINT_SMOOTH draws squares with ES 2. 0 但适用于 ES 1.0
我正在尝试使用顶点缓冲区对象来绘制圆，并在 iPhone 上的 OpenGL ES 2.0 中启用 GL_POINT_SMOOTH 来绘制点。我使用以下 ES 1.0 渲染代码在 iPhone 4
opengl-es - 为什么在 OpenGL ES 中缩小对象会导致对象变轻？
为什么在 OpenGL ES 1.x 中缩放(均匀)对象会导致对象变轻？更有意义的是它会更暗，因为法线被缩小是否也会使对象更暗？但由于某种原因，物体变轻了。当我放大时，对象变得更暗。在我看来，这应该
opengl-es - 如何有效地将深度缓冲区复制到 OpenGL ES 上的纹理
我正在尝试通过移植 some code 在 iOS 上的 OpenGL ES 2.0 中获得一些阴影效果。来自标准 GL。部分示例涉及将深度缓冲区复制到纹理: glBindTexture(GL_TEX
opengl-es - 在 OpenGL ES 中优化骨骼动画的顶点
所以我正在使用 2D 骨骼动画系统。有 X 个骨骼，每个骨骼至少有 1 个部分(一个四边形，两个三角形)。平均而言，我可能有 20 块骨头和 30 个部分。大多数骨骼都依赖于父骨骼，骨骼会每帧移动。
opengl-es - OpenGL ES 中的标识符、初始化和局部变量
我在使用 ES 着色器时遇到了一些晦涩难懂的问题，而且我现在几乎没有想法了。这是一些代码: .. precision mediump float; .. #define STEP (1f/6f) 5

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

elasticsearch - 优先考虑某些字段的ES搜索结果