elasticsearch - 带有正则表达式的elasticsearch multi

elasticsearch - 带有正则表达式的elasticsearch multi_match

转载作者：行者123 更新时间：2023-12-02 23:24:07

25

4

我试图重建我的 flex 搜索查询，因为我发现没有收到要查找的所有文档。

因此，假设我有这样的文档:

{
  "id": 1234,
  "mail_id": 5,
  "sender": "john smith",
  "email": "johnsmith@gmail.com",
  "subject": "somesubject",
  "txt": "abcdefgh\r\n",
  "html": "<div dir=\"ltr\">abcdefgh</div>\r\n",
  "date": "2017-07-020 10:00:00"
}

我有几百万个这样的文档，现在我试图通过这样的查询来搜索一些文档:

{
  "sort": [
    {
      "date": {
        "order": "desc"
      }
    }
  ],
  "query": {
    "bool": {
      "minimum_should_match": "100%",
      "should": [
        {
          "multi_match": {
            "type": "cross_fields",
            "query": "abcdefgh johnsmith john smith",
            "operator": "and",
            "fields": [
              "email.full",
              "sender",
              "subject",
              "txt",
              "html"
            ]
          }
        }
      ],
      "must": [
        {
          "ids": {
            "values": [
              "1234"
            ]
          }
        },
        {
          "term": {
            "mail_id": 5
          }
        }
      ]
    }
  }
}

对于这样的查询，一切都很好，但是当我想通过查询“gmail”或“com”查找文档时，它将无法工作。

"query": "abcdefgh johnsmith john smith gmail"
"query": "abcdefgh johnsmith john smith com"

只有当我搜索“gmail.com”时，它才能工作
“query”:“abcdefgh johnsmith john smith gmail.com”

所以...我试图附加分析仪

...
"type": "cross_fields",
"query": "abcdefgh johnsmith john smith",
"operator": "and",
"analyzer": "simple",
...

完全没有帮助。我能够找到此文档的唯一方法是定义正则表达式，例如:

"minimum_should_match": 1,
"should": [
  {
    "multi_match": {
      "type": "cross_fields",
      "query": "fdsfs wukamil kam wuj gmail.com",
      "operator": "and",
      "fields": [
        "email.full",
        "sender",
        "subject",
        "txt",
        "html"
      ]
    }
  },
  {
    "regexp": {
      "email.full": ".*gmail.*"
    }
  }
],

但是在这种方法中，我将不得不向我的json添加(查询*字段)正则表达式对象，因此我认为这不是最好的解决方案。我也知道通配符，但是就像正则表达式一样，它会很混乱。

如果有人遇到这样的问题并知道解决方案，我将非常感谢您的帮助:)

最佳答案

如果通过标准分析器运行搜索词，则可以看到johnsmith@gmail.com标记分解为哪些标记。您可以使用以下URL在浏览器中直接执行此操作:

https://<your_site>:<es_port>/_analyze/?analyzer=standard&text=johnsmith@gmail.com

这将显示电子邮件已分解为以下 token :

{

    "tokens": [
        {
            "token": "johnsmith",
            "start_offset": 0,
            "end_offset": 9,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "gmail.com",
            "start_offset": 10,
            "end_offset": 19,
            "type": "<ALPHANUM>",
            "position": 2
        }
    ]

}

因此，这表明您不仅可以使用 gmail进行搜索，而且可以使用 gmail.com进行搜索。要在点上也拆分文本，您可以更新映射以使用 Simple Analyzer，它说:

The simple analyzer breaks text into terms whenever it encounters a character which is not a letter. All terms are lower cased.

我们可以通过更新URL来使用简单的分析器来显示此作品，如下所示:

https://<your_site>:<es_port>/_analyze/?analyzer=simple&text=johnsmith@gmail.com

哪个返回:

{

    "tokens": [
        {
            "token": "johnsmith",
            "start_offset": 0,
            "end_offset": 9,
            "type": "word",
            "position": 1
        },
        {
            "token": "gmail",
            "start_offset": 10,
            "end_offset": 15,
            "type": "word",
            "position": 2
        },
        {
            "token": "com",
            "start_offset": 16,
            "end_offset": 19,
            "type": "word",
            "position": 3
        }
    ]

}

该分析器可能不适合该工作，因为它会忽略任何非字母值，但是您可以使用分析器和 token 生成器，直到获得所需的内容为止。

关于elasticsearch - 带有正则表达式的elasticsearch multi_match，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45210474/

25

4

0

文章推荐： powershell - 在ISE外部运行时，根本不显示FolderBrowserDialog

文章推荐： php - 为什么 require_once 会回显整个文件内容？

文章推荐：指定输出模板时，Mercurial 会忽略颜色扩展

文章推荐： audio - 在Blackberry上具有自定义数据源的播放器

elasticsearch - 单词开头的Elasticsearch multi_match
我有一个小问题，我开始使用Elasticsearch，并在使用curl脚本测试时遇到了这个问题。 curl -XGET 'localhost:9200/fcomputer/products/_sear
elasticsearch - 带有正则表达式的elasticsearch multi_match
我试图重建我的 flex 搜索查询，因为我发现没有收到要查找的所有文档。因此，假设我有这样的文档: { "id": 1234, "mail_id": 5, "sender": "john
Elasticsearch - multi_match 不适用于嵌套字段
我的记录可以对单个文本字段进行多种翻译，例如: { "type": "movie", "title": { "en": "Dark Knight", "de": "Der du
elasticsearch multi_match 与应该
谁能告诉我这两者的区别 "query": { "bool": { "should": [ { "match": {"title": keyword} }
ElasticSearch multi_match 如果字段存在则应用过滤器否则不用担心？
所以我们得到了一个 elasticsearch 实例，但是一个工作需要“组合搜索”(一个搜索字段，带有针对特定索引的 types 的复选框) 这很好，我只是将这种搜索应用于我的索引(为简洁起见:/po
Elasticsearch 如何使用带通配符的 multi_match
我有一个具有名称和姓氏属性的用户对象。我想使用一个查询来搜索这些字段，并且在文档中找到了 multi_match，但我不知道如何将其与通配符一起正确使用。是否可以？我尝试使用 multi_match
php - Elasticsearch multi_match 结合普通匹配
我有以下功能可以在 elasticsearch 中搜索结果。我想用 PHP 和 Guzzle 执行以下请求。 /** * {@inheritdoc} */ public func
elasticsearch - Elasticsearch中的Function_score，multi_match，script_score和filter
我在向嵌入在function_score内的现有多重匹配查询中添加过滤器时遇到问题。理想情况下，我想按"term" : { "lang" : "en" }进行过滤，只获取英语版本的文档。我尝试移动
elasticsearch - 是否可以在 multi_match 查询中仅对一个字段使用模糊性？
我在 Elasticsearch 中使用以下 multi_match 查询，我想知道我是否可以仅对“friendly_name 字段”使用模糊性。我尝试过不同的东西，但似乎没有用。我还想知道是否有可能
Elasticsearch 使用 multi_match 突出显示
我在简单的匹配查询中使用了 ES 的高亮显示: GET /_search { "query": { "match": { "Text": "key words he
具有模糊性的多个字段的 ElasticSearch multi_match 查询
如何为 multi_match 查询添加模糊性？因此，如果有人要搜索“basball”，它仍然会找到“baseball”文章。目前我的查询是这样的: POST /newspaper/articles/
elasticsearch - ElasticSearch:[multi_match]查询不支持[search_analyzer]
在ElasticSearch 7.x中，我使用具有synonym filter的分析器为数据字段建立了索引。但是，为了支持增强与数据字段中与同义词匹配的查询词“完全”匹配数据字段中的查询词的查询，我
ruby-on-rails - Elasticsearch，multi_match，ruby上的ruby
我想使用ElasticSearch搜索我的产品集合。如果我有这样的对象: { name: "Product Name", producer: "Best Producer"} 我想通过搜索找到该对象
具有 multi_match 的 Elasticsearch 过滤器
我正在尝试在 ElasticSearch 中编写一个查询，在其中我将 multi_match 与过滤器结合起来用于 id 或数字 og id。这是我目前所拥有的: { "query": {
Elasticsearch:具有多个搜索词的 multi_match phrase_prefix 查询
我有一个包含条目的数据库 title: This is my awesome title abstract: A more detailed descriptions of what [...] 我想
具有 multi_match AND bool 的 ElasticSearch
我尝试学习 Elasticsearch 以将其添加到我的 Rails 应用程序中。我想对 2 个字段执行 multi_match 查询(就好像它们只是一个字段一样)，并且还有一个必须等于 1 的另
elasticsearch - Elasticsearch 中 multi_match 搜索中哪个字段匹配查询？
我在 Elasticsearch 中使用 multi_match 查询: { "query": { "multi_match": { "query": "luk",
elasticsearch - Elasticsearch multi_match cross_fields 前缀
我有一个类型为 cross_fields 的 multi_match 查询，我想通过前缀匹配对其进行改进。 { "index": "companies", "size": 25, "fro
elasticsearch - query_string 和 multi_match 有什么区别？
运行此查询时: { "query_string" : { "query" : "text", "fields": ["field1", "field2"] } } - {
elasticsearch - 在 Elasticsearch 中的 multi_match 查询中获取分数分割
示例查询: { "from": 0, "query": { "filtered": { "query": { "

首页

博学

6Ren·AI

商城

elasticsearch - 带有正则表达式的elasticsearch multi_match