应该给出两个结果 “你是-6ren">
gpt4 book ai didi

Elasticsearch 不区分大小写的通配符搜索

转载 作者:行者123 更新时间:2023-12-02 22:13:03 29 4
gpt4 key购买 nike

领域priorityNamesearch_as_you_type数据类型。

我的用例就像我想用以下词搜索文档:---

  • “让我们”-> 应该给出两个结果
  • "DOING"-> 应该给出两个结果
  • “你是你吗”-> 应该给出两个结果
  • "Are You"-> 应该给出两个结果
  • “你做”(你做的不够)-> 应该给出两个结果
  • "re you"-> 应该给出两个结果

  • 在 6 个中,只有前 5 个使用 multi_match 给了我想要的结果。
    我怎么能有第 6 个用例,其中我们可以有不以第一个字符开头的不完整单词。

    示例文档
            "_index": "priority",
    "_type": "_doc",
    "_id": "vaCI_HAB31AaC-t5TO9H",
    "_score": 1,
    "_source": { -
    "priorityName": "What are you doing along Let's Go out"
    }
    },
    { -
    "_index": "priority",
    "_type": "_doc",
    "_id": "vqCQ_HAB31AaC-t5wO8m",
    "_score": 1,
    "_source": { -
    "priorityName": "what are you doing along let's go for shopping"
    }
    }
    ]
    }

    最佳答案

    上次搜索 re you ,您需要infix tokens默认情况下,它不包含在 search_as_you_type 数据类型中。我建议您创建一个自定义分析器,它将创建中缀标记并允许您匹配所有 6 个查询。
    我已经创建了一个自定义分析器并使用您的示例文档对其进行了测试,所有 6 个查询都给出了两个示例结果。
    索引映射
    POST/中缀索引

    {
    "settings": {
    "max_ngram_diff": 50,
    "analysis": {
    "filter": {
    "autocomplete_filter": {
    "type": "ngram",
    "min_gram": 1,
    "max_gram": 8
    }
    },
    "analyzer": {
    "autocomplete_analyzer": {
    "type": "custom",
    "tokenizer": "whitespace",
    "filter": [
    "lowercase",
    "autocomplete_filter"
    ]
    },
    "lowercase_analyzer": {
    "type": "custom",
    "tokenizer": "whitespace",
    "filter": [
    "lowercase"
    ]
    }
    }
    }
    },
    "mappings": {
    "properties": {
    "priorityName": {
    "type": "text",
    "analyzer": "autocomplete_analyzer",
    "search_analyzer": "standard" --> note this
    }
    }
    }
    }
    索引您的示例文档
    {
    "priorityName" : "What are you doing along Let's Go out"
    }

    {
    "priorityName" : "what are you doing along let's go for shopping"
    }
    最后的搜索查询 re you
    {
    "query": {
    "match" : {
    "priorityName" : "re you"
    }
    }
    }
    结果
    "hits": [
    {
    "_index": "ngram",
    "_type": "_doc",
    "_id": "1",
    "_score": 1.4652853,
    "_source": {
    "priorityName": "What are you doing along Let's Go out"
    }
    },
    {
    "_index": "ngram",
    "_type": "_doc",
    "_id": "2",
    "_score": 1.4509768,
    "_source": {
    "priorityName": "what are you doing along let's go for shopping"
    }
    }
    其他查询也向我返回了两个文件,但不包括它们以缩短此答案的长度。
    注:下面是一些重要的链接,可以详细了解答案。
    https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html
    https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

    关于Elasticsearch 不区分大小写的通配符搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60786805/

    29 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com