gpt4 book ai didi

elasticsearch - 如何让 Elasticsearch 突出显示 search_as_you_type 字段中的部分单词?

转载 作者:行者123 更新时间:2023-12-03 00:37:42 25 4
gpt4 key购买 nike

我在按照此处的指南设置 search_as_you_type 字段并突出显示时遇到问题 https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-as-you-type.html

我将留下一系列命令来重现我所看到的内容。希望有人可以权衡我所缺少的东西:)

  1. 创建映射
PUT /test_index
{
"mappings": {
"properties": {
"plain_text": {
"type": "search_as_you_type",
"index_options": "offsets",
"term_vector": "with_positions_offsets"
}
}
}
}
  • 插入文档
  • POST /test_index/_doc
    {
    "plain_text": "This is some random text"
    }
  • 搜索文档
  • GET /snippets_test/_search
    {
    "query": {
    "multi_match": {
    "query": "rand",
    "type": "bool_prefix",
    "fields": [
    "plain_text",
    "plain_text._2gram",
    "plain_text._3gram",
    "plain_text._index_prefix"
    ]
    }
    },
    "highlight" : {
    "fields" : [
    {
    "plain_text": {
    "number_of_fragments": 1,
    "no_match_size": 100
    }
    }
    ]
    }
    }
  • 回复
  • {
    "took" : 1,
    "timed_out" : false,
    "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
    },
    "hits" : {
    "total" : {
    "value" : 1,
    "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
    {
    "_index" : "test_index",
    "_type" : "_doc",
    "_id" : "rLZkjm8BDC17cLikXRbY",
    "_score" : 1.0,
    "_source" : {
    "plain_text" : "This is some random text"
    },
    "highlight" : {
    "plain_text" : [
    "This is some random text"
    ]
    }
    }
    ]
    }
    }

    我得到的回复没有我期望的突出显示理想情况下,亮点是:This is some <em>ran</em>dom text

    最佳答案

    为了实现 n-gram(字符)的突出显示,您需要:

    • 自定义 ngram 分词器。默认情况下,min_grammax_gram 之间的最大差异为 1,因此在我的示例中,突出显示仅适用于长度为 3 或 4 的搜索词。您可以更改此设置并创建通过为 index.max_ngram_diff 设置更高的值来获得更多 n-gram。
    • 基于自定义分词器的自定义分析器
    • 在映射中添加“plain_text.highlight”字段

    这是配置:

    {
    "settings": {
    "analysis": {
    "analyzer": {
    "partial_words" : {
    "type": "custom",
    "tokenizer": "ngrams",
    "filter": ["lowercase"]
    }
    },
    "tokenizer": {
    "ngrams": {
    "type": "ngram",
    "min_gram": 3,
    "max_gram": 4
    }
    }
    }
    },
    "mappings": {
    "properties": {
    "plain_text": {
    "type": "text",
    "fields": {
    "shingles": {
    "type": "search_as_you_type"
    },
    "ngrams": {
    "type": "text",
    "analyzer": "partial_words",
    "search_analyzer": "standard",
    "term_vector": "with_positions_offsets"
    }
    }
    }
    }
    }
    }

    查询:

    {
    "query": {
    "multi_match": {
    "query": "rand",
    "type": "bool_prefix",
    "fields": [
    "plain_text.shingles",
    "plain_text.shingles._2gram",
    "plain_text.shingles._3gram",
    "plain_text.shingles._index_prefix",
    "plain_text.ngrams"
    ]
    }
    },
    "highlight" : {
    "fields" : [
    {
    "plain_text.ngrams": { }
    }
    ]
    }
    }

    结果:

        "hits": [
    {
    "_index": "test_index",
    "_type": "_doc",
    "_id": "FkHLVHABd_SGa-E-2FKI",
    "_score": 2,
    "_source": {
    "plain_text": "This is some random text"
    },
    "highlight": {
    "plain_text.ngrams": [
    "This is some <em>rand</em>om text"
    ]
    }
    }
    ]

    注意:在某些情况下,此配置对于内存使用和存储来说可能会很昂贵。

    关于elasticsearch - 如何让 Elasticsearch 突出显示 search_as_you_type 字段中的部分单词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59677406/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com