gpt4 book ai didi

elasticsearch - 结合使用精确的前缀/匹配短语前缀查询和Ngram筛选器

转载 作者:行者123 更新时间:2023-12-02 22:34:22 28 4
gpt4 key购买 nike

我的目标是搜索长度为一或两个字符的查询文本。
这是我的索引设置。

"settings" : {
"index" : {
"number_of_shards" : "5",
"provided_name" : "my_user",
"analysis" : {
"filter" : {
"ngrammed" : {
"type" : "ngram",
"min_gram" : "3",
"max_gram" : "50"
}
},
"analyzer" : {
"ngrammed_ci" : {
"filter" : [
"lowercase",
"ngrammed"
],
"type" : "custom",
"tokenizer" : "standard"
},
"keyword_ci" : {
"filter" : [
"lowercase"
],
"type" : "custom",
"tokenizer" : "keyword"
}
}
}
}
}
我的一组用户具有以下分析器的显示名称字段。假设我有几个用户,例如 AllenAlecKimballPolly我面临的问题是,当我使用2个字符长度的查询字符串(如 al以及 AllenAlec)进行搜索时,它也与 Kimball匹配,因为ngram过滤器在反向索引中将 Kimball标记为 all。我试图避免这种情况。还想知道是否有实现该功能的方法而无需更改事物的Index端的任何东西,而仅在查询端进行更改。
"user_display_name" : {
"type" : "text",
"fields" : {
"ci" : {
"type" : "text",
"analyzer" : "keyword_ci"
}
"cs" : {
"type" : "keyword"
}
},
"analyzer" : "ngrammed_ci",
"search_analyzer" : "standard"
}

最佳答案

在您的情况下,您需要从单词开头开始的ngram。在这种情况下,改为使用edge ngrams更有意义。
添加带有索引映射,索引数据,搜索查询和搜索结果的工作示例。
映射:

{
"settings": {
"analysis": {
"filter": {
"ngrammed": {
"type": "edge_ngram", <<-- note this
"min_gram": "2",
"max_gram": "50"
}
},
"analyzer": {
"ngrammed_ci": {
"filter": [
"lowercase",
"ngrammed"
],
"type": "custom",
"tokenizer": "standard"
},
"keyword_ci": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
}
}
},
"index.max_ngram_diff": 50
},
"mappings": {
"properties": {
"user_display_name": {
"type": "text",
"fields": {
"ci": {
"type": "text",
"analyzer": "keyword_ci"
},
"cs": {
"type": "keyword"
}
},
"analyzer": "ngrammed_ci",
"search_analyzer": "standard"
}
}
}
}
将生成以下 token :
GET/_analyze

{
"analyzer" : "ngrammed_ci",
"text" : "Allen"
}

"tokens": [
{
"token": "al",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "all",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "alle",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "allen",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
}
]
索引数据:
{ "user_display_name" : "Allen" }
{ "user_display_name" : "Alec" }
{ "user_display_name" : "Kimball" }
{ "user_display_name" : "Polly" }
搜索查询:
    {
"query": {
"query_string": {
"query": "al",
"default_field": "user_display_name"
}
}
}
搜索结果:
 "hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "1",
"_score": 1.0087044,
"_source": {
"user_display_name": "Allen"
}
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "2",
"_score": 1.0087044,
"_source": {
"user_display_name": "Alec"
}
}
]

关于elasticsearch - 结合使用精确的前缀/匹配短语前缀查询和Ngram筛选器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63306161/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com