gpt4 book ai didi

java - ElasticSearch 一个 edgeNGram for autocomplete\typeahead, is my search_analyzer being ignored

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:02:27 25 4
gpt4 key购买 nike

我有三个带有“用户名”字段的文档:

  • '布里安迪利'
  • 'briangumble'
  • 'briangriffen'

当我搜索“brian”时,我按预期得到了所有三个,但是当我搜索“briandilley”时,我仍然得到了所有三个。 analyze API 告诉我它在我的搜索字符串上使用了 ngram 过滤器,但我不确定为什么。这是我的设置:

索引设置:

{
"analysis": {
"analyzer": {
"username_index": {
"tokenizer": "keyword",
"filter": ["lowercase", "username_ngram"]
},
"username_search": {
"tokenizer": "keyword",
"filter": ["lowercase"]
}
},
"filter": {
"username_ngram": {
"type": "edgeNGram",
"side" : "front",
"min_gram": 1,
"max_gram": 15
}
}
}
}

映射:

{
"user_follow": {

"properties": {
"targetId": { "type": "string", "store": true },
"followerId": { "type": "string", "store": true },
"dateUpdated": { "type": "date", "store": true },

"userName": {
"type": "multi_field",
"fields": {
"userName": {
"type": "string",
"index": "not_analyzed"
},
"autocomplete": {
"type": "string",
"index_analyzer": "username_index",
"search_analyzer": "username_search"
}
}
}
}
}
}

搜索:

{
"from" : 0,
"size" : 50,
"query" : {
"bool" : {
"must" : [ {
"field" : {
"targetId" : "51888c1b04a6a214e26a4009"
}
}, {
"match" : {
"userName.autocomplete" : {
"query" : "brian",
"type" : "boolean"
}
}
} ]
}
},
"fields" : "followerId"
}

我已经尝试过 matchQuery、matchPhraseQuery、textQuery 和 termQuery (java DSL api),每次都得到相同的结果。

最佳答案

我认为您并没有完全按照自己的想法行事。这就是为什么使用完整的 curl 语句而不是缩写它来呈现实际测试用例是有用的。

你上面的例子对我有用(稍作修改):

使用设置和映射创建索引:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
"mappings" : {
"test" : {
"properties" : {
"userName" : {
"fields" : {
"autocomplete" : {
"search_analyzer" : "username_search",
"index_analyzer" : "username_index",
"type" : "string"
},
"userName" : {
"index" : "not_analyzed",
"type" : "string"
}
},
"type" : "multi_field"
}
}
}
},
"settings" : {
"analysis" : {
"filter" : {
"username_ngram" : {
"max_gram" : 15,
"min_gram" : 1,
"type" : "edge_ngram"
}
},
"analyzer" : {
"username_index" : {
"filter" : [
"lowercase",
"username_ngram"
],
"tokenizer" : "keyword"
},
"username_search" : {
"filter" : [
"lowercase"
],
"tokenizer" : "keyword"
}
}
}
}
}
'

索引一些数据:

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '{
"userName" : "briangriffen"
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d '
{
"userName" : "brianlilley"
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d '
{
"userName" : "briangumble"
}
'

搜索 brian 会找到所有文档:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '{
"query" : {
"match" : {
"userName.autocomplete" : "brian"
}
}
}
'

# {
# "hits" : {
# "hits" : [
# {
# "_source" : {
# "userName" : "briangriffen"
# },
# "_score" : 0.1486337,
# "_index" : "test",
# "_id" : "AWzezvEFRIykOAr75QbtcQ",
# "_type" : "test"
# },
# {
# "_source" : {
# "userName" : "briangumble"
# },
# "_score" : 0.1486337,
# "_index" : "test",
# "_id" : "qIABuMOiTyuxLOiFOzcURg",
# "_type" : "test"
# },
# {
# "_source" : {
# "userName" : "brianlilley"
# },
# "_score" : 0.076713204,
# "_index" : "test",
# "_id" : "fGgTITKvR6GJXI_cqA4Vzg",
# "_type" : "test"
# }
# ],
# "max_score" : 0.1486337,
# "total" : 3
# },
# "timed_out" : false,
# "_shards" : {
# "failed" : 0,
# "successful" : 5,
# "total" : 5
# },
# "took" : 8
# }

搜索 brianlilley 只会找到该文档:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
"query" : {
"match" : {
"userName.autocomplete" : "brianlilley"
}
}
}
'

# {
# "hits" : {
# "hits" : [
# {
# "_source" : {
# "userName" : "brianlilley"
# },
# "_score" : 0.076713204,
# "_index" : "test",
# "_id" : "fGgTITKvR6GJXI_cqA4Vzg",
# "_type" : "test"
# }
# ],
# "max_score" : 0.076713204,
# "total" : 1
# },
# "timed_out" : false,
# "_shards" : {
# "failed" : 0,
# "successful" : 5,
# "total" : 5
# },
# "took" : 4
# }

关于java - ElasticSearch 一个 edgeNGram for autocomplete\typeahead, is my search_analyzer being ignored,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16411963/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com