gpt4 book ai didi

elasticsearch - Elasticsearch不返回具有相同 token 的结果?

转载 作者:行者123 更新时间:2023-12-03 02:33:50 25 4
gpt4 key购买 nike

插入ElasticSearch的数据是韩文,因此我无法提供确切的大小写,但可以说
我有一个词ABBCC已被标记为["A","BBCC"]和另一个词AZZXXX被标记为["A","ZZXXX"]

如果我搜索ABBCC,那么AZZXXX是否应该出现,因为它们具有相同的 token ?还是这不是Elasticsearch的工作方式?

这就是我检查分析单词的方式:

GET recpost_test/_analyze
{
"analyzer": "my_analyzer",
"text":"my query String!"
}

这就是我创建索引的方式:
PUT recpost
{
"settings": {
"index": {
"analysis": {
"tokenizer": {
"nori_user_dict": {
"type": "nori_tokenizer",
"decompound_mode": "mixed",
"user_dictionary": "userdict_ko.txt"
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "nori_user_dict"
}
},
"filter": {
"substring": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 10
}
}
}
}
}
}

这是我搜索的方式:
GET recpost/_search
{
"_source": [""],
"from": 0,
"size": 2,
"query":{
"multi_match": {
"query" : "my query String!",
"type": "best_fields",
"fields" : [
"brandkor",
"content",
"itemname",
"name",
"review",
"shortreview^2",
"title^3"]
}
}
}


编辑:
我尝试在搜索中添加“分析器”字段,但仍然无法正常工作
GET recpost/_search
{
"_source": [""],
"from": 0,
"size": 2,
"query":{
"multi_match": {
"query" : "깡스",
"analyzer": "my_analyzer",
"type": "best_fields",
"fields" : [
"brandkor",
"content",
"itemname",
"name",
"review",
"shortreview^2",
"title^3"]
}
}
}

EDIT2:这是我的映射:
{
"recpost_test" : {
"mappings" : {
"properties" : {
"@timestamp" : {
"type" : "date"
},
"brandkor" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"content" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"field_statistics" : {
"type" : "boolean"
},
"fields" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"itemname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"offsets" : {
"type" : "boolean"
},
"payloads" : {
"type" : "boolean"
},
"positions" : {
"type" : "boolean"
},
"review" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"shortreview" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"term_statistics" : {
"type" : "boolean"
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}

最佳答案

我看不到您将字段装入索引(映射)。
因此,据我所知,您是将所有字段(brandkor,content等)都索引为text ..,并且基本上是在匹配精确值。

除非您将每个字段与其分析器相关联。

关于elasticsearch - Elasticsearch不返回具有相同 token 的结果?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59352302/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com