gpt4 book ai didi

elasticsearch - Elastic Search多字同义词无法按预期工作

转载 作者:行者123 更新时间:2023-12-02 22:35:17 25 4
gpt4 key购买 nike

我有一个ElasticSearch搜索引擎,并且正在为其添加同义词支持。对于unigram同义词来说一切都很好,但是在开始处理多单词同义词时,一切都弄糟了。

例如,我希望以下查询-“冰淇淋”返回每个有关“冰淇淋”或“冰淇淋”或“冰淇淋”的文档。

我的映射设置如下

PUT stam_test_1
{
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"plural_stemmer": {
"name": "minimal_english",
"type": "stemmer"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
},
"english_graph_synonyms": {
"type": "synonym_graph",
"tokenizer": "standard",
"expand": true,
"synonyms": [
"ice cream, icecream, creamery, gelato",
"dim sum, dim sim, dimsim",
"ube, purple yam",
"sf, san francisco"
]
},
"english_synonyms": {
"type": "synonym",
"expand": true,
"tokenizer": "standard",
"synonyms": [
"burger, hamburger, slider",
"chicken, pollo",
"pork, pig, porc",
"barbeque, bbq, barbecue",
"sauce, dressing"
]
}
},
"analyzer": {
"english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"plural_stemmer",
"english_stop",
"english_stemmer",
"asciifolding",
"english_synonyms"
]
},
"english_search": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"plural_stemmer",
"english_stop",
"english_stemmer",
"asciifolding",
"english_graph_synonyms"
]
}
}
}
},
"mappings": {
"properties": {
"text_field": {
"type": "text",
"fields": {
"post_text": {
"type": "text",
"analyzer": "english",
"search_analyzer": "english_search"
}
}
}
}
}
}

我要添加一些文件
POST _bulk
{ "index" : { "_index" : "stam_test_1", "_id" : "1" } }
{ "post_text" : "Love this ice cream so much!!!"}
{ "index" : { "_index" : "stam_test_1", "_id" : "2" } }
{ "post_text" : "Great gelato and a tasty burger"}
{ "index" : { "_index" : "stam_test_1", "_id" : "3" } }
{ "post_text" : "I bought coke but did not get any ice with it" }
{ "index" : { "_index" : "stam_test_1", "_id" : "4" } }
{ "post_text" : "ic cream" }

当我查询“冰淇淋”时
GET / stam_test_1 / _search
{
"query": {
"match": {
"post_text": {
"query": "ice cream",
"analyzer": "english_search"}
}
}
}

我得到以下结果
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.6678555,
"hits" : [
{
"_index" : "stam_test_1",
"_type" : "_doc",
"_id" : "10",
"_score" : 2.6678555,
"_source" : {
"post_text" : "ic cream"
}
},
{
"_index" : "stam_test_1",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.6931472,
"_source" : {
"post_text" : "Great gelato and a tasty burger"
}
}
]
}
}

您可以看到,我故意添加了一个已经阻止的文档-“ic cream”,但我没有得到第一个文档“非常爱这款冰淇淋!”时,我怀疑它返回了。

当我直接在“冰淇淋”上测试分析仪时
GET stam_test_1/_analyze?
{
"analyzer": "english_search",
"text" : "ice cream"
}

它返回
{
"tokens" : [
{
"token" : "icecream",
"start_offset" : 0,
"end_offset" : 9,
"type" : "SYNONYM",
"position" : 0,
"positionLength" : 2
},
{
"token" : "softserv",
"start_offset" : 0,
"end_offset" : 9,
"type" : "SYNONYM",
"position" : 0,
"positionLength" : 2
},
{
"token" : "icream",
"start_offset" : 0,
"end_offset" : 9,
"type" : "SYNONYM",
"position" : 0,
"positionLength" : 2
},
{
"token" : "creameri",
"start_offset" : 0,
"end_offset" : 9,
"type" : "SYNONYM",
"position" : 0,
"positionLength" : 2
},
{
"token" : "gelato",
"start_offset" : 0,
"end_offset" : 9,
"type" : "SYNONYM",
"position" : 0,
"positionLength" : 2
},
{
"token" : "ic",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "cream",
"start_offset" : 4,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}

uni-word同义词正常返回,但是多词被阻止(每个标记分别),并且似乎没有阻止实际文档(这就是为什么我得到“ic cream”文档的原因)。

我确定这只是错误设置的定义。我试图用“关键字”而不是“标准”替换english_search分析器的标记器,但也没有运气。

关于如何处理此问题有什么建议吗? onymous_graph功能只有很少量的文档和Google结果。

最佳答案

所以我的错误是映射定义。我不应该定义字段,我所要做的就是使用以下映射,并且这样都可以正常工作

"mappings": {
"properties": {
"post_text": {
"type": "text",
"analyzer": "english",
"search_analyzer": "english_search"
}
}
}

关于elasticsearch - Elastic Search多字同义词无法按预期工作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57590273/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com