gpt4 book ai didi

javascript - 词干查询elasticsearch

转载 作者:太空宇宙 更新时间:2023-11-04 03:30:38 24 4
gpt4 key购买 nike

我已经成功实现了elasticsearch的词干提取,因此当我搜索“代码”时,我遇到了“代码”和“编码”等。

当我尝试在查询中使用“must_not”字段时,出现了问题。当我在“must_not”字段中包含“code”时,没关系,我仍然会按预期得到结果,但是当我搜索“codes”时,即使有文档中肯定含有“codes”一词,我也不会得到任何结果。

我的查询如下:

for(i = 0; i < exclude_words.length; i++)
{
must_not.push({term:{text:exclude_words[i].toLowerCase()}});
}
query = {
"filtered": {
"query": {
"dis_max": {
"queries": [
{"match": {"text": term}},
{"match": {"title": term}}
]
}
},
"filter": {
"bool": {
"must_not": must_not
}
}
}
}

我使用 Node.js 的 elasticsearch api 来构建查询并从 elasticsearch 获取结果。

我假设我遇到这个问题是因为词干,并且“代码”在搜索索引中存储为“代码”。

有没有办法在不使用外部算法来阻止我的查询的情况下解决这个问题?或者有一个优雅的方法来解决这个问题吗?

非常感谢任何帮助!

更新

这是我的分析器:

{
"settings": {
"analysis": {
"analyzer": {
"stopword_analyzer": {
"type": "snowball",
"stopwords": ["a", "able", "about", "across", "after", "all", "almost", "also", "am", "among", "an", "and", "any", "are", "as", "at", "be", "because", "been", "but", "by", "can", "cannot", "could", "dear", "did", "do", "does", "either", "else", "ever","every", "for", "from", "get", "got", "had", "has", "have", "he", "her", "hers", "him", "his", "how", "however", "i", "if", "in", "into", "is", "it", "its", "just", "least", "let", "like", "may", "me", "might", "most", "must", "my", "neither", "no", "nor", "not", "of", "off", "often", "on", "only", "or", "other", "our", "own", "rather", "said", "say", "says", "she", "should", "since", "so", "some", "than", "that", "the", "their", "them", "then", "there", "these", "they", "this", "tis", "to", "too", "us", "wants", "was", "we", "were", "what", "when", "where", "which", "while", "who", "whom", "why", "will", "with", "would", "yet", "you", "your"]
}
}
}
}

文本字段具有以下映射:

"text": {
"type": "string",
"analyzer": "stopword_analyzer"
}

最佳答案

When I include "code" in the "must_not" field, it's fine and I still get my results as expected

这与 must_not 无关,而是与您在 must_not 中使用的 term 过滤器有关。 term 过滤器将获取您的搜索文本 - “code”或“code”或其他任何内容 - 并且它将使用精确的值进行过滤。

但是,您使用的分析器正在更改正在索引的术语。例如,如果您想要索引“coding”,您实际上将在索引“code”中找到(作为倒排索引中的术语)。请记住,term 实际上会搜索精确值。因此,如果您搜索“代码”,则不会找到它,因为文档中的单个术语是“代码”。

我建议在 must_not 部分尝试使用 match 而不是 term,因为这样也会在搜索时使用分析器。像这样的事情:

  "filter": {
"bool": {
"must_not": [
{
"query": {
"match": {
"text": "codes"
}
}
},
{
"query": {
"match": {
"text": "coding"
}
}
}
]
}
}

关于javascript - 词干查询elasticsearch,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38462286/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com