gpt4 book ai didi

ElasticSearch:使用词干突出显示

转载 作者:行者123 更新时间:2023-12-02 23:15:36 25 4
gpt4 key购买 nike

我已阅读此内容 question并试图理解文档 here ,但这很复杂。

问题(我认为):

[更新 1]

我将 Scala 用于我的代码和与 ES 高级 Java API 的接口(interface)。

我配置了一个词干分析器。如果我搜索 responsibilities我得到 responsibilities 的结果和 responsibility .那太棒了。

但是

只有带有 responsibilities 的文档返回亮点。
这是因为搜索是在词干内容上,即 responsib .但是,重点是针对未提取的内容。因此,它找到 responsibilities这是一个搜索条件,但不是 responsibility ,事实并非如此。

如果我将荧光笔设置为在词干内容上突出显示,它根本不会返回任何内容。我猜是因为它在比较 resonsibresponsibilities
搜索

我使用 Java 高级 API。问题不在于代码本身。
目前,我只强调 content字段,仅返回 responsibilities .突出显示 content.english似乎什么也没有返回

 private def buildHighlighter(): HighlightBuilder = {
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder
val highlightBuilder = new HighlightBuilder
val highlightContent = new HighlightBuilder.Field("content")
highlightContent.highlighterType("unified")
highlightBuilder.field(highlightContent)
highlightBuilder

}

测绘(预)

{
"settings": {
"number_of_shards": 3,
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_keywords": {
"type": "keyword_marker",
"keywords": []
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"title": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
}
}
},
"content": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
}
}

}
}
}
}


[更新 2]

实现搜索的Scala代码:
def searchByField(indices: Seq[ESIndexName], terms: Seq[(String, String)], size: Int = 20): SearchResponse = {

val searchRequest = new SearchRequest
searchRequest.indices(indices.map(idx => idx.completeIndexName()): _*)
searchRequest.source(buildTargetFieldsMatchQuery(terms, size))

searchRequest.indicesOptions(IndicesOptions.strictSingleIndexNoExpandForbidClosed())

client.search(searchRequest, RequestOptions.DEFAULT)
}

查询构建如下:
private def buildTargetFieldsMatchQuery(termsByField: Seq[(String, String)], size: Int): SearchSourceBuilder = {

val query = new BoolQueryBuilder

termsByField.foreach {
case (field, term) =>

if (field == "content") {
logger.debug(field + " should have " + term)
query.should(new MatchQueryBuilder(field+standardAnalyzer, term.toLowerCase))
query.should(new MatchQueryBuilder(field, term.toLowerCase))
}
else if (field == "title"){
logger.debug(field + " should have " + term)
query.should(new MatchQueryBuilder(field+standardAnalyzer, term.toLowerCase())).boost
}
else {
logger.debug(field + " should have " + term)
query.should(new MatchQueryBuilder(field, term.toLowerCase))
}

}
val sourceBuilder: SearchSourceBuilder = new SearchSourceBuilder()
sourceBuilder.query(query)
sourceBuilder.from(0)
sourceBuilder.size(size)
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS))
sourceBuilder.highlighter(buildHighlighter())

}

最佳答案

使用普通的 REST,以下内容对我来说效果很好:

PUT test
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_keywords": {
"type": "keyword_marker",
"keywords": []
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"content": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
}
}

POST test/_doc/
{
"content": "This is my responsibility"
}

POST test/_doc/
{
"content": "These are my responsibilities"
}

GET test/_search
{
"query": {
"match": {
"content.english": "responsibilities"
}
},
"highlight": {
"fields": {
"content.english": {
"type": "unified"
}
}
}
}

结果是:
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "5D5PPGoBqgTTLzdtM-_Y",
"_score" : 0.18232156,
"_source" : {
"content" : "This is my responsibility"
},
"highlight" : {
"content.english" : [
"This is my <em>responsibility</em>"
]
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "5T5PPGoBqgTTLzdtZe8U",
"_score" : 0.18232156,
"_source" : {
"content" : "These are my responsibilities"
},
"highlight" : {
"content.english" : [
"These are my <em>responsibilities</em>"
]
}
}
]

查看您的 Java/Groovy (?) 代码,它看起来非常接近 example in the docs .您能否记录您正在运行的实际查询,以便我们找出问题所在?通常它应该像这样工作。

关于ElasticSearch:使用词干突出显示,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55770717/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com