gpt4 book ai didi

elasticsearch - ElasticSearch分析器以匹配 “Java”, “Script”和 “JavaScript”

转载 作者:行者123 更新时间:2023-12-03 01:54:30 25 4
gpt4 key购买 nike

索引值:Java, JavaScript, ClojureScript

_input_    | _output_
Java | JavaScript, Java
JavaScript | JavaScript
script | JavaScript, ClojureScript

以下是最接近所需结果的分析仪。
"analysis": {
"filter": {
"trigrams_filter": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "3"
}
},
"analyzer": {
"trigrams": {
"filter": [
"lowercase",
"trigrams_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
}

但这不够准确,因为“JavaScript”返回“JavaScript”和“Java”
而“脚本”则不返回任何内容。

最佳答案

映射存在一个主要问题:您想使用edge_ngram过滤器来搜索单词的一部分。当您要查找以查询值开头的单词时,将使用Edge_ngram过滤器。在您的情况下,您应该使用nGram过滤器:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenfilter.html

另外,仅当数据为索引时才应指定三字母组分析器。对于搜索,最好使用标准分析器,因为没有意义通过nGram过滤器放置查询字符串,因为您将获得比所需更多的数据。

正确的映射:

POST /so
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"trigrams_filter": {
"type": "nGram",
"min_gram": "2",
"max_gram": "20"
}
},
"analyzer": {
"trigrams": {
"filter": [
"lowercase",
"trigrams_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
},
"mappings": {
"so" :{
"properties": {
"text": {
"type": "string",
"analyzer": "trigrams",
"search_analyzer": "standard"
}
}
}
}
}

值:
POST /so/so/1
{
"text" :"Java"
}
POST /so/so/2
{
"text" :"JavaScript"
}
POST /so/so/3
{
"text" :"ClojureScript"
}

当您的查询字符串是“java”时,响应包含:Java和JavaScript
POST /so/so/_search
{
"query": {"match": {
"text": "Java"
}}
}

响应:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "so",
"_type": "so",
"_id": "1",
"_score": 1,
"_source": {
"text": "Java"
}
},
{
"_index": "so",
"_type": "so",
"_id": "2",
"_score": 1,
"_source": {
"text": "JavaScript"
}
}
]
}
}

当查询字符串为“JavaScript”时,响应包含:JavaScript
POST /so/so/_search
{
"query": {"match": {
"text": " JavaScript "
}}
}

响应:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4054651,
"hits": [
{
"_index": "so",
"_type": "so",
"_id": "2",
"_score": 1.4054651,
"_source": {
"text": "JavaScript"
}
}
]
}
}

当查询字符串为“脚本”时,响应包含:JavaScript和ClojureScript
POST /so/so/_search
{
"query": {"match": {
"text": "script"
}}
}

响应:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "so",
"_type": "so",
"_id": "2",
"_score": 1,
"_source": {
"text": "JavaScript"
}
},
{
"_index": "so",
"_type": "so",
"_id": "3",
"_score": 1,
"_source": {
"text": "ClojureScript"
}
}
]
}
}

关于elasticsearch - ElasticSearch分析器以匹配 “Java”, “Script”和 “JavaScript”,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37661348/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com