gpt4 book ai didi

elasticsearch - 部分词搜索 - ElasticSearch 1.7.2

转载 作者:行者123 更新时间:2023-12-03 00:19:00 27 4
gpt4 key购买 nike

我一直在尝试使用 ElasticSearch 为应用程序构建搜索模块。下面是我从其他 StackOverflow 帖子中阅读的示例代码构建的索引结构。

{
"megacorp4":{
"settings":{
"analysis":{
"analyzer":{
"my_analyzer":{
"type":"custom",
"tokenizer":"my_ngram_tokenizer",
"filter":[
"my_ngram_filter"
]
}
},
"filter":{
"my_ngram_filter":{
"type":"edgeNGram",
"min_gram":3,
"max_gram":15
}
},
"tokenizer":{
"my_ngram_tokenizer":{
"type":"edgeNGram",
"min_gram":3,
"max_gram":15
}
}
},
"mappings":{
"employee":{
"properties":{
"about":{
"type":"string",
"analyzer":"my_analyzer"
},
"age":{
"type":"long"
},
"first_name":{
"type":"string"
},
"interests":{
"type":"string",
"analyzer":"my_analyzer"
},
"last_name":{
"type":"string"
}
}
}
}
}
}
}

以下是我插入以测试搜索功能的记录
[
{
"first_name":"John",
"last_name":"Smith",
"age":25,
"about":"I love to go rock climbing",
"interests":[
"sports",
"music"
]
},
{
"first_name":"Douglas",
"last_name":"Fir",
"age":35,
"about":"I like to build album climb cabinets",
"interests":[
"forestry",
"music"
]
},
{
"first_name":"Jane",
"last_name":"Smith",
"age":32,
"about":"I like to collect rock albums",
"interests":[
"music"
]
}
]

我使用 API(通过 POSTMAN)和 Python 客户端对“关于”列进行了搜索,如下所示:

API查询:
localhost:9200/megacorp4/_search?q=climb

python 查询:
from elasticsearch import Elasticsearch
from pprint import pprint
es = Elasticsearch()
res = es.search(index="megacorp4", body={"query": {"match": {'about':"climb"}}})
pprint(res)

我只能获得完全匹配,并且在输出中没有得到“攀爬”的结果。但是,当我在查询中将 'climb' 替换为 'climb*' 时,我得到 2 条记录为 'climb' 和 'climbing'。我不想使用 '*' 通配符方法。

我也尝试过使用“english”、“standard”和“ngram”内置分析器,但似乎没有任何效果。

需要帮助以将关键字搜索为全文中的部分单词。

提前致谢。

最佳答案

请改用此映射:

删除测试

PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_ngram_filter"
]
}
},
"filter": {
"my_ngram_filter": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 15
}
}
}
},
"mappings": {
"employee": {
"properties": {
"about": {
"type": "string",
"analyzer": "my_analyzer"
},
"age": {
"type": "long"
},
"first_name": {
"type": "string"
},
"interests": {
"type": "string",
"analyzer": "my_analyzer"
},
"last_name": {
"type": "string"
}
}
}
}
}

POST /test/employee/_bulk
{"index":{}}
{"first_name":"John","last_name":"Smith","age":25,"about":"I love to go rock climbing","interests":["sports","music"]}
{"index":{}}
{"first_name":"Douglas","last_name":"Fir","age":35,"about":"I like to build album climb cabinets","interests":["forestry","music"]}
{"index":{}}
{"first_name":"Jane","last_name":"Smith","age":32,"about":"I like to collect rock albums","interests":["music"]}

GET /test/_search?q=about:climb

GET /test/_search
{
"query": {
"query_string": {
"query": "about:climb"
}
}
}

GET /test/_search
{
"query": {
"match": {
"about": "climb"
}
}
}

两个变化:
  • settings 需要另一个右大括号部分
  • 用另一个替换你的自定义标记器(这对你没有帮助,因为你已经有了 edgeNGram 过滤器),我的建议是 standard标记器

  • 对于 ?q=climb部分,默认情况下搜索 _all使用 standard 分析的字段分析仪,而不是您的自定义分析仪。

    所以,正确的查询是 localhost:9200/megacorp4/_search?q=about:climb .

    关于elasticsearch - 部分词搜索 - ElasticSearch 1.7.2,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33037451/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com