gpt4 book ai didi

elasticsearch - 如何在使用 Elasticsearch 对 500 万条记录集进行全文搜索时进行增量/搜索

转载 作者:行者123 更新时间:2023-12-04 03:50:40 24 4
gpt4 key购买 nike

我在所有维基百科文章名称的庞大数据集上使用 Elasticsearch ,它们大约有 500 万个数字数据库字段名称是文章名称

curl -XPUT "http://localhost:9200/index_wiki_articlenames/" -d'
{
"settings":{
"analysis":{
"filter":{
"nGram_filter":{
"type":"edgeNGram",
"min_gram":1,
"max_gram":20,
"token_chars":[
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"tokenizer":{
"edge_ngram_tokenizer":{
"type":"edgeNGram",
"min_gram":"1",
"max_gram":"20",
"token_chars":[
"letter",
"digit"
]
}
},
"analyzer":{
"nGram_analyzer":{
"type":"custom",
"tokenizer":"edge_ngram_tokenizer",
"filter":[
"lowercase",
"asciifolding"
]
}
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
},
"mappings":{
"name":{
"properties":{
"articlenames":{
"type":"text",
"analyzer":"nGram_analyzer"
}
}
}
}
}'

引用这些链接也解决了我的问题,但徒劳无功

Edge NGram with phrase matching

https://hackernoon.com/elasticsearch-building-autocomplete-functionality-494fcf81a7cf

我的目标是针对“sachin t”的输入查询获得如下结果

sachin tendulkar
sachin tendulkar centuries
sachin tejas
sachin top 60 quotes
sachin talwalkar
sachin tawade
sachin taps

以及查询“sachin te”

sachin tendulkar
sachin tendulkar centuries
sachin tejas

以及查询“sachin ta”

sachin talwalkar
sachin tawade
sachin taps

以及查询“sachin 10”

sachin tendulkar
sachin tendulkar centuries

请记住数据集很大,一些文章名称和单词可以包含特殊字符和单词,例如“Bronisław-Komorowski”

我能够获得最多 10 万条记录的较小数据集的输出,但一旦我的数据集变为 0.5 到 5 百万条记录我无法获得输出

我的查询是

http://127.0.0.1:9200/index_wiki_articlenames/_search?&q=articlenames:sachin-t+articlenames:sachin-t.*&filter_path=hits.hits._source.articlenames&size=50

最佳答案

您应该尝试这些设置:

curl -XPUT "http://localhost:9200/index_wiki_articlenames/" -d'
{
"settings":{
"analysis":{
"tokenizer":{
"edge_ngram_tokenizer":{
"type":"edgeNGram",
"min_gram":"1",
"max_gram":"20",
"token_chars":[
"letter",
"digit"
]
}
},
"analyzer":{
"nGram_analyzer":{
"type":"custom",
"tokenizer":"edge_ngram_tokenizer",
"filter":[
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings":{
"name":{
"properties":{
"articlenames":{
"type":"text",
"analyzer":"nGram_analyzer",
"search_analyzer": "standard"
}
}
}
}
}'

查询时也试试这个查询:

GET my_index/_search
{
"query": {
"match": {
"articlenames": {
"query": "Sachin T",
"operator": "and"
}
}
}
}

关于elasticsearch - 如何在使用 Elasticsearch 对 500 万条记录集进行全文搜索时进行增量/搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48809268/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com