gpt4 book ai didi

C# NEST Elasticsearch自定义过滤器结构(tokenize)

转载 作者:行者123 更新时间:2023-11-30 21:37:04 25 4
gpt4 key购买 nike

我正在尝试将此特定查询重写为 C# NEST,但我坚持定义过滤器...我很困惑...

{  
"settings":{
"analysis":{
"filter":{
"lemmagen_filter_sk":{
"type":"lemmagen",
"lexicon":"sk"
},
"synonym_filter":{
"type":"synonym",
"synonyms_path":"synonyms/sk_SK.txt",
"ignore_case":true
},
"stopwords_SK":{
"type":"stop",
"stopwords_path":"stop-­‐words/stop­‐words-­slovak.txt",
"ignore_case":true
}
},
"analyzer":{
"slovencina_synonym":{
"type":"custom",
"tokenizer":"standard",
"filter":[
"stopwords_SK",
"lemmagen_filter_sk",
"lowercase",
"stopwords_SK",
"synonym_filter",
"asciifolding"
]
},
"slovencina":{
"type":"custom",
"tokenizer":"standard",
"filter":[
"stopwords_SK",
"lemmagen_filter_sk",
"lowercase",
"stopwords_SK",
"asciifolding"
]
},

我希望有正确的 client.CreateIndex(...) 命令和正确的索引设置。我现在只有这个:

client.CreateIndex(indexName, c => c
.InitializeUsing(indexConfig)
.Mappings(m => m
.Map<T>(mp => mp.AutoMap())));

我找不到任何有关如何执行此操作的信息。如果有任何帮助,我将不胜感激。

编辑:

client.CreateIndex(indexName, c => c
.InitializeUsing(indexConfig)
.Settings(s => s
.Analysis(a => a
.TokenFilters(t => t
.UserDefined("lemmagen_filter_sk",
new LemmagenTokenFilter { Lexicon = "sk" })
.Synonym("synonym_filter", ts => ts
.SynonymsPath("synonyms/sk_SK.txt")
.IgnoreCase(true))
.Stop("stopwords_sk", tst => tst
.StopWordsPath("stop-words/stop-words-slovak")
.IgnoreCase(true))
)
.Analyzers(aa => aa
.Custom("slovencina_synonym", acs => acs
.Tokenizer("standard")
.Filters("stopwords_SK", "lemmagen_filter_sk", "lowercase", "stopwords_SK", "synonym_filter", "asciifolding")
)
.Custom("slovencina", acs => acs
.Tokenizer("standard")
.Filters("stopwords_SK", "lemmagen_filter_sk", "lowercase", "stopwords_SK", "asciifolding")
)
)
)
)
.Mappings(m => m
.Map<DealItem>(mp => mp.AutoMap()
.Properties(p => p
.Text(t => t
.Name(n => n.title_dealitem)
.Name(n => n.coupon_text1)
.Name(n => n.coupon_text2)
.Analyzer("slovencina_synonym")
)
))));

这就是我现在拥有的,但是在尝试使用一个之后我得到了错误

POST dealitems/_analyze
{
"analyzer": "slovencina",
"text": "Janko kúpil nové topánky"
}

错误:

{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[myNode][127.0.0.1:9300][indices:admin/analyze[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "failed to find analyzer [slovencina]"
},
"status": 400
}

并且 GET _settings 不显示任何分析器

结果:问题在于丢失的文件...错误的路径

最佳答案

事实上,NEST 中没有开箱即用的 lemmagen token 过滤器。希望您可以轻松创建自己的:

public class LemmagenTokenFilter : ITokenFilter
{
public string Version { get; set; }
public string Type => "lemmagen";
[JsonProperty("lexicon")]
public string Lexicon { get; set; }
}


var response = elasticClient.CreateIndex(_defaultIndex,
d => d.Settings(s => s
.Analysis(a => a
.TokenFilters(t => t.UserDefined("lemmagen_filter_sk",
new LemmagenTokenFilter
{
Lexicon = "sk"
}))))
..
);

希望对您有所帮助。

关于C# NEST Elasticsearch自定义过滤器结构(tokenize),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47645195/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com