gpt4 book ai didi

c# - 创建 Lucene.net 自定义分析器

转载 作者:行者123 更新时间:2023-11-30 12:21:06 29 4
gpt4 key购买 nike

我正在尝试在 Lucene.net 4.8 中创建一个自定义分析器 - 但是我遇到了一个我无法理解的错误。

我的分析器代码:

public class SynonymAnalyzer : Analyzer  
{

protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
{
String base1 = "lawnmower";
String syn1 = "lawn mower";
String base2 = "spanner";
String syn2 = "wrench";

SynonymMap.Builder sb = new SynonymMap.Builder(true);
sb.Add(new CharsRef(base1), new CharsRef(syn1), true);
sb.Add(new CharsRef(base2), new CharsRef(syn2), true);
SynonymMap smap = sb.Build();

Tokenizer tokenizer = new StandardTokenizer(Version.LUCENE_48, reader);

TokenStream result = new StandardTokenizer(Version.LUCENE_48, reader);
result = new SynonymFilter(result, smap, true);
return new TokenStreamComponents(tokenizer, result);
}
}

我建立索引的代码是:

var fordFiesta = new Document();
fordFiesta.Add(new StringField("Id", "1", Field.Store.YES));
fordFiesta.Add(new TextField("Make", "Ford", Field.Store.YES));
fordFiesta.Add(new TextField("Model", "Fiesta 1.0 Developing", Field.Store.YES));
fordFiesta.Add(new TextField("FullText", "lawnmower Ford 1.0 Fiesta Developing spanner", Field.Store.YES));

Lucene.Net.Store.Directory directory = FSDirectory.Open(new DirectoryInfo(Environment.CurrentDirectory + "\\LuceneIndex"));

SynonymAnalyzer analyzer = new SynonymAnalyzer();

var config = new IndexWriterConfig(Version.LUCENE_48, analyzer);
var writer = new IndexWriter(directory, config);

writer.UpdateDocument(new Term("Id", "1"), fordFiesta);

writer.Flush(true, true);
writer.Commit();
writer.Dispose();

但是,当我运行我的代码时,它在 writer.UpdateDocument 行失败并出现以下错误:

TokenStream contract violation: Reset()/Dispose() call missing, Reset() called multiple times, or subclass does not call base.Reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

我不知道哪里出错了?!

最佳答案

问题是您的 TokenStreamComponents 是使用与结果 TokenStream 中使用的不同的 Tokenizer 构建的。将其更改为此应该可以解决问题:

Tokenizer tokenizer = new StandardTokenizer(Version.LUCENE_48, reader);
TokenStream result = new SynonymFilter(tokenizer, smap, true);
return new TokenStreamComponents(tokenizer, result);

关于c# - 创建 Lucene.net 自定义分析器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47227786/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com