gpt4 book ai didi

java - 卢塞恩 6.6 : What is the current way to create a custom query using analyzers and filters?

转载 作者:行者123 更新时间:2023-12-02 12:39:04 24 4
gpt4 key购买 nike

我正在使用 lucene 6.6.0 开发搜索服务,但我对如何创建自定义分析器和查询感到非常困惑。

我已经根据来自 RDBMS 的数据编写了索引,起初我只是使用标准分析器。不幸的是,它似乎没有用“_”、“-”或数字等特殊字符来分割文本,它只用空格来标记。我找到了 WordDelimiterGraphFilter,它似乎可以实现我想要的功能,但我不明白如何使其工作。现在我尝试像这样使用它:

mCustomAnalyzer = new Analyzer()
{
@Override
protected TokenStreamComponents createComponents(String fieldName) {
Tokenizer source = new StandardTokenizer();

TokenStream filter = new WordDelimiterGraphFilter(source, 8, null);
return new TokenStreamComponents(source, filter);
}
};

QueryBuilder queryBuilder = new QueryBuilder(mCustomAnalyzer);
Query query = queryBuilder.createPhraseQuery(aField, aText, 15);

对于索引,我使用相同的分析器。但它不起作用:如果我搜索“term1 term2”,我希望找到诸如“term1_term2”以及“term32423”或“term_232”之类的内容。

我在这里缺少什么?我尝试了不同的整数作为过滤器的“configurationFlag”参数[1],但它似乎不起作用......

[1] http://lucene.apache.org/core/6_6_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/WordDelimiterGraphFilter.html

最佳答案

尚不清楚您正在索引什么以及正在搜索什么。在示例代码中,您将标志传递为 CATENATE_NUMBERS(8),这对文本没有真正的帮助,它只会连接数字,例如:500-42 -> 50042。要将 term1_term2 分解为 term1, term2, term1term2, term1_term2,您需要使用 GENERATE_WORD_PARTSCATENATE_WORDS CATENATE_NUMBERSCATENATE_ALLPRESERVE_ORIGIN 标志。

    private static class CustomAnalyzer extends Analyzer{
@Override
protected TokenStreamComponents createComponents(String fieldName) {
final int flags = GENERATE_WORD_PARTS|CATENATE_WORDS|CATENATE_NUMBERS|CATENATE_ALL|PRESERVE_ORIGINAL;
Tokenizer tokenizer = new StandardTokenizer();
return new TokenStreamComponents(tokenizer,new WordDelimiterGraphFilter(tokenizer, flags, null ));
}
}

用于测试示例的示例代码 -

CustomAnalyzer customAnalyzer = new CustomAnalyzer();    
Directory directory = FSDirectory.open(Paths.get("directoryPath"));
IndexWriter writer = new IndexWriter(directory, new IndexWriterConfig(customAnalyzer));
Document doc1 = new Document();
doc1.add(new TextField("text", "WAS_sample_tc", Field.Store.YES));
writer.addDocument(doc1);
writer.close();

QueryBuilder queryBuilder = new QueryBuilder(customAnalyzer);
Query query = queryBuilder.createPhraseQuery("text", "sample", 15);

IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(directory));

TopDocs topDocs = searcher.search(query, 10);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
Document doc = searcher.doc(scoreDoc.doc);
System.out.println(doc.toString());
}

关于java - 卢塞恩 6.6 : What is the current way to create a custom query using analyzers and filters?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45012195/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com