gpt4 book ai didi

java - 从 apache lucene 索引中搜索并按组计算结果

转载 作者:行者123 更新时间:2023-12-01 14:38:46 25 4
gpt4 key购买 nike

我正在尝试从 lucene 索引中搜索,但我想过滤此搜索。有两个字段内容和类别。假设我想搜索具有“sports”的文件,并且我还想计算有多少文件属于 a 和 b 类别。我正在尝试使用以下代码来实现此目的。但问题是,如果有数百万条记录,那么由于循环执行,速度会变慢,请建议我另一种方法来完成任务。

try { File indexDir= new File("文件路径")

           Directory directory = FSDirectory.open(indexDir);

IndexSearcher searcher = new IndexSearcher(directory, true);
int maxhits=1000000;
QueryParser parser1 = new QueryParser(Version.LUCENE_36, "contents",

new StandardAnalyzer(Version.LUCENE_36));

Query qu=parser1.parse("sport");

TopDocs topDocs = searcher.search(, maxhits);
ScoreDoc[] hits = topDocs.scoreDocs;


len = hits.length;

JOptionPane.showMessageDialog(null,"found times"+len);

int docId = 0;
Document d;





String category="";

int ctr=0,ctr1=0;

for ( i = 0; i<len; i++) {
docId = hits[i].doc;
d = searcher.doc(docId);
category= d.get(("category"));
if(category.equals("a"))
ctr++;
if(category.equals("b"))
ctr1++;


}

JOptionPane.showMessageDialog("wprd found in category a times"+ctr);
JOptionPane.showMessageDialog("wprd found in category b times"+ctr1);
}

catch(Exception ex)

{

ex.printStackTrace();
}

最佳答案

您可以只查询您要查找的每个类别并获取totalHits。更好的是使用 TotalHitCountCollector ,而不是获取 TopDocs 实例:

Query query = parser1.parser("+sport +category:a")
TotalHitCountCollector collector = new TotalHitCountCollector();
search.search(query, collector);
ctr = collector.getTotalHits();
query = parser1.parser("+sport +category:b")
collector = new TotalHitCountCollector();
search.search(query, collector);
ctr1 = collector.getTotalHits();

关于java - 从 apache lucene 索引中搜索并按组计算结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16206110/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com