gpt4 book ai didi

java - Lucene可以返回带有行号的搜索结果吗?

转载 作者:太空宇宙 更新时间:2023-11-04 07:34:09 24 4
gpt4 key购买 nike

我想使用 lucene 实现类似于 IDE 中的“在文件中查找”功能。基本上想要搜索源代码文件,如 .c、.cpp、.h、.cs 和 .xml。我尝试了 apache 网站上显示的演示。它返回没有行号和该文件中出现次数的文件列表。我确信应该有一些方法可以得到它。

有办法获取这些详细信息吗?

最佳答案

您能分享一下apache网站上的演示链接吗?

在这里,我向您展示如何获取给定文档集的术语的术语频率:

public static void main(final String[] args) throws CorruptIndexException,
LockObtainFailedException, IOException {

// Create the index
final Directory directory = new RAMDirectory();
final Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
final IndexWriterConfig config = new IndexWriterConfig(
Version.LUCENE_36, analyzer);
final IndexWriter writer = new IndexWriter(directory, config);

// addDoc(writer, field, text);
addDoc(writer, "title", "foo");
addDoc(writer, "title", "buz qux");
addDoc(writer, "title", "foo foo bar");

// Search
final IndexReader reader = IndexReader.open(writer, false);
final IndexSearcher searcher = new IndexSearcher(reader);

final Term term = new Term("title", "foo");
final Query query = new TermQuery(term);
System.out.println("Query: " + query.toString() + "\n");

final int limitShow = 3;
final TopDocs td = searcher.search(query, limitShow);
final ScoreDoc[] hits = td.scoreDocs;

// Take IDs and frequencies
final int[] docIDs = new int[td.totalHits];
for (int i = 0; i < td.totalHits; i++) {
docIDs[i] = hits[i].doc;
}
final Map<Integer, Integer> id2freq = getFrequencies(reader, term,
docIDs);

// Show results
for (int i = 0; i < td.totalHits; i++) {
final int docNum = hits[i].doc;
final Document doc = searcher.doc(docNum);
System.out.println("\tposition " + i);
System.out.println("Title: " + doc.get("title"));
final int freq = id2freq.get(docNum);
System.out.println("Occurrences of \"" + term.text() + "\" in \""
+ term.field() + "\" = " + freq);
System.out.println("--------------------------------\n");
}
searcher.close();
reader.close();
writer.close();
}

这里我们将文档添加到索引中:

private static void addDoc(final IndexWriter w, final String field,
final String text) throws CorruptIndexException, IOException {
final Document doc = new Document();
doc.add(new Field(field, text, Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field(field, text, Field.Store.YES, Field.Index.ANALYZED));
w.addDocument(doc);
}

这是如何获取文档中术语出现次数的示例:

public static Map<Integer, Integer> getFrequencies(
final IndexReader reader, final Term term, final int[] docIDs)
throws CorruptIndexException, IOException {
final Map<Integer, Integer> id2freq = new HashMap<Integer, Integer>();
final TermDocs tds = reader.termDocs(term);
if (tds != null) {
for (final int docID : docIDs) {
// Skip to the next docID
tds.skipTo(docID);
// Get its term frequency
id2freq.put(docID, tds.freq());
}
}
return id2freq;
}

如果将所有内容放在一起并运行它,您将获得以下输出:

Query: title:foo

position 0
Title: foo
Occurrences of "foo" in "title" = 2
--------------------------------

position 1
Title: foo foo bar
Occurrences of "foo" in "title" = 4
--------------------------------

关于java - Lucene可以返回带有行号的搜索结果吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17214058/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com