gpt4 book ai didi

java - Lucene Apache 不保留我的旧索引

转载 作者:行者123 更新时间:2023-11-30 07:24:23 27 4
gpt4 key购买 nike

我在互联网上找到了这个例子:

索引器.java

public class Indexer {

private IndexWriter writer;

@SuppressWarnings("deprecation")
public Indexer(String indexDirectoryPath) throws IOException {
Directory indexDirectory = FSDirectory.open(new File(indexDirectoryPath));
writer = new IndexWriter(indexDirectory, new StandardAnalyzer(Version.LUCENE_36), true,
IndexWriter.MaxFieldLength.UNLIMITED);
}

public void close() throws CorruptIndexException, IOException {
writer.close();
}

private Document getDocument(File file) throws IOException {
Document document = new Document();
Field contentField = new Field(LuceneConstants.CONTENTS, new FileReader(file));
Field fileNameField = new Field(LuceneConstants.FILE_NAME, file.getName(), Field.Store.YES,
Field.Index.NOT_ANALYZED);
Field filePathField = new Field(LuceneConstants.FILE_PATH, file.getCanonicalPath(), Field.Store.YES,
Field.Index.NOT_ANALYZED);
document.add(contentField);
document.add(fileNameField);
document.add(filePathField);
return document;
}

public void indexFile(File file) throws IOException {
Document document = getDocument(file);
writer.addDocument(document);
}

public int createIndex(String file) throws IOException {
indexFile(new File(file));
return writer.numDocs();
}

}

搜索器.java

public class Searcher {
IndexSearcher indexSearcher;
QueryParser queryParser;
Query query;

@SuppressWarnings("deprecation")
public Searcher(String indexDirectoryPath) throws IOException {
Directory indexDirectory = FSDirectory
.open(new File(indexDirectoryPath));
indexSearcher = new IndexSearcher(indexDirectory);
queryParser = new QueryParser(Version.LUCENE_36,
LuceneConstants.CONTENTS, new StandardAnalyzer(
Version.LUCENE_36));
}

public TopDocs search(String searchQuery) throws IOException,
ParseException {
query = queryParser.parse(QueryParser.escape(searchQuery));
return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
}

public Document getDocument(ScoreDoc scoreDoc)
throws CorruptIndexException, IOException {
return indexSearcher.doc(scoreDoc.doc);
}

public void close() throws IOException {
indexSearcher.close();
}

}

LuceneConstants.java

public class LuceneConstants {
public static final String CONTENTS = "contents";
public static final String FILE_NAME = "filename";
public static final String FILE_PATH = "filepath";
public static final int MAX_SEARCH = 10;

}

这就是我使用它们的方式:

public static void main(String[] args) throws IOException, ParseException {
{
// First file
Indexer indexer = new Indexer("index");
indexer.createIndex("f1.txt");
indexer.close();
Searcher searcher = new Searcher(Constante.DIR_INDEX.getValor());
TopDocs hits = searcher.search("Art. 1°");
for (ScoreDoc scoreDoc : hits.scoreDocs) {
org.apache.lucene.document.Document doc = searcher.getDocument(scoreDoc);
String nomeArquivo = doc.get(LuceneConstants.FILE_PATH);
System.out.println(nomeArquivo);
}
}
System.out.println("-----");
{
// Second file
Indexer indexer = new Indexer("index");
indexer.createIndex("f2.txt");
indexer.close();
Searcher searcher = new Searcher(Constante.DIR_INDEX.getValor());
TopDocs hits = searcher.search("Art. 1°");
for (ScoreDoc scoreDoc : hits.scoreDocs) {
org.apache.lucene.document.Document doc = searcher.getDocument(scoreDoc);
String nomeArquivo = doc.get(LuceneConstants.FILE_PATH);
System.out.println(nomeArquivo);
}
}
}

在“//第二个文件”行之前它工作得很好。

索引第二个文件后,我无法在第一个文件中找到任何内容。

如果我创建一个 Indexer 实例并使用它来索引 f1.txt 和 f2.txt 并关闭它,那么它就会像我希望的那样工作。问题是,如果我关闭应用程序并打开它并决定索引另一个文件,我将丢失 f1.txt 和 f2.txt。

有没有办法让Lucene在索引新文件时始终保留以前的索引?

最佳答案

看起来您使用的是旧版本的 Lucene(3.6 或更低版本),对吗?

IndexWriter constructor 的第三个参数指定是否应创建新索引或打开现有索引。如果设置为 true,它将覆盖现有索引(如果给定目录中存在索引)。如果要打开现有索引而不覆盖它,则应为 false:

writer = new IndexWriter(indexDirectory, new StandardAnalyzer(Version.LUCENE_36), false, IndexWriter.MaxFieldLength.UNLIMITED);

关于java - Lucene Apache 不保留我的旧索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37006733/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com