gpt4 book ai didi

java - Lucene 奇怪的行为

转载 作者:行者123 更新时间:2023-12-02 08:24:35 25 4
gpt4 key购买 nike

我正在尝试开始使用 lucene。我用来索引文档的代码是:

public void index(String type, String words) {
IndexWriter indexWriter = null;
try {
if (dir == null)
dir = createAndPropagate();
indexWriter = new IndexWriter(dir, new StandardAnalyzer(), true,
new KeepOnlyLastCommitDeletionPolicy(),
IndexWriter.MaxFieldLength.UNLIMITED);

Field wordsField = new Field(FIELD_WORDS, words, Field.Store.YES,
Field.Index.ANALYZED);
Field typeField = new Field(FIELD_TYPE, type, Field.Store.YES,
Field.Index.ANALYZED);

Document doc = new Document();
doc.add(wordsField);
doc.add(typeField);

indexWriter.addDocument(doc);
indexWriter.commit();
} catch (IOException e) {
logger.error("Problems while adding entry to index.", e);
} finally {
try {
if (indexWriter != null)
indexWriter.close();
} catch (IOException e) {
logger.error("Unable to close index writer.", e);
}
}

}

搜索看起来像这样:

public List<TagSearchEntity> searchFor(final String type, String words,
int amount) {
List<TagSearchEntity> result = new ArrayList<TagSearchEntity>();

try {
if (dir == null)
dir = createAndPropagate();

for (final Document doc : searchFor(dir, type, words, amount)) {
@SuppressWarnings("serial")
TagSearchEntity searchResult = new TagSearchEntity() {{
setType(type);
setWords(doc.getField(FIELD_WORDS).stringValue());
}};
result.add(searchResult);
}
} catch (IOException e) {
logger.error("Problems while searching", e);
}

return result;
}

private List<Document> searchFor(Directory indexDirectory, String type,
String words, int amount) throws IOException {
Searcher indexSearcher = new IndexSearcher(indexDirectory);

final Query tagQuery = new TermQuery(new Term(FIELD_WORDS, words));
final Query typeQuery = new TermQuery(new Term(FIELD_TYPE, type));

@SuppressWarnings("serial")
BooleanQuery query = new BooleanQuery() {{
add(tagQuery, BooleanClause.Occur.SHOULD);
add(typeQuery, BooleanClause.Occur.MUST);
}};

List<Document> result = new ArrayList<Document>();

for (ScoreDoc scoreDoc : indexSearcher.search(query, amount).scoreDocs) {
result.add(indexSearcher.doc(scoreDoc.doc));
}

indexSearcher.close();

return result;
}

我有两个用例。第一个添加某种类型的文档,然后搜索它,然后添加另一种类型的文档,然后搜索它,依此类推。另一个添加所有文档,然后搜索它们。第一个工作正常:

@Test
public void testSearch() {
search.index("type1", "test type1 for test purposes test test");
List<TagSearchEntity> result = search.searchFor("type1", "test", 10);
assertNotNull("Retrieved list should not be null.", result);
assertTrue("Retrieved list should not be empty.", !result.isEmpty());

search.index("type2", "test type2 for test purposes test test");
result.clear();
result = search.searchFor("type2", "test", 10);
assertTrue("Retrieved list should not be empty.", !result.isEmpty());

search.index("type3", "test type3 for test purposes test test");
result.clear();
result = search.searchFor("type3", "test", 10);
assertTrue("Retrieved list should not be empty.", !result.isEmpty());
}

但另一个似乎只索引最后一个文档:

@Test
public void testBuggy() {
search.index("type1", "test type1 for test purposes test test");
search.index("type2", "test type2 for test purposes test test");
search.index("type3", "test type3 for test purposes test test");

List<TagSearchEntity> result = search.searchFor("type3", "test", 10);
assertNotNull("Retrieved list should not be null.", result);
assertTrue("Retrieved list should not be empty.", !result.isEmpty());

result.clear();
result = search.searchFor("type2", "test", 10);
assertTrue("Retrieved list should not be empty.", !result.isEmpty());

result.clear();
result = search.searchFor("type1", "test", 10);
assertTrue("Retrieved list should not be empty.", !result.isEmpty());
}

它成功找到了type3,但未能找到所有其他类型。如果我转移这些调用,它仍然会成功地仅找到最后一个索引文档。Lucene版本,我使用的是:

    <dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>2.4.1</version>
</dependency>

<dependency>
<groupId>lucene</groupId>
<artifactId>lucene</artifactId>
<version>1.4.3</version>
</dependency>

我做错了什么?如何让它索引所有文档?

最佳答案

每次索引操作后都会创建一个新索引。第三个参数是 create 标志,它被设置为 true。根据documentation of IndexWriter ,如果设置了此标志,它将创建一个新索引或覆盖现有索引。将其设置为 false 以附加到现有索引。

关于java - Lucene 奇怪的行为,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4803830/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com