gpt4 book ai didi

java - 无法使用 lucene IndexWriter.deleteDocuments(term) 删除文档

转载 作者:行者123 更新时间:2023-11-30 06:53:22 25 4
gpt4 key购买 nike

这两天苦恼了,就是无法用indexWriter.deleteDocuments(term)删除文档

这里我将放置进行测试的代码,希望有人能指出我做错了什么,已经尝试过的事情:

  1. 将 lucene 版本从 2.x 更新到 5.x
  2. 使用indexWriter.deleteDocuments()而不是indexReader.deleteDocuments()
  3. indexOption 配置为 NONEDOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

这里是代码:

import org.apache.lucene.analysis.core.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.*;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

import java.io.IOException;
import java.nio.file.Paths;

public class TestSearch {
static SimpleAnalyzer analyzer = new SimpleAnalyzer();

public static void main(String[] argvs) throws IOException, ParseException {
generateIndex("5836962b0293a47b09d345f1");
query("5836962b0293a47b09d345f1");
delete("5836962b0293a47b09d345f1");
query("5836962b0293a47b09d345f1");

}

public static void generateIndex(String id) throws IOException {
Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
FieldType fieldType = new FieldType();
fieldType.setStored(true);
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
Field idField = new Field("_id", id, fieldType);
Document doc = new Document();
doc.add(idField);
iwriter.addDocument(doc);
iwriter.close();

}

public static void query(String id) throws ParseException, IOException {
Query query = new QueryParser("_id", analyzer).parse(id);
Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
IndexReader ireader = DirectoryReader.open(directory);
IndexSearcher isearcher = new IndexSearcher(ireader);
ScoreDoc[] scoreDoc = isearcher.search(query, 100).scoreDocs;
for(ScoreDoc scdoc: scoreDoc){
Document doc = isearcher.doc(scdoc.doc);
System.out.println(doc.get("_id"));
}
}

public static void delete(String id){
try {
Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Term term = new Term("_id", id);
iwriter.deleteDocuments(term);
iwriter.commit();
iwriter.close();
}catch (IOException e){
e.printStackTrace();
}
}
}

首先generateIndex()会在/tmp/test/lucene中生成一个索引,query()会显示 id将被成功查询,然后delete()有望删除该文档,但再次query()将证明删除操作失败。

这是 pom 依赖项,以防有人需要测试

    <dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>5.5.4</version>
<type>jar</type>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-common</artifactId>
<version>5.5.4</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queryparser</artifactId>
<version>5.5.4</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-smartcn</artifactId>
<version>5.5.4</version>
</dependency>

迫切需要答案。

最佳答案

您的问题出在分析器上。 SimpleAnalyzer 将标记定义为最大字母字符串(StandardAnalyzer,甚至 WhitespaceAnalyzer,是更典型的选择),因此您正在索引的值被分成标记:“b”,“a”,“b”,“d”,“f”。您定义的删除方法不会通过分析器,而只是创建一个原始术语。如果您尝试将 main 替换为以下内容,您可以看到它的实际效果:

generateIndex("5836962b0293a47b09d345f1");
query("5836962b0293a47b09d345f1");
delete("b");
query("5836962b0293a47b09d345f1");

作为一般规则,查询和术语等不会进行分析,而 QueryParser 会进行分析。

对于(看起来像)标识符字段,您可能根本不想分析该字段。在这种情况下,请将其添加到 FieldType:

fieldType.setTokenized(false);

然后,您必须更改查询(同样是 QueryParser 分析),并使用 TermQuery 代替。

Query query = new TermQuery(new Term("_id", id));

关于java - 无法使用 lucene IndexWriter.deleteDocuments(term) 删除文档,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42293998/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com