gpt4 book ai didi

java - Tika in Action 书中的示例 Lucene StandardAnalyzer 不起作用

转载 作者:行者123 更新时间:2023-12-01 18:34:19 37 4
gpt4 key购买 nike

首先,对于 Tika 和 Lucene,我完全是个菜鸟。我正在阅读《Tika in Action》一书,尝试其中的示例。第 5 章给出了这个例子:

package tikatest01;

import java.io.File;
import org.apache.tika.Tika;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.IndexWriter;

public class LuceneIndexer {

private final Tika tika;
private final IndexWriter writer;

public LuceneIndexer(Tika tika, IndexWriter writer) {
this.tika = tika;
this.writer = writer;
}

public void indexDocument(File file) throws Exception {
Document document = new Document();
document.add(new Field(
"filename", file.getName(),
Store.YES, Index.ANALYZED));
document.add(new Field(
"fulltext", tika.parseToString(file),
Store.NO, Index.ANALYZED));
writer.addDocument(document);
}
}

这个主要方法:

package tikatest01;

import java.io.File;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.tika.Tika;

public class TikaTest01 {

public static void main(String[] args) throws Exception {

String filename = "C:\\testdoc.pdf";
File file = new File(filename);

IndexWriter writer = new IndexWriter(
new SimpleFSDirectory(file),
new StandardAnalyzer(Version.LUCENE_30),
MaxFieldLength.UNLIMITED);
try {
LuceneIndexer indexer = new LuceneIndexer(new Tika(), writer);
indexer.indexDocument(file);
}
finally {
writer.close();
}
}
}

我已将库 tika-app-1.5.jar、lucene-core-4.7.0.jar 和 lucene-analyzers-common-4.7.0.jar 添加到项目中。

问题:

在当前版本的 Lucene 中,Field.Index 已被弃用,我应该使用什么来代替?

未找到 MaxFieldLength。我缺少导入吗?

最佳答案

对于 Lucene 4.7,索引器的代码为:

package tikatest01;

import java.io.File;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.tika.Tika;

public class LuceneIndexer {

private final Tika tika;
private final IndexWriter writer;

public LuceneIndexer(Tika tika, IndexWriter writer) {
this.tika = tika;
this.writer = writer;
}

public void indexDocument(File file) throws Exception {
Document document = new Document();
document.add(new TextField(
"filename", file.getName(), Store.YES));
document.add(new TextField(
"fulltext", tika.parseToString(file), Store.NO));
writer.addDocument(document);
}
}

主类的代码:

package tikatest01;

import java.io.File;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
import org.apache.tika.Tika;

public class TikaTest01 {

public static void main(String[] args) throws Exception {

String dirname = "C:\\MyTestDir\\";
File dir = new File(dirname);


IndexWriter writer = new IndexWriter(
new SimpleFSDirectory(dir),
new IndexWriterConfig(
Version.LUCENE_47,
new StandardAnalyzer(Version.LUCENE_47)));
try {
LuceneIndexer indexer = new LuceneIndexer(new Tika(), writer);
indexer.indexDocument(dir);
}
finally {
writer.close();
}
}
}

关于java - Tika in Action 书中的示例 Lucene StandardAnalyzer 不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22744277/

37 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com