gpt4 book ai didi

java - 如何在Lucene索引中搜索俄语文本?

转载 作者:行者123 更新时间:2023-12-01 15:58:06 25 4
gpt4 key购买 nike

我不明白我错在哪里。我的代码,其中“/home/test/03m8894---20070213134234.txt” - 包含英文文本的文件,和“/home/test/01---20061121103506.txt” - 包含俄语文本的文件。这两个文件均采用 UTF-8 编码。程序执行结果:10即该程序仅查找英语文本,而忽略俄语文本。尽管如果你这样做

            for (int m = 0; m <totalDocs; m + +) { 

Document thisDoc = reader.document (m);
System.out.print (thisDoc.get ("partnum"));

文本字段partnum正确,屏幕上的输出编码没有错误。

 RAMDirectory directory = new RAMDirectory();

IndexWriter writer =
//new IndexWriter(directory, new SimpleAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);
new IndexWriter(directory, new RussianAnalyzer(Version.LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED);
File f1[] = {new File("/home/test/03m8894---20070213134234.txt"), new File("/home/test/01---20061121103506.txt")};

String strLine1 = "";
for (int x = 0; x < f1.length; x++) {
Document doc = new Document();
int length = (int) f1[x].length();
if (length != 0) {
char[] cbuf = new char[length];
InputStreamReader isr = new InputStreamReader(new FileInputStream(f1[x]));
final int read = isr.read(cbuf);
strLine1 = new String(cbuf, 0, read);
isr.close();
doc.add(new Field("partnum", strLine1, Field.Store.YES, Field.Index.NOT_ANALYZED));
//doc.add(new Field("description", "Illidium Space Modulator", Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(doc);
}

}
writer.close();

IndexSearcher searcher = new IndexSearcher(directory);
IndexReader reader = searcher.getIndexReader();
int totalDocs = reader.numDocs();



for (int m = 0; m < totalDocs; m++) {

Document thisDoc = reader.document(m);

String tmp_str=thisDoc.get("partnum");

Query query = new TermQuery(new Term("partnum", tmp_str));

TopDocs rs = searcher.search(query, null, 10);
System.out.println(rs.totalHits);

最佳答案

您说文件是 UTF-8 编码的,但您使用:


InputStreamReader isr = new InputStreamReader(new FileInputStream(f1[x]));

这依赖于默认编码,该编码可能不是 UTF-8。尝试:


InputStreamReader isr = new InputStreamReader(new FileInputStream(f1[x]), "UTF-8");

关于java - 如何在Lucene索引中搜索俄语文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4648290/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com