gpt4 book ai didi

Lucene.net 返回正确的查询命中数,但不返回正确的文档

转载 作者:行者123 更新时间:2023-12-02 08:25:55 25 4
gpt4 key购买 nike

我是 Lucene 的新手,正在尝试解决这个问题。我的索引是这样的:

        Directory dir = FSDirectory.Open(new System.IO.DirectoryInfo(dirIndexDir));

//Create the indexWriter
IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29), true,
IndexWriter.MaxFieldLength.UNLIMITED);


Document doc = new Document();

doc.Add(new Field("keyform_type", entry.keyForm.type, Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("keyform_lang", entry.keyForm.lang, Field.Store.YES, Field.Index.NOT_ANALYZED));

doc.Add(new Field("keyform_dial", entry.keyForm.dial, Field.Store.YES, Field.Index.NOT_ANALYZED));

doc.Add(new Field("keyform_reg", entry.keyForm.reg, Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("keyform_term", entry.keyForm.term.Value, Field.Store.YES, Field.Index.ANALYZED));

if(entry.refForm.type!=null)
doc.Add(new Field("refform_type", entry.refForm.type, Field.Store.YES, Field.Index.NOT_ANALYZED));
if(entry.refForm.lang!=null)
doc.Add(new Field("refform_lang", entry.refForm.lang, Field.Store.YES, Field.Index.NOT_ANALYZED));
if (entry.refForm.dial != null)
doc.Add(new Field("refform_dial", entry.refForm.dial, Field.Store.YES, Field.Index.NOT_ANALYZED));

if(entry.refForm.reg!=null)
doc.Add(new Field("refform_reg", entry.refForm.reg, Field.Store.YES, Field.Index.NOT_ANALYZED));
if(entry.refForm.term.Value!=null)
doc.Add(new Field("refform_term", entry.refForm.term.Value, Field.Store.YES, Field.Index.ANALYZED));

doc.Add(new Field("pos", entry.pos, Field.Store.YES, Field.Index.NOT_ANALYZED));

for (int s = 0; s < entry.subject.Count; s++)
{
doc.Add(new Field("subject_"+s, entry.subject[s], Field.Store.YES, Field.Index.NOT_ANALYZED));
}
for (int g = 0; g < entry.sense.gloss.Count; g++)
{
doc.Add(new Field("gloss_"+g, entry.sense.gloss[g], Field.Store.YES, Field.Index.ANALYZED));

}
if (entry.signature.action != null)
doc.Add(new Field("action", entry.signature.action, Field.Store.YES, Field.Index.NOT_ANALYZED));
if (entry.signature.source != null)
doc.Add(new Field("source", entry.signature.source, Field.Store.YES, Field.Index.NOT_ANALYZED));
if(entry.signature.date==0)
doc.Add(new Field("date", entry.signature.date.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
//Add the doc
writer.AddDocument(doc);

writer.Close();

然后我使用此代码进行查询:

        //Doesn't matter what term is, same result
string term="workers";

Directory dir = FSDirectory.Open(new System.IO.DirectoryInfo(luceneDir));

IndexSearcher searcher = new IndexSearcher(dir, true);
List<string> b=new List<string>();
b.Add("keyform_gloss");
b.Add("keyform_term");
b.Add("refform_term");
b.Add("refform_gloss");
for (int i = 0; i < nMaxDupes; i++)
b.Add("gloss_" + i.ToString());
MultiFieldQueryParser mfqp = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29,
b.ToArray(), new StandardAnalyzer());
Query q = mfqp.Parse(term);
TopDocs td = searcher.Search(q, 300);

for (int i = 0; i < td.totalHits; i++)
{
//Generate a dictionaryEntry for each hit
Document doc = searcher.Doc(i);

//Access the document fields, blah
}

无论term的值是多少,Lucene都会返回索引中的前X个文档,其中X=实际匹配term的文档数。当我使用 LUKE 浏览索引时,相同的手写查询(keyform_term:termloss_0:term 等)会返回正确的结果数量以及与这些结果匹配的正确文档。

但是,上面的 C# 代码始终返回前 X 个文档,这些文档不一定在任何搜索字段中包含搜索词。他们甚至还没有接近。

我做错了什么?我知道索引很好,因为我可以在 LUKE 中搜索它,所以它必须是查询中的内容...

谢谢!

最佳答案

行:

Document doc = searcher.Doc(i);

应该是

Document doc = searcher.Doc(td.scoreDocs[i].doc);

或正确的 C# 语法等效项(抱歉,我是 Java 人员)

关于Lucene.net 返回正确的查询命中数,但不返回正确的文档,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3505875/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com