gpt4 book ai didi

c# - Lucene.net - 无法进行多词搜索

转载 作者:行者123 更新时间:2023-11-30 21:48:15 26 4
gpt4 key购买 nike

我在我的 lucene 索引中存储了以下文档:

{
"id" : 1,
"name": "John Smith"
"description": "worker"
"additionalData": "faster data"
"attributes": "is_hired=not"
},
{
"id" : 2,
"name": "Alan Smith"
"description": "hired"
"additionalData": "faster drive"
"attributes": "is_hired=not"
},
{
"id" : 3,
"name": "Mike Std"
"description": "hired"
"additionalData": "faster check"
"attributes": "is_hired=not"
}

现在我想搜索所有字段以检查给定值是否存在:

search term: "John data check"

这会让我返回带有 ID 1 和 3 的文档。但它没有,为什么?

var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);

BooleanQuery mainQuery = new BooleanQuery();
mainQuery.MinimumNumberShouldMatch = 1;

var cols = new string[] {
"name",
"additionalData"
};

string[] words = searchData.text.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries);

var queryParser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, cols, analyzer);

foreach (var word in words)
{
BooleanQuery innerQuery = new BooleanQuery();
innerQuery.MinimumNumberShouldMatch = 1;

innerQuery.Add(queryParser.Parse(word), Occur.SHOULD);

mainQuery.Add(innerQuery, Occur.MUST);
}

TopDocs hits = searcher.Search(mainQuery, null, int.MaxValue, Sort.RELEVANCE);

//hits.TotalHits is 0 !!

最佳答案

您构造的查询基本上要求所有三个词都匹配。

您将每个单词包装在 BooleanQuery 中用SHOULD条款。这相当于直接使用内部查询(您只是添加了一个不改变查询行为的间接查询)。 bool 查询只有一个子句,它应该与 bool 查询匹配。

然后,您将这些中的每一个包装在另一个 bool 查询中,这次使用 MUST每个条款。这意味着每个子句都必须匹配才能匹配查询。

对于 BooleanQuery匹配,所有 MUST必须满足子句,如果没有,则至少为 MinimumNumberShouldMatch SHOULD条款必须得到满足。将该属性保留为默认值,因为记录的行为是:

By default no optional clauses are necessary for a match (unless there are no required clauses).

实际上,您的查询是(为简单起见,假设没有 MultiFieldQueryParser):

+(john) +(data) +(check)

或者,以树的形式:

BooleanQuery
MUST: BooleanQuery
SHOULD: TermQuery: john
MUST: BooleanQuery
SHOULD: TermQuery: data
MUST: BooleanQuery
SHOULD: TermQuery: check

可以简化为:

BooleanQuery
MUST: TermQuery: john
MUST: TermQuery: data
MUST: TermQuery: check

但是你想要的查询是:

BooleanQuery
SHOULD: TermQuery: john
SHOULD: TermQuery: data
SHOULD: TermQuery: check

因此,删除 mainQuery.MinimumNumberShouldMatch = 1;行,然后替换你的 foreach正文包含以下内容,它应该可以完成工作:

mainQuery.Add(queryParser.Parse(word), Occur.SHOULD);

好的,这是一个完整的例子,对我有用:

var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);

var directory = new RAMDirectory();

using (var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED))
{
var doc = new Document();
doc.Add(new Field("id", "1", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("name", "John Smith", Field.Store.NO, Field.Index.ANALYZED));
doc.Add(new Field("additionalData", "faster data", Field.Store.NO, Field.Index.ANALYZED));
writer.AddDocument(doc);

doc = new Document();
doc.Add(new Field("id", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("name", "Alan Smith", Field.Store.NO, Field.Index.ANALYZED));
doc.Add(new Field("additionalData", "faster drive", Field.Store.NO, Field.Index.ANALYZED));
writer.AddDocument(doc);

doc = new Document();
doc.Add(new Field("id", "3", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("name", "Mike Std", Field.Store.NO, Field.Index.ANALYZED));
doc.Add(new Field("additionalData", "faster check", Field.Store.NO, Field.Index.ANALYZED));
writer.AddDocument(doc);
}

var words = new[] {"John", "data", "check"};
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, new[] {"name", "additionalData"}, analyzer);


var mainQuery = new BooleanQuery();
foreach (var word in words)
mainQuery.Add(parser.Parse(word), Occur.SHOULD); // Should probably use parser.Parse(QueryParser.Escape(word)) instead

using (var searcher = new IndexSearcher(directory))
{
var results = searcher.Search(mainQuery, null, int.MaxValue, Sort.RELEVANCE);
var idFieldSelector = new MapFieldSelector("id");

foreach (var scoreDoc in results.ScoreDocs)
{
var doc = searcher.Doc(scoreDoc.Doc, idFieldSelector);
Console.WriteLine("Found: {0}", doc.Get("id"));
}
}

关于c# - Lucene.net - 无法进行多词搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37793813/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com