gpt4 book ai didi

java - Lucene Phrase 查询不完整单词

转载 作者:行者123 更新时间:2023-12-03 03:22:36 30 4
gpt4 key购买 nike

我已经使用 StandandAnalyzer 实现了 RamDirectory,并将位置数据存储在 Lucene 缓存中,我已在 Lucene 中添加了数据,如下所示:

final Document document = new Document();

final IndexableField id = new StringField("placeId", place.getPlaceId(), Field.Store.YES);
final IndexableField name = new TextField("name", place.getName().toLowerCase(), Field.Store.YES);
final IndexableField location = new LatLonPoint("location", place.getLatitude(), place.getLongitude());
final IndexableField city = new StringField("city", place.getCity(), Field.Store.YES);

document.add(id);
document.add(name);
document.add(location);
document.add(city);

我实现了两种方法来搜索数据,一种是在定义的半径内的附近位置,效果很好,另一种是按名称搜索位置。我们还必须在按名称搜索时实现自动完成功能。

我已经实现了按名称搜索,如下所示:

QueryParser parser = new QueryParser("name", analyzer);
return parser.createPhraseQuery("name", searchStr, 2);

现在我有一个地方,名字叫“汤姆诊所和药房”。

如果我使用以下短语搜索,我会得到结果:

  1. 汤姆
  2. 汤姆诊所
  3. 汤姆药房

这很棒,但如果用户输入“Tom clini”或“Tom pharma”,Lucene 不会返回任何结果。

我尝试在 searchStr 的末尾添加一个“*”,尝试将短语传递给通配符查询(它在单个单词上工作正常,但在多个单词上失败)。

另外,我想添加一点模糊性,以便可以处理拼写错误,我是 Lucene 的新手,不知道从这里开始做什么,所以尽你所能帮助我!

附:Lucene 7.3

最佳答案

在这些情况下,最好的办法始终是寻找好的资源。我可以推荐以下书籍

。特别是,您可能对以下一项或两项感兴趣:

Fuzzy query

Lucene's fuzzy search implementation is based on Levenshtein distance. It compares two strings and finds out the number of single character changes that are needed to transform one string to another. The resulting number indicates the closeness of the two strings. In a fuzzy search, a threshold number of edits is used to determine if the two strings are matched. To trigger a fuzzy match in QueryParser, you can use the tilde ~ character. There are a couple configurations in QueryParser to tune this type of query. Here is a code

queryParser.setFuzzyMinSim(2f);
queryParser.setFuzzyPrefixLength(3);
Query query = queryParser.parse("hump~");

This example will return first, second, and fourth sentences as the fuzzy match matches hump to humpty because these two words are missed by two characters. We tuned the fuzzy query to a minimum similarity to two in this example.

PhraseQuery and MultiPhraseQuery

A PhraseQuery matches a particular sequence of terms, while a MultiPhraseQuery gives you an option to match multiple terms in the same position. For example, MultiPhrasQuery supports a phrase such as humpty (dumpty OR together) in which it matches humpty in position 0 and dumpty or together in position 1.

How to do it...

Here is a code snippet to demonstrate both Query types:

PhraseQuery query = new PhraseQuery();
query.add(new Term("content", "humpty"));
query.add(new Term("content", "together"));
MultiPhraseQuery query2 = new MultiPhraseQuery();
Term[] terms1 = new Term[1];
terms1[0] = new Term("content", "humpty");
Term[] terms2 = new Term[2];
terms2[0] = new Term("content", "dumpty");
terms2[1] = new Term("content", "together");
query2.add(terms1);
query2.add(terms2);

How it works…

The first Query, PhraseQuery, searches for the phrase humpty together. The second Query, MultiPhraseQuery, searches for the phrase humpty (dumpty OR together). The first Query would return sentence four from our setup, while the second Query would return sentence one, two, and four. Note that in MultiPhraseQuery, multiple terms in the same position are added as an array.

但是,直接处理 Lucene 的应用程序并不多,更常见的是使用 SolrElastic Search 。两者都在底层使用 Lucene,但它的包装很漂亮。也许值得一看。

关于java - Lucene Phrase 查询不完整单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49842537/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com