java - Lucene Phrase 查询不完整单词-6ren

java - Lucene Phrase 查询不完整单词

转载作者：行者123 更新时间：2023-12-03 03:22:36

30

4

我已经使用 StandandAnalyzer 实现了 RamDirectory，并将位置数据存储在 Lucene 缓存中，我已在 Lucene 中添加了数据，如下所示:

final Document document = new Document();

final IndexableField id = new StringField("placeId", place.getPlaceId(), Field.Store.YES);
final IndexableField name = new TextField("name", place.getName().toLowerCase(), Field.Store.YES);
final IndexableField location = new LatLonPoint("location", place.getLatitude(), place.getLongitude());
final IndexableField city = new StringField("city", place.getCity(), Field.Store.YES);

document.add(id);
document.add(name);
document.add(location);
document.add(city);

我实现了两种方法来搜索数据，一种是在定义的半径内的附近位置，效果很好，另一种是按名称搜索位置。我们还必须在按名称搜索时实现自动完成功能。

我已经实现了按名称搜索，如下所示:

QueryParser parser = new QueryParser("name", analyzer);
return parser.createPhraseQuery("name", searchStr, 2);

现在我有一个地方，名字叫“汤姆诊所和药房”。

如果我使用以下短语搜索，我会得到结果:

汤姆
汤姆诊所
汤姆药房

这很棒，但如果用户输入“Tom clini”或“Tom pharma”，Lucene 不会返回任何结果。

我尝试在 searchStr 的末尾添加一个“*”，尝试将短语传递给通配符查询(它在单个单词上工作正常，但在多个单词上失败)。

另外，我想添加一点模糊性，以便可以处理拼写错误，我是 Lucene 的新手，不知道从这里开始做什么，所以尽你所能帮助我!

附:Lucene 7.3

最佳答案

在这些情况下，最好的办法始终是寻找好的资源。我可以推荐以下书籍

Lucene in Action (古老但黄金)
Lucene 4 cookbook (以下示例摘自本书)；

。特别是，您可能对以下一项或两项感兴趣:

Fuzzy query

Lucene's fuzzy search implementation is based on Levenshtein distance. It compares two strings and finds out the number of single character changes that are needed to transform one string to another. The resulting number indicates the closeness of the two strings. In a fuzzy search, a threshold number of edits is used to determine if the two strings are matched. To trigger a fuzzy match in QueryParser, you can use the tilde ~ character. There are a couple configurations in QueryParser to tune this type of query. Here is a code
queryParser.setFuzzyMinSim(2f);
queryParser.setFuzzyPrefixLength(3);
Query query = queryParser.parse("hump~");
This example will return first, second, and fourth sentences as the fuzzy match matches hump to humpty because these two words are missed by two characters. We tuned the fuzzy query to a minimum similarity to two in this example.

PhraseQuery and MultiPhraseQuery

A PhraseQuery matches a particular sequence of terms, while a MultiPhraseQuery gives you an option to match multiple terms in the same position. For example, MultiPhrasQuery supports a phrase such as humpty (dumpty OR together) in which it matches humpty in position 0 and dumpty or together in position 1.

How to do it...

Here is a code snippet to demonstrate both Query types:
PhraseQuery query = new PhraseQuery();
query.add(new Term("content", "humpty"));
query.add(new Term("content", "together"));
MultiPhraseQuery query2 = new MultiPhraseQuery();
Term[] terms1 = new Term[1];
terms1[0] = new Term("content", "humpty");
Term[] terms2 = new Term[2];
terms2[0] = new Term("content", "dumpty");
terms2[1] = new Term("content", "together");
query2.add(terms1);
query2.add(terms2);
How it works…

The first Query, PhraseQuery, searches for the phrase humpty together. The second Query, MultiPhraseQuery, searches for the phrase humpty (dumpty OR together). The first Query would return sentence four from our setup, while the second Query would return sentence one, two, and four. Note that in MultiPhraseQuery, multiple terms in the same position are added as an array.

但是，直接处理 Lucene 的应用程序并不多，更常见的是使用 Solr或Elastic Search 。两者都在底层使用 Lucene，但它的包装很漂亮。也许值得一看。

关于java - Lucene Phrase 查询不完整单词，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49842537/

30

4

0

文章推荐： cordova - PhoneGap 中的相关媒体源

文章推荐： dart - 如何在 Dart 中覆盖来自不同库的私有(private)方法？

文章推荐： flutter - 在Flutter中重新发送OTP代码Firebase电话身份验证

文章推荐： Django Rest框架带有get和post的额外序列化器字段

ruby-on-rails - 在 rails 中用 phrase 替换 {phrase}
我想用 phrase 搜索和替换任何出现的 {phrase}使用 rails(erb.html 文件)。需要替换多个短语，并且事先不知道这些短语。完整示例: Hi {guys}, I really
Eclipse 内容辅助 : filter by "contains the phrase" instead of "starts with the phrase"
如果您在内容辅助激活时开始输入，内容辅助将仅根据起始字母过滤建议。然而，假设有一个对象，我们需要查看它的任何方法是否在其名称的任何部分包含特定的短语(不仅仅是检查它们是否以该短语开头)。有什么方法
java - Spring 数据 mongodb : Text search for 'phrase OR words in phrase'
我需要在名为 blog 的集合中搜索文档，该集合具有为标题、标签、摘要和正文定义的文本索引: @Document(collection="blog") public class Blog {
java - Spring 数据 mongodb : Text search for 'phrase OR words in phrase'
我需要在名为 blog 的集合中搜索文档，该集合具有为标题、标签、摘要和正文定义的文本索引: @Document(collection="blog") public class Blog {
Python 在 "u11-Phrase 1000.wav"之前对 "u11-Phrase 101.wav"进行排序；我怎样才能克服这个？
我在 win 32 上运行 Python 2.5(r25:51908，2006 年 9 月 19 日，09:52:17)[MSC v.1310 32 位(英特尔)] 当我问 Python 时 >>>
ruby - 在文本中捕获所有大写字母 "phrase"
我试图从一个长文本中提取一系列全部大写的单词。因此，在示例中: Here is a couple words of text. If you want more information please
java - Lucene Phrase 查询不完整单词
我已经使用 StandandAnalyzer 实现了 RamDirectory，并将位置数据存储在 Lucene 缓存中，我已在 Lucene 中添加了数据，如下所示: final Document
mysql批量删除表，其中table_name如 '%phrase'
好的，我有大约 68,000 个表需要用特定短语删除。当我运行命令时: 显示表，其中 table_name like '%phrase'; 我得到了我需要的所有结果，但我尝试运行以下代码，但它只删除了
ElasticSearch Phrase Suggester 不返回任何结果
我使用 ElasticSearch 5.1.2 作为 Heroku 的 Searchly 插件，带有 nodeJS 包 ( https://github.com/elastic/elasticsear
Elasticsearch phrase suggester 向我建议我的索引中不存在的建议
我有一个 Elasticsearch 索引，其中包含一些数据。我实现了 did-you-mean 功能，因此当用户写错拼写的内容时，它可以收到包含正确单词的建议。我使用短语 suggester 是因
KDB Apply where phrase only if column exists
我正在寻找一种在 KDB 中编写功能选择的方法，以便仅当列存在时才应用 where 短语(为了避免错误)。如果该列不存在，则默认为 true。我试过了，没用 enlist(|;enlist(in;`
javascript - 正则表达式: Match word or Phrase
目前，我在 JavaScript 中使用以下正则表达式来匹配和计算字符串中的单词。与此 ReEx 完美配合: 正则表达式: var pickRegExp = /[^\W\d]+[\u00C0-\u01
elasticsearch - Elasticsearch : Match phrase and term
在 elasticsearch 中，我正在构建过滤查询以查找包含的文档。两个一个短语和一个术语。以下查询不起作用。它似乎使用查询数组中的项目返回结果，但好像应用了“或”运算符。编辑:由于我使用的是
elasticsearch - 如何对带有/不带有特殊字符的所有类型的单词实现match和match-phrase-prefix的一致行为？
我创建了两个相等的api，就像从Elastic搜索中搜索filelds一样: 例如:如果 flex 搜索中的“queuename”字段具有诸如queue，queue1，queue2和3queue之类的
elasticsearch - elasticsearch phrase suggester 不从文本中删除空格
elastic search suggestor 不删除不需要的空间使用的查询... POST /_search { "_source": false, "suggest": { "t
solr - Solr 中的 "protected phrase"
我的一位客户是一家专门从事新闻摄影(嗯，还有八卦)的摄影机构，因此他们的许多客户的搜索都围绕着特定的人展开。我们索引了大约 150 万个文档，对标题和标题进行全文搜索；和全文搜索，无需词干标签。我们
prolog - 查看 phrase/3 翻译的标准方法？
我正在尝试深入研究以下的 GNU Prolog 行为: test(X,I,O) :- phrase(X,I,O). ?- test(("a",!,"b"),"ab",""). 有没有标准的方法来查看
mysql - 反向搜索 : Phrases per document
我有一个充满短语(80-100 个字符)和一些较长文档(50-100Kb)的数据库，我想要给定文档的短语排名列表；而不是搜索引擎的通常输出，而是给定短语的文档列表。我以前用过MYSQL全文索引，也研
nlp - 寻找有效的 NLP Phrase Embedding 模型
我想要实现的目标是找到一个好的 word_and_phrase 嵌入模型，它可以做到:(1) 对于我感兴趣的词和短语，它们有嵌入。(2) 我可以使用嵌入来比较两个事物(可以是单词或短语)之间的相似度
python - 在未标记的文本语料库上训练 Spacy 以提取 "important phrases"
我正在寻找一种从文本文档中提取“重要短语”的方法。希望使用 Spacy 做到这一点，但有一个警告:我的数据主要包含产品信息，因此重要的短语与自然口语中的短语不同。出于这个原因，我想在我自己的语料库上训

首页

博学

6Ren·AI

商城

java - Lucene Phrase 查询不完整单词