gpt4 book ai didi

java - 在Lucene Spell Checker 5.3.1中,如何获得最接近的匹配,无论它有多糟糕?

转载 作者:行者123 更新时间:2023-12-01 10:47:11 25 4
gpt4 key购买 nike

我有一些看起来像这样的代码(片段):

public List<String> search(String streetNumber, String streetDirection, String streetName) throws ParseException, IOException {

IndexReader ir = DirectoryReader.open(fsDirectory);
Dictionary d = new LuceneDictionary(ir, "text");

try(SpellChecker spellchecker = new SpellChecker(fsDirectory)) {
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
spellchecker.indexDictionary(d, indexWriterConfig, true);
String text = streetNumber + " " + streetDirection + " " + streetName;
String[] suggestions = spellchecker.suggestSimilar(text, MAX_MATCHES, 0.00001F);
return Arrays.asList(suggestions);
}
}

我用这个来测试它:

package ctc.api.web.service.impl;

import java.io.IOException;
import java.util.Arrays;
import java.util.List;

import org.apache.lucene.queryparser.classic.ParseException;
import org.testng.annotations.Test;

public class LuceneIndexServiceImplTest {

@Test
public void f() throws ParseException, IOException {

LuceneIndexServiceImpl t = new LuceneIndexServiceImpl();

String[] texts = { "123 n main st", "234 s apple st", "345 w orange st" };

t.addToIndex(Arrays.asList(texts).stream());

List<String> r;

r = t.search("123", "n", "moin");
org.testng.Assert.assertEquals(r.toString(), "[123 n main st]");

r = t.search("234", "", "opple");
org.testng.Assert.assertEquals(r.toString(), "[234 s apple st]");

r = t.search("345", "", "oge ave");
org.testng.Assert.assertEquals(r.toString(), "[345 w orange st]");

r = t.search("", "", "geez");
org.testng.Assert.assertEquals(r.toString(), "[345 w orange st]");

}
}

不幸的是,我似乎无法让最后一个断言通过。 Lucene 返回空,因为匹配太差(只有字母“ge”匹配)。不幸的是,对于我的应用程序来说,任何匹配都比没有匹配要好。

如何强制 Lucene 拼写检查通过编辑距离返回最接近的字符串?

最佳答案

这种方法在lucene中被称为模糊搜索。引用自lucene doc

模糊搜索

Lucene supports fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm. To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to "roam" use the fuzzy search:

roam~ This search will find terms like foam and roams.

Starting with Lucene 1.9 an additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example:

roam~0.8 The default that is used if the parameter is not given is 0.5.

有很多模糊搜索的解决方案,例如 How to get Lucene Fuzzy Search result 's matching terms?

关于java - 在Lucene Spell Checker 5.3.1中,如何获得最接近的匹配,无论它有多糟糕?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34101943/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com