gpt4 book ai didi

solr - Solr中的模糊搜索

转载 作者:行者123 更新时间:2023-12-04 17:50:50 24 4
gpt4 key购买 nike

我正在使用Solr进行模糊查询,该查询遍历了可能有拼错单词或缩写单词的数据存储库。例如,存储库的名称可能带有单词“Hlth”(单词“Health”的缩写形式)。

  • 如果我对Name:'Health'~0.35进行模糊搜索,则会得到单词'Health'而不是'Hlth'的结果。
  • 如果我对Name:'Hlth'~0.35进行模糊搜索,则会得到名称为“Health”和“Hlth”的记录。

  • 我想获得第一个查询才能正常工作。在我的业务用例中,我将不得不使用干净的数据来查询所有拼写错误或缩写的单词。

    有人可以提供帮助并阐明为什么#1模糊搜索不起作用以及是否有其他方法可以实现这一点。

    最佳答案

    您以错误的方式使用模糊查询。

    根据Mike McCandless(http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html)的说法:

    FuzzyQuery matches terms "close" to a specified base term: you specify an allowed maximum edit distance, and any terms within that edit distance from the base term (and, then, the docs containing those terms) are matched.

    The QueryParser syntax is term~ or term~N, where N is the maximum allowed number of edits (for older releases N was a confusing float between 0.0 and 1.0, which translates to an equivalent max edit distance through a tricky formula).

    FuzzyQuery is great for matching proper names: I can search for mcandless~1 and it will match mccandless (insert c), mcandles (remove s), mkandless (replace c with k) and a great many other "close" terms. With max edit distance 2 you can have up to 2 insertions, deletions or substitutions. The score for each match is based on the edit distance of that term; so an exact match is scored highest; edit distance 1, lower; etc.



    所以您需要编写这样的查询-Health〜2

    关于solr - Solr中的模糊搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16655933/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com