gpt4 book ai didi

elasticsearch - Elasticsearch模糊查询-最大编辑不能按预期工作

转载 作者:行者123 更新时间:2023-12-02 23:34:20 25 4
gpt4 key购买 nike

我最近在搜索查询字符串中添加了“模糊运算符”和模糊查询设置,以掩盖用户的误操作(例如“zamestnanost”与“zamestnani”)

POST /my_index/_search
{
"query": {
"query_string": {
"query": "+(content:zamestnanost~)",
"fuzzy_prefix_length": 3,
"fuzzy_min_sim": 0.5,
"fuzzy_max_expansions": 50
}
}
}

据我了解模糊查询设置, fuzzy_min_sim = 0.5应该允许原始查询的 length(query)*0.5编辑(在这种情况下为 6编辑)。

但是,它甚至与“更接近的”单词(标记)都不匹配,例如
  • “zamestnani”
  • “zamestnany”

  • 我有一种奇怪的感觉,它仍然只匹配索引中最大的单词。从原始查询字符串进行2次编辑(这是模​​糊查询中的默认编辑计数)。

    我认为我也对查询进行了解释,结果支持了该假设。 _explanation看起来像这样:
    "_explanation": {
    "value": 0.057083897,
    "description": "sum of:",
    "details": [
    {
    "value": 0.023866946,
    "description": "weight(content:zamestnano^0.8 in 0) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 0.023866946,
    "description": "score(doc=0,freq=4.0), product of:",
    "details": [
    {
    "value": 0.66062796,
    "description": "queryWeight, product of:",
    "details": [
    {
    "value": 0.8,
    "description": "boost"
    },
    {
    "value": 4.624341,
    "description": "idf(docFreq=1, maxDocs=75)"
    },
    {
    "value": 0.17857353,
    "description": "queryNorm"
    }
    ]
    },
    {
    "value": 0.036127664,
    "description": "fieldWeight in 0, product of:",
    "details": [
    {
    "value": 2,
    "description": "tf(freq=4.0), with freq of:",
    "details": [
    {
    "value": 4,
    "description": "termFreq=4.0"
    }
    ]
    },
    {
    "value": 4.624341,
    "description": "idf(docFreq=1, maxDocs=75)"
    },
    {
    "value": 0.00390625,
    "description": "fieldNorm(doc=0)"
    }
    ]
    }
    ]
    }
    ]
    },
    {
    "value": 0.03321695,
    "description": "weight(content:zamestnanos^0.9090909 in 0) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 0.03321695,
    "description": "score(doc=0,freq=6.0), product of:",
    "details": [
    {
    "value": 0.7507135,
    "description": "queryWeight, product of:",
    "details": [
    {
    "value": 0.9090909,
    "description": "boost"
    },
    {
    "value": 4.624341,
    "description": "idf(docFreq=1, maxDocs=75)"
    },
    {
    "value": 0.17857353,
    "description": "queryNorm"
    }
    ]
    },
    {
    "value": 0.044247173,
    "description": "fieldWeight in 0, product of:",
    "details": [
    {
    "value": 2.4494898,
    "description": "tf(freq=6.0), with freq of:",
    "details": [
    {
    "value": 6,
    "description": "termFreq=6.0"
    }
    ]
    },
    {
    "value": 4.624341,
    "description": "idf(docFreq=1, maxDocs=75)"
    },
    {
    "value": 0.00390625,
    "description": "fieldNorm(doc=0)"
    }
    ]
    }
    ]
    }
    ]
    }
    ]
    }

    使用模糊查询编辑仅创建查询“zamestnano”和“zemestnanos”。

    我了解模糊查询设置吗?你能指出我的错误吗?

    非常感谢您的每一个想法!

    最佳答案

    the documentation:

    0.0..1.0

    [1.7.0] Deprecated in 1.7.0. Support for similarity will be removed in Elasticsearch 2.0. converted into an edit distance using the formula: length(term) * (1.0 - fuzziness), eg a fuzziness of 0.6 with a term of length 10 would result in an edit distance of 4. Note: in all APIs except for the Fuzzy Like This Query, the maximum allowed edit distance is 2.


    再次检查的最简单方法是使用 validate API:
    GET _validate/query?explain&index=my_index
    {
    "query": {
    "query_string": {
    "query": "+(content:zamestnanost~)",
    "fuzzy_prefix_length": 3,
    "fuzzy_min_sim": 0.5,
    "fuzzy_max_expansions": 50
    }
    }
    }
    得到以下结果:
       "explanations": [
    {
    "index": "test",
    "valid": true,
    "explanation": "+content:zamestnanost~2"
    }
    ]
    该图显示ES将在查询中使用的实际编辑距离: zamestnanost~2

    关于elasticsearch - Elasticsearch模糊查询-最大编辑不能按预期工作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33210244/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com