gpt4 book ai didi

javascript - Elasticsearch 多字段模糊搜索不首先返回完全匹配

转载 作者:塔克拉玛干 更新时间:2023-11-02 21:18:38 25 4
gpt4 key购买 nike

我正在对“文本”和“关键字”字段执行模糊 Elasticsearch 查询。我在 elasticsearch 中有两个文档,一个带有“文本”“testPhone 5”,另一个带有“testPhone 4s”。当我使用“testPhone 5”执行模糊查询时,我看到两个文档都被赋予了完全相同的分数值。为什么会这样?

额外信息:我正在使用“uax_url_email”分词器和“小写”过滤器为文档编制索引。

这是我正在做的查询:

{
query : {
bool: {
// match one or the other fuzzy query
should: [
{
fuzzy: {
text: {
min_similarity: 0.4,
value: 'testphone 5',
prefix_length: 0,
boost: 5,
}
}
},
{
fuzzy: {
keywords: {
min_similarity: 0.4,
value: 'testphone 5',
prefix_length: 0,
boost: 1,
}
}
}
]
}
},
sort: [
'_score'
],
explain: true
}

这是结果:

{ max_score: 0.47213298,
total: 2,
hits:
[ { _index: 'test',
_shard: 0,
_id: '51fbf95f82e89ae8c300002c',
_node: '0Mtfzbe1RDinU71Ordx-Ag',
_source:
{ next: { id: '51fbf95f82e89ae8c3000027' },
cards: [ '51fbf95f82e89ae8c3000027', [length]: 1 ],
other: false,
_id: '51fbf95f82e89ae8c300002c',
category: '51fbf95f82e89ae8c300002b',
image: 'https://s3.amazonaws.com/sold_category_icons/Smartphones.png',
text: 'testPhone 5',
keywords: [ [length]: 0 ],
__v: 0 },
_type: 'productgroup',
_explanation:
{ details:
[ { details:
[ { details:
[ { details:
[ { details:
[ { value: 3.8888888, description: 'boost' },
{ value: 1.5108256,
description: 'idf(docFreq=2, maxDocs=5)' },
{ value: 0.17020021,
description: 'queryNorm' },
[length]: 3 ],
value: 0.99999994,
description: 'queryWeight, product of:' },
{ details:
[ { details:
[ { value: 1, description: 'termFreq=1.0' },
[length]: 1 ],
value: 1,
description: 'tf(freq=1.0), with freq of:' },
{ value: 1.5108256,
description: 'idf(docFreq=2, maxDocs=5)' },
{ value: 0.625,
description: 'fieldNorm(doc=0)' },
[length]: 3 ],
value: 0.944266,
description: 'fieldWeight in 0, product of:' },
[length]: 2 ],
value: 0.94426596,
description: 'score(doc=0,freq=1.0 = termFreq=1.0\n), product of:' },
[length]: 1 ],
value: 0.94426596,
description: 'weight(text:testphone^3.8888888 in 0) [PerFieldSimilarity], result of:' },
[length]: 1 ],
value: 0.94426596,
description: 'sum of:' },
{ value: 0.5, description: 'coord(1/2)' },
[length]: 2 ],
value: 0.47213298,
description: 'product of:' },
_score: 0.47213298 },
{ _index: 'test',
_shard: 4,
_id: '51fbf95f82e89ae8c300002d',
_node: '0Mtfzbe1RDinU71Ordx-Ag',
_source:
{ next: { id: '51fbf95f82e89ae8c3000027' },
cards: [ '51fbf95f82e89ae8c3000029', [length]: 1 ],
other: false,
_id: '51fbf95f82e89ae8c300002d',
category: '51fbf95f82e89ae8c300002b',
image: 'https://s3.amazonaws.com/sold_category_icons/Smartphones.png',
text: 'testPhone 4s',
keywords: [ 'apple', [length]: 1 ],
__v: 0 },
_type: 'productgroup',
_explanation:
{ details:
[ { details:
[ { details:
[ { details:
[ { details:
[ { value: 3.8888888, description: 'boost' },
{ value: 1.5108256,
description: 'idf(docFreq=2, maxDocs=5)' },
{ value: 0.17020021,
description: 'queryNorm' },
[length]: 3 ],
value: 0.99999994,
description: 'queryWeight, product of:' },
{ details:
[ { details:
[ { value: 1, description: 'termFreq=1.0' },
[length]: 1 ],
value: 1,
description: 'tf(freq=1.0), with freq of:' },
{ value: 1.5108256,
description: 'idf(docFreq=2, maxDocs=5)' },
{ value: 0.625,
description: 'fieldNorm(doc=0)' },
[length]: 3 ],
value: 0.944266,
description: 'fieldWeight in 0, product of:' },
[length]: 2 ],
value: 0.94426596,
description: 'score(doc=0,freq=1.0 = termFreq=1.0\n), product of:' },
[length]: 1 ],
value: 0.94426596,
description: 'weight(text:testphone^3.8888888 in 0) [PerFieldSimilarity], result of:' },
[length]: 1 ],
value: 0.94426596,
description: 'sum of:' },
{ value: 0.5, description: 'coord(1/2)' },
[length]: 2 ],
value: 0.47213298,
description: 'product of:' },
_score: 0.47213298 },
[length]: 2 ] }

最佳答案

模糊查询未被分析,但该字段被分析,因此您搜索距离为 0.4testphone 5 会产生分析的术语 testphone两个文档和该术语都用于进一步过滤结果

description: 'weight(text:testphone^3.8888888 in 0) [PerFieldSimilarity], result of:' },

另请参阅@imotov 出色的回答: ElasticSearch's Fuzzy Query

您可以使用 _analyze API 查看字符串将如何准确标记化

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html

http://localhost:9200/prefix_test/_analyze?field=text&text=testphone+5

将返回:

{
"tokens": [
{
"token": "testphone",
"start_offset": 0,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "5",
"start_offset": 10,
"end_offset": 11,
"type": "<NUM>",
"position": 2
}
]
}

因此,即使您为值 testphone sammsung 建立索引,对“testphone samsunk”的模糊查询也不会像 samsunk 那样产生任何结果。

如果不分析(或使用关键字分析器)该字段,您可能会获得更好的结果。

如果您想对单个字段进行不同的分析,您可以使用multi_field 结构。

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-multi-field-type.html

关于javascript - Elasticsearch 多字段模糊搜索不首先返回完全匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18024196/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com