gpt4 book ai didi

Elasticsearch:Span_near 和子串匹配

转载 作者:行者123 更新时间:2023-11-29 02:57:14 26 4
gpt4 key购买 nike

我是 elasticsearch 的新手。我想实现 span near 的功能,它还负责精确短语匹配和精确词序列匹配之后的子字符串匹配。

例如:

我在索引上的文件:

  1. 男士面霜
  2. 男士抗皱霜
  3. 男士高级抗皱霜
  4. 女士面霜
  5. 女士抗皱霜
  6. 女士高级抗皱霜

如果我搜索“men's cream”,我希望结果与上面显示的顺序相同。预期搜索结果:

  1. men's cream --> 精确短语匹配
  2. 男士抗皱霜 --> 搜索词序列 slop 1
  3. men's advanced wrinkle cream --> 搜索词序列 slop 2
  4. women's cream --> 接近精确短语匹配的子串
  5. women's wrinkle cream --> 带有slop 1的子串搜索词序列
  6. women's advanced wrinkle cream --> 带有slop 2的子串搜索词序列

我可以使用 span_near 嵌套 span_terms 并使用 slop = 2in_order = true< 获得前 3 个结果.
我无法在剩余的 4 到 6 个中实现它,因为 span_near 具有嵌套的 span_terms 不支持 wildcard,在此示例中为“men's cream”或“男装 面霜”。有什么方法可以使用 ELASTICSEARCH 实现吗?

更新
我的指数:

{
"bluray": {
"settings": {
"index": {
"uuid": "4jofvNfuQdqbhfaF2ibyhQ",
"number_of_replicas": "1",
"number_of_shards": "5",
"version": {
"created": "1000199"
}
}
}
}
}

映射:

{
"bluray": {
"mappings": {
"movies": {
"properties": {
"genre": {
"type": "string"
}
}
}
}
}
}

我正在运行以下查询:

POST /bluray/movies/_search
{
"query": {
"bool": {
"should": [
{
"span_near": {
"clauses": [
{
"span_term": {
"genre": "women"
}
},
{
"span_term": {
"genre": "cream"
}
}
],
"collect_payloads": false,
"slop": 12,
"in_order": true
}
},
{
"custom_boost_factor": {
"query": {
"match_phrase": {
"genre": "women cream"
}
},
"boost_factor": 4.1
}
},
{
"match": {
"genre": {
"query": "women cream",
"analyzer": "standard",
"minimum_should_match": "99%"
}
}
}
]
}
}
}

它给了我以下结果:

"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0.011612939,
"hits": [
{
"_index": "bluray",
"_type": "movies",
"_id": "u9aNkZAoR86uAiW9SX8szQ",
"_score": 0.011612939,
"_source": {
"genre": "men's cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "cpTyKrL6TWuJkXvliibVBQ",
"_score": 0.009290351,
"_source": {
"genre": "men's wrinkle cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "rn_SFvD4QBO6TJQJNuOh5A",
"_score": 0.009290351,
"_source": {
"genre": "men's advanced wrinkle cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "9a31_bRpR2WfWh_4fgsi_g",
"_score": 0.004618556,
"_source": {
"genre": "women's cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "q-DoBBl2RsON_qwLRSoh9Q",
"_score": 0.0036948444,
"_source": {
"genre": "women's advanced wrinkle cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "TxzCP8B_Q8epXtIcfgEw3Q",
"_score": 0.0036948444,
"_source": {
"genre": "women's wrinkle cream"
}
}
]
}
}

这是完全不正确的。为什么我已经搜索了女性,它会先搜索男性。

注意:搜索“men's cream”仍会返回更好的结果,但不符合搜索词顺序。

最佳答案

POST /bluray/movies/_search
{
"query": {
"bool": {
"should": [
{
"span_near": {
"clauses": [
{
"span_term": {
"genre": "women's"
}
},
{
"span_term": {
"genre": "cream"
}
}
],
"collect_payloads": false,
"slop": 12,
"in_order": true
}
},{
"match": {
"genre": {
"query": "women's cream",
"analyzer": "standard",
"minimum_should_match": "99%"
}
}
}
]
}
}
}

按照您的预期给出以下输出:

    {
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0.7841132,
"hits": [
{
"_index": "bluray",
"_type": "movies",
"_id": "4",
"_score": 0.7841132,
"_source": {
"genre": "women's cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "5",
"_score": 0.56961054,
"_source": {
"genre": "women's wrinkle cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "6",
"_score": 0.35892165,
"_source": {
"genre": "women's advanced wrinkle cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "3",
"_score": 0.2876821,
"_source": {
"genre": "men's advanced wrinkle cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "1",
"_score": 0.25811607,
"_source": {
"genre": "men's cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "2",
"_score": 0.11750762,
"_source": {
"genre": "men's wrinkle cream"
}
}
]
}
}

关于Elasticsearch:Span_near 和子串匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22970630/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com