gpt4 book ai didi

elasticsearch - 首先获取包含搜索词的文档,然后获取其同义词[Elastic]

转载 作者:行者123 更新时间:2023-12-03 00:53:50 28 4
gpt4 key购买 nike

我想我应该用一个例子来解释我的问题:

假设我已经用同义词分析器创建了索引,并且声明“laptop”,“phone”和“tablet”是相似的词,可以概括为“mobile”:

PUT synonym
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [
"phone, tablet, laptop => mobile"
]
}
}
}
}
},
"mappings": {
"synonym" : {
"properties" : {
"field1" : {
"type" : "text",
"analyzer": "synonym",
"search_analyzer": "synonym"
}
}
}
}
}

现在,我正在创建一些文档:
PUT synonym/synonym/1
{
"field1" : "phone"
}
PUT synonym/synonym/2
{
"field1" : "tablet"
}
PUT synonym/synonym/3
{
"field1" : "laptop"
}

现在,当我匹配 laptoptabletphone的查询时,结果始终是:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.2876821,
"hits": [
{
"_index": "synonym",
"_type": "synonym",
"_id": "2",
"_score": 0.2876821,
"_source": {
"field1": "tablet"
}
},
{
"_index": "synonym",
"_type": "synonym",
"_id": "1",
"_score": 0.18232156,
"_source": {
"field1": "phone"
}
},
{
"_index": "synonym",
"_type": "synonym",
"_id": "3",
"_score": 0.18232156,
"_source": {
"field1": "laptop"
}
}
]
}
}

您可以看到,即使我搜索 tabletlaptop的分数也总是更高。

我知道那是因为我宣布它们为相似的词。

但是,我试图弄清楚如何查询,以便带有搜索词的文档可以首先出现在结果列表中类似词语之前。

可以通过增强来完成,但是必须有一种更简单的方法。

最佳答案

Multi-fields为您解救。
用两种方法对field1进行索引,一种使用同义词分析器,另一种使用标准分析器。
现在,您可以简单地使用 bool(boolean) 查询来为field1(同义词)和field1.raw(标准)添加匹配分数。
因此,您的映射应如下所示:

PUT synonym
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [
"phone, tablet, laptop => mobile"
]
}
}
}
}
},
"mappings": {
"synonym": {
"properties": {
"field1": {
"type": "text",
"analyzer": "synonym",
"search_analyzer": "synonym",
"fields": {
"raw": {
"type": "text",
"analyzer": "standard"
}
}
}
}
}
}
}

您可以使用以下查询:
GET synonyms/_search?search_type=dfs_query_then_fetch
{
"query": {
"bool": {
"should": [
{
"match": {
"field1": "tablet"
}
},
{
"match": {
"field1.raw": "tablet"
}
}
]
}
}
}

注意:我使用了 search_type=dfs_query_then_fetch。由于您正在测试3个分片,并且文档很少,因此得到的分数不是应该的。这是因为频率是按每个分片计算的。您可以在测试时使用 dfs_query_then_fetch,但不建议将其用于生产。另请: https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch

关于elasticsearch - 首先获取包含搜索词的文档,然后获取其同义词[Elastic],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48090911/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com