gpt4 book ai didi

search - 同义词,将权重存储在文档中以在Elastic Search中进行相关性评分

转载 作者:行者123 更新时间:2023-12-02 22:46:44 29 4
gpt4 key购买 nike

故事:给定下面的示例文档并通过扩展它们,有可能获得以下排名:

  • 对“ Cereal ”的搜索结果显示以下排名
  • Jade 米片
  • 水稻脆饼
  • 搜索“大米”会得出以下排名
  • 印度香米
  • 水稻脆饼

  • 执行针对搜索的文档:
    [{
    name: "Cornflakes"
    },
    {
    name: "Basmati"
    },
    {
    name: "Rice Krispies"
    }]

    当然,其中一些甚至没有保存搜索词,因此可以选择添加具有文本值和权重的同义词数组,以帮助计算排名:
    [{
    name: "Cornflakes",
    synonyms: [
    {t: 'Cereals', weight: 100},
    {t: 'Sugar', weight: 100}]
    },
    {
    name: "Basmati",
    synonyms: [
    {t: 'Cereals', weight: 1},
    {t: 'Rice', weight: 1000}]
    },
    {
    name: "Rice Krispies",
    synonyms: [
    {t: 'Cereals', weight: 10},
    {t: 'Rice', weight: 1}]
    }]

    这是正确的方法吗?

    用于考虑加权同义词的Elastic Search查询是什么?

    最佳答案

    我认为“标签”比“同义词”更适合该 Realm 。
    您可以使用nested type来存储标签,并使用function score来组合tags.weight字段(最佳匹配标签(如果有)的值)和name字段上的匹配分数的值。

    一种这样的实现可能如下所示:

    put test

    put test/tag_doc/_mapping
    {
    "properties" : {
    "tags" : {
    "type" : "nested" ,
    "properties": {
    "t" : {"type" : "string"},
    "weight" : {"type" : "double"}
    }

    }
    }
    }

    put test/tag_doc/_bulk
    { "index" : { "_index" : "test", "_type" : "tag_doc", "_id":1} }
    {"name": "Cornflakes","tags": [{"t": "Cereals", "weight":100},{"t": "Sugar", "weight": 100}]}
    { "index" : { "_index" : "test", "_type" : "tag_doc","_id":2} }
    { "name": "Basmati","tags": [{"t": "Cereals", "weight": 1},{"t": "Rice", "weight": 1000}]}
    { "index" : { "_index" : "test", "_type" : "tag_doc","_id":3} }
    { "name": "Rice Krispies", "tags": [{"t": "Cereals", "weight": 10},{"t": "Rice", "weight": 1}]}


    post test/_search
    {
    "query": {
    "dis_max": {
    "queries": [
    {
    "match": {
    "name": {
    "query": "cereals",
    "boost": 100
    }
    }
    },
    {
    "nested": {
    "path": "tags",
    "query": {
    "function_score": {
    "functions": [
    {
    "field_value_factor": {
    "field": "tags.weight"
    }
    }
    ],
    "query": {
    "match": {
    "tags.t": "cereals"
    }
    },
    "boost_mode": "replace",
    "score_mode": "max"
    }
    },
    "score_mode": "max"
    }
    }
    ]
    }
    }
    }

    结果:
    "hits": {
    "total": 3,
    "max_score": 100,
    "hits": [
    {
    "_index": "test",
    "_type": "tag_doc",
    "_id": "1",
    "_score": 100,
    "_source": {
    "name": "Cornflakes",
    "tags": [
    {
    "t": "Cereals",
    "weight": 100
    },
    {
    "t": "Sugar",
    "weight": 100
    }
    ]
    }
    },
    {
    "_index": "test",
    "_type": "tag_doc",
    "_id": "3",
    "_score": 10,
    "_source": {
    "name": "Rice Krispies",
    "tags": [
    {
    "t": "Cereals",
    "weight": 10
    },
    {
    "t": "Rice",
    "weight": 1
    }
    ]
    }
    },
    {
    "_index": "test",
    "_type": "tag_doc",
    "_id": "2",
    "_score": 1,
    "_source": {
    "name": "Basmati",
    "tags": [
    {
    "t": "Cereals",
    "weight": 1
    },
    {
    "t": "Rice",
    "weight": 1000
    }
    ]
    }
    }
    ]
    }

    关于search - 同义词,将权重存储在文档中以在Elastic Search中进行相关性评分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34185332/

    29 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com