gpt4 book ai didi

elasticsearch - 如何按照匹配词的顺序对结果集进行排序

转载 作者:行者123 更新时间:2023-12-02 23:13:31 25 4
gpt4 key购买 nike

如何按照匹配词的顺序对结果集进行排序?

我有几个词“亨氏迈耶”

我的查询返回:

  • Heinz A. Meyer
  • Heinz Meyer GmbHHeizung-Sanitär
  • Heinz Meyer
  • Karl-Heinz Meyer GmbH

  • 但我需要,按与下一个匹配的位置排序:
  • Heinz Meyer
  • Heinz Meyer GmbHHeizung-Sanitär
  • Heinz A. Meyer
  • Karl-Heinz Meyer GmbH

  • 我的查询是:
        {
    "query": {
    "bool": {
    "must": [{
    "wildcard": {
    "name": "heinz*"
    }
    }, {
    "wildcard": {
    "name": "meyer*"
    }
    }],
    "must_not": [],
    "should": [],
    "filter": {
    "bool": {
    "must": [{
    "range": {
    "latestRevenueStatistics.revenue": {
    "gte": "0",
    "lte": "40000000"
    }
    }
    }, {
    "range": {
    "latestRevenueStatistics.number_of_employees": {
    "gte": "0",
    "lte": "300"
    }
    }
    }, {
    "term": {
    "addresses.postal_code_length": 5
    }
    }]
    }
    }
    }
    },
    "from": 0,
    "size": 10
    }

    最终解决方案:
    {
    "query": {
    "bool": {
    "must": [{
    "wildcard": {
    "name": "heinz*"
    }
    }, {
    "wildcard": {
    "name": "mayer*"
    }
    }, {
    "span_near": {
    "clauses": [{
    "span_term": {
    "name": {
    "value": "heinz"
    }
    }
    }, {
    "span_term": {
    "name": {
    "value": "mayer"
    }
    }
    }],
    "slop": 4,
    "in_order": true
    }
    }],
    "must_not": [],
    "should": [{
    "span_first": {
    "match": {
    "span_term": {
    "name": "heinz"
    }
    },
    "end": 1
    }
    }, {
    "span_first": {
    "match": {
    "span_term": {
    "name": "mayer"
    }
    },
    "end": 2
    }
    }],
    "filter": {
    "bool": {
    "must": [{
    "range": {
    "latestRevenueStatistics.revenue": {
    "gte": "0",
    "lte": "40000000"
    }
    }
    }, {
    "range": {
    "latestRevenueStatistics.number_of_employees": {
    "gte": "0",
    "lte": "300"
    }
    }
    }, {
    "term": {
    "addresses.postal_code_length": 5
    }
    }]
    }
    }
    }
    },
    "from": 0,
    "size": 10
    }

    最佳答案

    您可以结合使用Span FirstSpan TermSpan Near Query来执行比对查询

    为了简单起见,我创建了一个示例索引,其中仅包含一个标记为text类型的name的字段以及以下文档。

    文件:

    POST sortindex/_doc/1
    {
    "name": "Heinz A. Meyer"
    }

    POST sortindex/_doc/2
    {
    "name": "Heinz Meyer GmbH Heizung-Sanitär"
    }

    POST sortindex/_doc/3
    {
    "name": "Heinz Meyer"
    }

    POST sortindex/_doc/4
    {
    "name": "Karl-Heinz Meyer GmbH"
    }

    查询:
    POST sortindex/_search
    {
    "query": {
    "bool": {
    "must": [
    {
    "span_near": { <---- Span Near Query
    "clauses": [
    {
    "span_term": { <---- Span Term Query
    "name": {
    "value": "heinz"
    }
    }
    },
    {
    "span_term": {
    "name": {
    "value": "meyer"
    }
    }
    }
    ],
    "slop": 4, <---- Retrieve all docs having both heinz and meyer with distance of <= 4 words
    "in_order": true <---- Heinz must always come before Meyer
    }
    }
    ],
    "should": [
    {
    "span_first": { <---- Span First Query
    "match": {
    "span_term": { <---- Span Term Query
    "name": "heinz"
    }
    },
    "end": 1 <---- Retrieve docs having heinz's postition <= 1 and > 0 i.e. the first word
    }
    }
    ]
    }
    }
    }

    请注意, Span Near放在 must子句中,而 Span First放在 should子句中。这样,符合 should子句的文档与不匹配的文档相比将获得更高的分数。

    在这两种内部,我们都使用 Span Term进行搜索,这不过是一个术语查询,但它特别适用于与Span Queries一起使用。

    如果您想了解更多关于 Span Queries的信息,建议您浏览这些链接。

    从链接:

    Span queries are low-level positional queries which provide expert control over the order and proximity of the specified terms. These are typically used to implement very specific queries on legal documents or patents.



    响应:
    {
    "took" : 1,
    "timed_out" : false,
    "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
    },
    "hits" : {
    "total" : {
    "value" : 4,
    "relation" : "eq"
    },
    "max_score" : 0.38327998,
    "hits" : [
    {
    "_index" : "sortindex",
    "_type" : "_doc",
    "_id" : "3",
    "_score" : 0.38327998,
    "_source" : {
    "name" : "Heinz Meyer"
    }
    },
    {
    "_index" : "sortindex",
    "_type" : "_doc",
    "_id" : "2",
    "_score" : 0.26893127,
    "_source" : {
    "name" : "Heinz Meyer GmbH Heizung-Sanitär"
    }
    },
    {
    "_index" : "sortindex",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 0.25940484,
    "_source" : {
    "name" : "Heinz A. Meyer"
    }
    },
    {
    "_index" : "sortindex",
    "_type" : "_doc",
    "_id" : "4",
    "_score" : 0.19908611,
    "_source" : {
    "name" : "Karl-Heinz Meyer GmbH"
    }
    }
    ]
    }
    }

    您可以继续并将以上查询添加到您拥有的查询中。

    希望这可以帮助!

    关于elasticsearch - 如何按照匹配词的顺序对结果集进行排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57891689/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com