gpt4 book ai didi

elasticsearch - 在 Elasticsearch 中查询非规范化树数据

转载 作者:行者123 更新时间:2023-12-04 08:26:02 25 4
gpt4 key购买 nike

我将树数据存储在 Elasticsearch 7.9 中,其数据结构如下所述。我正在尝试编写一个查询,该查询可以给出下面有最多 child 的前 10 个 child 。

设置数据
鉴于此示例树:
example tree
在 ES 中由以下数据描述:

{ "id": "A", "name": "User A" }
{ "id": "B", "name": "User B", "parents": ["A"], "parent1": "A" }
{ "id": "C", "name": "User C", "parents": ["A"], "parent1": "A" }
{ "id": "D", "name": "User D", "parents": ["A", "B"], "parent1": "B", "parent2": "A" }
{ "id": "E", "name": "User E", "parents": ["A", "B", "D"], "parent1": "D", "parent2": "B", "parent2": "A" }
每个字段都是映射类型 keyword文档字段是:
  • "id"- 文档 ID,与 _id 相同,
  • "parents"- 文档的所有父节点,如果是根节点则为空
  • "parent1"- 文档的父级
  • "parent2"- 文档的祖父级
  • "parent N "- 第 N 个曾祖 parent ,最多 5

  • 预期结果
    我想从用户 A 中找到所有“ parent ”和总数 count child 的。所以在这个例子中,结果将是
    User B - 2
    User C - 0

    自己测试一下
    PUT test_index
    PUT test_index/_mapping
    {
    "properties": {
    "id": { "type": "keyword" },
    "name": { "type": "keyword" },
    "referred_by_sub": { "type": "keyword" },
    "parents": { "type": "keyword" },
    "parent1": { "type": "keyword" },
    "parent2": { "type": "keyword" },
    "parent3": { "type": "keyword" },
    "parent4": { "type": "keyword" },
    "parent5": { "type": "keyword" }
    }
    }

    POST _bulk
    { "index" : { "_index" : "test_index", "_id" : "A" } }
    { "id": "A", "name": "User A" }
    { "index" : { "_index" : "test_index", "_id" : "B" } }
    { "id": "B", "name": "User B", "parents": ["A"], "parent1": "A" }
    { "index" : { "_index" : "test_index", "_id" : "C" } }
    { "id": "C", "name": "User C", "parents": ["A"], "parent1": "A" }
    { "index" : { "_index" : "test_index", "_id" : "D" } }
    { "id": "D", "name": "User D", "parents": ["A", "B"], "parent1": "B", "parent2": "A" }
    { "index" : { "_index" : "test_index", "_id" : "E" } }
    { "id": "E", "name": "User E", "parents": ["A", "B", "D"], "parent1": "D", "parent2": "B", "parent2": "A" }

    最终结果扩展自乔的回答
    对于将来来到这里的任何人,如果它与接受的答案不同,我喜欢发布我的最终结果。我的包括生成的文档源以及一个数组。这些不在要求中,因为我试图使我的问题尽可能简单。
    也许它会在 future 帮助某人。
    查询
    GET test_index/_search
    {
    "size": 0,
    "query": {
    "bool": {
    "should": [
    {
    "term": {
    "id": "A"
    }
    },
    {
    "term": {
    "parents": "A"
    }
    }
    ]
    }
    },
    "aggs": {
    "children_counter": {
    "scripted_metric": {
    "init_script": "state.ids_vs_children = [:]; state.root_children = [:]",
    "map_script": """
    def current_id = doc['id'].value;
    if (!state.ids_vs_children.containsKey(current_id)) {
    state.ids_vs_children[current_id] = new ArrayList();
    }

    if(doc['parent1'].contains(params.id)) {
    state.root_children[current_id] = params._source;
    }

    def parents = doc['parents'];
    if (parents.size() > 0) {
    for (def p : parents) {
    if (!state.ids_vs_children[current_id].contains(p)) {
    if (!state.ids_vs_children.containsKey(p)) {
    state.ids_vs_children[p] = new ArrayList();
    }
    state.ids_vs_children[p].add(current_id);
    }
    }
    }
    """,
    "combine_script": """
    def results = [];
    for (def pair : state.ids_vs_children.entrySet()) {
    def uid = pair.getKey();
    if (!state.root_children.containsKey(uid)) {
    continue;
    }

    def doc_map = [:];
    doc_map["doc"] = state.root_children[uid];
    doc_map["num_children"] = pair.getValue().size();
    results.add(doc_map);
    }

    def final_result = [:];
    final_result['count'] = results.length;
    final_result['results'] = results;
    return final_result;
    """,
    "reduce_script": "return states",
    "params": {
    "id": "A"
    }

    }
    }
    }
    }
    输出
    {
    "took" : 9,
    "timed_out" : false,
    "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
    },
    "hits" : {
    "total" : {
    "value" : 4,
    "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
    },
    "aggregations" : {
    "children_counter" : {
    "value" : [
    {
    "count" : 2,
    "results" : [
    {
    "num_children" : 1,
    "doc" : {
    "parent1" : "A",
    "name" : "User B",
    "id" : "B",
    "parents" : [
    "A"
    ]
    }
    },
    {
    "num_children" : 0,
    "doc" : {
    "parent1" : "A",
    "name" : "User C",
    "id" : "C",
    "parents" : [
    "A"
    ]
    }
    }
    ]
    }
    ]
    }
    }
    }

    最佳答案

    您的非规范化树已经包含了该计算所需的一切,但我们需要访问其他文档的父文档,因为我们遍历子文档并跟踪引用,因此这是 scripted metric aggregation 的完美用例。 .

    GET test_index/_search
    {
    "size": 0,
    "query": {
    "bool": {
    "should": [
    {
    "term": {
    "id": "A"
    }
    },
    {
    "term": {
    "parents": "A"
    }
    }
    ]
    }
    },
    "aggs": {
    "children_counter": {
    "scripted_metric": {
    "init_script": "state.ids_vs_children = [:];",
    "map_script": """
    def current_id = doc['id'].value;
    if (!state.ids_vs_children.containsKey(current_id)) {
    state.ids_vs_children[current_id] = new ArrayList();
    }

    def parents = doc['parents'];
    if (parents.size() > 0) {
    for (def p : parents) {
    if (!state.ids_vs_children[current_id].contains(p)) {
    state.ids_vs_children[p].add(current_id);
    }
    }
    }
    """,
    "combine_script": """
    def final_map = [:];
    for (def pair : state.ids_vs_children.entrySet()) {
    def uid = pair.getKey();
    if (params.exclude_users != null && params.exclude_users.contains(uid)) {
    continue;
    }

    final_map[uid] = pair.getValue().size();
    }

    return final_map;
    """,
    "reduce_script": "return states",
    "params": {
    "exclude_users": ["A"]
    }
    }
    }
    }
    }
    屈服
    ...
    "aggregations" : {
    "children_counter" : {
    "value" : [
    {
    "B" : 2, <--
    "C" : 0, <--
    "D" : 1,
    "E" : 0
    }
    ]
    }
    }
    强烈建议使用顶级查询,这样您就不会占用 CPU,像这样的 b/c 脚本是众所周知的资源密集型。
    需要顶级查询以将其限制为仅 A 的 child 。
    提示:如果你不太频繁地更新这些用户,我建议在索引之前执行这个子项计算——你必须在某个地方迭代,所以为什么不在 ES 之外呢?

    关于elasticsearch - 在 Elasticsearch 中查询非规范化树数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65255790/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com