gpt4 book ai didi

arrays - elasticsearch将数据转换为数组

转载 作者:行者123 更新时间:2023-12-02 22:34:21 27 4
gpt4 key购买 nike

我想使用ES来计算用户保留率:

  • 1,事件日志记录到默认索引
  • 2,转换为中间索引:以实体为中心的数据,按acc分组
  • 3,使用aggs过滤器(或adjacency_matrix)计算每天的相交结果。

  • 问题出在第二步:如何生成一个不错的转换
    输入事件日志:
    POST _bulk
    {"index": {"_index": "test.u1"}}
    {"acc":1001, "event":"create", "timestamp":"2020-08-01 09:00"}
    {"index": {"_index": "test.u1"}}
    {"acc":1001, "event":"login", "timestamp":"2020-08-01 10:00"}
    {"index": {"_index": "test.u1"}}
    {"acc":1001, "event":"login", "timestamp":"2020-08-02 10:00"}
    {"index": {"_index": "test.u1"}}
    {"acc":1001, "event":"login", "timestamp":"2020-08-03 10:00"}
    {"index": {"_index": "test.u1"}}
    {"acc":1002, "event":"create", "timestamp":"2020-08-01 10:00"}
    {"index": {"_index": "test.u1"}}
    {"acc":1002, "event":"login", "timestamp":"2020-08-02 10:00"}
    {"index": {"_index": "test.u1"}}
    {"acc":1002, "event":"login", "timestamp":"2020-08-02 11:00"}
    {"index": {"_index": "test.u1"}}
    {"acc":1003, "event":"create", "timestamp":"2020-08-01 10:00"}
    {"index": {"_index": "test.u1"}}
    {"acc":1004, "event":"create", "timestamp":"2020-08-02 10:00"}
    {"index": {"_index": "test.u1"}}
    {"acc":1004, "event":"login", "timestamp":"2020-08-02 10:00"}
    {"index": {"_index": "test.u1"}}
    {"acc":1004, "event":"login", "timestamp":"2020-08-03 10:00"}
    期望中间指数:
    {"acc":1001, "create":"08-01", "login":[08-01, 08-02, 08-03]}
    {"acc":1002, "create":"08-01", "login":[08-02]}
    {"acc":1003, "create":"08-01", "login":[]}
    {"acc":1004, "create":"08-02", "login":[08-02, 08-03]}
    如何生成 “登录” 数组?
    或任何更好的设计是受欢迎的。

    最佳答案

    通过aggs.scripted_metric使其完成

    PUT _transform/tr-acc2-ar2
    {
    "source": {
    "index": [
    "mhlog2-*"
    ]
    },
    "pivot": {
    "group_by": {
    "msg.#account_id": {
    "histogram": {
    "field": "msg.#account_id",
    "interval": "1"
    }
    }
    },
    "aggregations": {
    "create": {
    "filter": {
    "term": {
    "msg.#event_name.keyword": "createRole"
    }
    },
    "aggs": {
    "time": {
    "min": {
    "field": "@timestamp"
    }
    }
    }
    },
    "login": {
    "filter": {
    "term": {
    "msg.#event_name.keyword": "login"
    }
    },
    "aggs": {
    "days": {
    "scripted_metric": {
    "init_script": "state.days=[:];",
    "map_script": "state.days[doc['@timestamp'].value.toString('yyyy-MM-dd')]=1; ",
    "combine_script": "return state",
    "reduce_script": "def days = [:]; def array =[]; for (s in states) { for (d in s.days.keySet()) { days[d]=1; } } for (d in days.keySet()) { array.add(d);} return array; "
    }
    }
    }
    }
    }
    },
    "dest": {
    "index": "idx.tr.acc2.ar2"
    },
    "sync": {
    "time": {
    "field": "@timestamp",
    "delay": "60s"
    }
    }
    }
    gen中间索引:
    _id : AAAAAAAA
    _index : acc.array
    _score : 0
    _type : _doc
    create.time : Aug 18, 2020 @ 11:17:43.000
    login.days : 2020-08-18T00:00:00.000Z, 2020-08-19T00:00:00.000Z, 2020-08-20T00:00:00.000Z
    msg.#account_id : 12333212323
    最后,通过KQL过滤器可以轻松地为2020-08-19的2020-08-18用户保留:
    create.time: 2020-08-18 AND login.days: 2020-08-19

    关于arrays - elasticsearch将数据转换为数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63752220/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com