gpt4 book ai didi

ElasticSearch 映射折叠/对分组文档执行操作的结果

转载 作者:行者123 更新时间:2023-12-02 23:46:54 26 4
gpt4 key购买 nike

有一个对话列表,每个对话都有一个消息列表。每条消息都有不同的字段和 action field 。我们需要考虑在对话的第一条消息中使用了操作A。 ,在几条消息之后,使用了操作 A.1过了一会儿A.1.1等等(有一个聊天机器人意图列表)。

将对话的消息操作分组将类似于:A > A > A > A.1 > A > A.1 > A.1.1 ...
问题:

我需要使用 ElasticSearch 创建一个报告,该报告将返回 actions group每一次谈话;接下来,我需要对类似的 actions groups 进行分组添加计数;最终将导致 Map<actionsGroup, count>'A > A.1 > A > A.1 > A.1.1', 3 .

构建 actions group我需要消除每一组重复项;而不是 A > A > A > A.1 > A > A.1 > A.1.1我需要 A > A.1 > A > A.1 > A.1.1 .

我开始做的步骤 :

{
"collapse":{
"field":"context.conversationId",
"inner_hits":{
"name":"logs",
"size": 10000,
"sort":[
{
"@timestamp":"asc"
}
]
}
},
"aggs":{
},
}

我接下来需要什么:
  • 我需要将崩溃的结果映射到单个结果中,例如 A > A.1 > A > A.1 > A.1.1 .我在案例中看到过,或aggr可以使用scripts通过结果,可以创建我需要的操作列表,但是 aggr正在对所有消息进行操作,而不仅仅是对我崩溃的分组消息。有可能使用aggr内部崩溃或类似的解决方案?
  • 我需要对所有折叠的结果值( A > A.1 > A > A.1 > A.1.1 )进行分组,添加计数并得到 Map<actionsGroup, count> .

  • 或:
  • conversationId 对对话消息进行分组字段使用 aggr (我不知道我该怎么做)
  • 使用脚本迭代所有值并创建 actions group对于每一次谈话。 (不确定这是否可能)
  • 使用另一个 aggr对所有值进行分组,返回 Map<actionsGroup, count> .

  • 更新 2:我设法获得了部分结果,但仍然存在一个问题。请查看 here我还需要修复什么。

    更新 1:添加一些额外的细节

    映射:
    "mappings":{
    "properties":{
    "@timestamp":{
    "type":"date",
    "format": "epoch_millis"
    }
    "context":{
    "properties":{
    "action":{
    "type":"keyword"
    },
    "conversationId":{
    "type":"keyword"
    }
    }
    }
    }
    }

    对话文件样本:
    Conversation 1.
    {
    "@timestamp": 1579632745000,
    "context": {
    "action": "A",
    "conversationId": "conv_id1",
    }
    },
    {
    "@timestamp": 1579632745001,
    "context": {
    "action": "A.1",
    "conversationId": "conv_id1",
    }
    },
    {
    "@timestamp": 1579632745002,
    "context": {
    "action": "A.1.1",
    "conversationId": "conv_id1",
    }
    }

    Conversation 2.
    {
    "@timestamp": 1579632745000,
    "context": {
    "action": "A",
    "conversationId": "conv_id2",
    }
    },
    {
    "@timestamp": 1579632745001,
    "context": {
    "action": "A.1",
    "conversationId": "conv_id2",
    }
    },
    {
    "@timestamp": 1579632745002,
    "context": {
    "action": "A.1.1",
    "conversationId": "conv_id2",
    }
    }

    Conversation 3.
    {
    "@timestamp": 1579632745000,
    "context": {
    "action": "B",
    "conversationId": "conv_id3",
    }
    },
    {
    "@timestamp": 1579632745001,
    "context": {
    "action": "B.1",
    "conversationId": "conv_id3",
    }
    }

    预期结果:
    {
    "A -> A.1 -> A.1.1": 2,
    "B -> B.1": 1
    }
    Something similar, having this or any other format.

    由于我是 elasticsearch 新手,因此每个提示都非常受欢迎。

    最佳答案

    我使用 scripted_metric 解决了它的弹性。此外,index从初始状态改变。

    剧本:

    {
    "size": 0,
    "aggs": {
    "intentPathsCountAgg": {
    "scripted_metric": {
    "init_script": "state.messagesList = new ArrayList();",
    "map_script": "long currentMessageTime = doc['messageReceivedEvent.context.timestamp'].value.millis; Map currentMessage = ['conversationId': doc['messageReceivedEvent.context.conversationId.keyword'], 'time': currentMessageTime, 'intentsPath': doc['brainQueryRequestEvent.brainQueryRequest.user_data.intentsHistoryPath.keyword'].value]; state.messagesList.add(currentMessage);",
    "combine_script": "return state",
    "reduce_script": "List messages = new ArrayList(); Map conversationsMap = new HashMap(); Map intentsMap = new HashMap(); String[] ifElseWorkaround = new String[1]; for (state in states) { messages.addAll(state.messagesList);} messages.stream().forEach((message) -> { Map existingMessage = conversationsMap.get(message.conversationId); if(existingMessage == null || message.time > existingMessage.time) { conversationsMap.put(message.conversationId, ['time': message.time, 'intentsPath': message.intentsPath]); } else { ifElseWorkaround[0] = ''; } }); conversationsMap.entrySet().forEach(conversation -> { if (intentsMap.containsKey(conversation.getValue().intentsPath)) { long intentsCount = intentsMap.get(conversation.getValue().intentsPath) + 1; intentsMap.put(conversation.getValue().intentsPath, intentsCount); } else {intentsMap.put(conversation.getValue().intentsPath, 1L);} }); return intentsMap.entrySet().stream().map(intentPath -> [intentPath.getKey().toString(): intentPath.getValue()]).collect(Collectors.toSet()) "
    }
    }
    }
    }

    格式化脚本(为了更好的可读性 - 使用 .ts):
    scripted_metric: {
    init_script: 'state.messagesList = new ArrayList();',
    map_script: `
    long currentMessageTime = doc['messageReceivedEvent.context.timestamp'].value.millis;
    Map currentMessage = [
    'conversationId': doc['messageReceivedEvent.context.conversationId.keyword'],
    'time': currentMessageTime,
    'intentsPath': doc['brainQueryRequestEvent.brainQueryRequest.user_data.intentsHistoryPath.keyword'].value
    ];
    state.messagesList.add(currentMessage);`,
    combine_script: 'return state',
    reduce_script: `
    List messages = new ArrayList();
    Map conversationsMap = new HashMap();
    Map intentsMap = new HashMap();
    boolean[] ifElseWorkaround = new boolean[1];

    for (state in states) {
    messages.addAll(state.messagesList);
    }

    messages.stream().forEach(message -> {
    Map existingMessage = conversationsMap.get(message.conversationId);
    if(existingMessage == null || message.time > existingMessage.time) {
    conversationsMap.put(message.conversationId, ['time': message.time, 'intentsPath': message.intentsPath]);
    } else {
    ifElseWorkaround[0] = true;
    }
    });

    conversationsMap.entrySet().forEach(conversation -> {
    if (intentsMap.containsKey(conversation.getValue().intentsPath)) {
    long intentsCount = intentsMap.get(conversation.getValue().intentsPath) + 1;
    intentsMap.put(conversation.getValue().intentsPath, intentsCount);
    } else {
    intentsMap.put(conversation.getValue().intentsPath, 1L);
    }
    });

    return intentsMap.entrySet().stream().map(intentPath -> [
    'path': intentPath.getKey().toString(),
    'count': intentPath.getValue()
    ]).collect(Collectors.toSet())`

    答案:
    {
    "took": 2,
    "timed_out": false,
    "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": {
    "value": 11,
    "relation": "eq"
    },
    "max_score": null,
    "hits": []
    },
    "aggregations": {
    "intentPathsCountAgg": {
    "value": [
    {
    "smallTalk.greet -> smallTalk.greet2 -> smallTalk.greet3": 2
    },
    {
    "smallTalk.greet -> smallTalk.greet2 -> smallTalk.greet3 -> smallTalk.greet4": 1
    },
    {
    "smallTalk.greet -> smallTalk.greet2": 1
    }
    ]
    }
    }
    }

    关于ElasticSearch 映射折叠/对分组文档执行操作的结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60662222/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com