gpt4 book ai didi

elasticsearch - 计算对象关键字出现的次数,该次数被ElasticSearch中的其他参数分组

转载 作者:行者123 更新时间:2023-12-02 23:08:54 24 4
gpt4 key购买 nike

我在ES中有以下文档:

[
{
"endpoint": "/abc",
"user": "John",
"method": "GET",
"params": {
"param1": 1,
"param2": 2
}
},
{
"endpoint": "/abc",
"user": "John",
"method": "GET",
"params": {
"param1": 4,
"param2": 5,
"param3": 100
}
},
{
"endpoint": "/xyz",
"user": "Jimmy",
"method": "POST",
"params": {
"param1": 99,
"param2": 88,
"param4": 65
}
},
{
"endpoint": "/xyz",
"user": "Jimmy",
"method": "POST",
"params": {
"param1": 4,
"param2": 2,
"param5": 3
}
}
]

我想执行一个按(端点,用户,方法,param_name)分组的计数聚合,其中param_name是params对象的键。因此,以上文档集合的汇总为:
endpoint: /abc, user: John, method: GET, param1: 2 ( since param1 is used 2 times by user John on endpoint /abc with method GET)
endpoint: /abc, user: John, method: GET, param2: 2
endpoint: /abc, user: John, method: GET, param3: 1
endpoint: /xyz, user: Jimmy, method: POST, param1: 2
endpoint: /xyz, user: Jimmy, method: POST, param2: 2
endpoint: /xyz, user: Jimmy, method: POST, param4: 1
endpoint: /xyz, user: Jimmy, method: POST, param5: 1

非常感谢您提供有关如何解决此问题的帮助!

最佳答案

如果您的映射如下所示(为简洁起见,则为以下内容而被折叠):

{"groups":{"mappings":{"properties":{"endpoint":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"groups":{"type":"nested","properties":{"group_id":{"type":"long"},"parent_group_id":{"type":"long"},"parent_group_title":{"type":"text","term_vector":"with_positions_offsets","fields":{"keyword":{"type":"keyword"}},"analyzer":"my_custom_analyzer"},"title":{"type":"text","term_vector":"with_positions_offsets","fields":{"keyword":{"type":"keyword"}},"analyzer":"my_custom_analyzer"}}},"method":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"params":{"properties":{"param1":{"type":"long"},"param2":{"type":"long"},"param3":{"type":"long"},"param4":{"type":"long"},"param5":{"type":"long"}}},"user":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}}}

您可以利用一堆链式 terms aggs加上 scripted_metric来汇总各个参数的统计信息:
GET groups/_search
{
"size": 0,
"aggs": {
"by_endpoint": {
"terms": {
"field": "endpoint.keyword"
},
"aggs": {
"by_user": {
"terms": {
"field": "user.keyword"
},
"aggs": {
"by_method": {
"terms": {
"field": "method.keyword"
},
"aggs": {
"by_params": {
"scripted_metric": {
"init_script": "state.params_map=[:]",
"map_script": """
def param_keys = ['param1', 'param2', 'param3', 'param4', 'param5'];

for (def key : param_keys) {

def param_path = 'params.' + key;
if (!doc.containsKey(param_path) || doc[param_path].size() == 0) return;

def param = doc[param_path].value + '';

if (state.params_map.containsKey(key)) {
state.params_map[key] += 1;
} else {
state.params_map[key] = 1;
}
}
""",
"combine_script": "return state",
"reduce_script": "return states"
}
}
}
}
}
}
}
}
}
}

屈服
...
{
"key":"/abc",
"doc_count":2,
"by_user":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":"John",
"doc_count":2,
"by_method":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":"GET",
"doc_count":2,
"by_params":{
"value":[
{
"params_map":{
"param3":1,
"param1":2,
"param2":2
}
}
]
}
}
]
}
}
]
}
}
...

可以很容易地将其后处理为上面的csv-ish格式。

关于elasticsearch - 计算对象关键字出现的次数,该次数被ElasticSearch中的其他参数分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62329210/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com