gpt4 book ai didi

linux - linux下如何分析文件

转载 作者:太空宇宙 更新时间:2023-11-04 04:32:47 24 4
gpt4 key购买 nike

我有一个以下格式的文件:

{"report":[{"call_time":"2018-03-31 00:10:13","number":"01232802624","CLI":"7941232455","name":null,"destination":null,"status":"Answered","duration":"27:30"}, {"call_time":"2018-03-31 00:12:21","number":"01233802632","CLI":"7831233003","name":null,"destination":null,"status":"Answered","duration":"7:48"}, {"call_time":"2018-03-31 00:51:16","number":"0123802642","CLI":"7711123367","name":null,"destination":null,"status":"Answered","duration":"0:57"}, {"call_time":"2018-03-31 01:50:33","number":"012342802624","CLI":"7812386544","name":null,"destination":null,"status":"Answered","duration":"9:54"}, {"call_time":"2018-03-31 16:29:38","number":"01232802642","CLI":"7741230002","name":null,"destination":null,"status":"Answered","duration":"0:13"}], "summary":{"Total_Calls":"3,862","Answered_Calls":"3,834","Answered":"3,922:58","Calls_Answered":"99.1%","ACD":"8:00"},"result":1}

我需要过滤掉每个“数字”的除最新十行之外的所有内容(最好是时间上最新的 10 个项目),并打印平均持续时间。

预期输出类似于:

2018-03-31 00:10:13 01232802624 27:30
01232802624 Average 27:30

2018-03-31 00:12:21 01233802632 7:48
01233802632 Average 7:48

2018-03-31 00:51:16 0123802642 0:57
2018-03-31 16:29:38 0123802642 0:13
0123802642 Average: 0:30

等等

欢迎任何想法...我已经尝试使用 sed、grep 和 awk 几个小时但无法做到这一点...我的代码和结果到处都是。我正在努力在网上寻找任何解决方案。

最佳答案

jq是处理JSON的强大工具。它有很好的文档,位于 jq Manual .

jq 对解析持续时间的支持有点缺乏,所以你可能不得不使用其他东西,而且我不确定你想要的确切输出格式,所以我没有给出完整的解决方案。

这是一个示例,也许它可以帮助您朝着正确的方向前进:

$ jq '.report | group_by(.number) | .[][-10:] | [.] | map({number: .[0].number, calls: map({call_time: .call_time, duration: .duration})}) | .[]' < data
{
"number": "01232802624",
"calls": [
{
"call_time": "2018-03-31 00:10:13",
"duration": "27:30"
}
]
}
{
"number": "01232802642",
"calls": [
{
"call_time": "2018-03-31 16:29:38",
"duration": "0:13"
}
]
}
{
"number": "01233802632",
"calls": [
{
"call_time": "2018-03-31 00:12:21",
"duration": "7:48"
}
]
}
{
"number": "012342802624",
"calls": [
{
"call_time": "2018-03-31 01:50:33",
"duration": "9:54"
}
]
}
{
"number": "0123802642",
"calls": [
{
"call_time": "2018-03-31 00:51:16",
"duration": "0:57"
}
]
}

解释:

  1. .report:获取根对象的report
  2. group_by(.number):按number键的值分组
  3. .[][-10:]:对于每个组 (.[]),仅保留最后 10 项 ([-10:])
  4. [.]:嵌套在数组中以使下一个命令满意
  5. map(...):将组数组映射到对象数组
  6. .[] 删除不必要的嵌套
<小时/>

这是另一种变体,输出以制表符分隔:

$ jq -r '.report | group_by(.number) | .[][-10:] | map([.number, .call_time, .duration]) | .[], [] | join("\t")' < data
01232802624 2018-03-31 00:10:13 27:30

01232802642 2018-03-31 16:29:38 0:13

01233802632 2018-03-31 00:12:21 7:48

012342802624 2018-03-31 01:50:33 9:54

0123802642 2018-03-31 00:51:16 0:57

解释:

  1. .report:获取根对象的report
  2. group_by(.number):按number键的值分组
  3. .[][-10:]:对于每个组 (.[]),仅保留最后 10 项 ([-10:])
  4. map(...):将对象映射到数组项
  5. .[], []:添加一个附加数组以在组之间创建空间
  6. join("\t"):用制表符连接每个数组的元素

关于linux - linux下如何分析文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49589589/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com