gpt4 book ai didi

awk - 从 bash 输出中删除包含大量可能性的行

转载 作者:行者123 更新时间:2023-12-04 12:06:02 25 4
gpt4 key购买 nike

我正在尝试过滤一个大 txt 文件(大约 10GB)的行,仅当 direction 出现在被叫号码的前缀上时。列等于 2 .
这是我从管道获取的文件格式(来自不同的脚本)

caller_number=34234234324, clear_number=982545345435, direction=1, ...
caller_number=83479234234, clear_number=348347384533, direction=2, ...
当然,这只是一个示例数据,但实际文件包含许多其他列,但我只想过滤 clear_number列基于 direction所以这就足够了。
我想删除不包含前缀列表的行,因此例如在这里我将使用 grep 执行以下操作:
grep -vP 'clear_number=(?!(2207891|22034418|22074450|220201677|220240574|220272183|220722988|220723276|220751152|220774457|220794227|220799141|2202000425|2202000939|2202000967)).*direction=2'
这很好用。唯一的问题是我得到的前缀数量有时约为 10K-50K,这是很多前缀,如果我尝试使用 grep 来做到这一点
我收到 grep: regular expression is too large .
任何想法如何使用 Bash 命令解决它?
更新
例如..假设我有以下内容:
caller_number=34234234324,     clear_number=982545345435, direction=1
caller_number=83479234234, clear_number=348347384533, direction=2
caller_number=2342334324, clear_number=5555345435, direction=1
caller_number=034082394234324, clear_number=33335345435, direction=1
caller_number=83479234234, clear_number=348347384533, direction=2
caller_number=83479234234, clear_number=444447384533, direction=2
caller_number=83479234234, clear_number=64237384533, direction=2
和我的 list.txt包含:
642
3333
534234235
所以它只会返回该行
caller_number=83479234234,     clear_number=64237384533, direction=2
由于清除号码以 642 开头和方向= 2 .就我而言,它将超过 10GB 的文本文件并返回至少 100K 的结果。
另一个更新
对不起,我还不清楚另一件事。我从管道命令中获取行,所以我应该做 | awk...在输出上,我收到了以前的命令。

最佳答案

使用您显示的样本,请尝试以下操作。由于 OP 已更改示例,因此现在按此添加代码。

awk '
FNR==NR{
arr[$0]
next
}
match($0,/clear_number=[^,]*/){
val=substr($0,RSTART+13,RLENGTH-13)
for(i in arr){
if(index(val,i)==1 && $NF=="direction=2,"){
print
next
}
}
}
' list.txt Input_file
说明:为上述添加详细说明。
awk '                  ##Starting awk program from here.
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when list.txt is being read.
arr[$0] ##Creating arr array with index of current line.
next ##next will skip all further statements from here.
}
match($0,/clear_number=[^,]*/){ ##Using match to match regex for clear_match till 1st occurrence of comma here.
val=substr($0,RSTART+13,RLENGTH-13) ##Creating val which has substring of matched regex.
for(i in arr){ ##Traversing through arr here.
if(index(val,i)==1 && $NF=="direction=2,"){ ##Checking condition of index AND last field is direction=2 then do following.
print ##Printing current line here.
next ##next will skip all further statements from here.
}
}
}
' list.txt Input_file ##Mentioning Input_file names here.

关于awk - 从 bash 输出中删除包含大量可能性的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67886163/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com