gpt4 book ai didi

awk - 如何匹配列字段并将它们的值组合在一起?

转载 作者:行者123 更新时间:2023-12-04 02:26:51 26 4
gpt4 key购买 nike

我正在对我使用 pdfgrep 创建的一些文件进行排序,以列出我拥有的某些 PDF 的页码。它产生了以下输出:

./Buddhism in the Shadow of Brahmanism.pdf:111:      Then, rising from his seat, covering one shoulder with his robe, the king
./Buddhism in the Shadow of Brahmanism.pdf:182:branch who has adopted the yellow robes of Buddhism; he is sur-
./Buddhism in the Shadow of Brahmanism.pdf:229: resolve that his body, his bowl, and his monastic robe (which had been
./Buddhism in the Shadow of Brahmanism.pdf:230:robe. In this way, Mahākāśyapa (or at least his body) is to act as a sort
./Buddhism in the Shadow of Brahmanism.pdf:230:corpse to his disciples and displays to them the Buddha’s robe, and they
./Buddhism in the Shadow of Brahmanism.pdf:230:offer him the robe that the Buddha had confided to him. Only then will
./Introduction to the History of Indian Buddhism.pdf:31:the robes of a Buddhist monk in an effort to convert them, he was Sciequia. For
./Introduction to the History of Indian Buddhism.pdf:54:monks, and in particular on retreat, robes, and chastity, p. 308.—On the life of
./Introduction to the History of Indian Buddhism.pdf:97:are the Kat.hināvadāna, which deals with the bowl, the staff, and the robes of
./Introduction to the History of Indian Buddhism.pdf:111:of a sort of robe.
./Introduction to the History of Indian Buddhism.pdf:112:cover his nakedness, and who rejects all other robes as superfluous.
./Introduction to the History of Indian Buddhism.pdf:127:noon, after having taken his robe and his bowl,
./Introduction to the History of Indian Buddhism.pdf:127:bowl and his robe, he went to the place where the Cāpāla caitya6 was located,

我想做的是将第二列中与文件名匹配的页码组合在一起,我希望输出看起来像这样:

./Buddhism in the Shadow of Brahmanism.pdf:111, 182, 229, 230
./Introduction to the History of Indian Buddhism.pdf:31, 54, 97, 111, 112, 127

我试过使用 awk 来解析第一个值,然后在同一个文件上使用这些结果来仅打印页码,这样我就可以 grep 结果并稍后在文件名后追加,如下所示:

awk -F : '{print $1}' parsing_file | uniq | while read line; do awk -v number="$line" -F : '$1 == "$number" { print $2 }' parsing_file; done 

但这并没有通过,我猜测 uniqwhile read 可以被删除,也许只使用一些数组与 awk?

我在这里看到过类似的事情:

https://unix.stackexchange.com/questions/167280/awk-group-by-and-sum-column-values

但我不想对列上的值求和,而是想将它们组合在一起。

谢谢

最佳答案

使用您展示的示例,请尝试执行以下操作。用 GNU awk 编写和测试。

awk  -v OFS=":" '
match($0,/^\.\/.*\.pdf:[0-9]+/){
value=substr($0,RSTART,RLENGTH)
split(value,arr,":")
if(!seen[arr[1],arr[2]]++){
name[arr[1]]=(name[arr[1]]?name[arr[1]]", ":"")arr[2]
}
}
END{
for(key in name){
print key,name[key]
}
}
' Input_file

您显示的示例的输出如下:

./Buddhism in the Shadow of Brahmanism.pdf:111, 182, 229, 230
./Introduction to the History of Indian Buddhism.pdf:31, 54, 97, 111, 112, 127

说明: 为以上添加详细说明。

awk  -v OFS=":" '                   ##Starting awk program from here.
match($0,/^\.\/.*\.pdf:[0-9]+/){ ##Using match function to match from starting ./ till .pdf : digits as per shown samples.
value=substr($0,RSTART,RLENGTH) ##Creating value with matched sub string here.
split(value,arr,":") ##Splitting value into array arr with : delimiter.
if(!seen[arr[1],arr[2]]++){
name[arr[1]]=(name[arr[1]]?name[arr[1]]", ":"")arr[2] ##Creating name array with index of book name and its value it digits as per needed output.
}
}
END{ ##Starting END block of this program from here.
for(key in name){ ##Traversing through name here.
print key,name[key] ##Printing key and array value here.
}
}
' Input_file ##Mentioning Input_file name here.

注意:之前上面的解决方案没有处理来自同一段落的重复数字,所以我编辑了解决方案来处理 Ed 回答后的情况。

关于awk - 如何匹配列字段并将它们的值组合在一起?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66841703/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com