gpt4 book ai didi

loops - 使用awk获取一列中具有相同值的各行的所有值

转载 作者:行者123 更新时间:2023-12-03 23:58:05 24 4
gpt4 key购买 nike

我有一个带有树列的数据集 (test-file.csv):

node,contact,mail
AAAA,Peter,peter@anything.com
BBBB,Hans,hans@anything.com
CCCC,Dieter,dieter@anything.com
ABABA,Peter,peter@anything.com
CCDDA,Hans,hans@anything.com

我喜欢将标题扩展为 count 列,并将 node 重命名为 nodes。此外,所有条目都应排在第二列 (mail) 之后。在列 count 我想得到列 mail 的出现次数,在 nodes 列中具有相同值的所有条目 mail 应打印(空格分隔并按字母顺序排序)。

这是我试图实现的目标:

contact,mail,count,nodes
Dieter,dieter@anything,com,1,CCCC
Hans,hans@anything.com,2,BBBB CCDDA
Peter,peter@anything,com,2,AAAA ABABA

我有这个 awk 命令:

awk -F"," '
BEGIN{
FS=OFS=",";
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR>1{
counts[$3]++; # Increment count of lines.
contact[$2]; # contact
}
END {
# Iterate over all third-column values.
for (x in counts) {
printf "%s,%s,%s,%s\n", contact[x],x,counts[x],"nodes"
}
}
' test-file.csv | sort --field-separator="," --key=2 -n

但是这是我的结果:-(只有发生的数量才会起作用。

,Dieter@anything.com,1,nodes
,hans@anything.com,2,nodes
,peter@anything.com,2,nodes
contact,mail,count,nodes

任何帮助表示赞赏!

最佳答案

你可以使用这个gnu awk:

awk '
BEGIN {
FS = OFS = ","
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR > 1 {
++counts[$3] # Increment count of lines.
name[$3] = $2
map[$3] = ($3 in map ? map[$3] " " : "") $1
}
END {
# Iterate over all third-column values.
PROCINFO["sorted_in"]="@ind_str_asc";
for (k in counts)
print name[k], k, counts[k], map[k]
}
' test-file.csv

输出:

contact,mail,count,nodes
Dieter,dieter@anything.com,1,CCCC
Hans,hans@anything.com,2,BBBB CCDDA
Peter,peter@anything.com,2,AAAA ABABA

关于loops - 使用awk获取一列中具有相同值的各行的所有值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67720710/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com