gpt4 book ai didi

awk - 如何使用 awk 命令使用关联数组来计算文件中特定字符的出现次数

转载 作者:行者123 更新时间:2023-12-02 04:32:08 26 4
gpt4 key购买 nike

我有这样的文件:

<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="cea3afa0a7bda68eb7afa6a1a1e0ada1a3" rel="noreferrer noopener nofollow">[email protected]</a>
Rajesh<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="587628392c3d341830372c35393134763136" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="99f3f2f5d9fef4f8f0f5b7ecf2" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="eca2899bdddedfac999899c28d8fc28582" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3140465471565c50585d1f525e1f585f" rel="noreferrer noopener nofollow">[email protected]</a>

我想将每个域的出现次数计算为

Domain Name No of Email
-----------------------
com 1
in 3
uk 1

最佳答案

这是一个纯 POSIX awk解决方案(从sort程序内部调用awk):

awk -F. -v OFS='\t' '
# Build an associative array that maps each unique top-level domain
# (taken from the last `.`-separated field, `$NF`) to how often it
# occurs in the input.
{ a[$NF]++ }

END {
# Print the header.
print "Domain Name", "No of Email"
print "----------------------------"
# Output the associative array and sort it (by top-level domain).
for (k in a) print k, a[k] | "sort"
}
' file

如果您有GNU awk 4.0或更高,无需外部即可凑合sort甚至可以轻松地从 gawk 内部控制排序字段程序:

gawk -F. -v OFS='\t' '
# Build an associative array that maps each unique top-level domain
# (taken from the last `.`-separated field, `$NF`) to how often it
# occurs in the input.
{ a[$NF]++ }

END {
# Print the header.
print "Domain Name", "No of Email"
print "----------------------------"
# Output the associative array and sort it (by top-level domain).
# First, control output sorting by setting the order in which
# the associative array will be looped over by, via the special
# PROCINFO["sorted_in"] variable; e.g.:
# - Sort by top-level domain, ascending: "@ind_str_asc"
# - Sort by occurrence count, descending: "@val_num_desc"
PROCINFO["sorted_in"]="@ind_str_asc"
for (k in a) print k, a[k]
}
' file

关于awk - 如何使用 awk 命令使用关联数组来计算文件中特定字符的出现次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22800531/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com