gpt4 book ai didi

linux - 从 awk 中排除列

转载 作者:太空狗 更新时间:2023-10-29 11:03:43 25 4
gpt4 key购买 nike

我正在尝试删除几列,然后删除文件内容的唯一性。我要删除的列有月、日、时间和纪元时间;这些在每一行中都是不同的,不能让我知道文件内容的唯一性。

sample.log 的示例内容:

Jun  5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun 5 05:13:14 AAA AAA AAAA 1433495594.306612 XXXX CCCC CCCC AAAA SDDDD DFFFFF222
Jun 5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun 5 05:13:15 AAA AAA AAAA XXXXX 1433495596.306614 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun 5 05:13:16 AAA AAA AAAA XXXXX 1433495597.306615 XXXX CCCC CCCC AAAA SDDDD DFFFFF333
Jun 5 05:13:17 AAA AAA AAAA XXXXX 1433495598.306616 XXXX CCCC CCCC AAAA SDDDD DFFFFF444

问题:

月份、日期、时间在固定列中,但是纪元时间在第 7 列和第 8 列之间切换。想知道如何处理这个问题。

示例输出:

Jun  5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun 5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun 5 05:13:15 AAA AAA AAAA XXXXX 1433495596.306614 XXXX CCCC CCCC AAAA SDDDD DFFFFF111

如果上面的问题太多了,那么如下所示:

AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
AAA AAA AAAA XXXXX 1433495596.306614 XXXX CCCC CCCC AAAA SDDDD DFFFFF111

我正在按照以下方向尝试,但不是很有帮助。

while read line
do

seven=$(echo $line |awk '{print $7}')
eight=$(echo $line |awk '{print $8}')

if [[ "$seven" =~ "^[0-9]" ]];then
#echo "seventh column starts with number"
echo $line|awk '$1=$2=$3=$7=" " {print}'
else
#echo "Eighth column starts with number"
echo $line|awk '$1=$2=$3=$8=" " {print}'
fi
done < $1

更多例子:

输入文件内容:

Jun  5 05:13:13 AAA BBB CCC 142222222222.000 DDD EEE FFFF
Jun 5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE FFFF
Jun 5 05:13:14 AAA BBB CCC 142222222224.000 DDD EEE GGGG
Jun 5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE GGGG
Jun 5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE FFFF
Jun 5 05:13:13 AAA BBB CCC XXX 142222222226.000 DDD EEE FFFF

输出:

Jun  5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE FFFF
Jun 5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE GGGG
Jun 5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE GGGG
Jun 5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE FFFF

输出:

 AAA BBB CCC  DDD EEE FFFF
AAA BBB CCC DDD EEE GGGG
AAA BBB CCC XXX DDD EEE GGGG
AAA BBB CCC XXX DDD EEE FFFF

最佳答案

如果我正确理解了这个问题,那么这里不需要 Bash,只需 Awk:

% awk '
{
for (f = 4; f <= NF; ++f) { # Start at column 4
if (f == 7 || f == 8) { # Treat columns 7 or 8 differently
if ($f !~ /^[0-9]+\.[0-9]+$/) { # Only print if non-numeric
printf $f " "
}
} else {
printf $f " "
}
}
printf "\n"
}
' sample.log
AAA AAA AAAA XXXX CCCC CCCC AAAA SDDDD DFFFFF111
AAA AAA AAAA XXXX CCCC CCCC AAAA SDDDD DFFFFF222
AAA AAA AAAA XXXX CCCC CCCC AAAA SDDDD DFFFFF111
AAA AAA AAAA XXXXX XXXX CCCC CCCC AAAA SDDDD DFFFFF111
AAA AAA AAAA XXXXX XXXX CCCC CCCC AAAA SDDDD DFFFFF333
AAA AAA AAAA XXXXX XXXX CCCC CCCC AAAA SDDDD DFFFFF444

获取唯一行:

% awk '             
{
for (f = 4; f <= NF; ++f) { # Start at column 4
if (f == 7 || f == 8) { # Treat columns 7 or 8 differently
if ($f !~ /^[0-9]+\.[0-9]+$/) { # Only print if non-numeric
printf $f " "
}
} else {
printf $f " "
}
}
printf "\n"
}
' sample2.log | sort -u
AAA BBB CCC DDD EEE FFFF
AAA BBB CCC DDD EEE GGGG
AAA BBB CCC XXX DDD EEE FFFF
AAA BBB CCC XXX DDD EEE GGGG

关于处理 %s...

如果您的输入文件包含 % 符号,根据您的评论,您需要在将它们传递给 printf 之前对它们进行转义。您可以使用像这样的 function 来做到这一点...

% awk '             
function escape_percents(s)
{
gsub("%", "%%", s)
return s
}

{
for (f = 4; f <= NF; ++f) { # Start at column 4
if (f == 7 || f == 8) { # Treat columns 7 or 8 differently
if ($f !~ /^[0-9]+\.[0-9]+$/) { # Only print if non-numeric
printf escape_percents($f) " "
}
} else {
printf escape_percents($f) " "
}
}
printf "\n"
}
' sample2.log | sort -u
AAA BBB CCC DDD %E%E%E FFFF
AAA BBB CCC DDD %E%E%E GGGG
AAA BBB CCC XXX DDD %E%E%E FFFF
AAA BBB CCC XXX DDD %E%E%E GGGG

关于linux - 从 awk 中排除列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37389686/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com