gpt4 book ai didi

awk 如何仅在先前字段相同的情况下删除字段中的重复项

转载 作者:行者123 更新时间:2023-12-02 14:43:00 24 4
gpt4 key购买 nike

仅当先前的字段相同时,我才尝试从字段中删除重复项(并将其替换为空白)。例如:

示例输入:

France  Paris      Museum of Fine Arts          blabala
France Paris Museum of Fine Arts blajlk
France Paris Yet another museum lqmsjdf
France Paris Museum of National History mlqskjf
France Bordeaux Museum of Fine Arts qsfsqf
France Bordeaux City Hall lmqjflqsk
France Bordeaux City Hall lqkjfqlskjflqskfj
Spain Madrid Museum of Fine Arts lqksjfh
Spain Madrid Museum of Fine Arts qlmfjlqsjf
Spain Barcelona City Hall nvqjvvnqk
Spain Barcelona Museum of Fine Arts lmkqjflqksfj

期望的输出:

France    Paris        Museum of FineArts                    blabala
blajlk
Yet another museum lqmsjdf
Museum of National History mlqskjf
Bordeaux Museum of Fine Arts qsfsqf
City Hall lmqjflqsk
lqkjfqlskjflqskfj
Spain Madrid Museum of Fine Arts lqksjfh
qlmfjlqsjf
Barcelona City Hall nvqjvvnqk
Museum of Fine Arts lmkqjflqksfj

提前非常感谢您提供的任何帮助。

最佳答案

尝试一下:

awk -F '\t' 'BEGIN {OFS=FS} {if ($1 == prev1) $1 = ""; else prev1 = $1; if ($2 == prev2) $2 = ""; else prev2 = $2; if ($3 == prev3) $3 = ""; else prev3 = $3; print}' inputfile

这是一个较短的版本,适用于任意数量的字段(始终打印最后一个字段):

awk -F '\t' 'BEGIN {OFS=FS} {for (i=1; i<=NF-1;i++) if ($i == prev[i]) $i = ""; else prev[i] = $i; print}' inputfile

输出不会针对屏幕使用进行对齐,但会有正确的选项卡数量。

输出将如下所示:

field1 TAB field2 TAB field3 TAB field4
TAB TAB TAB field4
TAB TAB field3 TAB field4
TAB field2 TAB field3 TAB field4
etc.

如果您需要对齐列,那也是可能的。

编辑:

此版本允许您指定要删除重复的字段:

#!/usr/bin/awk -f
BEGIN {
FS="\t"; OFS=FS
deduplist=ARGV[1]
ARGV[1]=""
split(deduplist,tmp," ")
for (i in tmp) dedup[tmp[i]]=1
}
{
for (i=1; i<=NF;i++)
if (i in dedup) {
if ($i == prev[i])
$i = ""
else
prev[i] = $i
}
# prevent printing lines that are completely blank because
# it's an exact duplicate of the preceding line and all fields
# are being deduplicated
if ($0 !~ /^[[:blank:]]*$/)
print
}

像这样运行它:./script.awk "2 3"inputfile 以删除字段 2 和字段 3 的重复项。

关于awk 如何仅在先前字段相同的情况下删除字段中的重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4785566/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com