gpt4 book ai didi

awk - 根据第一列中名称的出现将文件拆分为多个文件

转载 作者:行者123 更新时间:2023-12-01 08:40:46 24 4
gpt4 key购买 nike

如何将 file.txt 拆分为子文件,其中每个文件在 file.txt 中连续出现 XX?例如,将以 XX 开头的行打印到 file1.txt 中,如果下一行不是 XX,则关闭 file1.txt 并打开 file2.txt 以便下次出现 XX。

输入文件:file.txt

some header information
XX 123 456 abc
XX 234 567 def
XX 456 345 ghi
END
XX 345 654 ijk
XX 567 789 klm
XX 678 asd mno
XX 567 thy mnb
YY 123 dft fty
XX 456 tfg tyg
XX 456 thu gtr
PAGE2
XX 345 dcf try

期望的输出:

文件1.txt

XX 123 456 abc
XX 234 567 def
XX 456 345 ghi

文件2.txt

XX 345 654 ijk
XX 567 789 klm
XX 678 asd mno
XX 567 thy mnb

文件3.txt

XX 456 tfg tyg 
XX 456 thu gtr

文件4.txt

XX 345 dcf try

最佳答案

使用awk,一行:

$ awk '!/^XX/{if(f)close(f);f=sprintf("file%d.txt",++n);next}{print >f}' infile

说明:

awk '!/^XX/{                          # if line/record/row does not start with XX
if(f) # if variable f was set before
close(f); # close file
f=sprintf("file%d.txt",++n); # pre increment variable n, generate new file name
next # go to next line
}
{
print >f # Records starts with XX will be
# written to file defined in variable f
}
' infile

测试结果:

输入:

$ cat infile
some header information
XX 123 456 abc
XX 234 567 def
XX 456 345 ghi
END
XX 345 654 ijk
XX 567 789 klm
XX 678 asd mno
XX 567 thy mnb
YY 123 dft fty
XX 456 tfg tyg
XX 456 thu gtr
PAGE2
XX 345 dcf try

输出:

$ cat file1.txt 
XX 123 456 abc
XX 234 567 def
XX 456 345 ghi

$ cat file2.txt
XX 345 654 ijk
XX 567 789 klm
XX 678 asd mno
XX 567 thy mnb

$ cat file3.txt
XX 456 tfg tyg
XX 456 thu gtr

$ cat file4.txt
XX 345 dcf try

评论:

if there are too many lines in header information of the input file, the output file name starts with a bigger number. How can I start the output file from ouput1.out and so on?

awk '/^XX/{if(!w)f=sprintf("file%d.txt",++n);w=1;print >f;next}{close(f);w=0}' infile

说明:

awk '/^XX/{                             # if line starts with XX
if(!w) # if negate of w is true
f=sprintf("file%d.txt",++n); # pre increment n, and set up variable f
w=1; # set variable w = 1
print >f; # write record/row/line to file
next # go to next line
}
{ # for which does not start with XXX
close(f); # close file
w=0 # set w = 0, (so that for next line with XX use newfile)
}
' infile

测试结果 - 供评论:

输入修改:

$ cat infile 
some header information
some header2
some header 3
XX 123 456 abc
XX 234 567 def
XX 456 345 ghi
END
some more extra
wxxasa
extrasa
XX 345 654 ijk
XX 567 789 klm
XX 678 asd mno
XX 567 thy mnb
YY 123 dft fty
XX 456 tfg tyg
XX 456 thu gtr
PAGE2
XX 345 dcf try

执行:

$ awk '/^XX/{if(!w)f=sprintf("file%d.txt",++n); w=1;  print >f;next}{close(f); w=0}' infile 

生成的文件:

$ ls *.txt -1
file1.txt
file2.txt
file3.txt
file4.txt

每个文件的内容:

$ for i in *.txt; do echo "File: $i"; cat $i; done
File: file1.txt
XX 123 456 abc
XX 234 567 def
XX 456 345 ghi
File: file2.txt
XX 345 654 ijk
XX 567 789 klm
XX 678 asd mno
XX 567 thy mnb
File: file3.txt
XX 456 tfg tyg
XX 456 thu gtr
File: file4.txt
XX 345 dcf try

关于awk - 根据第一列中名称的出现将文件拆分为多个文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48311604/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com