gpt4 book ai didi

awk - 如何仅将某些空白列转换为制表符?

转载 作者:行者123 更新时间:2023-12-04 16:24:19 28 4
gpt4 key购买 nike

我知道我可以使用 sed 's/[[:blank:]]/,/g' 将空格转换为逗号或我在文件中选择的任何内容,但是是否有以某种方式设置它,以便只有前 5 个空格实例将它们转换为逗号?

这是因为我的最后一列写了很多信息,所以当 sed 将该列中的所有空格都转换为逗号时,这很烦人。

示例输入文件:

sample1 gi|11| 123 33 97.23 This is a sentence
sample2 gi|22| 234 33 97.05 Keep these spaces

我正在寻找的输出:

sample1,gi|11|,123,33,97.23,This is a sentence
sample2,gi|22|,234,33,97.05,Keep these spaces

只有前 5 个空白链被更改为逗号。

最佳答案

对于 match() 的第三个参数的 GNU awk:

$ awk '{ match($0,/((\S+\s+){5})(.*)/,a); gsub(/\s+/,",",a[1]); print a[1] a[3] }' file
sample1,gi|11|,123,33,97.23,This is a sentence
sample2,gi|22|,234,33,97.05,This is a sentence

但我建议您实际上将其转换为有效的 CSV(即符合 RFC 4180 的 CSV),例如可以被 MS-Excel 和其他工具读取,因为“这是一个句子”(可能还有其他字段) ) 大概可以包含逗号和双引号:

$ awk '{
gsub(/"/,"\"\"");
match($0,/((\S+\s+){5})(.*)/,a)
gsub(/\s+/,"\",\"",a[1])
print "\"" a[1] a[3] "\""
}' file
"sample1","gi|11|","123","33","97.23","This is a sentence"
"sample2","gi|22|","234","33","97.05","This is a sentence"

例如给定这个输入:

$ cat file
sample1 gi|11| 123 33 97.23 This is a sentence
a,b,sample2 gi|22| 234 33 97.05 This is, "typically", a sentence

第一个脚本的输出不是有效的 CSV:

$ awk '{ match($0,/((\S+\s+){5})(.*)/,a); gsub(/\s+/,",",a[1]); print a[1] a[3] }' file
sample1,gi|11|,123,33,97.23,This is a sentence
a,b,sample2,gi|22|,234,33,97.05,This is, "typically", a sentence

虽然第二个脚本的输出是有效的 CSV:

$ awk '{ gsub(/"/,"\"\""); match($0,/((\S+\s+){5})(.*)/,a); gsub(/\s+/,"\",\"",a[1]); print "\"" a[1] a[3] "\"" }' file
"sample1","gi|11|","123","33","97.23","This is a sentence"
"a,b,sample2","gi|22|","234","33","97.05","This is, ""typically"", a sentence"

关于awk - 如何仅将某些空白列转换为制表符?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68512991/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com