第一个样本:快速又肮脏:
如果您的逗号总是在文本字符串后跟一个空格,而在字段分隔中永远不要,那么您可以使用:
sed -e 's/,\([^ ]\)/\|\1/g'
"Chang, Yao-Jen"|33|MIS|"Taiwan, Taipei"|M
但您必须确定下一个字符。
更加精巧的示例,无空间需求,最接近您的原始想法。
sed -e ':a;s/^\(\("[^"]*"\|[^",]*\)*\),/\1|/;ta'
echo '"Chang, Yao-Jen",33,MIS,"Taiwan, Taipei",M' |
sed -e ':a;s/^\(\("[^"]*"\|[^",]*\)*\),/\1|/;ta'
"Chang, Yao-Jen"|33|MIS|"Taiwan, Taipei"|M
echo '"Chang,Yao-Jen",33,MIS,"Taiwan,Taipei",M' |
sed -e '1 { :a;s/^\(\("[^"]*"\|[^",]*\)*\),/\1|/;ta }'
"Chang,Yao-Jen"|33|MIS|"Taiwan,Taipei"|M
解释:
sed -e '
:a
s/^\(\("[^"]*"\|[^",]*\)*\),/\1|/
ta
'
:a
是分支(循环)的地址位置
从'[^“,] *,'或'” ...“,'的行首搜索s/
,而不是用vbar替换逗号。
如果以前的ta
已匹配,则
s/
分支到。
当您要求在第2行上操作时,您将必须:
sed -e '2 { :a; s/^\(\("[^"]*"\|[^",]*\)*\),/\1|/; ta } '
编辑: [错误!见编辑3]
如果您想混合使用双引号和双引号,请参见以下示例:
有一个带有混合引号,未引号和一个字段的示例
包含引号,但双引号:
cat <<eof >sample
A,B,"C,D",E,"F,G",H,"I,J,K"
"Chang, Yao-Jen",33,MIS,"Taiwan, Taipei",M
A,B,'C,D',E,'F,G',H,'I,J,K'
'Chang, Yao-Jen',33,MIS,'Taiwan, Taipei',M
"Chang, Yao-Jen",33,MIS,"Taiwan, Taipei",M,'Chang,Yao-Jen',34,MZZ,'Taiwan, Taipei',Z
"Chang's son: Yao-Lu",55,MAA,'Taiwan, too',z
eof
sed -e ':a;s/^\(\(\(['\''"]\)[^\3]*\3\|[^",'\'']*\)*\),/\1|/;ta' sample
A|B|"C,D"|E|"F,G"|H|"I,J,K"
"Chang, Yao-Jen"|33|MIS|"Taiwan, Taipei"|M
A|B|'C,D'|E|'F,G'|H|'I,J,K'
'Chang, Yao-Jen'|33|MIS|'Taiwan, Taipei'|M
"Chang, Yao-Jen"|33|MIS|"Taiwan, Taipei"|M|'Chang,Yao-Jen'|34|MZZ|'Taiwan, Taipei'|Z
"Chang's son: Yao-Lu"|55|MAA|'Taiwan, too'|z
sed脚本可以限制在一个更具可读性的脚本文件中,如下所示:
cat <<oesedscript >csvtopsv.sed
#!/bin/sed -f
# Coma Separated Values to Pipe Separated Values
:a
s/^\(\(\(['"]\)[^\3]*\3\|[^",']*\)*\),/\1|/;
ta
oesedscript
chmod +x csvtopsv.sed
./csvtopsv.sed sample
A|B|"C,D"|E|"F|G"|H|"I|J|K"
"Chang, Yao-Jen"|33|MIS|"Taiwan, Taipei"|M
A|B|'C,D'|E|'F|G'|H|'I|J|K'
'Chang, Yao-Jen'|33|MIS|'Taiwan, Taipei'|M
"Chang, Yao-Jen"|33|MIS|"Taiwan, Taipei"|M|'Chang,Yao-Jen'|34|MZZ|'Taiwan, Taipei'|Z
"Chang's son: Yao-Lu"|55|MAA|'Taiwan, too'|z
解释:
s/
搜索引号或双引号
['"]
作为第三个封闭的正则表达式部分,后跟0或多个比第三个封闭的数学部分更复杂的字符,最后一个跟第三个正则表达式部分相同的第二个字符...或无逗号,单引号或双引号
[,'"]
...
编辑3 警告!这是
错误! :
所以正确的答案肯定是这样的:
sed -e ':a;s/^\(\(\(['\''"]\)[^\3]*\3\|[^",'\'']*\)*\),/\1|/;ta'
您可能会在
;L
之前添加
ta
进行调试时看到我的错误:
sed -e ':a;s/^\(\(\(['\''"]\)[^\3]*\3\|[^",'\'']*\)*\),/\1|/;L;ta'
在哪里
echo '1,"John Doe","6, rue Peuh",236,"B,-,F,H,P,-",-55' |
sed -e ':a;s/^\(\("[^"]*"\|'\''[^'\'']*'\''\|[^",'\'']*\)*\),/\1#/;L;ta'
1#"John Doe","6, rue Peuh",236,"B,-,F,H,P,-",-55
1#"John Doe"#"6, rue Peuh",236,"B,-,F,H,P,-",-55
1#"John Doe"#"6, rue Peuh"#236,"B,-,F,H,P,-",-55
1#"John Doe"#"6, rue Peuh"#236#"B,-,F,H,P,-",-55
1#"John Doe"#"6, rue Peuh"#236#"B,-,F,H,P,-"#-55
1#"John Doe"#"6, rue Peuh"#236#"B,-,F,H,P,-"#-55
1#"John Doe"#"6, rue Peuh"#236#"B,-,F,H,P,-"#-55
我们可以看到,这并不是那么简单...
[^\3]
不会达到预期的效果,而是匹配不符合char
3
的条件。
最后,我们必须为自己搜索每个定界符:
:a;
s/^\(\("[^"]*"\|'[^']*'\|[^",']*\)*\),/\1\t/;
ta
通知:,从这里开始,我将
csv2tsv
表示为逗号,以制表符分隔的值,如果您真的更喜欢使用
|
管道作为分隔符,则可以将
\t
替换为
|
或所需的任何字符。
好吧,命令线不那么性感:
echo '1,"John Doe","6, rue Peuh",236,"B,-,F,H,P,-",-55' |
sed -e ':a;s/^\(\("[^"]*"\|'\''[^'\'']*'\''\|[^",'\'']*\)*\),/\1\t/;L;ta'
1 "John Doe","6, rue Peuh",236,"B,-,F,H,P,-",-55
1 "John Doe" "6, rue Peuh",236,"B,-,F,H,P,-",-55
1 "John Doe" "6, rue Peuh" 236,"B,-,F,H,P,-",-55
1 "John Doe" "6, rue Peuh" 236 "B,-,F,H,P,-",-55
1 "John Doe" "6, rue Peuh" 236 "B,-,F,H,P,-" -55
1 "John Doe" "6, rue Peuh" 236 "B,-,F,H,P,-" -55
1 "John Doe" "6, rue Peuh" 236 "B,-,F,H,P,-" -55
但这符合需要。
echo '1,"John Doe","6, rue Peuh",236,"B,-,F,H,P,-",-55' |
sed -e ':a;s/^\(\("[^"]*"\|'\''[^'\'']*'\''\|[^",'\'']*\)*\),/\1\t/;ta'
1 "John Doe" "6, rue Peuh" 236 "B,-,F,H,P,-" -55
好的,创建
sedscript :
cat >csv2tsv.sed <<eof
#!/bin/sed -f
# Coma separated values to Tab separated values
:a
s/^\(\("[^"]*"\|'[^']*'\|[^",']*\)*\),/\1\t/;
ta
eof
chmod +x csv2tsv.sed
现在:
cat >file.csv <<eof
A,B,"C,D",E,"F,G",H,"I,J,K"
"Chang, Yao-Jen",33,MIS,"Taiwan, Taipei",M
1,"John Doe","6, rue Peuh",236,"B,-,F,H,P,-",-55
4,"hacker's string",'one quote: "I have no special talents. I am only passionat\
ely curious." - Albert Einstein',unquoted string,9,1,1,3
eof
./csv2tsv.sed file.csv
A B "C,D" E "F,G" H "I,J,K"
"Chang, Yao-Jen" 33 MIS "Taiwan, Taipei" M
1 "John Doe" "6, rue Peuh" 236 "B,-,F,H,P,-" -55
4 "hacker's string" 'one quote: "I have no special talents. I am only pa
ssionately curious." - Albert Einstein' unquoted string 9 1 1 3
我是一名优秀的程序员,十分优秀!