gpt4 book ai didi

awk - 在awk中用双引号转义分隔符

转载 作者:行者123 更新时间:2023-12-03 10:15:06 25 4
gpt4 key购买 nike

我正在使用awk来使用“,”作为分隔符来解析我的数据,因为输入是一个csv文件。但是,数据中存在用双引号(“...”)进行转义的“,”。


filed1,filed2,field3,"field4,FOO,BAR",field5

如何忽略双引号中的逗号“,”,以便可以使用awk正确解析输出?我知道我们可以在excel中做到这一点,但是我们如何在awk中做到这一点呢?

最佳答案

使用GNU awk 4很简单:

zsh-4.3.12[t]% awk '{ 
for (i = 0; ++i <= NF;)
printf "field %d => %s\n", i, $i
}' FPAT='([^,]+)|("[^"]+")' infile
field 1 => filed1
field 2 => filed2
field 3 => field3
field 4 => "field4,FOO,BAR"
field 5 => field5

根据OP要求添加一些注释。

GNU awk manual on "Defining fields by content:

The value of FPAT should be a string that provides a regular expression. This regular expression describes the contents of each field. In the case of CSV data as presented above, each field is either “anything that is not a comma,” or “a double quote, anything that is not a double quote, and a closing double quote.” If written as a regular expression constant, we would have /([^,]+)|("[^"]+")/. Writing this as a string requires us to escape the double quotes, leading to:

FPAT = "([^,]+)|(\"[^\"]+\")"



使用 +两次,这对于空白字段无法正常工作,但也可以将其修复:

As written, the regexp used for FPAT requires that each field contain at least one character. A straightforward modification (changing the first ‘+’ to ‘*’) allows fields to be empty:

FPAT = "([^,]*)|(\"[^\"]+\")"

关于awk - 在awk中用双引号转义分隔符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7804673/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com