gpt4 book ai didi

linux - 将 "table-data"文件转换成 CSV 格式

转载 作者:塔克拉玛干 更新时间:2023-11-03 01:50:25 24 4
gpt4 key购买 nike

我有一个包含一些“表格”数据的长文本文件,即:

12/10/2018  aaaa bbb     xxxxxxxxxxxxxxxxxxxxxxxxxxxxx      002424004234
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
12/11/2018 cccc dddd yyyyyyyyyyyyyyyyyyyyyy 0542121212122
yyyyyyyyyyyyyyyyyyyyyy
12/12/2018 eeee ffffff zzzzzzzzzzzzzzzzzzzzzzz 0639872651252
12/13/2018 ggggggg hhhhhh vvvvv vvvvvvvvvvvvvvvvv 1968745213648
vvvvvvvvvvvvvvvvvvvvvvv
12/14/2018 ....

作为扫描的结果,其中一些列就像电子表格的“单元格”。我如何使用一些命令行工具对 CSV 文件进行转换,例如:

12/10/2018,aaaaaaaa,bbbbb,xxxxxx.......xxxx,002424004234
12/11/2018,ccccccc,dddddd,yyyyyy.......yyyy,0542121212122

等等?

谢谢

编辑:我有一个文件扫描结果的文本文件。本文以“表格方式”呈现数据,即第三列是“多行”文本。我会将其转换为一个简单的 CSV 文件,即在一行中,我将包含多行“单元格”的所有文本。 xxxxx...xxxx 重现第三列的多行文本

编辑 2:数据示例

Date         AMOUNT      OP     DESCRIPTION                                 CODE
12/10/2018 $123,45 id01 payment for hotel in Las Vegas 005214875462
room
room service
dinner
golf club

12/11/2018 $400,00 id04 cash from ATM 0528158852687
located in L.A.
12/12/2018 $1000,00 id99 ACME tornado pill 854674852658

我想转型

12/10/2018;$123,45;id01;payment for hotel in Las Vegas room room service     dinner golf club;005214875462     
12/11/2018;$400,00;id04;cash from ATM located in L.A.;0528158852687
12/12/2018;$1000,00;id99;ACME tornado pill;854674852658

最佳答案

您需要使用多空格作为字段分隔符 (FS) 并修剪输入中的尾随空格。检查以下代码(另存为ip.awk)

BEGIN{
FS="[[:space:]][[:space:]]+";
op[0] = "";
line = 0;
}
{
if(NR <= 1 || NF == 0)
skip;
if(NF==5)
{
line = line + 1;
op[line,"1"] = $1;
op[line,"2"] = $2;
op[line,"3"] = $3;
op[line,"4"] = $4;
op[line,"5"] = $5;
}
else{
#printf("line:%d,tok=%s,ex=%s\n",line,$2,op[line,"4"]);
op[line,"4"] = op[line,"4"] " " $2;
}
}
END{
OFS=";"
for(i=1;i<=line;i++)
print op[i,"1"],op[i,"2"],op[i,"3"],op[i,"4"],op[i,"5"];
}

你可以像这样运行代码1.txt为输入文件

cat 1.txt | sed 's/[ \t]*$//g' | awk -f ip.awk

OP是

Date;AMOUNT;OP;DESCRIPTION;CODE
12/10/2018;$123,45;id01;payment for hotel in Las Vegas room room service dinner golf club ;005214875462
12/11/2018;$400,00;id04;cash from ATM located in L.A.;0528158852687
12/12/2018;$1000,00;id99;ACME tornado pill ;854674852658

关于linux - 将 "table-data"文件转换成 CSV 格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53728312/

24 4 0