gpt4 book ai didi

linux - 纯文本表格到 csv linux

转载 作者:塔克拉玛干 更新时间:2023-11-03 01:13:42 29 4
gpt4 key购买 nike

我有一些明文表格,需要以csv格式输出如果我执行 tr 并替换字符,当我有 2 行时,我的字段会出现一些问题。

cat file.txt | tr -s '|' ' ' | tr -s '_' ' '

原始表格:

 ____________________________________________________________________________
| Name | AB | DATA | SOME | IF | DATE |
|___________________________|_________|__________|_______|________|__(UTC)__|
| Marra Carolina Odoriz | | | | |2019-07- |
| Dolman |36737202 |098787267 | 45 | - |09T10:35:|
|____________________________|_________|__________|_______|________|_50.289Z_|
| | | | | |2019-07- |
| - |53959997 |098543650 | 30 | - |09T12:02:|
|____________________________|_________|__________|_______|________|_36.746Z_|
| | | | | |2019-07- |
| Vic Velazquez |33577915 |096638025 | - | 6000 |09T12:40:|
|____________________________|_________|__________|_______|________|_17.754Z_|
| Gabriela Letacia Cararallo | | | | |2019-07- |
| Vacchetzi |43132876 |091322398 | 30 | - |09T12:40:|
|____________________________|_________|__________|_______|________|_40.887Z_|

我需要 csv 的输出对于这个普通表格示例:

NAME;AB;DATA;SOME;IF;DATE (UTC)
Marra Carolina Odoriz Dolman;36737202;098787267;45;-;2019-07-09T10:35:50.289Z
-;53959997;098543650;30;-;2019-07-09T12:02:36.746Z
Vic Velazquez;33577915;096638025;-;6000;2019-07-09T12:40:17.754Z
Gabriela Letacia Cararallo Vacchetzi;43132876;091322398;30;-;2019-0709T12:40:40.887Z

如果我有没有“table ascii”设计的原始多行输入文件,可以将此部分解决方案应用于案例吗?我已经申请:

while(<>)
{

@vals = split /\ /; # split fields into the val array (now I take the blank space)
$size = @vals;
for( $i = 0 ; $i < $size ; $i++ )
{
#clean up the values: remove underscores and extra spaces
#remove semicolons
$vals[$i] =~ s/_/ /g;
$vals[$i] =~ s/;/ /g;
$vals[$i] =~ s/^ *//;
$vals[$i] =~ s/ *$//;

# append the value to the data record for this field
$data[$i] .= $vals[$i];

# special handling for first field: use spaces when joining
$data[$i] .= " " if ($i==0);
}
if(/\R/) # Taking four underscores to indicate the end of the record
# now taking the return of carriage of new line how end of the record
{
# clean up the first record; trim spaces
$data[0] =~ s/^ *//;
$data[0] =~ s/ *$//;
$data[3] =~ s/\..*//;

# join the records with semicolons
$line = join (";", @data);

# collapse multiple spaces
$line =~ s/ +/ /g;

# print this line and start over
print "$line\n" unless ($line eq '');
@data = ();
}
}

使用此解决方案的结果是:

名称;完整;;;;;;;;;AB;;;;;;数据;;;一些;;日期;(UTC)马拉;卡罗来纳;奥多里兹;;;;;36737202;098787267;45;-;2019-07-09T10:35:50.289Z

杜尔曼;;;

最佳答案

多行处理在 shell 中很难,但在 perl 中很容易。

blocktab2csv.pl:

while(<>)
{
chomp; # remove newline
s/^\|//; # remove pipe at the start of the line

@vals = split /\|/; # split fields into the val array
$size = @vals;
for( $i = 0 ; $i < $size ; $i++ )
{
#clean up the values: remove underscores and extra spaces
$vals[$i] =~ s/_//g;
$vals[$i] =~ s/^ *//;
$vals[$i] =~ s/ *$//;

# append the value to the data record for this field
$data[$i] .= $vals[$i];

# special handling for first field: use spaces when joining
$data[$i] .= " " if ($i==0);
}
if(/____/) # Taking four underscores to indicate the end of the record
{
# clean up the first record; trim spaces
$data[0] =~ s/^ *//;
$data[0] =~ s/ *$//;

# join the records with semicolons
$line = join (";", @data);

# collapse multiple spaces
$line =~ s/ +/ /g;

# print this line and start over
print "$line\n" unless ($line eq '');
@data = ();
}
}

然后

$ perl blocktab2csv.pl intable.txt > output.csv

输出.csv:

Name;AB;DATA;SOME;IF;DATE(UTC)
Marra Carolina Odoriz Dolman;36737202;098787267;45;-;2019-07-09T10:35:50.289Z
-;53959997;098543650;30;-;2019-07-09T12:02:36.746Z
Vic Velazquez;33577915;096638025;-;6000;2019-07-09T12:40:17.754Z
Gabriela Letacia Cararallo Vacchetzi;43132876;091322398;30;-;2019-07-09T12:40:40.887Z

这假设您的字段中没有分号。不过,很容易修改以处理它们。

关于linux - 纯文本表格到 csv linux,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56975212/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com