gpt4 book ai didi

php - 排序/解析 MySQL 行字段中的异构 CSV 分隔文本值

转载 作者:行者123 更新时间:2023-11-29 14:13:49 25 4
gpt4 key购买 nike

我有一个类似于以下的数据集 - 我对正则表达式非常生疏,尽管进行了几次尝试,但对“走树”一无所知 - 由于各个术语的愚蠢组织,Excel 中的文本到列并没有帮助EFFECT_DATA字段中的classes/tags以及手动调整引入的错误。

样本数据

ROW_ID|NAME   | UNORDERED_CSV_CONCATD_TAG_DATA_STRING

123456|Prod123|"Minoxidistuff [MoA], Direct [PE], Agonists [EPC]"
123457|Prod124|"Minoxion [Chem], InterferonA [EPC], Delayed [PE]"

123458|Prod125|"Anotherion [EPC], Direct [MoA], Agonists [EPC]"
123459|Prod126|"Competitor [PE], Progestin [EPC], Agonists [EPC]"
123460|Prod127|"Minoxidistuff [Chem]"

所需数据输出示例:

PRODUCT|EPC      |
Prod125|Antherion|
Prod125|Agonists |

PRODUCT|CMPD |
Prod127|Minoxidistuff|
Prod124|Minoxion |

etc 对于product[i]tag[j] 的所有标签(如果有意义的话),本质上是 ea。 CSVD_TAG_DATA 字段顺序困惑,包含多个标签(位于所需术语的末尾。

我刚刚开始使用多维哈希方法,请原谅我的正则表达式伪代码。

非常感谢。

最佳答案

这是 Perl 方法。将下面的代码保存为 parser.pl。以 perl parser.pl data.csv 形式运行,其中 data.csv 是数据文件的名称。 (或者使其可执行并运行 ./parser.pl data.csv。)

#!/usr/bin/perl -w

use strict;

# Take in the first arguement as the file
my $file = $ARGV[0];

# open a filehandle
open (my $fh, '<', $file);

# Well predefine a hashref
my $products = {};

# Loop through the file
while (<$fh>) {

# remove line breaks
chomp;

# split into our primary sections
my ($id, $product, $csv) = split(/\|/);

# skip a header line
next if ($id =~ /\D/);

# remove the quotes
($csv) = ($csv =~ /"(.*)"/);

# split the CSV an a comma possibly followed by a space
my @items = split(/,\s*/, $csv);

# loop through each item in the csv
foreach my $item(@items) {

# Our keys and values are reversed!
my ($value,$key) = ($item =~ /(.*)\[(.*)\]/);

# Remove trailing whitespace
$value =~ s/\s+$//;

# If the arrayref does not exist then create it
# Otherwise add to it
if (!exists($products->{$key}->{$product})) {
$products->{$key}->{$product} = [$value];
} else {
push(@{$products->{$key}->{$product}}, $value);
}

}

}

# We have a nicely formed hashref now. Loop through and print how we want

foreach my $key(keys %$products) {

# Header for this section
print "PRODUCT|$key\n";

# Go through each product and print the different values
foreach my $product(keys %{$products->{$key}}) {

while (my $value = shift(@{$products->{$key}->{$product}})) {
print "$product|$value\n";
}

}

# Add a space to divide the groups cleanly
print "\n";

}

示例输出:

PRODUCT|MoA
Prod123|Minoxidistuff
Prod125|Direct

PRODUCT|Chem
Prod127|Minoxidistuff
Prod124|Minoxion

PRODUCT|PE
Prod123|Direct
Prod124|Delayed
Prod126|Competitor

PRODUCT|EPC
Prod123|Agonists
Prod124|InterferonA
Prod126|Progestin
Prod126|Agonists
Prod125|Anotherion
Prod125|Agonists

关于php - 排序/解析 MySQL 行字段中的异构 CSV 分隔文本值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13039367/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com