gpt4 book ai didi

linux - 使用 Perl 获取分块数据的元素

转载 作者:太空狗 更新时间:2023-10-29 11:27:25 24 4
gpt4 key购买 nike

我有一个看起来像这样的数据:

some info
some info

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cy
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

[Typedef]
id: regulates
name: regulates
xref: RO:0002211
transitive_over: part_of ! part_of

请注意,该文件的末尾包含空格。

我想做的是解析每个以[Term]开头的chunk,得到idnamenamespace 。在一天结束时,数组的散列如下:

$VAR = ['GO:0000001' => ["mitochondrion inheritance","biological_process"],
'GO:0000002' => ["mitochondrial genome maintenance","biological_process"];

我该如何使用 Perl?

我坚持使用这段代码:

#!/usr/bin/perl
use Data::Dumper;
my %bighash;
while(<DATA>) {
chomp;
my $line = $_;

my $term = "";
my $id = "";
my $name ="";
my $namespace ="";
if ($line =~ /^\[Term/) {
$term = $line;
}
elsif ($line =~ /^id: (.*)/) {
$id = $1;
}
elsif ($line =~ /^name: (.*)/) {
$name = $1;
}
elsif ($line =~ /^namespace: (.*)/) {
$namespace = $1;
}
elsif ($line =~ /$/) {
$bighash{$id}{$name} = $namespace;
}

}

print Dumper \%bighash;



__DATA__
some info
some info

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cy
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

[Typedef]
id: regulates
name: regulates
xref: RO:0002211
transitive_over: part_of ! part_of

在这里测试:https://eval.in/80497

最佳答案

如果将 Perl 的输入记录分隔符设置为 '' ( local $/ = ''; ),您将以 paragraph 模式读取数据,即以空行分隔的 block 形式读取数据。接下来,您可以使用正则表达式从该 block 中捕获所需的部分。例如:

use strict;
use warnings;
use Data::Dumper;

local $/ = '';
my %hash;

while (<DATA>) {
next unless /^\[Term\]/;

my ($id) = /id:\s+(.+)/;
my ($name) = /name:\s+(.+)/;
my ($namespace) = /namespace:\s+(.+)/;

push @{ $hash{$id} }, ( $name, $namespace );
}

print Dumper \%hash;

__DATA__
[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cy
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

[Typedef]
id: regulates
name: regulates
xref: RO:0002211
transitive_over: part_of ! part_of

输出:

$VAR1 = {
'GO:0000001' => [
'mitochondrion inheritance',
'biological_process'
],
'GO:0000002' => [
'mitochondrial genome maintenance',
'biological_process'
]
};

希望这对您有所帮助!

关于linux - 使用 Perl 获取分块数据的元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20649054/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com