gpt4 book ai didi

perl - 使用 perl 将一个制表符分隔文件中的数据解析为另一个文件

转载 作者:行者123 更新时间:2023-12-04 08:56:27 24 4
gpt4 key购买 nike

我有一个像这样的制表符分隔文件(在我的脚本 DIVERGE 中):

contig04730 contigK02622 0.3515
contig04733 contigK02622 0.3636
contig14757 contigK03055 0.4

我有第二个制表符分隔文件,如下所示(数据):

contig04730 F GO:0000228 nuclear GO:0000783 telomere_cap
contig04730 F GO:0005528 reproduction GO:0001113 eggs
contig14757 P GO:0123456 immune GO:0003456 cells
contig14757 P GO:0000782 nuclear GO:0001891 DNA_binding
contig14757 C GO:0000001 immune GO:00066669 more_cells

我正在尝试将第一个文件的第 2 列和第 3 列添加到第二个文件中,这样我就可以 (OUT):

contig04730 F GO:0000228 nuclear GO:0000783 telomere_cap contigK02622 0.3515
contig04730 F GO:0005528 reproduction GO:0001113 eggs contigK02622 0.3515
contig14757 P GO:0123456 immune GO:0003456 cells contigK03055 0.4
contig14757 P GO:0000782 nuclear GO:0001891 DNA_binding contigK03055 0.4
contig14757 C GO:0000001 immune GO:00066669 more_cells contigK03055 0.4

这是我正在尝试使用的 perl 脚本(尝试改编我在这里找到的脚本 - 对 perl 来说非常新):

#!/usr/bin/env/perl

use strict;
use warnings;

#open the ortholog contig list
open (DIVERGE, "$ARGV[0]") or die "Error opening the input file with contig pairs";

#hash to store contig IDs
my ($espr, $liya, $divergence) = split("\t", $_);

#read through the ortho contig list and read into memory
while(<DIVERGE>){
chomp $_; #get rid of ending whitepace
($espr, $liya, $divergence)->{$_} = 1;
}
close(DIVERGE);

#open output file
open(OUT, ">$ARGV[2]") or die "Error opening the output file";

#open data file
open(DATA, "$ARGV[1]") or die "Error opening the sequence pairs file\n";

while(<DATA>){
chomp $_;

my ($contigs, $FPC, $GOslim, $slimdesc, $GOterm, $GOdesc) = split("\t", $_);
if (defined $espr->{$contigs}) {
print OUT "$_", "\t$liya\t$divergence", "\n";
}
}
close(DATA);
close(OUT);

但是我在第 15 行遇到了关于无用使用私有(private)变量和在第 10 行拆分了未初始化值 _$ 的错误。我对 perl 术语/变量只有非常基本的了解。因此,如果有人能指出我哪里出错以及如何解决,将不胜感激。

最佳答案

这是一个使用 Text::CSV 模块的机会。为 csv 数据使用适当的解析器的好处当然是避免边缘情况破坏您的数据。

use strict;
use warnings;
use Text::CSV;

my $div = "diverge.txt"; # you can also assign dynamical names, e.g.
my $data = "data.txt"; # my ($div, $data) = @ARGV
my $csv = Text::CSV->new({
binary => 1,
eol => $/,
sep_char => "\t",
});
my %div;

open my $fh, "<", $div or die $!;

while (my $row = $csv->getline($fh)) {
my $key = shift @$row; # first col is key
$div{$key} = $row; # store row entries
}
close $fh;

open $fh, "<", $data or die $!;

while (my $row = $csv->getline($fh)) {
my $key = $row->[0]; # first col is key (again)
push @$row, @{ $div{$key} }; # add stored values to $row
$csv->print(*STDOUT, $row); # print using Text::CSV's method
}

输出:

contig04730     F       GO:0000228      nuclear GO:0000783      telomere_cap contigK02622    0.3515
contig04730 F GO:0005528 reproduction GO:0001113 eggs contigK02622 0.3515
contig14757 P GO:0123456 immune GO:0003456 cells contigK03055 0.4
contig14757 P GO:0000782 nuclear GO:0001891 DNA_binding contigK03055 0.4
contig14757 C GO:0000001 immune GO:00066669 more_cells contigK03055 0.4

请注意,输出看起来不同,因为它是制表符分隔的,而在问题中它是空格分隔的。

关于perl - 使用 perl 将一个制表符分隔文件中的数据解析为另一个文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15364849/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com