gpt4 book ai didi

Perl 通过动态规划在序列对齐中爆炸

转载 作者:行者123 更新时间:2023-12-01 19:32:37 26 4
gpt4 key购买 nike

我正在比较大小为 5500 个碱基的引用序列和大小为 3600 个碱基的查询序列,使用动态编程(半全局对齐),事实上我对复杂性和性能了解不多,并且代码正在爆炸并给出我的错误是“内存不足”。知道它在较小的序列上正常工作,我的问题是:这种行为是正常的,或者我可能在代码中遇到另一个问题?如果正常的话有解决这个问题的任何提示吗?提前致谢。

sub semiGlobal {
my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = @_;
# initialization: first row to 0 ;
my @matrix;
$matrix[0][0]{score} = 0;
$matrix[0][0]{pointer} = "none";
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j]{score} = 0;
$matrix[0][$j]{pointer} = "none";
}

for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
$matrix[$i][0]{score} = $GAP * $i;
$matrix[$i][0]{pointer} = "up";
}

# fill
my $max_i = 0;
my $max_j = 0;
my $max_score = 0;

print "seq2: ".length($seq2);
print "seq1: ".length($seq1);

for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
my ( $diagonal_score, $left_score, $up_score );
# calculate match score
my $letter1 = substr( $seq1, $j - 1, 1 );
my $letter2 = substr( $seq2, $i - 1, 1 );
if ( $letter1 eq $letter2 ) {
$diagonal_score = $matrix[ $i - 1 ][ $j - 1 ]{score} + $MATCH;
}
else {
$diagonal_score = $matrix[ $i - 1 ][ $j - 1 ]{score} + $MISMATCH;
}

# calculate gap scores
$up_score = $matrix[ $i - 1 ][$j]{score} + $GAP;
$left_score = $matrix[$i][ $j - 1 ]{score} + $GAP;

# choose best score
if ( $diagonal_score >= $up_score ) {
if ( $diagonal_score >= $left_score ) {
$matrix[$i][$j]{score} = $diagonal_score;
$matrix[$i][$j]{pointer} = "diagonal";
}
else {
$matrix[$i][$j]{score} = $left_score;
$matrix[$i][$j]{pointer} = "left";
}
}
else {
if ( $up_score >= $left_score ) {
$matrix[$i][$j]{score} = $up_score;
$matrix[$i][$j]{pointer} = "up";
}
else {
$matrix[$i][$j]{score} = $left_score;
$matrix[$i][$j]{pointer} = "left";
}
}

# set maximum score
if ( $matrix[$i][$j]{score} > $max_score ) {
$max_i = $i;
$max_j = $j;
$max_score = $matrix[$i][$j]{score};
}
}
}

my $align1 = "";
my $align2 = "";
my $j = $max_j;
my $i = $max_i;

while (1) {
if ( $matrix[$i][$j]{pointer} eq "none" ) {
$stseq1 = $j;
last;
}

if ( $matrix[$i][$j]{pointer} eq "diagonal" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
$j--;
}
elsif ( $matrix[$i][$j]{pointer} eq "left" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= "-";
$j--;
}
elsif ( $matrix[$i][$j]{pointer} eq "up" ) {
$align1 .= "-";
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
}
}

$align1 = reverse $align1;
$align2 = reverse $align2;
return ( $align1, $align2, $stseq1 ,$max_j);
}

最佳答案

可能解决该问题的一种方法是将@matrix 与文件绑定(bind)。然而,这将大大减慢程序的速度。考虑一下:

sub semiGlobal {

use Tie::Array::CSV;

tie my @matrix, 'Tie::Array::CSV', 'temp.txt'; # Don't forget to add your own error handler.


my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = @_;

# initialization: first row to 0 ;

$matrix[0][0] = '0 n';
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j] = '0 n';
}

for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {

my $score = $GAP * $i;
$matrix[$i][0] = join ' ',$score,'u';
}

#print Dumper(\@matrix);

# fill
my $max_i = 0;
my $max_j = 0;
my $max_score = 0;

print "seq2: ".length($seq2)."\n";
print "seq1: ".length($seq1)."\n";

for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
my ( $diagonal_score, $left_score, $up_score );

# calculate match score
my $letter1 = substr( $seq1, $j - 1, 1 );
my $letter2 = substr( $seq2, $i - 1, 1 );
my $score = (split / /, $matrix[ $i - 1 ][ $j - 1 ])[0];
if ( $letter1 eq $letter2 ) {
$diagonal_score = $score + $MATCH;
}
else {
$diagonal_score = $score + $MISMATCH;
}

# calculate gap scores
$up_score = (split / /,$matrix[ $i - 1 ][$j])[0] + $GAP;
$left_score = (split / /,$matrix[$i][ $j - 1 ])[0] + $GAP;

# choose best score
if ( $diagonal_score >= $up_score ) {
if ( $diagonal_score >= $left_score ) {
$matrix[$i][$j] = join ' ',$diagonal_score,'d';
}
else {
$matrix[$i][$j] = join ' ', $left_score, 'l';
}
}
else {
if ( $up_score >= $left_score ) {
$matrix[$i][$j] = join ' ', $up_score, 'u';
}
else {
$matrix[$i][$j] = join ' ', $left_score, 'l';
}
}

# set maximum score
if ( (split / /, $matrix[$i][$j])[0] > $max_score ) {
$max_i = $i;
$max_j = $j;
$max_score = (split / /, $matrix[$i][$j])[0];

}
}
}


my $align1 = "";
my $align2 = "";
my $stseq1;

my $j = $max_j;
my $i = $max_i;

while (1) {
my $pointer = (split / /, $matrix[$i][$j])[1];
if ( $pointer eq "n" ) {
$stseq1 = $j;
last;
}

if ( $pointer eq "d" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
$j--;
}
elsif ( $pointer eq "l" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= "-";
$j--;
}
elsif ( $pointer eq "u" ) {
$align1 .= "-";
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
}
}

$align1 = reverse $align1;
$align2 = reverse $align2;

untie @matrix; # Don't forget to add your own error handler.

unlink 'temp.txt'; # Don't forget to add your own error handler.

return ( $align1, $align2, $stseq1 ,$max_j);
}

您仍然可以使用原来的子系统来处理短序列,并切换到此子系统来处理长序列。

关于Perl 通过动态规划在序列对齐中爆炸,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19502723/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com