python - Needleman-Wunsch 算法动态规划实现中的回溯-6ren

python - Needleman-Wunsch 算法动态规划实现中的回溯

转载作者：太空狗更新时间：2023-10-29 17:33:52

我的 needleman-wunsch 实现几乎可以正常工作，但我对如何处理特定案例的回溯感到困惑。

想法是，为了重新构建序列(最长路径)，我们重新计算以确定得分来自的矩阵。我遇到问题的边缘情况是右下角的分数不在匹配矩阵中，而是在插入列矩阵中(这意味着生成的追溯序列应该有一个插入。

这些序列以 a2m 格式记录，其中序列中的插入被记录为小写字符。所以在最终输出中，ZZ 到 AAAC 的对齐方式应该是 AAac。当我手动回溯时，我以 AAAc 结束，因为我只访问了 Ic 矩阵一次。 Here是我的白板的图片。如您所见，我有三个黑色箭头和一个绿色箭头，这就是为什么我的回溯给了我 AAAc。我应该数第一个单元格，然后停在位置 1,1 吗？我不确定我将如何改变我实现它的方式。

请注意，此处使用的替换矩阵是 BLOSUM62。递归关系是

M(i,j) = max(Ic(i-1,j-1)+subst, M(i-1,j-1)+subst, Ir(i-1,j-1)+subst)
Ic(i,j) = max(Ic(i,j-1)-extend, M(i,j-1)-open, Ir(i,j-1)-double)
Ir(i,j) = max(Ic(i-1,j)-double, M(i-1,j)-open, Ir(i-1,j)-extend)

编辑:这里重写了 traceback_col_seq 函数以使其更清晰。请注意，score_cell 现在返回 thisM、thisC、thisR 而不是其中的最大值。这个版本的比对得分为 AaAc，仍然有同样的问题，现在又出现了另一个问题，为什么它会在 1,2 处再次进入 Ic。然而，这个版本更清晰。

def traceback_col_seq(self):
    i, j = self.maxI-1, self.maxJ-1
    self.traceback = list()
    matrixDict = {0:'M',1:'Ic',2:'Ir',3:'M',4:'Ic',5:'Ir',6:'M',7:'Ic',8:'Ir'}
    while i > 0 or j > 0:
        chars = self.col_seq[j-1] + self.row_seq[i-1]
        thisM, thisC, thisR = self.score_cell(i, j, chars)
        cell = thisM + thisC + thisR
        prevMatrix = matrixDict[cell.index(max(cell))]
        print(cell, prevMatrix,i,j)
        if prevMatrix == 'M':
            i -= 1; j -= 1
            self.traceback.append(self.col_seq[j])
        elif prevMatrix == 'Ic':
            j -= 1
            self.traceback.append(self.col_seq[j].lower())
        elif prevMatrix == 'Ir':
            i -= 1
            self.traceback.append('-')
    return ''.join(self.traceback[::-1])

这是生成动态规划矩阵并追溯对齐的python类。还有一个评分函数用于检查生成的比对是否正确。

class global_aligner():
    def __init__(self, subst, open=12, extend=1, double=3):
        self.extend, self.open, self.double, self.subst = extend, open, double, subst
    def __call__(self, row_seq, col_seq):
        #add alphabet error checking?
        score_align(row_seq, col_seq)
        return traceback_col_seq()
    def init_array(self):
        """initialize three numpy arrays, set values in 1st column and row"""
        self.M = zeros((self.maxI, self.maxJ), float)
        self.Ic = zeros((self.maxI, self.maxJ), float)
        self.Ir = zeros((self.maxI, self.maxJ), float)
        for i in xrange(1,self.maxI):
            self.M[i][0], self.Ic[i][0], self.Ir[i][0] = \
                    -float('inf'), -float('inf'), -(self.open+self.extend*(i-1))
        for j in xrange(1,self.maxJ):
            self.M[0][j], self.Ir[0][j], self.Ic[0][j] = \
                    -float('inf'), -float('inf'), -(self.open+self.extend*(j-1))
        self.Ic[0][0] = self.Ir[0][0] = -float('inf')
    def score_cell(self, i, j, chars):
        """score a matrix cell based on the 9 total neighbors (3 each direction)"""
        thisM = [self.M[i-1][j-1]+self.subst[chars], self.Ic[i-1][j-1]+ \
                self.subst[chars], self.Ir[i-1][j-1]+self.subst[chars]]
        thisC = [self.M[i][j-1]-self.open, self.Ic[i][j-1]-self.extend, \
                        self.Ir[i][j-1]-self.double]
        thisR = [self.M[i-1][j]-self.open, self.Ic[i-1][j]-self.double, \
                self.Ir[i-1][j]-self.extend]
        return max(thisM), max(thisC), max(thisR)
    def score_align(self, row_seq, col_seq):
        """build dynamic programming matrices to align two sequences"""
        self.row_seq, self.col_seq = list(row_seq), list(col_seq)
        self.maxI, self.maxJ = len(self.row_seq)+1, len(self.col_seq)+1
        self.init_array() #initialize arrays
        for i in xrange(1, self.maxI):
            row_char = self.row_seq[i-1]
            for j in xrange(1, self.maxJ):
                chars = row_char+self.col_seq[j-1]
                self.M[i][j], self.Ic[i][j], self.Ir[i][j] = self.score_cell(i, j, chars)
    def traceback_col_seq(self):
        """trace back column sequence in matrices in a2m format"""
        i, j = self.maxI-1, self.maxJ-1
        self.traceback = list()
        #find which matrix to start in
        #THIS IS WHERE THE PROBLEM LIES I THINK
        cell = (self.M[i][j], self.Ic[i][j], self.Ir[i][j])
        prevMatrix = cell.index(max(cell))
        while i > 1 and j > 1:
            if prevMatrix == 0: #M matrix
                i -= 1; j -= 1 #step up diagonally
                prevChars = self.row_seq[i-1]+self.col_seq[j-1]
                diag = self.score_cell(i, j, prevChars) #re-score diagonal cell
                prevMatrix = diag.index(max(diag)) #determine which matrix that was
                self.traceback.append(self.col_seq[j])
            elif prevMatrix == 1: #Ic matrix
                j -= 1 
                prevChars = self.row_seq[i-1]+self.col_seq[j-1]
                left = self.score_cell(i, j, prevChars)
                prevMatrix = left.index(max(left))
                self.traceback.append(self.col_seq[j].lower())
            elif prevMatrix == 2: #Ir matrix
                i -= 1
                prevChars = self.row_seq[i-1]+self.col_seq[j-1]
                up = self.score_cell(i, j, prevChars)
                prevMatrix = up.index(max(up))
                self.traceback.append('-')
        for j in xrange(j,0,-1): #hit top of matrix before ending, add chars
            self.traceback.append(self.col_seq[j-1])
        for i in xrange(i,0,-1): #hit left of matrix before ending, add gaps
            self.traceback.append('-')
        print(''.join(self.row[::-1]))
        return ''.join(self.traceback[::-1])
    def score_a2m(self, s1, s2):
        """scores an a2m alignment of two sequences. I believe this function correctly
        scores alignments and is used to test my alignments. The value produced by this
        function should be the same as the largest value in the bottom right of the three
        matrices"""
        s1, s2 = list(s1.strip('.')), list(s2.strip('.'))
        s1_pos, s2_pos = len(s1)-1, len(s2)-1
        score, gap = 0, False
        while s1_pos >= 0 and s2_pos >= 0:
            if s1[s1_pos].islower() and gap is False:
                score -= self.open; s1_pos -= 1; gap = True
            elif s1[s1_pos].islower() and gap is True:
                score -= self.extend; s1_pos -= 1
            elif s2[s2_pos].islower() and gap is False:
                score -= self.open; s2_pos -= 1; gap = True
            elif s2[s2_pos].islower() and gap is True:
                score -= self.extend; s2_pos -= 1
            elif s1[s1_pos] == '-' and gap is False:
                score -= self.open; s1_pos -= 1; s2_pos -= 1; gap = True
            elif s1[s1_pos] == '-' and gap is True:
                score -= self.extend; s1_pos -= 1; s2_pos -= 1
            elif s2[s2_pos] == '-' and gap is False:
                score -= self.open; s1_pos -= 1; s2_pos -= 1; gap = True
            elif s2[s2_pos] == '-' and gap is True:
                score -= self.extend; s1_pos -= 1; s2_pos -= 1
            elif gap is True:
                score += self.subst[s1[s1_pos].upper() + s2[s2_pos].upper()]
                s1_pos -= 1; s2_pos -= 1; gap = False
            else:
                score += self.subst[s1[s1_pos].upper() + s2[s2_pos].upper()]
                s1_pos -= 1; s2_pos -= 1
        if s1_pos >= 0 and gap is True:
            score -= self.extend*s1_pos
        elif s1_pos >= 0 and gap is False:
            score -= self.open+s1_pos*self.extend
        if s2_pos >= 0 and gap is True:
            score -= self.extend*s2_pos
        elif s2_pos >= 0 and gap is False:
            score -= self.open+s2_pos*self.extend
        return score


test = global_aligner(blosumMatrix)
s1,s2 = 'ZZ','AAAC'
test.score_align(s1, s2)
align = test.traceback_col_seq()
print('This score: ', test.score_a2m(s1,align)
print('Correct score: ', test.score_a2m(s1,'AAac'))

Blosum 解析器

def parse_blosum(blosumFile):
    blosumMatrix, commentFlag = dict(), False
    for line in blosumFile:
        if not line.startswith('#') and not commentFlag:
            alphabet = line.rstrip().split()
            commentFlag = True
        elif commentFlag:
            line = line.rstrip().split()
            thisChar, line = line[0], line[1:]
            for x in xrange(len(line)):
                alphaChar, thisValue = alphabet[x], line[x]
                blosumMatrix[thisChar+alphaChar] = int(thisValue)
    return blosumMatrix

最佳答案

def traceback_col_seq(self):
    """
    Traces back the column sequence to determine final global alignment.
    Recalculates the score using score_cell. 
    """
    i, j = self.maxI-1, self.maxJ-1
    self.traceback = list() #stores final sequence
    matrixDict = {0:'M',1:'Ic',2:'Ir'} #mapping between numeric value and matrix
    chars = self.col_seq[j-1] + self.row_seq[i-1] #store first characters
    thisM, thisC, thisR = self.score_cell(i,j, chars) 
    cell = max(thisM), max(thisC), max(thisR) #determine where to start
    prevMatrix = matrixDict[cell.index(max(cell))] #store where to go first
    while i > 0 or j > 0:
        #loop until the top left corner of the matrix is reached
        if prevMatrix == 'M':
            self.traceback.append(self.col_seq[j-1])
            i -= 1; j -= 1
            prevMatrix = matrixDict[thisM.index(max(thisM))]
        elif prevMatrix == 'Ic':
            self.traceback.append(self.col_seq[j-1].lower())
            j -= 1
            prevMatrix = matrixDict[thisC.index(max(thisC))]
        elif prevMatrix == 'Ir':
            self.traceback.append('-')
            i -= 1
            prevMatrix = matrixDict[thisR.index(max(thisR))]
        chars = self.col_seq[j-1] + self.row_seq[i-1] #store next characters
        thisM, thisC, thisR = self.score_cell(i,j, chars) #rescore next cell
    return ''.join(self.traceback[::-1])

关于python - Needleman-Wunsch 算法动态规划实现中的回溯，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20229817/

文章推荐： angular - Angular 2 模板中的渲染代码

文章推荐： c# - Serilog 和 .NET Core 2.1 HostBuilder 配置

文章推荐： angular - 使用 Angular 2 进行反向地理编码

文章推荐： python - python2.6中没有索引的字符串格式化

Prolog 回溯 VS Rete 回溯
在我的类里面，我学习了 Prolog 回溯算法和 Rete forprop 算法，但我也被告知 Rete 可用于进行反向传播。这是如何运作的？它在哪些方面与 Prolog 回溯相似/不同？例如，这
Haskell 回溯
两个 friend P1 和 P2 向共同的 friend P3 发送相同的消息 M。然而由于一些网络损坏，P3 一次只能接收一个字符不知道接收到的字符是属于 P1 还是 P2。此外，P3 可能会
python 回溯
我最近发了几个理解递归和回溯的问题，我觉得我现在得到了一些东西，并尝试编写一个测试，我确实解决了数独问题，但是当我以另一种格式编写代码时，代码卡了一会儿，返回False，说明这个问题无解。 grid
LISP 回溯
有人可以指导我或解释如何在 LISP 中执行回溯吗？任何示例或链接将不胜感激。我确实尝试过谷歌，但是他们都没有足够简单的例子让我理解。谢谢最佳答案典型的方法是将不可变状态向下传递到调用堆栈，辅助
Javascript 回溯
我正在使用 apache 2.2.14 运行 Backtrack 5 R2 (ubuntu) 的完全库存安装。我尝试运行一个简单的 index.html 文件，其中包含一些 javascript 代码
Javascript 回溯
如何在 Javascript 中获取回溯？理想的特征: 入口函数名称，或匿名函数的一些有意义的标识符，每个级别的参数列表，行号。这可以用标准的 ECMAScript 完成吗？如果没有，是否可
图解LeetCode算法汇总——回溯
本文首发公众号：小码A梦回溯算法是一种常见的算法，常见用于解决排列组合、排列问题、搜索问题等算法，在一个搜索空间中寻找所有的可能的解。通过向分支不断尝试获取所有的解，然后找到合适的
Python 在每个异常上显示自定义错误消息/回溯
Python 是否支持为每个异常/引发/断言显示相同的自定义错误消息(无论代码在哪里中断)？我目前对它的破解使用了一个装饰器。我有一个函数main它显示回溯很好，但我希望它也打印my_var (在函
java - 回溯 - 在二维网格中找到最佳路径
输入: 3,4,8,7,3 5,S,7,2,3, 8,5,5,8,10 9,3,3,8,7 6,10,3,G,1 目标是找到从起点(S)到目标(G)的最佳路径。我们可以向上、向下、向左、向右移动。
Java 正则表达式回溯
我想匹配一个包含“json”(出现超过 2 次)且两个“json”之间没有字符串“from”的字符串。 For example(what I want the string match or not)
c++ - 使用贪婪方法寻找熄灯游戏的解决方案(回溯)
我正在尝试使用回溯方法找到熄灯游戏的解决方案。我无法理解此过程的算法。我的方法是枚举从 0 到 2n2 - 1 的所有整数，并将每个整数转换为具有 n*n 位的二进制数。然后，将其分成n2个二进制数字
python - 服从测试山羊 - 回溯
所以我正在阅读这本书《服从测试山羊》，在学习 Python 时我在第六章中遇到了一个问题。它说我应该能够运行我们在本章和前一章中设置的功能测试，没有错误；但是，我不断收到我不知道如何修复的回溯。 Tr
Android 日志去混淆/回溯
我需要一些关于 Android 日志文件反混淆的帮助。问题是如果我有这样的异常: ... 10-16 10:03:10.488: E/AndroidRuntime(25723): Cau
sql - 回溯-PostgreSQL
我有一个看起来像这样的表: here | there | -------+-------+ {1,1} | {1,1} | {1,1} | {2,1} | {1,1} | {1,2} |
c++ - 堆栈粉碎/回溯
我写了一小段代码，它应该接受一个字符数组并让它看起来像计算机正在输入文本。很简单，对吧？但是当我运行它时，Terminal 告诉我: *** stack smashing detected ***:
带有模块名称的 Python 回溯
Python 中的堆栈跟踪显示文件路径。有什么方法可以让它们显示完全限定的函数名称吗？例子: class Foo(object): def bar(self): raise
algorithm - 投资者和资金池 - 回溯
我决定深入学习回溯的概念，我有以下任务: 给定N个投资者，M个城市，N×M个投资者偏好矩阵P(P[i,j]=1，当第i个投资者希望在第j个城市建矿池；P[i, j] = 0 那么他是中立的，当 P[i
algorithm - 寻找最小的顶点子集 - 回溯？
设 E - 图 G 中所有边的集合问题是从G中找到顶点的最小子集S，它满足条件:S = E 中每个顶点的所有出边的总和换句话说:边是街道，我们可以在顶点上放置路灯。如果我们在一个顶点上放置一盏路灯—
c++ - 回溯——用硬币填充网格
我正在尝试做这个我在查找面试问题时遇到的问题。我们被问及将 r 个硬币放置在 n*m 网格上的方法数量，使得每行和每列至少包含一个硬币。我想到了一个回溯解决方案，按行主要顺序处理网格中的每个单元格，
android - DexGuard 回溯
我使用 DexGuard混淆。我有来自崩溃日志和映射文件的堆栈跟踪。当我运行 retrace.bat 并为其提供堆栈跟踪和映射文件时，输出仍然是混淆格式。最佳答案您是否在使用 ProGuard 的

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - Needleman-Wunsch 算法动态规划实现中的回溯