java - 获得超过 100% 的 Smith-Waterman 计算-6ren

java - 获得超过 100% 的 Smith-Waterman 计算

转载作者：塔克拉玛干更新时间：2023-11-03 04:00:27

我编写了一些实现 Smith-Waterman 算法的 Java 代码。我有一个日记条目 here是这样说的

sim  (X, Y)  = 2 . SLength  (X, Y) / Length  ( X)  +  Length  (Y  ) 


X  and  Y are  the  sequences  for  comparison.  
SLength(X, Y) is  the  string length of the  maximum matching set.  
Length(X) is the  number  of  characters  in  the  sequence X.
Length(Y)  is  the     number  of  characters  in  the  sequence  Y.  
The  result  of  this function (sim)  is a real  number,  O<=sim<=1.  
The larger  SIM is, the  stronger  the  two  programs  similarity  is,
and  the  plagiarism possibility larger,  vice versa.

这是我的 Smith-Waterman 代码

public class SmithWaterman {

private final String one, two;
private final int matrix[][];
private int gap;
private final int match;
private final int o;
private int l;
private final int e;

public SmithWaterman(String one, String two) {
    this.one = "-" + one.toLowerCase();
    this.two = "-" + two.toLowerCase();
    this.match = 2;

    // Define affine gap starting values
    o = -2;
    l = -1;
    e = -1;

    // initialize matrix to 0
    matrix = new int[one.length() + 1][two.length() + 1];
    for (int i = 0; i < one.length(); i++) {
        for (int j = 0; j < two.length(); j++) {
            matrix[i][j] = 0;
        }
    }

}

// returns the alignment score
/**
 * @return
 */
public double computeSmithWaterman() {
    for (int i = 0; i < one.length(); i++) {
        for (int j = 0; j < two.length(); j++) {
            gap = o + (l - 1) * e;
            if (i != 0 && j != 0) {
                if (one.charAt(i) == two.charAt(j)) {
                    // match
                    // reset l
                    l = 0;
                    matrix[i][j] = Math.max(0, Math.max(
                            matrix[i - 1][j - 1] + match, Math.max(
                                    matrix[i - 1][j] + gap,
                                    matrix[i][j - 1] + gap)));
                } else {
                    // gap
                    l++;
                    matrix[i][j] = Math.max(0, Math.max(
                            matrix[i - 1][j - 1] + gap, Math.max(
                                    matrix[i - 1][j] + gap,
                                    matrix[i][j - 1] + gap)));
                }
            }
        }
    }

    // find the highest value
    double longest = 0;
    int iL = 0, jL = 0;
    for (int i = 0; i < one.length(); i++) {
        for (int j = 0; j < two.length(); j++) {
            if (matrix[i][j] > longest) {
                longest = matrix[i][j];
                iL = i;
                jL = j;
            }
        }
    }

    // Backtrack to reconstruct the path
    int i = iL;
    int j = jL;
    Stack<String> actions = new Stack<String>();

    while (i != 0 && j != 0) {
        // diag case
        if (Math.max(matrix[i - 1][j - 1],
                Math.max(matrix[i - 1][j], matrix[i][j - 1])) == matrix[i - 1][j - 1]) {
            actions.push("align");
            i = i - 1;
            j = j - 1;
            // left case
        } else if (Math.max(matrix[i - 1][j - 1],
                Math.max(matrix[i - 1][j], matrix[i][j - 1])) == matrix[i][j - 1]) {
            actions.push("insert");
            j = j - 1;
            // up case
        } else {
            actions.push("delete");
            i = i - 1;
        }
    }

    int maxMatchSet = actions.size();

    String alignOne = new String();
    String alignTwo = new String();

    Stack<String> backActions = (Stack<String>) actions.clone();
    for (int z = 0; z < one.length(); z++) {
        alignOne = alignOne + one.charAt(z);
        if (!actions.empty()) {
            String curAction = actions.pop();

            if (curAction.equals("insert")) {
                alignOne = alignOne + "-";
                while (actions.peek().equals("insert")) {
                    alignOne = alignOne + "-";
                    actions.pop();
                }
            }
        }
    }

    for (int z = 0; z < two.length(); z++) {
        alignTwo = alignTwo + two.charAt(z);
        if (!backActions.empty()) {
            String curAction = backActions.pop();
            if (curAction.equals("delete")) {
                alignTwo = alignTwo + "-";
                while (backActions.peek().equals("delete")) {
                    alignTwo = alignTwo + "-";
                    backActions.pop();
                }
            }
        }
    }
    int minMatchSet = backActions.size();

    // print alignment
    double realLengthStringOne = one.length() - 1;
    double realLenghtStringTwo = two.length() - 1;
    double totalOfMatricesElement = realLengthStringOne + realLenghtStringTwo;

    double value = (2 * maxMatchSet / totalOfMatricesElement) * 100;

    System.out.println("2 * " + maxMatchSet + " / " + "( " + realLengthStringOne + " + " + realLenghtStringTwo + " ) " + "= " + value + "%");


    return value;
}

public void printMatrix() {

    for (int i = 0; i < one.length(); i++) {
        if (i == 0) {
            for (int z = 0; z < two.length(); z++) {
                if (z == 0) {
                    System.out.print("  \t");
                }
                System.out.print(two.charAt(z) + " \t");

                if (z == two.length() - 1) {
                    System.out.println();
                }
            }
        }

        for (int j = 0; j < two.length(); j++) {
            if (j == 0) {
                System.out.print(one.charAt(i) + " \t");
            }
            System.out.print(matrix[i][j] + " \t");
        }
        System.out.println();
    }
    System.out.println();
}

public static void main(String[] args) {
    // DNA sequence Test:

    SmithWaterman sw = new SmithWaterman("ahmad", "achmad");
    System.out.println("Alignment Score: " + sw.computeSmithWaterman());

    sw.printMatrix();

}
}

如果我有两个序列，比如 "ahmad", "ahmad"，输出是 = 100 %,

但是你知道，如果我有两个序列，如“ahmad”、“achmad”，输出是这样的:

2 * 6 / ( 5.0 + 6.0 ) = 109.09090909090908%

Alignment Score: 109.09090909090908
  -     a   c   h   m   a   d   
-   0   0   0   0   0   0   0   
a   0   2   1   0   0   2   1   
h   0   0   0   3   2   0   0   
m   0   0   0   0   5   4   2   
a   0   2   1   0   2   7   6   
d   0   0   0   0   0   1   9

我是否在执行过程中遇到了问题，我在代码中的哪里丢失了？

最佳答案

回答你为什么得到 150% 的直接问题

您的变量 longest = 6 并且您将值 one 和 two 设置为 '-' + one 和 '-'+ 两个 所以数学是 2 * 6/8 = 12/8 = 1.5 * 100 = 150%.

如果您使用one和two的原始长度，您可能会得到正确答案。

但是，我认为您的方法可能存在缺陷:

您的变量 longest 不是比赛的长度，而是矩阵中的最高分。这是 Smith-Waterman 比对分数。这现在可行，因为您正在对齐完美匹配并使用 +2 的匹配分数，但我不确定这是否适用于非完美匹配。

此值表示通过矩阵的最佳评分(部分)对齐路径。虽然这条路径通常是最长的路径，但它不一定是。其他地方可能会有更长但更糟糕的得分路径。

此外，你的开局缺口 -2 和扩展 -1 的比赛惩罚意味着多个连续的缺口将使你的比赛分数不再是偶数。

要实际查看您的对齐有多长，您实际上必须从得分最高的位置开始通过矩阵回溯，直到到达矩阵中得分为 0 的单元格。 (因为这是 Smith-Waterman，它允许局部比对而不是全长全局比对)。

您已经在您的 actions 代码块中执行了类似的操作。但是，您可能需要考虑如何将插入和删除视为长度的一部分。如果你想计算它们，那么最长的对齐长度就是 actions.size()

关于java - 获得超过 100% 的 Smith-Waterman 计算，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24702888/

文章推荐： r - 反复从矩阵s.t.中删除列和行。行和列最小值的平均值最小

文章推荐： java - 如何操作数组。求平均值。初学者Java

文章推荐： java - 如何观察变量的变化

文章推荐： algorithm - 点与高度图之间的距离

algorithm - Smith-Waterman 算法路径检测说明
我将 Smith-Waterman 算法用于一本书的家庭作业，制作了一个值表。一旦我了解了如何获取值，构建表格就很容易了，但现在我很难从表格中确定最佳比对序列。表格示例是按照公式生成的 min(
algorithm - 关于 Smith-Waterman 算法的问题
我正在使用 Smith-Waterman 算法运行一些字符串匹配测试。我目前正在使用 SimMetrics(Java 开源项目)来运行测试。谁能解释为什么我比较“Bloggs J”。到“Bloggs
python - 我如何决定在 Smith–Waterman 算法中回溯的方式？
我正在尝试使用 Smith–Waterman algorithm 在 Python 中实现局部序列比对. 这是我目前所拥有的。它就构建了 similarity matrix : import sys,
python - 实现 Waterman-Eggert 算法
我正在尝试实现 Waterman-Eggert 算法来查找次优局部序列比对，但我正在努力了解如何“解聚”单独的比对组。我的基本 Smith-Waterman 算法运行良好。将以下序列与自身对齐的简单
python - 具有仿射间隙惩罚的 Smith-Waterman 算法中的回溯
我正在尝试使用仿射间隙罚函数实现用于局部序列比对的 Smith-Waterman 算法。我想我了解如何启动和计算计算比对分数所需的矩阵，但对于如何回溯以找到比对一无所知。要生成所需的 3 个矩阵，我有
php - Smith–Waterman 用于 PHP 中的字符串？
我需要 smith-waterman 在 php 中，你知道是否已经有一个实现吗？我需要这个算法来进行邻近搜索 ( Function that returns affinity between te
algorithm - 我如何修改 Smith and Waterman perl 脚本以加快速度？
我找到了这个 perl 脚本，但是我有太多的序列需要分析。我想知道是否可以优化它？我在上面启动了 NYTProf，发现“计算匹配分数”、“计算差距分数”和“选择最佳分数”部分花费了大量时间。我必须修改
algorithm - CUDA 中 Smith-Waterman 算法的矩阵填充
我需要有关如何在 CUDA 中优化 Smith-Waterman 算法实现的建议。我要优化的部分是填充矩阵。由于矩阵元素之间的数据依赖性(每个下一个元素都依赖于其他元素 - 左到它，到它，再到它)，
c - 直接从我的应用程序执行 BLAST/Smith Waterman 搜索
我正在开发一个小型应用程序，并考虑将 BLAST 或其他本地比对搜索集成到我的应用程序中。我的搜索只调出程序，需要作为外部程序安装和调用。从头开始实现它还有什么办法吗？可能有任何预制库吗？最佳答案
perl - 如何使用替换矩阵修改 Smith-Waterman 算法以比对 Perl 中的蛋白质？
如何修改 Smith-Waterman algorithm在 Perl 中使用替换矩阵比对蛋白质？ [需要引用] 最佳答案我实际上是一名生物信息学研究人员，并且正在等待他自己的生物信息学代码运行，所
java - 获得超过 100% 的 Smith-Waterman 计算
我编写了一些实现 Smith-Waterman 算法的 Java 代码。我有一个日记条目 here是这样说的 sim (X, Y) = 2 . SLength (X, Y) / Length

塔克拉玛干

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 获得超过 100% 的 Smith-Waterman 计算