gpt4 book ai didi

java - 获得超过 100% 的 Smith-Waterman 计算

转载 作者:塔克拉玛干 更新时间:2023-11-03 04:00:27 25 4
gpt4 key购买 nike

我编写了一些实现 Smith-Waterman 算法的 Java 代码。我有一个日记条目 here是这样说的

sim  (X, Y)  = 2 . SLength  (X, Y) / Length  ( X)  +  Length  (Y  ) 


X and Y are the sequences for comparison.
SLength(X, Y) is the string length of the maximum matching set.
Length(X) is the number of characters in the sequence X.
Length(Y) is the number of characters in the sequence Y.
The result of this function (sim) is a real number, O<=sim<=1.
The larger SIM is, the stronger the two programs similarity is,
and the plagiarism possibility larger, vice versa.

这是我的 Smith-Waterman 代码

public class SmithWaterman {

private final String one, two;
private final int matrix[][];
private int gap;
private final int match;
private final int o;
private int l;
private final int e;

public SmithWaterman(String one, String two) {
this.one = "-" + one.toLowerCase();
this.two = "-" + two.toLowerCase();
this.match = 2;

// Define affine gap starting values
o = -2;
l = -1;
e = -1;

// initialize matrix to 0
matrix = new int[one.length() + 1][two.length() + 1];
for (int i = 0; i < one.length(); i++) {
for (int j = 0; j < two.length(); j++) {
matrix[i][j] = 0;
}
}

}

// returns the alignment score
/**
* @return
*/
public double computeSmithWaterman() {
for (int i = 0; i < one.length(); i++) {
for (int j = 0; j < two.length(); j++) {
gap = o + (l - 1) * e;
if (i != 0 && j != 0) {
if (one.charAt(i) == two.charAt(j)) {
// match
// reset l
l = 0;
matrix[i][j] = Math.max(0, Math.max(
matrix[i - 1][j - 1] + match, Math.max(
matrix[i - 1][j] + gap,
matrix[i][j - 1] + gap)));
} else {
// gap
l++;
matrix[i][j] = Math.max(0, Math.max(
matrix[i - 1][j - 1] + gap, Math.max(
matrix[i - 1][j] + gap,
matrix[i][j - 1] + gap)));
}
}
}
}

// find the highest value
double longest = 0;
int iL = 0, jL = 0;
for (int i = 0; i < one.length(); i++) {
for (int j = 0; j < two.length(); j++) {
if (matrix[i][j] > longest) {
longest = matrix[i][j];
iL = i;
jL = j;
}
}
}

// Backtrack to reconstruct the path
int i = iL;
int j = jL;
Stack<String> actions = new Stack<String>();

while (i != 0 && j != 0) {
// diag case
if (Math.max(matrix[i - 1][j - 1],
Math.max(matrix[i - 1][j], matrix[i][j - 1])) == matrix[i - 1][j - 1]) {
actions.push("align");
i = i - 1;
j = j - 1;
// left case
} else if (Math.max(matrix[i - 1][j - 1],
Math.max(matrix[i - 1][j], matrix[i][j - 1])) == matrix[i][j - 1]) {
actions.push("insert");
j = j - 1;
// up case
} else {
actions.push("delete");
i = i - 1;
}
}

int maxMatchSet = actions.size();

String alignOne = new String();
String alignTwo = new String();

Stack<String> backActions = (Stack<String>) actions.clone();
for (int z = 0; z < one.length(); z++) {
alignOne = alignOne + one.charAt(z);
if (!actions.empty()) {
String curAction = actions.pop();

if (curAction.equals("insert")) {
alignOne = alignOne + "-";
while (actions.peek().equals("insert")) {
alignOne = alignOne + "-";
actions.pop();
}
}
}
}

for (int z = 0; z < two.length(); z++) {
alignTwo = alignTwo + two.charAt(z);
if (!backActions.empty()) {
String curAction = backActions.pop();
if (curAction.equals("delete")) {
alignTwo = alignTwo + "-";
while (backActions.peek().equals("delete")) {
alignTwo = alignTwo + "-";
backActions.pop();
}
}
}
}
int minMatchSet = backActions.size();

// print alignment
double realLengthStringOne = one.length() - 1;
double realLenghtStringTwo = two.length() - 1;
double totalOfMatricesElement = realLengthStringOne + realLenghtStringTwo;

double value = (2 * maxMatchSet / totalOfMatricesElement) * 100;

System.out.println("2 * " + maxMatchSet + " / " + "( " + realLengthStringOne + " + " + realLenghtStringTwo + " ) " + "= " + value + "%");


return value;
}

public void printMatrix() {

for (int i = 0; i < one.length(); i++) {
if (i == 0) {
for (int z = 0; z < two.length(); z++) {
if (z == 0) {
System.out.print(" \t");
}
System.out.print(two.charAt(z) + " \t");

if (z == two.length() - 1) {
System.out.println();
}
}
}

for (int j = 0; j < two.length(); j++) {
if (j == 0) {
System.out.print(one.charAt(i) + " \t");
}
System.out.print(matrix[i][j] + " \t");
}
System.out.println();
}
System.out.println();
}

public static void main(String[] args) {
// DNA sequence Test:

SmithWaterman sw = new SmithWaterman("ahmad", "achmad");
System.out.println("Alignment Score: " + sw.computeSmithWaterman());

sw.printMatrix();

}
}

如果我有两个序列,比如 "ahmad", "ahmad",输出是 = 100 %,

但是你知道,如果我有两个序列,如“ahmad”、“achmad”,输出是这样的:

2 * 6 / ( 5.0 + 6.0 ) = 109.09090909090908%

Alignment Score: 109.09090909090908
- a c h m a d
- 0 0 0 0 0 0 0
a 0 2 1 0 0 2 1
h 0 0 0 3 2 0 0
m 0 0 0 0 5 4 2
a 0 2 1 0 2 7 6
d 0 0 0 0 0 1 9

我是否在执行过程中遇到了问题,我在代码中的哪里丢失了?

最佳答案

回答你为什么得到 150% 的直接问题

您的变量 longest = 6 并且您将值 onetwo 设置为 '-' + one'-'+ 两个 所以数学是 2 * 6/8 = 12/8 = 1.5 * 100 = 150%.

如果您使用onetwo的原始长度,您可能会得到正确答案。

但是,我认为您的方法可能存在缺陷:

您的变量 longest 不是比赛的长度,而是矩阵中的最高分。这是 Smith-Waterman 比对分数。这现在可行,因为您正在对齐完美匹配并使用 +2 的匹配分数,但我不确定这是否适用于非完美匹配。

此值表示通过矩阵的最佳评分(部分)对齐路径。虽然这条路径通常是最长的路径,但它不一定是。其他地方可能会有更长但更糟糕的得分路径。

此外,你的开局缺口 -2 和扩展 -1 的比赛惩罚意味着多个连续的缺口将使你的比赛分数不再是偶数。

要实际查看您的对齐有多长,您实际上必须从得分最高的位置开始通过矩阵回溯,直到到达矩阵中得分为 0 的单元格。 (因为这是 Smith-Waterman,它允许局部比对而不是全长全局比对)。

您已经在您的 actions 代码块中执行了类似的操作。但是,您可能需要考虑如何将插入和删除视为长度的一部分。如果你想计算它们,那么最长的对齐长度就是 actions.size()

关于java - 获得超过 100% 的 Smith-Waterman 计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24702888/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com