gpt4 book ai didi

algorithm - 使用优化的 Levenshtein 算法寻找最近的邻居

转载 作者:塔克拉玛干 更新时间:2023-11-03 03:15:47 26 4
gpt4 key购买 nike

我最近posted a question关于优化计算 Levenshtein 距离的算法,回复将我引到关于 Levenshtein Distance 的维基百科文章.

文章提到,如果给定查询的可能结果的最大距离有一个界限 k,那么运行时间可以从 O(mn)O(kn)mn 是字符串的长度。我查阅了算法,但我真的不知道如何实现它。我希望在这里得到一些线索。

优化是“可能的改进”下的#4。

让我感到困惑的部分是说我们只需要计算宽度为 2k+1 的对角线条纹,以主对角线为中心(主对角线定义为坐标 (i ,i)).

如果有人可以提供一些帮助/见解,我将不胜感激。如果需要,我可以在此处发布书中算法的完整描述作为答案。

最佳答案

我已经做过很多次了。我这样做的方法是对可能发生变化的游戏树进行递归深度优先树遍历。有一个预算 k 的变化,我用它来修剪树。有了这个例程,首先我用 k=0 运行它,然后 k=1,然后 k=2,直到我被击中或我不想再高了。

char* a = /* string 1 */;
char* b = /* string 2 */;
int na = strlen(a);
int nb = strlen(b);
bool walk(int ia, int ib, int k){
/* if the budget is exhausted, prune the search */
if (k < 0) return false;
/* if at end of both strings we have a match */
if (ia == na && ib == nb) return true;
/* if the first characters match, continue walking with no reduction in budget */
if (ia < na && ib < nb && a[ia] == b[ib] && walk(ia+1, ib+1, k)) return true;
/* if the first characters don't match, assume there is a 1-character replacement */
if (ia < na && ib < nb && a[ia] != b[ib] && walk(ia+1, ib+1, k-1)) return true;
/* try assuming there is an extra character in a */
if (ia < na && walk(ia+1, ib, k-1)) return true;
/* try assuming there is an extra character in b */
if (ib < nb && walk(ia, ib+1, k-1)) return true;
/* if none of those worked, I give up */
return false;
}

添加以解释 trie 搜索:

// definition of trie-node:
struct TNode {
TNode* pa[128]; // for each possible character, pointer to subnode
};

// simple trie-walk of a node
// key is the input word, answer is the output word,
// i is the character position, and hdis is the hamming distance.
void walk(TNode* p, char key[], char answer[], int i, int hdis){
// If this is the end of a word in the trie, it is marked as
// having something non-null under the '\0' entry of the trie.
if (p->pa[0] != null){
if (key[i] == '\0') printf("answer = %s, hdis = %d\n", answer, hdis);
}
// for every actual subnode of the trie
for(char c = 1; c < 128; c++){
// if it is a real subnode
if (p->pa[c] != null){
// keep track of the answer word represented by the trie
answer[i] = c; answer[i+1] = '\0';
// and walk that subnode
// If the answer disagrees with the key, increment the hamming distance
walk(p->pa[c], key, answer, i+1, (answer[i]==key[i] ? hdis : hdis+1));
}
}
}
// Note: you have to edit this to handle short keys.
// Simplest is to just append a lot of '\0' bytes to the key.

现在,为了将其限制在预算范围内,如果 hdis 太大,则拒绝下降。

关于algorithm - 使用优化的 Levenshtein 算法寻找最近的邻居,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3195269/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com