gpt4 book ai didi

javascript - Damerau-Levenshtein 距离实现

转载 作者:塔克拉玛干 更新时间:2023-11-03 03:43:39 25 4
gpt4 key购买 nike

我正在尝试在 JS 中创建一个 damerau-levenshtein 距离函数。

我在 WIkipedia 上找到了关于该算法的描述,但它们没有实现。它说:

To devise a proper algorithm to calculate unrestrictedDamerau–Levenshtein distance note that there always exists an optimalsequence of edit operations, where once-transposed letters are nevermodified afterwards. Thus, we need to consider only two symmetric waysof modifying a substring more than once: (1) transpose letters andinsert an arbitrary number of characters between them, or (2) delete asequence of characters and transpose letters that become adjacentafter deletion. The straightforward implementation of this idea givesan algorithm of cubic complexity: O\left (M \cdot N \cdot \max(M, N)\right ), where M and N are string lengths. Using the ideas ofLowrance and Wagner,[7] this naive algorithm can be improved to beO\left (M \cdot N \right) in the worst case. It is interesting thatthe bitap algorithm can be modified to process transposition. See theinformation retrieval section of[1] for an example of such anadaptation.

https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance

第 [1] 节指向 http://acl.ldc.upenn.edu/P/P00/P00-1037.pdf这对我来说更复杂。

如果我没有理解错的话,要基于它创建一个实现并不是那么容易。

这是我目前使用的 levenshtein 实现:

levenshtein=function (s1, s2) {
// discuss at: http://phpjs.org/functions/levenshtein/
// original by: Carlos R. L. Rodrigues (http://www.jsfromhell.com)
// bugfixed by: Onno Marsman
// revised by: Andrea Giammarchi (http://webreflection.blogspot.com)
// reimplemented by: Brett Zamir (http://brett-zamir.me)
// reimplemented by: Alexander M Beedie
// example 1: levenshtein('Kevin van Zonneveld', 'Kevin van Sommeveld');
// returns 1: 3

if (s1 == s2) {
return 0;
}

var s1_len = s1.length;
var s2_len = s2.length;
if (s1_len === 0) {
return s2_len;
}
if (s2_len === 0) {
return s1_len;
}

// BEGIN STATIC
var split = false;
try {
split = !('0')[0];
} catch (e) {
// Earlier IE may not support access by string index
split = true;
}
// END STATIC
if (split) {
s1 = s1.split('');
s2 = s2.split('');
}

var v0 = new Array(s1_len + 1);
var v1 = new Array(s1_len + 1);

var s1_idx = 0,
s2_idx = 0,
cost = 0;
for (s1_idx = 0; s1_idx < s1_len + 1; s1_idx++) {
v0[s1_idx] = s1_idx;
}
var char_s1 = '',
char_s2 = '';
for (s2_idx = 1; s2_idx <= s2_len; s2_idx++) {
v1[0] = s2_idx;
char_s2 = s2[s2_idx - 1];

for (s1_idx = 0; s1_idx < s1_len; s1_idx++) {
char_s1 = s1[s1_idx];
cost = (char_s1 == char_s2) ? 0 : 1;
var m_min = v0[s1_idx + 1] + 1;
var b = v1[s1_idx] + 1;
var c = v0[s1_idx] + cost;
if (b < m_min) {
m_min = b;
}
if (c < m_min) {
m_min = c;
}
v1[s1_idx + 1] = m_min;
}
var v_tmp = v0;
v0 = v1;
v1 = v_tmp;
}
return v0[s1_len];
}

对于构建这样的算法,您有什么想法?如果您认为它太复杂,我可以做些什么来区分“l”(L 小写)和“I”(i 大写)。

最佳答案

要点@doukremt 给出:https://gist.github.com/doukremt/9473228

在 Javascript 中给出以下内容。

您可以在权重对象中更改操作的权重。

var levenshteinWeighted= function(seq1,seq2)
{
var len1=seq1.length;
var len2=seq2.length;
var i, j;
var dist;
var ic, dc, rc;
var last, old, column;

var weighter={
insert:function(c) { return 1.; },
delete:function(c) { return 0.5; },
replace:function(c, d) { return 0.3; }
};

/* don't swap the sequences, or this is gonna be painful */
if (len1 == 0 || len2 == 0) {
dist = 0;
while (len1)
dist += weighter.delete(seq1[--len1]);
while (len2)
dist += weighter.insert(seq2[--len2]);
return dist;
}

column = []; // malloc((len2 + 1) * sizeof(double));
//if (!column) return -1;

column[0] = 0;
for (j = 1; j <= len2; ++j)
column[j] = column[j - 1] + weighter.insert(seq2[j - 1]);

for (i = 1; i <= len1; ++i) {
last = column[0]; /* m[i-1][0] */
column[0] += weighter.delete(seq1[i - 1]); /* m[i][0] */
for (j = 1; j <= len2; ++j) {
old = column[j];
if (seq1[i - 1] == seq2[j - 1]) {
column[j] = last; /* m[i-1][j-1] */
} else {
ic = column[j - 1] + weighter.insert(seq2[j - 1]); /* m[i][j-1] */
dc = column[j] + weighter.delete(seq1[i - 1]); /* m[i-1][j] */
rc = last + weighter.replace(seq1[i - 1], seq2[j - 1]); /* m[i-1][j-1] */
column[j] = ic < dc ? ic : (dc < rc ? dc : rc);
}
last = old;
}
}

dist = column[len2];
return dist;
}

关于javascript - Damerau-Levenshtein 距离实现,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22308014/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com