gpt4 book ai didi

python - Jellyfish 的 Damerau–Levenshtein 距离计算有问题吗?

转载 作者:行者123 更新时间:2023-11-28 18:45:55 26 4
gpt4 key购买 nike

我正在尝试使用 Jellyfish使用模糊字符串。我注意到 Damerau–Levenshtein distance 的一些奇怪行为算法。例如:

import jellyfish as jf
In [0]: jf.damerau_levenshtein_distance('ZX', 'XYZ')
Out[0]: 3
In [1]: jf.damerau_levenshtein_distance('BADC', 'ABCD')
Out[1]: 3

在我看来,两者都应该得 2 分。

在第一个例子中:

  1. ZXXZ(转置相邻字符)
  2. XZXYZ(插入 Y)

在第二个例子中:

  1. BACDABDC(转置相邻的BA字符)
  2. ABDCABCD(转置相邻的DC字符)

这是算法有问题,还是我误解了度量?任何指导将不胜感激。

编辑

为了让事情更奇特,我还观察到以下几点:

In [3]: jf.damerau_levenshtein_distance('jellyifhs', 'jellyfish')
Out[3]: 2
In [4]: jf.damerau_levenshtein_distance('ifhs', 'fish')
Out[4]L 3

这特别奇怪,因为两个示例中的编辑次数不仅应该是两次,而且它们是完全相同的编辑:

在第三个例子中:

  1. jellyifhsjellyfihs(转置相邻字符if)
  2. jellyfihsjellyfish(转置相邻字符hs)

在第四个例子中:

  1. ifhsfihs(转置相邻字符if)
  2. fihsfish(转置相邻字符hs)

最佳答案

来自 wikipedia

In information theory and computer science, the Damerau–Levenshtein distance (named after Frederick J. Damerau and Vladimir I. Levenshtein)[citation needed] is a "distance" (string metric) between two strings, i.e., finite sequence of symbols, given by counting the minimum number of operations needed to transform one string into the other, where an operation is defined as an insertion, deletion, or substitution of a single character, or a transposition of two adjacent characters.

但如果你进一步阅读,

Take for example the edit distance between CA and ABC. The Damerau–Levenshtein distance LD(CA,ABC) = 2 because CA -> AC -> ABC, but the optimal string alignment distance OSA(CA,ABC) = 3 because if the operation CA -> AC is used, it is not possible to use AC -> ABC because that would require the substring to be edited more than once, which is not allowed in OSA, and therefore the shortest sequence of operations is CA -> A -> AB -> ABC. Note that for the optimal string alignment distance, the triangle inequality does not hold: OSA(CA,AC) + OSA(AC,ABC) < OSA(CA,ABC), and so it is not a true metric.

编辑:

查看 source 后,很明显该函数计算的是 OSA 而不是 Damerau–Levenshtein 距离

关于python - Jellyfish 的 Damerau–Levenshtein 距离计算有问题吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20258800/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com