gpt4 book ai didi

python - python 中的字符串比较但不是 Levenshtein 距离(我认为)

转载 作者:行者123 更新时间:2023-11-28 19:25:59 25 4
gpt4 key购买 nike

我在一篇正在阅读的论文中发现了一个粗略的字符串比较,如下所示:

他们使用的等式如下(从论文中摘录,稍作修改以使其更通用和可读)由于作者的描述不是很清楚(使用作者的示例),我尝试用自己的话解释更多

例如对于2个序列ABCDE和BCEFA,有两种可能的图

图 1) 连接 B 与 B C 与 C 和 E 与 E

图 2) 将 A 与 A 连接起来

当我连接其他三个(图 1)时,我无法将 A 与 A 连接,因为那将是交叉线(假设您在 B-B、C-C 和 E-E 之间画线);也就是说,A-A 线将穿过连接 B-B、C-C 和 E-E 的线。所以这两个序列产生了 2 个可能的图形;一个有 3 个连接(BB、CC 和 EE),另一个只有一个(AA),然后我按照下面的等式计算得分 d。

Consequently, to define the degree of similarity between twopenta-strings we calculate the distance d between them. Aligning thetwo penta-strings, we look for all the identities between theircharacters, wherever these may be located. If each identity isrepresented by a link between both penta-strings, we define a graphfor this pair. We call any part of this graph a configuration.

Next, we retain all of those configurations in which there is no charactercross pairing (the meaning is explained in my example above, i.e., no crossings of links between identical characters and only those graphs are retained).Each of these is then evaluated as a function of thenumber p of characters related to the graph, the shifting Δi for thecorresponding pairs and the gap δij between connected characters ofeach penta-string. The minimum value is chosen as characteristic andis called distance d: d Min(50 – 10p + ΣΔi + Σδij) Although very rough,this measure is generally in good agreement with the qualitative eyeguided estimation. For instance, the distance between abcde and abcfgis 20, whereas that between abcde and abfcg is 23 =(50 – 30 + 1 +2).

我对如何着手做这件事感到困惑。任何能帮助我的建议都将不胜感激。

我尝试了 Levenshtein 以及用于蛋白质序列比较的简单序列比对该论文的链接是: http://peds.oxfordjournals.org/content/16/2/103.long

我找不到有关第一作者 Alain Figureau 的任何信息,我发给 MA Soto 的电子邮件也没有得到回复(截至今天)。

谢谢

最佳答案

嗯,这绝对不是 Levenshtein:

>>> from nltk import metrics
>>> metrics.distance.edit_distance('abcde','abcfg')
2
>>> metrics.distance.edit_distance('abcde','abfcg')
3
>>> help(metrics.distance.edit_distance)
Help on function edit_distance in module nltk.metrics.distance:

edit_distance(s1, s2)
Calculate the Levenshtein edit-distance between two strings.
The edit distance is the number of characters that need to be
substituted, inserted, or deleted, to transform s1 into s2. For
example, transforming "rain" to "shine" requires three steps,
consisting of two substitutions and one insertion:
"rain" -> "sain" -> "shin" -> "shine". These operations could have
been done in other orders, but at least three steps are needed.

@param s1, s2: The strings to be analysed
@type s1: C{string}
@type s2: C{string}
@rtype C{int}

关于python - python 中的字符串比较但不是 Levenshtein 距离(我认为),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13166089/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com