python - python 中的字符串比较但不是 Levenshtein 距离(我认为)-6ren

python - python 中的字符串比较但不是 Levenshtein 距离(我认为)

转载作者：行者123 更新时间：2023-11-28 19:25:59

25

4

我在一篇正在阅读的论文中发现了一个粗略的字符串比较，如下所示:

他们使用的等式如下(从论文中摘录，稍作修改以使其更通用和可读)由于作者的描述不是很清楚(使用作者的示例)，我尝试用自己的话解释更多

例如对于2个序列ABCDE和BCEFA，有两种可能的图

图 1) 连接 B 与 B C 与 C 和 E 与 E

图 2) 将 A 与 A 连接起来

当我连接其他三个(图 1)时，我无法将 A 与 A 连接，因为那将是交叉线(假设您在 B-B、C-C 和 E-E 之间画线)；也就是说，A-A 线将穿过连接 B-B、C-C 和 E-E 的线。所以这两个序列产生了 2 个可能的图形；一个有 3 个连接(BB、CC 和 EE)，另一个只有一个(AA)，然后我按照下面的等式计算得分 d。

Consequently, to define the degree of similarity between twopenta-strings we calculate the distance d between them. Aligning thetwo penta-strings, we look for all the identities between theircharacters, wherever these may be located. If each identity isrepresented by a link between both penta-strings, we define a graphfor this pair. We call any part of this graph a configuration.

Next, we retain all of those configurations in which there is no charactercross pairing (the meaning is explained in my example above, i.e., no crossings of links between identical characters and only those graphs are retained).Each of these is then evaluated as a function of thenumber p of characters related to the graph, the shifting Δi for thecorresponding pairs and the gap δij between connected characters ofeach penta-string. The minimum value is chosen as characteristic andis called distance d: d Min(50 – 10p + ΣΔi + Σδij) Although very rough,this measure is generally in good agreement with the qualitative eyeguided estimation. For instance, the distance between abcde and abcfgis 20, whereas that between abcde and abfcg is 23 =(50 – 30 + 1 +2).

我对如何着手做这件事感到困惑。任何能帮助我的建议都将不胜感激。

我尝试了 Levenshtein 以及用于蛋白质序列比较的简单序列比对该论文的链接是: http://peds.oxfordjournals.org/content/16/2/103.long

我找不到有关第一作者 Alain Figureau 的任何信息，我发给 MA Soto 的电子邮件也没有得到回复(截至今天)。

谢谢

最佳答案

嗯，这绝对不是 Levenshtein:

>>> from nltk import metrics
>>> metrics.distance.edit_distance('abcde','abcfg')
2
>>> metrics.distance.edit_distance('abcde','abfcg')
3
>>> help(metrics.distance.edit_distance)
Help on function edit_distance in module nltk.metrics.distance:

edit_distance(s1, s2)
    Calculate the Levenshtein edit-distance between two strings.
    The edit distance is the number of characters that need to be
    substituted, inserted, or deleted, to transform s1 into s2.  For
    example, transforming "rain" to "shine" requires three steps,
    consisting of two substitutions and one insertion:
    "rain" -> "sain" -> "shin" -> "shine".  These operations could have
    been done in other orders, but at least three steps are needed.

    @param s1, s2: The strings to be analysed
    @type s1: C{string}
    @type s2: C{string}
    @rtype C{int}

关于python - python 中的字符串比较但不是 Levenshtein 距离(我认为)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13166089/

25

4

0

文章推荐： python - python中的破坏性操作？

文章推荐： javascript - 使用 html jQuery 对象时 jQuery .data 不起作用

文章推荐： javascript - 目标位于同一中

文章推荐： css - 不同的 Material Icons 样式不对齐

algorithm - 距离(B)+ 距离(A-B)
A是不同元素的序列，B是A的子序列，A-B是A中的所有元素，但不是B中的所有元素距离(A) = 总和|a(i)-a(i+1)|从 i=1 到 n-1找到一个子序列 B 使得 Dist(B)+Dist(
r - 许多矩阵对之间的相似性/距离
我想通过计算每对中所有(多维)点集之间距离的平均值来量化组相似性。我可以很容易地手动为每对组手动完成此操作，如下所示: library(dplyr) library(tibble) library(
OpenXML 距离、大小单位
在 OpenXML 中用于指定大小或 X、Y 坐标的度量单位是什么？ (介绍)。将那些与像素匹配是否有意义，如果是这样，那些如何转换为像素？ graphicFrame.Transform = new
r - 如何替换过渡层中的值？ (距离)
我想知道是否有人可以帮助我替换过渡层中的值。如果我尝试: transitionlayer[transitionlayer >= 0.14] = 0.14 : comparison (5) is
Firebase - 按自定义功能排序(距离)
我在 firebase 中有一个列表，其中包括地理位置(经度和纬度)，并且我想获得距给定坐标最近的 10 个位置。我正在从 MySQL 过渡，在那里我将计算 SELECT 中的距离, 并在 ORDE
Python根据2个GPS坐标计算速度、距离、方向
如何在 Python 中根据 2 个 GPS 坐标计算速度、距离和方向(度)？每个点都有纬度、经度和时间。我在这篇文章中找到了半正矢距离计算: Calculate distance between
java - 距离出租车几何形状
关闭。这个问题需要多问focused 。目前不接受答案。想要改进此问题吗？更新问题，使其仅关注一个问题 editing this post . 已关闭 6 年前。 Improve this ques
python - 标记曲线之间的最大偏差/距离
我只想使用 matplotlib 标记两条曲线之间发生最大偏差的位置。请帮助我。垂直距离适用于 Kolmogorov–Smirnov test import numpy as np %matplot
linux - 查找重复行之间的平均时间/距离
我有一个包含数万行重复项的文件。我想根据行号找到重复项之间的平均时间/距离。例如:(其中第一列是行号) 1 string1 2 string2 3 string2 4 string1 5 strin
使用公式速度=距离/时间计算时间
用公式speed=distance/time计算时间但时间总是0我的输入是 distance=10 和 speed=5 我的输出必须 = 2 #include int main() { in
字符串相似度 -> Levenshtein 距离
我正在使用 Levenshtein 算法来查找两个字符串之间的相似性。这是我正在制作的程序的一个非常重要的部分，因此它需要有效。问题是该算法没有发现以下示例相似: CONAIR AIRCON 算法给出
mysql - 距离+关键词搜索方案
对于一个房地产网站，我需要实现一个允许搜索文本和距离的搜索机制。当 lat 和 lon 记录在单独的列中时，在 MySQL 表上进行距离计算很容易，但房子往往有 LOT true/false 属性。
iphone - UIPanGestureRecognizer 距离
是否可以在触发前更改 UIPanGestureRecognizer 的距离？目前的实现似乎在触发前有 5-10 像素的距离余量，我想降低它如果可能的话。原因是我将 UIPanGestureRecog
3d - 计算两个网格之间的 Hausdorff 距离
我试图找到两个网格之间的偏差。例如在 3d 空间中定义的两组点之间的差异，我计划使用一些 3d 可视化工具来可视化距离，例如QT3d 或一些基于开放式 gl 的库。我有两组网格，基本上是两个 .ST
excel - 找出哪些细胞具有最小的 levenshtein 距离
所以，我有这个函数可以快速返回两个字符串之间的 Levenshtein 距离: Function Levenshtein(ByVal string1 As String, ByVal string2
OCR:加权 Levenshtein 距离
我正在尝试用字典创建一个光学字符识别系统。事实上，我还没有实现字典=) 我听说有一些基于 Levenstein 距离的简单指标，这些指标考虑了不同符号之间的不同距离。例如。 'N' 和 'H' 彼此
gis - 带有经纬度SRID的PostGIS中的真实(大圆)距离？
我在PostGIS数据库(-4326)中使用经纬度/经度SRID。我想以一种有效的方式找到最接近给定点的点。我试图做一个 ORDER BY ST_Distance(point, ST_GeomF
r - 沿线串查找坐标 x 距离
我想从线串的一端开始提取沿线串已知距离处的点的坐标。例如: library(sf) path % group_by(L1) %>% summarise(do_union =
r - 确定基于序列(距离)的聚类的理想聚类数
我已经编写了这些用于聚类基于序列的数据的函数: library(TraMineR) library(cluster) clustering <- function(data){ data <- s
iphone - 是否可以设置线之间的 UILabel 距离？
是否可以设置 UILabel 的行之间的距离，因为我有一个 UILabel 包含 3 行，并且换行模式是自动换行？最佳答案如果您指的是“前导”，它指的是类型行之间的间隙 - 您无法在 UILabe

首页

博学

6Ren·AI

商城

python - python 中的字符串比较但不是 Levenshtein 距离(我认为)