gpt4 book ai didi

python - Python 的 SequenceMatcher 是如何工作的?

转载 作者:太空狗 更新时间:2023-10-29 18:27:35 26 4
gpt4 key购买 nike

我对 SequenceMatcher 返回的两个不同答案感到有点困惑取决于参数的顺序。为什么会这样?

例子

SequenceMatcher 不可交换:

>>> from difflib import SequenceMatcher
>>> SequenceMatcher(None, "Ebojfm Mzpm", "Ebfo ef Mfpo").ratio()
0.6086956521739131

>>> SequenceMatcher(None, "Ebfo ef Mfpo", "Ebojfm Mzpm").ratio()
0.5217391304347826

最佳答案

SequenceMatcher.ratio 内部使用 SequenceMatcher.get_matching_blocks 来计算比率,我将引导您完成这些步骤以了解这是如何发生的:

SequenceMatcher.get_matching_blocks

Return list of triples describing matching subsequences. Each triple is of the form (i, j, n), and means that a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in i and j.

The last triple is a dummy, and has the value (len(a), len(b), 0). It is the only triple with n == 0. If (i, j, n) and (i', j', n') are adjacent triples in the list, and the second is not the last triple in the list, then i+n != i' or j+n != j'; in other words, adjacent triples always describe non-adjacent equal blocks.

ratio 在内部使用 SequenceMatcher.get_matching_blocks 的结果,并对 SequenceMatcher.get_matching_blocks 返回的所有匹配序列的大小求和。这是来自 difflib.py 的确切源代码:

matches = sum(triple[-1] for triple in self.get_matching_blocks())

上面一行很关键,因为上面表达式的结果是用来计算比率的。我们很快就会看到这一点,以及它如何影响比率的计算。


>>> m1 = SequenceMatcher(None, "Ebojfm Mzpm", "Ebfo ef Mfpo")
>>> m2 = SequenceMatcher(None, "Ebfo ef Mfpo", "Ebojfm Mzpm")

>>> matches1 = sum(triple[-1] for triple in m1.get_matching_blocks())
>>> matches1
7
>>> matches2 = sum(triple[-1] for triple in m2.get_matching_blocks())
>>> matches2
6

如您所见,我们有 7 和 6。这些只是 get_matching_blocks 返回的匹配子序列的总和。为什么这很重要?这就是为什么,比率是按以下方式计算的,(这来自 difflib 源代码):

def _calculate_ratio(matches, length):
if length:
return 2.0 * matches / length
return 1.0

lengthlen(a) + len(b) 其中 a 是第一个序列,b作为第二个序列。

好了,废话少说,我们需要行动:

>>> length = len("Ebojfm Mzpm") + len("Ebfo ef Mfpo") 
>>> m1.ratio()
0.6086956521739131
>>> (2.0 * matches1 / length) == m1.ratio()
True

m2 类似:

>>> 2.0 * matches2 / length
0.5217391304347826
>>> (2.0 * matches2 / length) == m2.ratio()
True

注意:并非所有 SequenceMatcher(None a,b).ratio() == SequenceMatcher(None b,a).ratio() 都是 False ,有时它们可​​以是 True:

>>> s1 = SequenceMatcher(None, "abcd", "bcde").ratio()
>>> s2 = SequenceMatcher(None, "bcde", "abcd").ratio()
>>> s1 == s2
True

如果你想知道为什么,这是因为

sum(triple[-1] for triple in self.get_matching_blocks())

对于 SequenceMatcher(None, "abcd", "bcde")SequenceMatcher(None, "bcde", "abcd") 是相同的 < strong>3.

关于python - Python 的 SequenceMatcher 是如何工作的?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35517353/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com