gpt4 book ai didi

python - 确定两个大二进制文件的差异?

转载 作者:行者123 更新时间:2023-12-01 03:12:14 25 4
gpt4 key购买 nike

我需要比较并获取 2 个大二进制文件(最多 100 MB)的差异。

对于 ASCII 格式,我可以使用这个:

import difflib
file1 = open('large1.txt', 'r')
file2 = open('large2.txt', 'r')
diff = difflib.ndiff(file1.readlines(), file2.readlines())
difference = ''.join(x[2:] for x in diff if x.startswith('- '))
print(difference)

如何使其适用于二进制文件?尝试了不同的编码、二进制读取模式,但还没有任何效果。

编辑:我使用 .vcl 二进制文件。

最佳答案

difflib 对于大文件来说会非常慢,100MB 将被归类为非常大......

Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst case and quadratic time in the expected case. SequenceMatcher is quadratic time for the worst case and has expected-case behavior dependent in a complicated way on how many elements the sequences have in common; best case time is linear.

如果您可以忍受缓慢的情况,请尝试 difflib.SequenceMatcher,它几乎适用于所有类型的数据。

This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable.

Pythone Doc - class difflib.SequenceMatcher

关于python - 确定两个大二进制文件的差异?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42769750/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com