gpt4 book ai didi

python - 什么是 _md5.md5,为什么 hashlib.md5 这么慢?

转载 作者:行者123 更新时间:2023-12-03 14:23:33 31 4
gpt4 key购买 nike

发现此未记录 _md5当对缓慢的 stdlib 感到沮丧时 hashlib.md5执行。

在 macbook 上:

>>> timeit hashlib.md5(b"hello world")
597 ns ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit _md5.md5(b"hello world")
224 ns ± 3.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> _md5
<module '_md5' from '/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_md5.cpython-37m-darwin.so'>

在 Windows 盒子上:
>>> timeit hashlib.md5(b"stonk overflow")
328 ns ± 21.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit _md5.md5(b"stonk overflow")
110 ns ± 12.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> _md5
<module '_md5' (built-in)>

在 Linux 机器上:
>>> timeit hashlib.md5(b"https://adventofcode.com/2016/day/5")
259 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit _md5.md5(b"https://adventofcode.com/2016/day/5")
102 ns ± 0.0576 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> _md5
<module '_md5' from '/usr/local/lib/python3.8/lib-dynload/_md5.cpython-38-x86_64-linux-gnu.so'>

对于散列短消息,速度更快。对于长消息,类似的性能。

为什么它隐藏在下划线扩展模块中,为什么在 hashlib 中默认不使用这种更快的实现? 什么是_md5模块为什么它没有公共(public) API?

最佳答案

Python 公共(public)模块将方法委托(delegate)给隐藏模块是很常见的。

例如collections.abc的完整代码模块是:

from _collections_abc import *
from _collections_abc import __all__

The functions of hashlib are dynamically created :
for __func_name in __always_supported:
# try them all, some may not work due to the OpenSSL
# version not supporting that algorithm.
try:
globals()[__func_name] = __get_hash(__func_name)

The definition of always_supported is :
__always_supported = ('md5', 'sha1', 'sha224', 'sha256', 'sha384', 'sha512',
'blake2b', 'blake2s',
'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512',
'shake_128', 'shake_256')

And get_hash 要么 __get_openssl_constructor__get_builtin_constructor :
try:
import _hashlib
new = __hash_new
__get_hash = __get_openssl_constructor
algorithms_available = algorithms_available.union(
_hashlib.openssl_md_meth_names)
except ImportError:
new = __py_new
__get_hash = __get_builtin_constructor

__get_builtin_constructor is a fallback for the (again) hidden _hashlib module :
def __get_openssl_constructor(name):
if name in __block_openssl_constructor:
# Prefer our blake2 and sha3 implementation.
return __get_builtin_constructor(name)
try:
f = getattr(_hashlib, 'openssl_' + name)
# Allow the C module to raise ValueError. The function will be
# defined but the hash not actually available thanks to OpenSSL.
f()
# Use the C function directly (very fast)
return f
except (AttributeError, ValueError):
return __get_builtin_constructor(name)

以上在 hashlib code ,你有这个:
def __get_builtin_constructor(name):
cache = __builtin_constructor_cache
...
elif name in {'MD5', 'md5'}:
import _md5
cache['MD5'] = cache['md5'] = _md5.md5

但是 md5不在 __block_openssl_constructor ,因此 _hashlib/openssl版本优于 _md5/builtin版本:

REPL 中的确认:
>>> hashlib.md5
<built-in function openssl_md5>
>>> _md5.md5
<built-in function md5>

这些函数是 MD5 算法和 openssl_md5 的不同实现。调用动态系统库。这就是为什么你有一些性能变化。第一个版本定义在 https://github.com/python/cpython/blob/master/Modules/_hashopenssl.c另一个在 https://github.com/python/cpython/blob/master/Modules/md5module.c ,如果你想检查差异。

那为什么是 _md5.md5定义了但从未使用过的函数?我想这个想法是为了确保某些算法始终可用,即使 openssl缺席:

Constructors for hash algorithms that are always present in this module are sha1(), sha224(), sha256(), sha384(), sha512(), blake2b(), and blake2s(). (https://docs.python.org/3/library/hashlib.html)

关于python - 什么是 _md5.md5,为什么 hashlib.md5 这么慢?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59955854/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com