gpt4 book ai didi

python - 为什么 collections.Counter 比直接运行其源代码运行得更快

转载 作者:行者123 更新时间:2023-12-01 01:04:19 24 4
gpt4 key购买 nike

我使用collections.Counter来计算某个字符串中的单词数:

s = """Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."""
lorem = s.lower().split()

请注意,这比我尝试过的真实字符串要小,但结论/结果是可以概括的。

%%timeit
dcomp = Counter(lorem)

# 8 µs ± 329 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

如果我使用这个(与 cpython/Lib/collections/init.py 源代码的一部分相同)

%%timeit
d = dict()
get = d.get
for w in lorem:
d[w] = get(w, 0) + 1

# 15.4 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

编辑:使用功能:

def count():
d = dict()
get = d.get
for w in lorem:
d[w] = get(w, 0) + 1
return d

%%timeit
count()
# Still significantly slower. function definition not in timeit loop.
# 14 µs ± 763 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

对于更大的字符串,结果相似,后一个过程大约需要第一个过程的 1.8-2 倍时间。

有效的源代码部分在这里:

def _count_elements(mapping, iterable):
'Tally elements from the iterable.'
mapping_get = mapping.get
for elem in iterable:
mapping[elem] = mapping_get(elem, 0) + 1

其中映射是其自身的实例super(Counter, self).__init__() -> dict()。当我将所有后者的尝试放入一个函数并调用该函数后,相同的速度仍然存在。我不明白这种速度差异从何而来。 python lib 是否有特殊待遇?或者我忽略了一些注意事项。

最佳答案

更仔细地查看 collections/__init__.py 的代码。正如您所示,它确实定义了 _count_elements ,但随后它尝试执行 from _collections import _count_elements 。这表明它是从 C 库导入的,该库经过优化,因此速度更快。仅当未找到 C 版本时才使用 Python 实现。

关于python - 为什么 collections.Counter 比直接运行其源代码运行得更快,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55523572/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com