gpt4 book ai didi

python - 为什么从串联列表创建集合比使用 `.update` 更快?

转载 作者:太空狗 更新时间:2023-10-30 01:37:48 29 4
gpt4 key购买 nike

在尝试回答 What is the preferred way to compose a set from multiple lists in Python 时,我做了一些性能分析并得出了一个有点令人惊讶的结论。

使用

python -m timeit -s '
import itertools
import random
n=1000000
random.seed(0)
A = [random.randrange(1<<30) for _ in xrange(n)]
B = [random.randrange(1<<30) for _ in xrange(n)]
C = [random.randrange(1<<30) for _ in xrange(n)]'

为了设置,我为以下片段计时:

> $TIMEIT 'set(A+B+C)'
10 loops, best of 3: 872 msec per loop

> $TIMEIT 's = set(A); s.update(B); s.update(C)'
10 loops, best of 3: 930 msec per loop

> $TIMEIT 's = set(itertools.chain(A,B,C))'
10 loops, best of 3: 941 msec per loop

令我惊讶的是,set(A+B+C)最快的,尽管它创建了一个包含 3000000 个元素的中间列表。 .updateitertools.chain 都比较慢,尽管它们都没有复制任何列表。

这是怎么回事?


编辑:在第二台机器(OS X 10.10.5、Python 2.7.10、2.5GHz Core i7)上,我运行了以下脚本(向前和向后运行测试以避免排序影响):

SETUP='import itertools
import random
n=1000000
random.seed(0)
A = [random.randrange(1<<30) for _ in xrange(n)]
B = [random.randrange(1<<30) for _ in xrange(n)]
C = [random.randrange(1<<30) for _ in xrange(n)]'

python -m timeit -s "$SETUP" 'set(A+B+C)'
python -m timeit -s "$SETUP" 's = set(A); s.update(B); s.update(C)'
python -m timeit -s "$SETUP" 's = set(itertools.chain(A,B,C))'

python -m timeit -s "$SETUP" 's = set(itertools.chain(A,B,C))'
python -m timeit -s "$SETUP" 's = set(A); s.update(B); s.update(C)'
python -m timeit -s "$SETUP" 'set(A+B+C)'

并得到如下结果:

10 loops, best of 3: 579 msec per loop
10 loops, best of 3: 726 msec per loop
10 loops, best of 3: 775 msec per loop
10 loops, best of 3: 761 msec per loop
10 loops, best of 3: 737 msec per loop
10 loops, best of 3: 555 msec per loop

现在 set(A+B+C) 明显 更快,而且结果相当稳定 - 很难将其归因于单纯的测量误差。重复运行此脚本会产生类似的结果。

最佳答案

我得到的结果与你在我的 Win 7 SP1 机器上的结果不同,这并不奇怪最慢的方式,正如人们所期望的那样。重新启用垃圾收集并使用 Python 3.4.3 获得了类似的结果。

我使用了我自己的基于 timeit 的性能评估测试平台,得到了以下结果:

fastest to slowest execution speeds (Python 2.7.10)
(10 executions, best of 3 repetitions)

set(A); s.update(B); s.update(C) : 4.787919 secs, rel speed 1.00x, 0.00% slower
set(A).update(B,C) : 6.463666 secs, rel speed 1.35x, 35.00% slower
set(itertools.chain(A,B,C)) : 6.743028 secs, rel speed 1.41x, 40.83% slower
set(A+B+C) : 8.030483 secs, rel speed 1.68x, 67.72% slower

基准代码:

from __future__ import print_function
import sys
from textwrap import dedent
import timeit

N = 10 # Number of executions of each "algorithm"
R = 3 # number of Repeations of executions

# common setup for all algorithms (not timed)
setup = dedent("""
import itertools
import gc
import random

try:
xrange
except NameError:
xrange = range

random.seed(0)
n = 1000000 # number of elements in each list
A = [random.randrange(1<<30) for _ in xrange(n)]
B = [random.randrange(1<<30) for _ in xrange(n)]
C = [random.randrange(1<<30) for _ in xrange(n)]

# gc.enable() # to (re)enable garbage collection if desired
""")

algorithms = {
"set(A+B+C)": dedent("""
s = set(A+B+C)
"""),

"set(A); s.update(B); s.update(C)": dedent("""
s = set(A); s.update(B); s.update(C)
"""),

"set(itertools.chain(A,B,C))": dedent("""
s = set(itertools.chain(A,B,C))
"""),

"set(A).update(B,C)": dedent("""
s = set(A).update(B,C)
"""),
}

# execute and time algorithms, collecting results
timings = [
(label,
min(timeit.repeat(algorithms[label], setup=setup, repeat=R, number=N)),
) for label in algorithms
]

print('fastest to slowest execution speeds (Python {}.{}.{})\n'.format(
*sys.version_info[:3]),
' ({:,d} executions, best of {:d} repetitions)\n'.format(N, R))

longest = max(len(timing[0]) for timing in timings) # length of longest label
ranked = sorted(timings, key=lambda t: t[1]) # ascending sort by execution time
fastest = ranked[0][1]
for timing in ranked:
print("{:>{width}} : {:9.6f} secs, rel speed {:4.2f}x, {:6.2f}% slower".
format(timing[0], timing[1], round(timing[1]/fastest, 2),
round((timing[1]/fastest - 1) * 100, 2), width=longest))

关于python - 为什么从串联列表创建集合比使用 `.update` 更快?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32483539/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com