gpt4 book ai didi

python - 为什么 Python 中的连接看起来越来越慢?

转载 作者:太空宇宙 更新时间:2023-11-03 14:44:26 25 4
gpt4 key购买 nike

为什么 Python 3 中的串联在某些情况下比 Python 2 慢?

受影响最大的连接方法似乎是 bytes 对象的连续连接,它的操作复杂度从 O(n) 变为 O(n²)。

我的大部分分析代码都在这里:

#!/usr/bin/env python

from operator import concat
from sys import version, version_info
from timeit import timeit # Compatibility: ver >= 2.6

# ver = version.partition('\n')[0].rstrip()
ver = '.'.join(str(v) for v in version_info[:3])
print(ver)

if version_info[0] == 2:
from StringIO import StringIO
else:
from io import StringIO
from functools import reduce
xrange = range


def build_plus():
output = ''
for _ in xrange(input_len):
output += 'a'
return output


def build_join():
return ''.join('a' for _ in xrange(input_len))


def build_bytes_plus():
output = b''
for _ in xrange(input_len):
output += b'a'
return output


def build_stringio():
output = StringIO()
for _ in xrange(input_len):
output.write('a')
return output.getvalue()


def build_reduce():
return reduce(concat, ('a' for _ in xrange(input_len)))


builds = {'str+': build_plus,
'join': build_join,
'reduce': build_reduce,
'bytes+': build_bytes_plus,
'StringIO': build_stringio}

if version_info[0] == 2:
import cStringIO

def build_cstringio():
output = cStringIO.StringIO()
for _ in xrange(input_len):
output.write('a')
return output.getvalue()

builds['cStringIO'] = build_cstringio
else:
from io import BytesIO

def build_bytesio():
output = BytesIO()
for _ in xrange(input_len):
output.write(b'a')
return output.getvalue()

builds['BytesIO'] = build_bytesio

resfile = open('times.csv', 'a')
size_range = 50 # Number of points over the size axis
min_order = 1.0 # 10^x byte input min
max_order = 5.0 # 10^x byte input max

for allow_gc in (False, True):
setup = 'gc.enable()' if allow_gc else 'pass'

for build_name, build_fun in builds.items():
# For a roughly constant confidence interval, aim for uniform sample density across the
# (logarithmic) input size axis.
for size_index in range(size_range+1):
input_len = int(10**((max_order-min_order)*size_index/size_range + min_order))

# Rather than repeating many measurements at one input size, perform one measurement
# per input size for a continuous range of input sizes and apply smoothing later.
dur = timeit(build_fun, setup, number=1)

resfile.write('"%s",%s,"%s",%d,%.6g\n' % (ver, str(allow_gc).upper(), build_name,
input_len, dur))

此处显示了我的 R 脚本中的一些图表:

String append across Python versions

String append time complexity

最佳答案

在循环中用 ++= 连接字符串从来都不是一个好主意。它看起来很有效,因为有一个 weird, controversial special case在字节码解释器循环中,如果它能证明没有其他人引用它正在处理的字符串,它将尝试可变地连接字符串。没有有效的调整大小政策;它只是called realloc并抱最好的希望,所以如果 realloc 需要复制,它仍然可能以 O(n^2) 结束。

在 Python 3 中,weird special case现在处理 unicode 字符串而不是字节串。字节串连接每次都返回构建一个新的字符串对象,因此您的循环返回到 O(n^2)。

关于python - 为什么 Python 中的连接看起来越来越慢?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50954839/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com