gpt4 book ai didi

python - 如何减少在 python 中加载 pickle 文件所需的时间

转载 作者:IT老高 更新时间:2023-10-28 22:15:35 24 4
gpt4 key购买 nike

我在 python 中创建了一个字典并转储到 pickle 中。它的大小达到 300MB。现在,我想加载相同的 pickle 。

output = open('myfile.pkl', 'rb')
mydict = pickle.load(output)

加载这个 pickle 大约需要 15 秒我怎样才能减少这个时间?

硬件规范:Ubuntu 14.04,4GB RAM

下面的代码显示了使用 json、pickle、cPickle 转储或加载文件所需的时间。

转储后,文件大小约为 300MB。

import json, pickle, cPickle
import os, timeit
import json

mydict= {all values to be added}

def dump_json():
output = open('myfile1.json', 'wb')
json.dump(mydict, output)
output.close()

def dump_pickle():
output = open('myfile2.pkl', 'wb')
pickle.dump(mydict, output,protocol=cPickle.HIGHEST_PROTOCOL)
output.close()

def dump_cpickle():
output = open('myfile3.pkl', 'wb')
cPickle.dump(mydict, output,protocol=cPickle.HIGHEST_PROTOCOL)
output.close()

def load_json():
output = open('myfile1.json', 'rb')
mydict = json.load(output)
output.close()

def load_pickle():
output = open('myfile2.pkl', 'rb')
mydict = pickle.load(output)
output.close()

def load_cpickle():
output = open('myfile3.pkl', 'rb')
mydict = pickle.load(output)
output.close()


if __name__ == '__main__':
print "Json dump: "
t = timeit.Timer(stmt="pickle_wr.dump_json()", setup="import pickle_wr")
print t.timeit(1),'\n'

print "Pickle dump: "
t = timeit.Timer(stmt="pickle_wr.dump_pickle()", setup="import pickle_wr")
print t.timeit(1),'\n'

print "cPickle dump: "
t = timeit.Timer(stmt="pickle_wr.dump_cpickle()", setup="import pickle_wr")
print t.timeit(1),'\n'

print "Json load: "
t = timeit.Timer(stmt="pickle_wr.load_json()", setup="import pickle_wr")
print t.timeit(1),'\n'

print "pickle load: "
t = timeit.Timer(stmt="pickle_wr.load_pickle()", setup="import pickle_wr")
print t.timeit(1),'\n'

print "cPickle load: "
t = timeit.Timer(stmt="pickle_wr.load_cpickle()", setup="import pickle_wr")
print t.timeit(1),'\n'

输出:

Json dump: 
42.5809804916

Pickle dump:
52.87407804489

cPickle dump:
1.1903790187836

Json load:
12.240660209656

pickle load:
24.48748306274

cPickle load:
24.4888298893

我发现 cPickle 转储和加载所需的时间更少,但加载文件仍然需要很长时间

最佳答案

尝试使用 json library而不是 pickle。在您的情况下,这应该是一个选项,因为您正在处理一个相对简单的对象的字典。

根据this website ,

JSON is 25 times faster in reading (loads) and 15 times faster in writing (dumps).

另请参阅此问题:What is faster - Loading a pickled dictionary object or Loading a JSON file - to a dictionary?

升级 Python 或使用 the marshal module使用固定的 Python 版本也有助于提高速度 (code adapted from here):

try: import cPickle
except: import pickle as cPickle
import pickle
import json, marshal, random
from time import time
from hashlib import md5

test_runs = 1000

if __name__ == "__main__":
payload = {
"float": [(random.randrange(0, 99) + random.random()) for i in range(1000)],
"int": [random.randrange(0, 9999) for i in range(1000)],
"str": [md5(str(random.random()).encode('utf8')).hexdigest() for i in range(1000)]
}
modules = [json, pickle, cPickle, marshal]

for payload_type in payload:
data = payload[payload_type]
for module in modules:
start = time()
if module.__name__ in ['pickle', 'cPickle']:
for i in range(test_runs): serialized = module.dumps(data, protocol=-1)
else:
for i in range(test_runs): serialized = module.dumps(data)
w = time() - start
start = time()
for i in range(test_runs):
unserialized = module.loads(serialized)
r = time() - start
print("%s %s W %.3f R %.3f" % (module.__name__, payload_type, w, r))

结果:

C:\Python27\python.exe -u "serialization_benchmark.py"
json int W 0.125 R 0.156
pickle int W 2.808 R 1.139
cPickle int W 0.047 R 0.046
marshal int W 0.016 R 0.031
json float W 1.981 R 0.624
pickle float W 2.607 R 1.092
cPickle float W 0.063 R 0.062
marshal float W 0.047 R 0.031
json str W 0.172 R 0.437
pickle str W 5.149 R 2.309
cPickle str W 0.281 R 0.156
marshal str W 0.109 R 0.047

C:\pypy-1.6\pypy-c -u "serialization_benchmark.py"
json int W 0.515 R 0.452
pickle int W 0.546 R 0.219
cPickle int W 0.577 R 0.171
marshal int W 0.032 R 0.031
json float W 2.390 R 1.341
pickle float W 0.656 R 0.436
cPickle float W 0.593 R 0.406
marshal float W 0.327 R 0.203
json str W 1.141 R 1.186
pickle str W 0.702 R 0.546
cPickle str W 0.828 R 0.562
marshal str W 0.265 R 0.078

c:\Python34\python -u "serialization_benchmark.py"
json int W 0.203 R 0.140
pickle int W 0.047 R 0.062
pickle int W 0.031 R 0.062
marshal int W 0.031 R 0.047
json float W 1.935 R 0.749
pickle float W 0.047 R 0.062
pickle float W 0.047 R 0.062
marshal float W 0.047 R 0.047
json str W 0.281 R 0.187
pickle str W 0.125 R 0.140
pickle str W 0.125 R 0.140
marshal str W 0.094 R 0.078

Python 3.4 uses pickle protocol 3 as default ,与协议(protocol) 4 相比没有区别。Python 2 的协议(protocol) 2 是最高的 pickle 协议(protocol)(如果提供负值来转储,则选择此协议(protocol)),它的速度是协议(protocol) 3 的两倍。

关于python - 如何减少在 python 中加载 pickle 文件所需的时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26860051/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com