gpt4 book ai didi

python - 将 numpy 数组写入文本文件的速度

转载 作者:行者123 更新时间:2023-12-04 23:36:15 25 4
gpt4 key购买 nike

我需要将一个非常“高”的两列数组写入文本文件,而且速度非常慢。我发现如果我将数组重新整形为更宽的数组,写入速度会快得多。
例如

import time
import numpy as np
dataMat1 = np.random.rand(1000,1000)
dataMat2 = np.random.rand(2,500000)
dataMat3 = np.random.rand(500000,2)
start = time.perf_counter()
with open('test1.txt','w') as f:
np.savetxt(f,dataMat1,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('test2.txt','w') as f:
np.savetxt(f,dataMat2,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('test3.txt','w') as f:
np.savetxt(f,dataMat3,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

在三个数据矩阵中的元素数量相同的情况下,为什么最后一个比其他两个更耗时?有什么方法可以加快“高”数据数组的写入速度?

最佳答案

hpaulj pointed out , savetxtlooping through the rows of X 并分别格式化每一行:

for row in X:
try:
v = format % tuple(row) + newline
except TypeError:
raise TypeError("Mismatch between array dtype ('%s') and "
"format specifier ('%s')"
% (str(X.dtype), format))
fh.write(v)

我认为这里的主要时间杀手是所有字符串插值调用。
如果我们将所有字符串插值打包到一个调用中,事情会变得更快:
with open('/tmp/test4.txt','w') as f:
fmt = ' '.join(['%g']*dataMat3.shape[1])
fmt = '\n'.join([fmt]*dataMat3.shape[0])
data = fmt % tuple(dataMat3.ravel())
f.write(data)
import io
import time
import numpy as np

dataMat1 = np.random.rand(1000,1000)
dataMat2 = np.random.rand(2,500000)
dataMat3 = np.random.rand(500000,2)
start = time.perf_counter()
with open('/tmp/test1.txt','w') as f:
np.savetxt(f,dataMat1,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('/tmp/test2.txt','w') as f:
np.savetxt(f,dataMat2,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('/tmp/test3.txt','w') as f:
np.savetxt(f,dataMat3,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('/tmp/test4.txt','w') as f:
fmt = ' '.join(['%g']*dataMat3.shape[1])
fmt = '\n'.join([fmt]*dataMat3.shape[0])
data = fmt % tuple(dataMat3.ravel())
f.write(data)
end = time.perf_counter()
print(end-start)

报告
0.1604848340011813
0.17416274400056864
0.6634929459996783
0.16207673999997496

关于python - 将 numpy 数组写入文本文件的速度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53820891/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com