gpt4 book ai didi

python - 为什么从文件中读取的 numpy 数组会消耗这么多内存?

转载 作者:太空狗 更新时间:2023-10-30 00:14:27 28 4
gpt4 key购买 nike

文件包含 2000000 行:每行包含 208 列,以逗号分隔,如下所示:

0.0863314058048,0.0208767447842,0.03358010485,0.0,1.0,0.0,0.314285714286,0.336293217457,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0

程序将此文件读取到一个 numpy narray,我预计它将消耗大约 (2000000 * 208 * 8B) = 3.2GB 内存。然而,当程序读取这个文件时,我发现程序消耗了大约20GB的内存。

我很困惑为什么我的程序消耗了这么多内存,不符合预期?

最佳答案

我正在使用 Numpy 1.9.0,np.loadtxt()np.genfromtxt() 的内存不足似乎与它们直接相关基于临时列表来存储数据:

  • 参见 here对于 np.loadtxt()
  • here对于 np.genfromtxt()

通过预先了解数组的形状,您可以想到一个文件读取器,它消耗的内存量非常接近理论内存量(本例中为 3.2 GB),通过存储使用相应的 dtype 的数据:

def read_large_txt(path, delimiter=None, dtype=None):
with open(path) as f:
nrows = sum(1 for line in f)
f.seek(0)
ncols = len(f.next().split(delimiter))
out = np.empty((nrows, ncols), dtype=dtype)
f.seek(0)
for i, line in enumerate(f):
out[i] = line.split(delimiter)
return out

关于python - 为什么从文件中读取的 numpy 数组会消耗这么多内存?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26569852/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com