gpt4 book ai didi

python - Tensorflow Dataset.from_tensor_slices 花费的时间太长

转载 作者:太空宇宙 更新时间:2023-11-03 14:49:33 26 4
gpt4 key购买 nike

我有以下代码:

data = np.load("data.npy")
print(data) # Makes sure the array gets loaded in memory
dataset = tf.contrib.data.Dataset.from_tensor_slices((data))

文件 “data.npy” 是 3.3 GB。使用 numpy 读取文件需要几秒钟,但创建 tensorflow 数据集对象的下一行需要很长时间才能执行。这是为什么?它在幕后做什么?

最佳答案

引用此 answer :

np.load of a npz just returns a file loader, not the actual data. It's a 'lazy loader', loading the particular array only when accessed.

这就是它速度快的原因。

编辑 1: 进一步扩展此答案,引用自 tensorflow's documentation :

If all of your input data fit in memory, the simplest way to create a Dataset from them is to convert them to tf.Tensor objects and use Dataset.from_tensor_slices().

This works well for a small dataset, but wastes memory---because the contents of the array will be copied multiple times---and can run into the 2GB limit for the tf.GraphDef protocol buffer.

该链接还展示了如何高效地执行此操作。

关于python - Tensorflow Dataset.from_tensor_slices 花费的时间太长,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46856003/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com