gpt4 book ai didi

python - 将生成器与 tf.data 一起使用的最可扩展方式? tf.data 指南说 `from_generator` 的可扩展性有限

转载 作者:行者123 更新时间:2023-12-04 04:31:49 26 4
gpt4 key购买 nike

tf.data 有一个 from_generator初始值设定项,它似乎不可扩展。来自官方指南

Caution: While this is a convienient approach it has limited portability and scalibility. It must run in the same python process that created the generator, and is still subject to the Python GIL.



https://www.tensorflow.org/guide/data#consuming_python_generators

并在官方文档中

NOTE: The current implementation of Dataset.from_generator() uses tf.numpy_function and inherits the same constraints. In particular, it requires the Dataset- and Iterator-related operations to be placed on a device in the same process as the Python program that called Dataset.from_generator(). The body of generator will not be serialized in a GraphDef, and you should not use this method if you need to serialize your model and restore it in a different environment.

NOTE: If generator depends on mutable global variables or other external state, be aware that the runtime may invoke generator multiple times (in order to support repeating the Dataset) and at any time between the call to Dataset.from_generator() and the production of the first element from the generator. Mutating global variables or external state can cause undefined behavior, and we recommend that you explicitly cache any external state in generator before calling Dataset.from_generator().



https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_generator

然而,生成器是训练非常大量数据的一种相当常见的方法。所以必须有一些替代的最佳实践,但官方的 Tensorflow 数据指南没有提供任何信息。

最佳答案

遍历您的生成器并将数据写入 TFRecord。然后使用TFRecordDataset。这是指南。
https://www.tensorflow.org/tutorials/load_data/tfrecord
TF 旨在通过多 GPU 有效地使用这些类型的数据集。
将数据分片到磁盘也可以改进改组。

关于python - 将生成器与 tf.data 一起使用的最可扩展方式? tf.data 指南说 `from_generator` 的可扩展性有限,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59131008/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com