python - TFRecordReader 似乎非常慢，多线程读取不工作-6ren

python - TFRecordReader 似乎非常慢，多线程读取不工作

转载作者：太空狗更新时间：2023-10-29 20:44:43

27

4

我的训练过程使用 tfrecord 格式的训练和评估数据集。

我测试了reader的benchmark，只有8000records/second。和 io 速度(见 iotop 命令)只有 400KB-500KB/s。

我这里使用的是cpp版本的protobuf

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#protobuf-library-related-issues

如果可能，请提供一个最小的可重现示例(我们通常没有时间阅读您的数百行代码)

def read_and_decode(filename_queue):
     reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)
    return serialized_example
  serialized_example = read_and_decode(filename_queue)
  batch_serialized_example = tf.train.shuffle_batch(
      [serialized_example],
      batch_size=batch_size,
      num_threads=thread_number,
      capacity=capacity,
      min_after_dequeue=min_after_dequeue)
  features = tf.parse_example(
      batch_serialized_example,
      features={
          "label": tf.FixedLenFeature([], tf.float32),
          "ids": tf.VarLenFeature(tf.int64),
          "values": tf.VarLenFeature(tf.float32),
      })

您还尝试过哪些其他尝试的解决方案？

我尝试在 tf.train.shuffle_batch 中设置 num_threads 但不起作用。

好像设置为2个线程时，它以8000条/秒的速度工作，当线程数增加时，速度会变慢。 (我删除了所有消耗 CPU 的操作。只读取数据。)

我的服务器是 24 核 cpus。

最佳答案

这里的问题是每个 session.run 都有固定的成本开销，用许多小示例填充队列会很慢。

具体而言，每个 session.run 大约需要 100-200 微秒，因此您每秒只能进行大约 5k-10k 次 session.run 调用。

如果进行 Python 分析(python -m cProfile)，这个问题很明显，但如果从时间线分析或 CPU 分析开始，则很难看出。

解决方法是使用 enqueue_many 将内容分批添加到您的队列中。我从 https://gist.github.com/ericyue/7705407a88e643f7ab380c6658f641e8 中获取了你的基准并将其修改为在每次 .run 调用时将许多项目排入队列，这提供了 10 倍的加速。

修改是修改tf.batch调用如下:

if enqueue_many:
    reader = tf.TFRecordReader(options = tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.ZLIB))
    queue_batch = []
    for i in range(enqueue_many_size):
        _, serialized_example = reader.read(filename_queue)
        queue_batch.append(serialized_example)
    batch_serialized_example = tf.train.shuffle_batch(
        [queue_batch],
        batch_size=batch_size,
        num_threads=thread_number,
        capacity=capacity,
        min_after_dequeue=min_after_dequeue,
        enqueue_many=True)

如需完整的源代码，请查看此处: https://github.com/yaroslavvb/stuff/blob/master/ericyue-slowreader/benchmark.py

很难对其进行优化以使其运行得更快，因为现在大部分时间都花在了队列操作上。看着stripped down仅将整数添加到队列的版本，您也可以获得类似的速度，并且查看时间轴，时间花在出队操作上。

每个出队操作大约需要 60 usec，但平均有 5 个并行运行，因此每次出队得到 12 usec。因此，这意味着在最好的情况下，您每秒将获得 <20 万个示例。

关于python - TFRecordReader 似乎非常慢，多线程读取不工作，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41647784/

27

4

0

文章推荐： c# - 如何在 C++/CLI 中将事件处理程序分配给事件？

文章推荐： c++ - 如何使用 CAtlComModule 实现 COM 事件接收器？

文章推荐： c# - 如何创建两个构造函数重载，都只采用一个字符串参数？

文章推荐： c++ - 更改类模板成员可见性

python - TFRecordReader 似乎非常慢，多线程读取不工作
我的训练过程使用 tfrecord 格式的训练和评估数据集。我测试了reader的benchmark，只有8000records/second。和 io 速度(见 iotop 命令)只有 400KB
python - 使用 TensorFlow 的 TFRecordReader
我有一个 tfrecords 文件，想要使用 TensorFlow 的 TFRecordReader 查看该文件的内容。我想在命令提示符中显示文件的内容，但没有得到任何结果。任何指示都会有帮助最佳答
python - TFRecordReader 在 session 关闭后保持文件锁定
运行此脚本(您需要 here 中的一些 tfrecords): import os import shutil import tempfile import tensorflow as tf data
python - 在不启动 session 的情况下检查 TFRecordReader 条目
假设我用 MNIST 示例编写了一个 TFRecords 文件 (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/exa

首页

博学

6Ren·AI

商城

python - TFRecordReader 似乎非常慢，多线程读取不工作