TensorFlow : Enqueuing and dequeuing a queue from multiple threads-6ren

TensorFlow : Enqueuing and dequeuing a queue from multiple threads

转载作者：行者123 更新时间：2023-12-03 00:43:07

24

4

我试图解决的问题如下:我有一个 list trainimgs文件名。我定义了一个

tf.RandomShuffleQueue及其capacity=len(trainimgs)和min_after_dequeue=0 。
这个tf.RandomShuffleQueue预计由 trainimgs 填补对于指定的epochlimit次数。
许多线程需要并行工作。每个线程从 tf.RandomShuffleQueue 中取出一个元素并对它进行一些操作并将其排入另一个队列。这部分我是对的。
但是一次1 epoch的trainimgs已处理，tf.RandomShuffleQueue为空，前提是当前纪元 e < epochlimit ，队列必须再次被填满，线程必须再次工作。

好消息是:我已经在某种情况下工作了(请参阅最后的PS!!)

坏消息是:我认为有更好的方法来做到这一点。

我现在用来执行此操作的方法如下(我简化了功能并删除了基于图像处理的预处理和后续排队，但处理的核心保持不变!):

with tf.Session() as sess:
    train_filename_queue = tf.RandomShuffleQueue(capacity=len(trainimgs), min_after_dequeue=0, dtypes=tf.string, seed=0)
    queue_size = train_filename_queue.size()
    trainimgtensor = tf.constant(trainimgs)
    close_queue = train_filename_queue.close()
    epoch = tf.Variable(initial_value=1, trainable=False, dtype=tf.int32)
    incrementepoch = tf.assign(epoch, epoch + 1, use_locking=True)
    supplyimages = train_filename_queue.enqueue_many(trainimgtensor)
    value = train_filename_queue.dequeue()

    init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
    sess.run(init_op)
    coord = tf.train.Coordinator()
    tf.train.start_queue_runners(sess, coord)
    sess.run(supplyimages)
    lock = threading.Lock()
    threads = [threading.Thread(target=work, args=(coord, value, sess, epoch, incrementepoch, supplyimages, queue_size, lock, close_queue)) for  i in range(200)] 
    for t in threads:
        t.start()
    coord.join(threads)

工作函数如下:

def work(coord, val, sess, epoch, incrementepoch, supplyimg, q, lock,\
         close_op):
while not coord.should_stop():
    if sess.run(q) > 0:
        filename, currepoch = sess.run([val, epoch])
        filename = filename.decode(encoding='UTF-8')
        print(filename + ' ' + str(currepoch))
    elif sess.run(epoch) < 2:
        lock.acquire()
        try:
            if sess.run(q) == 0:
                print("The previous epoch = %d"%(sess.run(epoch)))
                sess.run([incrementepoch, supplyimg])
                sz = sess.run(q)
                print("The new epoch = %d"%(sess.run(epoch)))
                print("The new queue size = %d"%(sz))
        finally:
            lock.release()
    else:
        try:
            sess.run(close_op)
        except tf.errors.CancelledError:
            print('Queue already closed.')
        coord.request_stop()
return None

所以，虽然这有效，但我有一种感觉，有一种更好、更干净的方法来实现这一目标。所以，简而言之，我的问题是:

是否有更简单、更清晰的方法在 TensorFlow 中完成此任务？
这段代码的逻辑有问题吗？我对多线程场景不是很有经验，所以任何我没有注意到的明显错误都会对我很有帮助。

P.S:看来这段代码毕竟并不完美。当我运行 120 万张图像和 200 个线程时，它运行了。但是，当我运行 10 个图像和 20 个线程时，出现以下错误:

CancelledError (see above for traceback): RandomShuffleQueue '_0_random_shuffle_queue' is closed.
     [[Node: random_shuffle_queue_EnqueueMany = QueueEnqueueManyV2[Tcomponents=[DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](random_shuffle_queue, Const)]]

我以为 except tf.errors.CancelledError 涵盖了这一点。这到底是怎么回事？

最佳答案

我终于找到答案了。问题在于多个线程在 work() 函数中的各个点上发生冲突。以下 work() 函数完美运行。

def work(coord, val, sess, epoch, maxepochs, incrementepoch, supplyimg, q, lock, close_op):
    print('I am thread number %s'%(threading.current_thread().name))
    print('I can see a queue with size %d'%(sess.run(q)))
    while not coord.should_stop():
        lock.acquire()
        if sess.run(q) > 0:
            filename, currepoch = sess.run([val, epoch])
            filename = filename.decode(encoding='UTF-8')
            tid = threading.current_thread().name
            print(filename + ' ' + str(currepoch) + ' thread ' + str(tid))
        elif sess.run(epoch) < maxepochs:
            print('Thread %s has acquired the lock'%(threading.current_thread().name))
            print("The previous epoch = %d"%(sess.run(epoch)))
            sess.run([incrementepoch, supplyimg])
            sz = sess.run(q)
            print("The new epoch = %d"%(sess.run(epoch)))
            print("The new queue size = %d"%(sz))
    else:
            coord.request_stop()
        lock.release()

    return None

关于TensorFlow : Enqueuing and dequeuing a queue from multiple threads，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42514206/

24

4

0

文章推荐： tensorflow - tf.nn.lrn() 方法有什么作用？

文章推荐： ios - 管理每个用户的 Xcode 命令行工具版本

文章推荐： vb.net - 在一个处理程序中处理所有事件？

文章推荐： heroku - 如何列出我的 Heroku 应用程序及其域名？

python - multiprocessinq.Queue 作为 Queue.Queue 子级的属性
我试图弄清楚以下模块正在做什么。 import Queue import multiprocessing import threading class BufferedReadQueue(Queue.
python - multiprocessing.Queue 和 Queue.Queue 有什么不同？
如果我使用 Queue.Queue，那么我的 read() 函数不起作用，为什么？但是，如果我使用 multiprocessing.Queue，它运行良好: from multiprocessing
python - multiprocessing.Queue 和 queue.Queue 的实现
我正在寻找比我在文档中找到的更多关于 Python 队列实现的见解。根据我的理解，如果我在这方面有误，请原谅我的无知: queue.Queue():通过内存中的基本数组实现，因此不能在多个进程之间共
python - python Queue.Queue 和 multiprocessing.Queue 的区别
当我使用多处理模块(Windows 上的 Python 2.7)中的队列代替 Queue.Queue 时，我的程序没有完全关闭。最终，我想使用 multiprocessing.Process 处理
JavaScript 事件循环 : Queue vs Message Queue vs Event Queue
阅读了大量的 JavaScript 事件循环教程，我看到了不同的术语来标识队列存储消息，当调用堆栈为空时，事件循环准备好获取消息: 队列消息队列事件队列我找不到规范的术语来识别它。甚至 MDN
java - 错误: Type Queue Does Not Take Parameter - What's the difference between Queues and Priority Queues?
我收到错误消息“类型队列不接受参数”。当我将更改队列行替换为 PriorityQueue 时，此错误消失并且编译正常。有什么区别以及如何将其更改为编译队列和常规队列？ import java.util
python - 如何将项目放回 queue.Queue
如何将项目返回到 queue.Queue？如果任务失败，这在线程或多处理中很有用，这样任务就不会丢失。 docs for queue.Queue.get()说函数可以“从队列中删除并返回一个项目”，但
python - queue.Queue 上的多路复用？
如何在多个 queue.Queue 上进行“选择”同时？ Golang 有 desired feature及其 channel : select { case i1 = 声明。线程:queue 模
python - python Queue.queue 获取并放置线程安全吗？
http://docs.python.org/2/library/queue.html#Queue.Queue.put 这似乎是一个幼稚的问题，但我在文档和谷歌搜索中都没有找到答案，那么这些方法是线程
javascript - .queue() 和 jquery.queue() 的区别
这可能是个愚蠢的问题，但我对与 .dequeue() 和 $.queue() 一起使用的 .queue() 感到困惑> 或 jquery.queue()。它们是否相同，如果是，为什么 jquery
Python Queue.Queue 不能在线程化的 TCP 流处理程序中工作
我正在尝试创建一个线程化的 tcp 流处理程序类线程和主线程对话，但是 Queue.Queue 也没有做我需要的，服务器从另一个程序接收数据，我只想传递它进入主线程进行处理这里是我到目前为止的代码:
python - Queue.Queue vs 多线程Python代码中的信号量、锁等
The principal challenge of multi-threaded applications is coordinating threads that share data or ot
Python:为什么一些 Queue.queue 的方法是 "unreliable"？
在Queue模块的queue类中，有几个方法，分别是qsize、empty 和 full，其文档声称它们“不可靠”。他们到底有什么不可靠的地方？我确实注意到 on the Python docs网
python - Queue.Queue 与 collections.deque
我需要一个队列，多个线程可以将内容放入其中，并且多个线程可以从中读取。 Python 至少有两个队列类，Queue.Queue 和 collections.deque，前者似乎在内部使用后者。两者都在
message-queue - 为什么是 ActiveMQ，而不是简单的 Queue/Mutex？
明天我将介绍我选择进程内消息队列实现的基本原理，但我无法阐明我的推理。我的合作设计者提议我们实现一个简单的异步队列，只使用基本的作业列表和互斥锁来控制访问，我建议在嵌入式模式下使用 ActiveMQ。
scala - 对于 "trait Queue[T]"， `Queue` 是一种类型吗？
在 scala 中定义了一个特征: trait Queue[T] Queue 是一种类型吗？或其他东西，例如类型构造函数？来自 http://artima.com/pins1ed/type-para
queue - 我如何在 SML/NJ 中使用 Queue 库
我看到 SML/NJ 包含一个队列结构。我不知道如何使用它。如何使用 SML/NJ 提供的附加库？最佳答案 Queue structure SML '97 未指定，但它存在于 SML/NJ 的顶级环
javascript - queue.await() 和 queue.awaitAll() 的区别
我是 D3 和 JavaScript 的新手。我试图理解其中的 queue.js。我已经完成了 this关联。但是仍然无法清楚地了解 queue.await() 和 queue.awaitAll(
c++ - 错误 : no matching function for call to "Queue::Queue()"
所以我试图在我的 main.cpp 文件中调用一个函数，但我得到“错误:没有匹配函数来调用‘Queue::Queue()。” 队列.h #ifndef QUEUE_H #define QUEUE_H
Python - 如何将整个 numpy 数组一次全部放入 Queue.Queue 但分别检索每一行
假设我有一个 10 行的二维 numpy 数组例如 array([[ 23425. , 521331.40625], [ 23465. , 521246.03125],

首页

博学

6Ren·AI

商城

TensorFlow : Enqueuing and dequeuing a queue from multiple threads