multithreading - 使用 tensorflow 在并行 GPU 上运行超参数优化-6ren

multithreading - 使用 tensorflow 在并行 GPU 上运行超参数优化

转载作者：行者123 更新时间：2023-12-03 12:47:53

我有一个训练函数，可以在这里端到端地训练一个 tf 模型(仅为了说明而设计):

def opt_fx(params, gpu):
    os.environ["CUDA_VISIBLE_DEVICES"] = gpu

    sess = tf.Session()
    # Run some training on a particular gpu...
    sess.run(...)

我想使用每个 gpu 的模型在 20 次试验中运行超参数优化:

from threading import Thread
exp_trials = list(hyperparams.trials(num=20))
train_threads = []
for gpu_num, trial_params in zip(['0', '1', '2', '3']*5, exp_trials):
    t = Thread(target=opt_fx, args=(trial_params, gpu_num,))
    train_threads.append(t)

# Start the threads, and block on their completion.
for t in train_threads:
  t.start()

for t in train_threads:
  t.join()

然而，这失败了……正确的方法是什么？

最佳答案

我不确定这是否是最好的方法，但我最终做的是为每个设备定义一个图表，并在单独的 session 中训练每个图表。这可以并行化。我试图在不同的设备中重复使用该图，但那没有用。这是我的版本在代码中的样子(一个完整的例子):

import threading
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# Get the data
mnist = input_data.read_data_sets("data/mnist", one_hot=True)
train_x_all = mnist.train.images
train_y_all = mnist.train.labels
test_x = mnist.test.images
test_y = mnist.test.labels

# Define the graphs per device
devices = ['/gpu', '/cpu']        # just one GPU on this machine...
learning_rates = [0.01, 0.03]
jobs = []
for device, learning_rate in zip(devices, learning_rates):
  with tf.Graph().as_default() as graph:
    x = tf.placeholder(tf.float32, [None, 784], name='x')
    y = tf.placeholder(tf.float32, [None, 10], name='y')
    W = tf.Variable(tf.zeros([784, 10]))
    b = tf.Variable(tf.zeros([10]))
    pred = tf.nn.softmax(tf.matmul(x, W) + b)
    accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)), tf.float32))
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1), name='cost')
    optimize = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost, name='optimize')
  jobs.append(graph)

# Train a graph on a device
def train(device, graph):
  print "Start training on %s" % device
  with tf.Session(graph=graph) as session:
    x = graph.get_tensor_by_name('x:0')
    y = graph.get_tensor_by_name('y:0')
    cost = graph.get_tensor_by_name('cost:0')
    optimize = graph.get_operation_by_name('optimize')

    session.run(tf.global_variables_initializer())
    batch_size = 500
    for epoch in range(5):
      total_batch = int(train_x_all.shape[0] / batch_size)
      for i in range(total_batch):
        batch_x = train_x_all[i * batch_size:(i + 1) * batch_size]
        batch_y = train_y_all[i * batch_size:(i + 1) * batch_size]
        _, c = session.run([optimize, cost], feed_dict={x: batch_x, y: batch_y})
        if i % 20 == 0:
          print "Device %s: epoch #%d step=%d cost=%f" % (device, epoch, i, c)

# Start threads in parallel
train_threads = []
for i, graph in enumerate(jobs):
  train_threads.append(threading.Thread(target=train, args=(devices[i], graph)))
for t in train_threads:
  t.start()
for t in train_threads:
  t.join()

请注意，train 函数在上下文中使用来自 graph 的张量和操作，即每个 cost 和 optimize 是不同的。

这会产生以下输出，表明两个模型是并行训练的:

Start training on /gpu
Start training on /cpu
Device /cpu: epoch #0 step=0 cost=2.302585
Device /cpu: epoch #0 step=20 cost=1.788247
Device /cpu: epoch #0 step=40 cost=1.400490
Device /cpu: epoch #0 step=60 cost=1.271820
Device /gpu: epoch #0 step=0 cost=2.302585
Device /cpu: epoch #0 step=80 cost=1.128214
Device /gpu: epoch #0 step=20 cost=2.105802
Device /cpu: epoch #0 step=100 cost=0.927004
Device /cpu: epoch #1 step=0 cost=0.905336
Device /gpu: epoch #0 step=40 cost=1.908744
Device /cpu: epoch #1 step=20 cost=0.865687
Device /gpu: epoch #0 step=60 cost=1.808407
Device /cpu: epoch #1 step=40 cost=0.754765
Device /gpu: epoch #0 step=80 cost=1.676024
Device /cpu: epoch #1 step=60 cost=0.794201
Device /gpu: epoch #0 step=100 cost=1.513800
Device /gpu: epoch #1 step=0 cost=1.451422
Device /cpu: epoch #1 step=80 cost=0.786958
Device /gpu: epoch #1 step=20 cost=1.415125
Device /cpu: epoch #1 step=100 cost=0.643715
Device /cpu: epoch #2 step=0 cost=0.674683
Device /gpu: epoch #1 step=40 cost=1.273473
Device /cpu: epoch #2 step=20 cost=0.658424
Device /gpu: epoch #1 step=60 cost=1.300150
Device /cpu: epoch #2 step=40 cost=0.593681
Device /gpu: epoch #1 step=80 cost=1.242193
Device /cpu: epoch #2 step=60 cost=0.640543
Device /gpu: epoch #1 step=100 cost=1.105950
Device /gpu: epoch #2 step=0 cost=1.089900
Device /cpu: epoch #2 step=80 cost=0.664947
Device /gpu: epoch #2 step=20 cost=1.088389
Device /cpu: epoch #2 step=100 cost=0.535446
Device /cpu: epoch #3 step=0 cost=0.580295
Device /gpu: epoch #2 step=40 cost=0.983053
Device /cpu: epoch #3 step=20 cost=0.566510
Device /gpu: epoch #2 step=60 cost=1.044966
Device /cpu: epoch #3 step=40 cost=0.518787
Device /gpu: epoch #2 step=80 cost=1.025607
Device /cpu: epoch #3 step=60 cost=0.562461
Device /gpu: epoch #2 step=100 cost=0.897545
Device /gpu: epoch #3 step=0 cost=0.907381
Device /cpu: epoch #3 step=80 cost=0.600475
Device /gpu: epoch #3 step=20 cost=0.911914
Device /cpu: epoch #3 step=100 cost=0.477412
Device /cpu: epoch #4 step=0 cost=0.527233
Device /gpu: epoch #3 step=40 cost=0.827964
Device /cpu: epoch #4 step=20 cost=0.513356
Device /gpu: epoch #3 step=60 cost=0.897128
Device /cpu: epoch #4 step=40 cost=0.474257
Device /gpu: epoch #3 step=80 cost=0.898960
Device /cpu: epoch #4 step=60 cost=0.514083
Device /gpu: epoch #3 step=100 cost=0.774140
Device /gpu: epoch #4 step=0 cost=0.799004
Device /cpu: epoch #4 step=80 cost=0.559898
Device /gpu: epoch #4 step=20 cost=0.802869
Device /cpu: epoch #4 step=100 cost=0.440813
Device /gpu: epoch #4 step=40 cost=0.732562
Device /gpu: epoch #4 step=60 cost=0.801020
Device /gpu: epoch #4 step=80 cost=0.815830
Device /gpu: epoch #4 step=100 cost=0.692840

您可以使用 standard MNIST data 自行尝试.

如果有许多超参数需要调整，这并不理想，但您应该能够制作一个外部循环，迭代可能的超参数元组，将特定图形分配给设备并如上所示运行它们。

关于multithreading - 使用 tensorflow 在并行 GPU 上运行超参数优化，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46712272/

文章推荐： ruby - jRuby 线程——我做的对吗？

文章推荐： java - 无法突破 Ubuntu 16.04 内核 4.4.0 上的 12k 线程限制

delphi - 如何访问Delphi中的基(超)类？
在C#中，我可以通过base关键字访问基类，在java中，我可以通过super关键字访问它。在delphi中如何做到这一点？假设我有以下代码: type TForm3 = class(TF
mysql - 用于数据库设计的抽象/超/子类结构
在这件事上我已经把头撞到墙上好几次了。所以我希望在正确的方向上得到一点帮助。我有一张 table ，上面有订单，一张 table 上有火车，一张 table 上有航类，一张 table 上有巴士。每
python - 如何初始化基(超)类？
在 Python 中，假设我有以下代码: class SuperClass(object): def __init__(self, x): self.x = x
ios - 超 View 边框正在切入 subview
我希望这个 subview 扩展到它的父 View 之外，但是父 View 的边框正在切入 subview 。有没有办法防止这种情况？ class TheView : UIView { let
具有峰值和平顶(超)高斯信号的 Python 曲线拟合问题
我有一个标准的高斯函数，看起来像这样: def gauss_fnc(x, amp, cen, sigma): return amp * np.exp(-(x - cen) ** 2 / (2
具有下界(超)类型的 Java 方法
例如，我有下一个类，带有有界类型参数: public class ItemContainer { void addItems(List items); } 在另一个带有参数的类中使用: pub
batch-file - 7zip 超 LZMA2 压缩
如何将此设置转换为命令？结果如下: // Manual Compression (see the image above) Compressed Size: 12,647,451 bytes //
java - Lombok @Wither/@With Inheritance(超/子类)
请建议在应用继承时如何使用@Wither/@With。我有一个抽象类Parent和具体的Child。 Child 应该是不可变的。将 @Wither 放在两者上会给我两个错误: 构造函数 Child
networking - 超 V : Network Adapter Drivers
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。这个问题似乎不是关于 a specific programming problem, a softwar
python - 超 HTTP2 自定义 SSLContext 错误
我正在尝试向服务器(虚拟托管)发出 HTTP2 请求，该服务器根据主机 header 值 (SNI) 提供 SSL 证书。 # conn = hyper.HTTP20Connection('h
java - 超 SQL (HSQLDB) : massive insert performance
我有一个应用程序，必须将大约 1300 万行、大约 10 个平均长度的字符串插入到嵌入式 HSQLDB 中。我一直在调整一些东西(批量大小、单线程/多线程、缓存/非缓存表、MVCC 事务、log_si
julia - 在 Julia 中，如何正确地对调用者提供的(超)类型的参数进行方法分派(dispatch)？
我想定义一个函数f(x, t::Type)根据 isa(x, t) 是否执行不同的行为.假设我想调用b1(x)如果是，b2(x)除此以外。我知道我可以像这样在运行时进行动态检查: function
c# - 超 V WMI : Import virtual machine and create snapshot
我正在使用 Hyper-V WMI Provider在 Hyper-V 中导入虚拟机，特别是使用 ImportVirtualSystemEx Msvm_VirtualSystemManagementS
c - 如何在(旧)opengl (2.4) 中按程序绘制(超/n-)立方体
这几个星期以来一直困扰着我，我没有结束对它的研究，因为我目前重载并且它让我落后于第一年的 CS (opengl) 大学类(class)，这首先让我研究了这个:如何只用一个 for 循环绘制立方体的所有
linux - 超 V 错误。同时运行 Windows Phone 模拟器和 VM (Windows 8)
我正在我的计算机(操作系统:Windows 8)上开发一个 WP8 应用程序。我需要安装一个 VM 才能拥有 linux。同时我需要使用我的 Windows Phone 模拟器。我下载了 VMWar

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

multithreading - 使用 tensorflow 在并行 GPU 上运行超参数优化