tensorflow - 如何强制 tensorflow 使用所有可用的 GPU？-6ren

tensorflow - 如何强制 tensorflow 使用所有可用的 GPU？

转载作者：行者123 更新时间：2023-12-01 23:16:12

我有一个 8 GPU 集群，当我运行 piece of Tensorflow code from Kaggle (粘贴在下面)，它只使用一个 GPU 而不是全部 8 个。我使用 nvidia-smi 确认了这一点.

# Set some parameters
IMG_WIDTH = 256
IMG_HEIGHT = 256
IMG_CHANNELS = 3
TRAIN_IM = './train_im/'
TRAIN_MASK = './train_mask/'
TEST_PATH = './test/'

warnings.filterwarnings('ignore', category=UserWarning, module='skimage')
num_training = len(os.listdir(TRAIN_IM))
num_test = len(os.listdir(TEST_PATH))
# Get and resize train images
X_train = np.zeros((num_training, IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS), dtype=np.uint8)
Y_train = np.zeros((num_training, IMG_HEIGHT, IMG_WIDTH, 1), dtype=np.bool)
print('Getting and resizing train images and masks ... ')
sys.stdout.flush()

#load training images
for count, filename in tqdm(enumerate(os.listdir(TRAIN_IM)), total=num_training):
    img = imread(os.path.join(TRAIN_IM, filename))[:,:,:IMG_CHANNELS]
    img = resize(img, (IMG_HEIGHT, IMG_WIDTH), mode='constant', preserve_range=True)
    X_train[count] = img
    name, ext = os.path.splitext(filename)
    mask_name = name + '_mask' + ext
    mask = cv2.imread(os.path.join(TRAIN_MASK, mask_name))[:,:,:1]
    mask = resize(mask, (IMG_HEIGHT, IMG_WIDTH))
    Y_train[count] = mask

# Check if training data looks all right
ix = random.randint(0, num_training-1)
print(ix)
imshow(X_train[ix])
plt.show()
imshow(np.squeeze(Y_train[ix]))
plt.show()
# Define IoU metric
def mean_iou(y_true, y_pred):
    prec = []
    for t in np.arange(0.5, 1.0, 0.05):
        y_pred_ = tf.to_int32(y_pred > t)
        score, up_opt = tf.metrics.mean_iou(y_true, y_pred_, 2)
        K.get_session().run(tf.local_variables_initializer())
        with tf.control_dependencies([up_opt]):
            score = tf.identity(score)
        prec.append(score)
    return K.mean(K.stack(prec), axis=0)

# Build U-Net model
inputs = Input((IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS))
s = Lambda(lambda x: x / 255) (inputs)
width = 64
c1 = Conv2D(width, (3, 3), activation='relu', padding='same') (s)
c1 = Conv2D(width, (3, 3), activation='relu', padding='same') (c1)
p1 = MaxPooling2D((2, 2)) (c1)

c2 = Conv2D(width*2, (3, 3), activation='relu', padding='same') (p1)
c2 = Conv2D(width*2, (3, 3), activation='relu', padding='same') (c2)
p2 = MaxPooling2D((2, 2)) (c2)

c3 = Conv2D(width*4, (3, 3), activation='relu', padding='same') (p2)
c3 = Conv2D(width*4, (3, 3), activation='relu', padding='same') (c3)
p3 = MaxPooling2D((2, 2)) (c3)

c4 = Conv2D(width*8, (3, 3), activation='relu', padding='same') (p3)
c4 = Conv2D(width*8, (3, 3), activation='relu', padding='same') (c4)
p4 = MaxPooling2D(pool_size=(2, 2)) (c4)

c5 = Conv2D(width*16, (3, 3), activation='relu', padding='same') (p4)
c5 = Conv2D(width*16, (3, 3), activation='relu', padding='same') (c5)

u6 = Conv2DTranspose(width*8, (2, 2), strides=(2, 2), padding='same') (c5)
u6 = concatenate([u6, c4])
c6 = Conv2D(width*8, (3, 3), activation='relu', padding='same') (u6)
c6 = Conv2D(width*8, (3, 3), activation='relu', padding='same') (c6)

u7 = Conv2DTranspose(width*4, (2, 2), strides=(2, 2), padding='same') (c6)
u7 = concatenate([u7, c3])
c7 = Conv2D(width*4, (3, 3), activation='relu', padding='same') (u7)
c7 = Conv2D(width*4, (3, 3), activation='relu', padding='same') (c7)

u8 = Conv2DTranspose(width*2, (2, 2), strides=(2, 2), padding='same') (c7)
u8 = concatenate([u8, c2])
c8 = Conv2D(width*2, (3, 3), activation='relu', padding='same') (u8)
c8 = Conv2D(width*2, (3, 3), activation='relu', padding='same') (c8)

u9 = Conv2DTranspose(width, (2, 2), strides=(2, 2), padding='same') (c8)
u9 = concatenate([u9, c1], axis=3)
c9 = Conv2D(width, (3, 3), activation='relu', padding='same') (u9)
c9 = Conv2D(width, (3, 3), activation='relu', padding='same') (c9)

outputs = Conv2D(1, (1, 1), activation='sigmoid') (c9)

model = Model(inputs=[inputs], outputs=[outputs])

sgd = optimizers.SGD(lr=0.03, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='binary_crossentropy', metrics=[mean_iou])
model.summary()
    
# Fit model
earlystopper = EarlyStopping(patience=20, verbose=1)
checkpointer = ModelCheckpoint('nuclei_only.h5', verbose=1, save_best_only=True)
results = model.fit(X_train, Y_train, validation_split=0.05, batch_size = 32, verbose=1, epochs=100, 
                callbacks=[earlystopper, checkpointer])

我想使用 mxnet 或其他方法在所有可用的 GPU 上运行此代码。但是，我不确定如何执行此操作。所有资源仅显示如何在 mnist 数据集上执行此操作。我有自己的数据集，我正在以不同的方式阅读。因此，不太确定如何修改代码。

最佳答案

TL;DR : 使用 tf.distribute.MirroredStrategy() 作为一个范围，比如

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    [...create model as you would otherwise...]

如果不指定任何参数， tf.distribute.MirroredStrategy() 将使用所有可用的 GPU。如果需要，您还可以指定要使用的那些，如下所示: mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"]) .

引用这个 Distributed training with TensorFlow实现细节和其他策略的指南。

较早的答案(现已过时: deprecated, removed as of April 1, 2020。):
使用 multi_gpu_model() 来自喀拉斯。 ()

TS;WM :

TensorFlow 2.0 现在有 tf.distribute 模块，“用于跨多个设备运行计算的库”。它建立在“分销策略”的概念之上。您可以指定分发策略，然后将其用作范围。 TensorFlow 将基本透明地拆分输入、并行计算并加入输出。反向传播也受制于此。由于所有处理现在都在幕后完成，您可能需要熟悉可用的策略及其参数，因为它们可能会极大地影响您的训练速度。例如，您希望变量驻留在 CPU 上吗？然后使用 tf.distribute.experimental.CentralStorageStrategy() .引用 Distributed training with TensorFlow指南以获取更多信息。

较早的答案(现已过时，将其留在这里以供引用):

来自 Tensorflow Guide :

If you have more than one GPU in your system, the GPU with the lowest ID will be selected by default.

如果您想使用多个 GPU，不幸的是，您必须手动指定要在每个 GPU 上放置的张量，例如

with tf.device('/device:GPU:2'):

更多信息请访问 Tensorflow Guide Using Multiple GPUs .

就如何在多个 GPU 上分布网络而言，有两种主要方法。

您将网络分层分布在 GPU 上。这更容易实现，但不会产生很多性能优势，因为 GPU 将相互等待完成操作。

您创建网络的单独副本，在每个 GPU 上称为“塔”。当您输入八元组网络时，您将输入批处理分成 8 个部分，然后分发它们。让网络前向传播，然后对梯度求和，然后进行反向传播。这将导致 almost-linear speedup与 GPU 的数量有关。但是，实现起来要困难得多，因为您还必须处理与批量标准化相关的复杂性，并且非常建议您确保正确随机化批量。有a nice tutorial here .您还应该查看 Inception V3 code在那里引用了如何构建这样一个东西的想法。特别是 _tower_loss() , _average_gradients()和 train() 的部分从 for i in range(FLAGS.num_gpus): 开始.

如果您想尝试一下 Keras，它现在通过 multi_gpu_model() 显着简化了多 GPU 训练。 .它可以为您完成所有繁重的工作。

关于tensorflow - 如何强制 tensorflow 使用所有可用的 GPU？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50032721/

文章推荐： r - 有没有一种基本的方法可以用标签替换 R 的级别整数编码？

文章推荐： SAS 数组创建

文章推荐： docker - Visual Studio 运行 Docker 项目问题

delphi - 检测到机器是否已连接/可用？
如何检测当前网络中计算机是否已连接/可用。当然，它有多种用途，但我主要关心的是我的应用程序使用位于特定计算机中的资源，如果这些资源不可用，它甚至不会尝试连接，而是使用本地资源。最佳答案您可以尝试
javascript - 延迟函数直到 $ 可用
雅虎建议load scripts at the bottom of an HTML pages出于性能原因。我使用遵守规则的 HTML5 Boilerplate。这种方法的问题是 jQuery 也是
javascript - 如何使组件仅对少数用户可见/可用？
我有一个场景，我需要只向管理员而不是普通用户展示一个组件。说， // This component should be rendered for public users.
javascript - 将数组转换为具有默认值的对象的更简洁的方法？ (Lodash 可用)
我有一个数组，比方说 ["a","b","c"]，我想把它变成一个对象，它以数组值作为键和一个默认值我可以设置。因此，如果默认值为 true，我希望我的输出为 {a:true, b:true, c:t
.net - 如何使在该函数中可访问的函数中创建的 PSDrive 可用？
我假设在函数中创建一个新的 PS-Drive 会使该驱动器只能在该函数中访问。如何通过从该函数可访问的 MAIN 调用函数来创建 PS 驱动器？ $temproraryPSDriveName =
.net - 可用 .net 控件库的比较
您会推荐哪些 .net 控制库作为企业必备的？可以免费/非免费至少应该包含一个真正强大的Datagrid Ajax 功能有没有可用的比较？最佳答案我用过 Telerik ASP.NET控制之
javascript - 无法获取所选文本的容器(可用 jsfiddle)
假设我有一些像这样的 html: Hello World Javascript代码: var fooBar = document.getElementById('fooBar'); fooBar.
dart - Flutter 相机插件无法导入/可用
开始实现 camera plugin 时出现以下错误在我的 flutter 应用程序上: [VERBOSE-2:dart_error.cc(16)] Unhandled exception:
java - 我应该在哪里正确声明我的 "count"可用？
我想知道应该在哪里正确放置 countA，因为我希望将计数添加到所做的每个按钮中。开头的“count”变量用于制作新按钮，“countA”是我试图声明的计数，但它无法编译。这是我的代码片段:我的问题再
python - 可用 **kwargs 列表
在python帮助文档中我经常看到带有函数名和命名参数的签名行，然后你会看到**kwarg(关键字参数)。 list? 打印到帮助文档字符串: Init signature: list(self, /
c# - 可用 UWP 菜单控件之间的区别
我是 UWP 的新手，想知道“Flyout”、“ContextFlyout”和“Popup”之间的区别。另外，我想知道“AppBar”和“CommandBar”之间的区别。我相信这些是容纳菜单或菜单项
android - 可用 Wifi 设备列表
我想显示可用的 Wifi 设备列表。这是我的代码，我不明白这里有什么错误: wifi = (WifiManager) getSystemService(Context.WIFI_SERVICE); i
javascript - 使函数仅对其原型(prototype)可用
这是我的代码: Random = function(name) { this.name = name; this.addSomething = function(a, b) {
ios - 使按钮连续动画，可用 - Swift
我希望我的按钮持续动画直到用户触摸它，这是代码 func animate() { UIView.animateWithDuration(1, animations: { () -> Void
java - 阻塞直到 DataInputStream 可用
我正在为我正在参加的类(class)制作一个 HTTP 服务器作为项目，一周来我一直在尝试寻找解决方案。我有一个 DataInputStream，我需要等待客户端向我发送 http 请求，由于连接保持
linux - 如何仅打印特定范围内关闭(可用)的端口？
我想创建一个 bash 脚本来检查我的 Linux 服务器中的哪些端口已关闭且未在特定范围(端口范围 (3000-3010))中使用。打印输出只需1个端口，如果将输出保存为变量或保存在同一个文件中，
c - 如何让我的设备对 QEMU 可用
我写了一个我认为是最小的自定义设备来测试我对 QOM 和 QEMU 的总体理解。以下是省略注释的相关代码。不幸的是，当我启动 guest 并将我的设备名称作为命令行参数传递时，它无法找到我的设备并退出
python - opencv中有哪些常量(代码)可用
调用Opencv函数时，通常会提供一个常量给函数调用，如: im_hsv = cv2.cvtColor(im, cv2.COLOR_BGR2HSV) 虽然我找不到对所有可用常量的引用。 Opencv
c++ - 标准化之前多长时间 `string` 可用？
C++ 于 1998 年正式标准化，但我们能在多久以前找到一个名为 string 的类，它看起来像 C++2003 中的 std::string预标准 C++ 实现？我问是因为 CString 作为
git - 可用 --porcelain 选项的命令列表
是否有一个官方的(或可能没有)git 命令列表，--porcelain 选项可用？或者我应该在 the porcelain commands list 中手动查看它们中的每一个吗？？我已经设法用谷

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

tensorflow - 如何强制 tensorflow 使用所有可用的 GPU？