gpt4 book ai didi

python - TensorFlow 中简单前馈 NN 的 GPU 训练的高效示例实现?也许用 tf.data?

转载 作者:行者123 更新时间:2023-11-28 18:59:32 24 4
gpt4 key购买 nike

我刚开始使用 TensorFlow 的 GPU 版本,希望它能加快我的前馈神经网络的训练速度。我可以在我的 GPU (GTX1080ti) 上进行训练,但不幸的是,它并不比我目前实现的方式在我的 CPU (i7-8700K) 上进行相同的训练快得多。在训练期间,GPU 似乎几乎没有被使用,这让我怀疑我的实现中的瓶颈在于如何使用 feed_dict 将数据从主机复制到设备。

我听说 TensorFlow 有一个叫做“tf.data”管道的东西,它应该可以更容易和更快地将数据提供给 GPU 等。但是我还没有找到任何简单的例子来说明这个概念作为 feed_dict 的替代品实现到多层感知器训练中。

有没有人知道这样的例子,可以指点我吗?最好尽可能简单,因为我通常是 TensorFlow 的新手。还是我应该在当前的实现中更改其他内容以使其更有效率?我在这里粘贴我的代码:

import tensorflow as tf
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
tf.reset_default_graph()
import time

# Function for iris dataset.
def get_iris_data():
iris = datasets.load_iris()
data = iris["data"]
target = iris["target"]

# Convert to one-hot vectors
num_labels = len(np.unique(target))
all_Y = np.eye(num_labels)[target]
return train_test_split(data, all_Y, test_size=0.33, random_state=89)
# Function which initializes tensorflow weights & biases for feed-forward NN.
def InitWeights(LayerSizes):
with tf.device('/gpu:0'):
# Make tf placeholders for network inputs and outputs.
X = tf.placeholder( shape = (None,LayerSizes[0]),
dtype = tf.float32,
name ='InputData')
y = tf.placeholder( shape = (None,LayerSizes[-1]),
dtype = tf.float32,
name ='OutputData')
# Initialize weights and biases.
W = {}; b = {};
for ii in range(len(LayerSizes)-1):
layername = f'layer%s' % ii
with tf.variable_scope(layername):
ny = LayerSizes[ii]
nx = LayerSizes[ii+1]
# Weights (initialized with xavier initializatiion).
W['Weights_'+layername] = tf.get_variable(
name = 'Weights_'+layername,
shape = (ny, nx),
initializer = tf.contrib.layers.xavier_initializer(),
dtype = tf.float32
)
# Bias (initialized with xavier initializatiion).
b['Bias_'+layername] = tf.get_variable(
name = 'Bias_'+layername,
shape = (nx),
initializer = tf.contrib.layers.xavier_initializer(),
dtype = tf.float32
)
return W, b, X, y
# Function for forward propagation of NN.
def FeedForward(X, W, b):
with tf.device('/gpu:0'):
# Initialize 'a' of first layer to the placeholder of the network input.
a = X
# Loop all layers of the network.
for ii in range(len(W)):
# Use name of each layer as index.
layername = f'layer%s' % ii
## Weighted sum: z = input*W + b
z = tf.add(tf.matmul(a, W['Weights_'+layername], name = 'WeightedSum_z_'+layername), b['Bias_'+layername])
## Passed through actication fcn: a = h(z)
if ii == len(W)-1:
a = z
else:
a = tf.nn.relu(z, name = 'activation_a_'+layername)
return a

if __name__ == "__main__":
# Import data
train_X, test_X, train_y, test_y = get_iris_data()
# Define network size [ninputs-by-256-by-outputs]
LayerSizes = [4, 256, 3]
# Initialize weights and biases.
W, b, X, y = InitWeights(LayerSizes)

# Define loss function to optimize.
yhat = FeedForward(X, W, b)
loss = tf.reduce_sum(tf.square(y - yhat),reduction_indices=[0])

# Define optimizer to use when minimizing loss function.
all_variables = tf.trainable_variables()
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.0001)
train_op = optimizer.minimize(loss, var_list = all_variables)

# Start tf session and initialize variables.
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# Train 10000 minibatches and time how long it takes.
t0 = time.time()
for i in range(10000):
ObservationsToUse = np.random.choice(len(train_X), 32)
X_minibatch = train_X[ObservationsToUse,:]
y_minibatch = train_y[ObservationsToUse,:]
sess.run(train_op, feed_dict={X : X_minibatch, y : y_minibatch})
t1 = time.time()

print('Training took %0.2f seconds' %(t1-t0))
sess.close()

最佳答案

速度可能很慢,因为:

  • 您正在创建占位符。使用 numpy,我们将数据插入到占位符,从而将它们转换为图的张量。

通过使用 tf.data.Dataset,您可以创建一个直接管道,使数据直接流入图形,而无需占位符。它们速度快、可扩展并且有许多功能可供使用。

    with np.load("/var/data/training_data.npy") as data:
features = data["features"]
labels = data["labels"]
# Assume that each row of `features` corresponds to the same row as `labels`.
assert features.shape[0] == labels.shape[0]
dataset = tf.data.Dataset.from_tensor_slices((features, labels))

一些有用的函数:

dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32) # Creating batches
dataset = dataset.repeat(num_epochs) # repeat the dataset 'N' times
iterator = dataset.make_one_shot_iterator() # Create a iterator to retrieve batches of data

X, Y = iterator.get_next()

这里,32 是批量大小。在你的情况下,

dataset = tf.data.Dataset.from_tensor_slices((data, targets))

因此,不需要占位符。直接运行,

session.run( train_op ) # no feed_dict!!

关于python - TensorFlow 中简单前馈 NN 的 GPU 训练的高效示例实现?也许用 tf.data?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54254353/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com