gpt4 book ai didi

python - 如何限制 TFLearn 中的 GPU 内存使用?

转载 作者:行者123 更新时间:2023-11-30 09:34:03 26 4
gpt4 key购买 nike

我正在使用 TFLearn 和 AlexNet 在 GTA V 中制作一辆自动驾驶汽车,我已经训练了网络,但是当我尝试同时运行 GTA 和网络时,我收到此错误 CUBLAS_STATUS_ALLOC_FAILED这意味着我猜 GPU 内存已经用完了。

这是我的 alex 网络文件

import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
from tflearn.layers.normalization import local_response_normalization


def alexnet(width, height, lr):
network = input_data(shape=[None, width, height, 1], name='input')
network = conv_2d(network, 96, 11, strides=4, activation='relu')
network = max_pool_2d(network, 3, strides=2)
network = local_response_normalization(network)
network = conv_2d(network, 256, 5, activation='relu')
network = max_pool_2d(network, 3, strides=2)
network = local_response_normalization(network)
network = conv_2d(network, 384, 3, activation='relu')
network = conv_2d(network, 384, 3, activation='relu')
network = conv_2d(network, 256, 3, activation='relu')
network = max_pool_2d(network, 3, strides=2)
network = local_response_normalization(network)
network = fully_connected(network, 4096, activation='tanh')
network = dropout(network, 0.5)
network = fully_connected(network, 4096, activation='tanh')
network = dropout(network, 0.5)
network = fully_connected(network, 3, activation='softmax')
network = regression(network, optimizer='momentum',
loss='categorical_crossentropy',
learning_rate=lr, name='targets')

model = tflearn.DNN(network, checkpoint_path='model_data/model_alexnet',
max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='log')

return model

我尝试添加这个

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config)
session.run(tf.global_variables_initializer())

然后将 session=session 传递给 tflearn.DNN 函数,如下所示

 model = tflearn.DNN(network, checkpoint_path='model_data/model_alexnet',
max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='log', session=session)

但它也不起作用,我发现有些变量未初始化

事实上,当我尝试使用此文件中的模型时

import numpy as np
from alexnet import alexnet

WIDTH = 80
HEIGHT = 60
LR = 1e-3
EPOCHS = 8
MODEL_NAME = 'pygta5-car-{}-{}-{}-epochs.model'. \
format(LR, 'alexnet', EPOCHS)

model = alexnet(WIDTH, HEIGHT, LR)

train_data = np.load('training_data.npy')

train = train_data[:-100]
test = train_data[-100:]

train_x = np.array([i[0] for i in train]).reshape([-1, WIDTH, HEIGHT, 1]) # Prendo solo le immagini
train_y = np.array([i[1] for i in train]) # Prendo solo le label

test_x = np.array([i[0] for i in test]).reshape([-1, WIDTH, HEIGHT, 1]) # Prendo solo le immagini
test_y = np.array([i[1] for i in test]) # Prendo solo le label

model.fit({'input': train_x}, {'targets': train_y},
n_epoch=EPOCHS, validation_set=({'input': test_x}, {'targets': test_y}),
snapshot_step=500, run_id=MODEL_NAME, show_metric=True)


model.save('models/model.tfl')

我在执行model.fit()期间收到此错误

"C:\Program Files\Python36\python.exe" C:/Users/Elia/PycharmProjects/SelfDrivingGrandTheftAutoV/v2/train_model.py
WARNING:tensorflow:From C:\Program Files\Python36\lib\site-packages\tflearn\initializations.py:119: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
2018-01-09 23:49:30.486827: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-01-09 23:49:30.947896: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.8475
pciBusID: 0000:23:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2018-01-09 23:49:30.948297: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:23:00.0, compute capability: 6.1)
2018-01-09 23:49:32.382017: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:23:00.0, compute capability: 6.1)
---------------------------------
Run id: pygta5-car-0.001-alexnet-8-epochs.model
Log directory: log/
---------------------------------
Training samples: 7775
Validation samples: 100
--
2018-01-09 23:49:34.924216: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.924720: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.925239: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.925749: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.926254: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.927268: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.927814: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.928404: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.928867: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.929380: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.929866: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.930321: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.930808: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.931303: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.931798: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.932288: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call
return fn(*args)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1302, in _run_fn
status, run_metadata)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
[[Node: Conv2D_1/W/read/_179 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_748_Conv2D_1/W/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/Elia/PycharmProjects/SelfDrivingGrandTheftAutoV/v2/train_model.py", line 26, in <module>
snapshot_step=500, run_id=MODEL_NAME, show_metric=True)
File "C:\Program Files\Python36\lib\site-packages\tflearn\models\dnn.py", line 216, in fit
callbacks=callbacks)
File "C:\Program Files\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 339, in fit
show_metric)
File "C:\Program Files\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 818, in _train
feed_batch)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
run_metadata_ptr)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run
options, run_metadata)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
[[Node: Conv2D_1/W/read/_179 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_748_Conv2D_1/W/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Crossentropy/Mean/moving_avg/read', defined at:
File "C:/Users/Elia/PycharmProjects/SelfDrivingGrandTheftAutoV/v2/train_model.py", line 11, in <module>
model = alexnet(WIDTH, HEIGHT, LR)
File "C:\Users\Elia\PycharmProjects\SelfDrivingGrandTheftAutoV\v2\alexnet.py", line 37, in alexnet
max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='log', session=session)
File "C:\Program Files\Python36\lib\site-packages\tflearn\models\dnn.py", line 65, in __init__
best_val_accuracy=best_val_accuracy)
File "C:\Program Files\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 131, in __init__
clip_gradients)
File "C:\Program Files\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 693, in initialize_training_ops
ema_num_updates=self.training_steps)
File "C:\Program Files\Python36\lib\site-packages\tflearn\summaries.py", line 239, in add_loss_summaries
loss_averages_op = loss_averages.apply([loss] + other_losses)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\training\moving_averages.py", line 401, in apply
colocate_with_primary=(var.op.type in ["Variable", "VariableV2"]))
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\training\slot_creator.py", line 174, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\training\slot_creator.py", line 151, in create_slot_with_initializer
dtype)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\training\slot_creator.py", line 67, in _create_slot_var
validate_shape=validate_shape)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1203, in get_variable
constraint=constraint)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1092, in get_variable
constraint=constraint)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 425, in get_variable
constraint=constraint)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 394, in _true_getter
use_resource=use_resource, constraint=constraint)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 805, in _get_single_variable
constraint=constraint)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variables.py", line 213, in __init__
constraint=constraint)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variables.py", line 356, in _init_from_args
self._snapshot = array_ops.identity(self._variable, name="read")
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\array_ops.py", line 125, in identity
return gen_array_ops.identity(input, name=name)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 2070, in identity
"Identity", input=input, name=name)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op
op_def=op_def)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value Crossentropy/Mean/moving_avg
[[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
[[Node: Conv2D_1/W/read/_179 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_748_Conv2D_1/W/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]


Process finished with exit code 1

有没有办法解决这个问题或者有更好的方法来限制 tflearn 中的 GPU 使用?

最佳答案

当我遇到同样的问题时,我发现了这个问题。我认为这与您无关,但对其他人可能相关。

当您尝试将模型加载到视频 RAM 中时,会出现此问题,但由于没有足够的空间用于 GTA 5 和您的模型,因此失败。

我是 tflearn 新手,所以我无法解释为什么你的解决方案不起作用。

要限制 GPU 内存使用,您可以在 alexnet 中的 model = tflearn.DNN(...) 之前添加以下行。

tflearn.init_graph(num_cores=4, gpu_memory_fraction=0.5)

TFLearn Documentation

不认为 num_cores=4 实际上是必要的,但我没有在没有它的情况下测试它。

此外,您需要在不运行 alexnet 的情况下监视您的 vram 使用情况,以查看您的游戏本身需要多少,因为上面的行仅在小于 50% 时才起作用(您可以更改该值)。

我正在《极限竞速:地平线 3》中尝试类似的操作(对 PC 的优化很差),通过关闭设置,可以将使用率从 60% 减少到 40%。

我已经让它可以与 8GB 2080 配合使用,因此它也应该可以与您的 6GB 1060 配合使用。

关于python - 如何限制 TFLearn 中的 GPU 内存使用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48177832/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com