gpt4 book ai didi

python - tf.GPUOptions 不适用于 Keras 中的 set_session()

转载 作者:太空宇宙 更新时间:2023-11-04 04:15:35 26 4
gpt4 key购买 nike

我试图在我的 tf.GPUOptions() 中增加 per_process_gpu_memory_fraction 值,然后使用 set_session() 更改 Keras session ,内存分数实际上从未改变。第一次运行 while 循环后,保留了 319MB,如 nvidia-smi 所示,

a) 在 clear_session() 被调用时永远不会被释放,并且

b) 不会在 while 循环的下一次迭代中上升。

import GPUtil
import time

import tensorflow as tf
import numpy as np

from keras.backend.tensorflow_backend import set_session, clear_session, get_session
from tensorflow.python.framework.errors_impl import ResourceExhaustedError, UnknownError
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical


def model_trainer():
y_pred = None
errors = 0
total_ram = GPUtil.getGPUs()[0].memoryTotal
total_ram_allowed = GPUtil.getGPUs()[0].memoryTotal * 0.90
mem_amount = 0.005 # intentionally allocated a small amount so it needs to
# increment the mem_amount

x_train = np.empty((10000, 100))
y_train = np.random.randint(0, 9, size=10000)
y_train = to_categorical(y_train, 10)

while y_pred is None:
print("mem", mem_amount)
if total_ram_allowed > total_ram * mem_amount and GPUtil.getGPUs()[0].memoryFree > total_ram * mem_amount:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=mem_amount)
config = tf.ConfigProto(
intra_op_parallelism_threads=2,
inter_op_parallelism_threads=2,
gpu_options=gpu_options)

sess = tf.Session(config=config)
set_session(sess)
model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])

try:
print(sess)

model.fit(x_train, y_train, epochs=5, batch_size=32)
y_pred = model.predict(x_train)

except (ResourceExhaustedError, UnknownError) as e:
if mem_amount > 1.0:
raise ValueError('model too large for vram')
else:
mem_amount += 0.05

clear_session()
errors += 1
pass
else:
clear_session()


if __name__ == "__main__":
model_trainer()

令人费解的是,Keras 愿意接受新 session (如 get_session() 调用所示),但不会应用新的 GPUOptions

除了上面的例子我还尝试过:

clear_session()
del model
clear_session()
del model
gc.collect()

这些都没有对释放 VRAM 起作用。

我的总体目标是使用“反复试验”直到进程有足够的 VRAM 进行训练,因为似乎没有很好的方法来计算 Keras 模型需要多少 VRAM 而不只是运行它,所以我可以在单个 GPU 上并行运行多个模型。当发生 ResourceExhaustedError 时,我想释放 Keras 占用的 VRAM然后使用更大数量的 VRAM 重试。有什么办法可以做到这一点?

最佳答案

找了一段时间,发现Tensorflow只会占用VRAM,死了也不会释放,即使del model,clear_session()也是如此。我还尝试了此处显示的方法 ( https://github.com/keras-team/keras/issues/9379),它使用:

from keras import backend as K
K.clear_session()

from numba import cuda
cuda.select_device(0)
cuda.close()

这对我来说是一个错误,因为当 Tensorflow 再次尝试访问 GPU 时,它指向内存空间的指针是无效的(因为它被 cuda.close() 杀死了)。因此,解决它的唯一方法是使用进程,而不是线程(也试过,和以前一样的问题)。

我发现的另一件事是,虽然有一些方法可以尝试估计 Keras 模型将使用的 VRAM 量,但这并不是一种非常准确的方法。 (参见:How to determine needed memory of Keras model?)我还尝试直接从 Keras 层进行计算,结果变化很大,因此也不准确。因此,这实际上只会让您通过捕获 ResourceExhaustedError 并重试来尝试错误。

下面是我在单个 GPU 上运行多个不同 Keras 模型的代码。

import GPUtil
import time
import multiprocessing

import tensorflow as tf
import numpy as np

from keras.backend.tensorflow_backend import set_session, clear_session, get_session
from tensorflow.python.framework.errors_impl import ResourceExhaustedError, UnknownError
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical


def model_trainer():
mem_amount = 0.05

x_train = np.empty((100000, 100))
y_train = np.random.randint(0, 9, size=100000)
y_train = to_categorical(y_train, 10)

manager = multiprocessing.Manager()
return_dict = manager.dict()

def worker(mem_amount, return_dict):
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=mem_amount)
config = tf.ConfigProto(
intra_op_parallelism_threads=2,
inter_op_parallelism_threads=2,
gpu_options=gpu_options)
sess = tf.Session(config=config)
set_session(sess)

model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=2048, activation='relu'))
model.add(Dense(units=10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])

try:
get_session()

model.fit(x_train, y_train, epochs=5, batch_size=1000)

return_dict["valid"] = True

except (ResourceExhaustedError, UnknownError) as e:
return

while "valid" not in list(return_dict.keys()):
print("mem", mem_amount)

total_ram = GPUtil.getGPUs()[0].memoryTotal
total_ram_allowed = GPUtil.getGPUs()[0].memoryTotal * 0.90

# can add in a for loop to have multiple models
if total_ram_allowed > total_ram * mem_amount and GPUtil.getGPUs()[0].memoryFree > total_ram * mem_amount:
p = multiprocessing.Process(target=worker, args=(mem_amount, return_dict))
p.start()
p.join()

print(return_dict.values())

if "valid" not in list(return_dict.keys()):
if mem_amount > 1.0:
raise ValueError('model too large for vram')
else:
mem_amount += 0.05
else:
break
else:
time.sleep(10)


if __name__ == "__main__":
model_trainer()

关于python - tf.GPUOptions 不适用于 Keras 中的 set_session(),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55500819/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com