gpt4 book ai didi

python - Keras - 未知错误 : Failed to get convolution algorithm

转载 作者:行者123 更新时间:2023-12-01 08:25:30 24 4
gpt4 key购买 nike

在使用 Keras 和 Jupyter Notebook 时,一旦开始训练模型,我偶尔会收到错误(请参阅下面的完整错误日志)。而Failed to get convolution algorithm. This is probably because cuDNN failed to initialize,表明这与版本冲突有关,但它似乎不适用于我的情况。就我而言,我的版本似乎工作正常,因为我在大多数情况下都可以很好地运行训练过程,但是一旦出现此错误,我需要关闭所有正在运行的 python 进程并重新启动 Anaconda,以便继续无错误。

由于每次出现此错误时都重新启动 Anaconda 非常不方便,我想知道除了版本冲突之外,是否有任何修复或建议来解释为什么会出现此错误?

这是我收到的整个错误:

---------------------------------------------------------------------------
UnknownError Traceback (most recent call last)
<ipython-input-23-5d485feb54c5> in <module>
1 K.clear_session()
2 model_all = define_model(train_data)
----> 3 model_all = train_bild(train_generator_all,validation_generator_all, model_all)
4 model_all.save(subdir+cat+"/"+cat+"_model_all_inception.h5")
5

<ipython-input-17-afb528e9309d> in train_bild(train_generator, validation_generator, model)
25 epochs=num_epochs,
26 validation_data=validation_generator,
---> 27 validation_steps=VALID_STEPS, workers=16,callbacks=[checker,early, reduce_lr],class_weight=class_weights)#,class_weight=class_weights)
28
29 model = load_model(filepath)

~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
89 warnings.warn('Update your `' + object_name + '` call to the ' +
90 'Keras 2 API: ' + signature, stacklevel=2)
---> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper

~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
1416 use_multiprocessing=use_multiprocessing,
1417 shuffle=shuffle,
-> 1418 initial_epoch=initial_epoch)
1419
1420 @interfaces.legacy_generator_methods_support

~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\keras\engine\training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
215 outs = model.train_on_batch(x, y,
216 sample_weight=sample_weight,
--> 217 class_weight=class_weight)
218
219 outs = to_list(outs)

~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\keras\engine\training.py in train_on_batch(self, x, y, sample_weight, class_weight)
1215 ins = x + y + sample_weights
1216 self._make_train_function()
-> 1217 outputs = self.train_function(ins)
1218 return unpack_singleton(outputs)
1219

~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\keras\backend\tensorflow_backend.py in __call__(self, inputs)
2713 return self._legacy_call(inputs)
2714
-> 2715 return self._call(inputs)
2716 else:
2717 if py_any(is_tensor(x) for x in inputs):

~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\keras\backend\tensorflow_backend.py in _call(self, inputs)
2673 fetched = self._callable_fn(*array_vals, run_metadata=self.run_metadata)
2674 else:
-> 2675 fetched = self._callable_fn(*array_vals)
2676 return fetched[:len(self.outputs)]
2677

~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py in __call__(self, *args, **kwargs)
1437 ret = tf_session.TF_SessionRunCallable(
1438 self._session._session, self._handle, args, status,
-> 1439 run_metadata_ptr)
1440 if run_metadata:
1441 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\framework\errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
526 None, None,
527 compat.as_text(c_api.TF_Message(self.status.status)),
--> 528 c_api.TF_GetCode(self.status.status))
529 # Delete the underlying status object from memory otherwise it stays alive
530 # as there is a reference to status from this from the traceback due to

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@batch_normalization_1/cond_1/FusedBatchNorm/Switch"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](conv2d_1/convolution-0-TransposeNHWCToNCHW-LayoutOptimizer, conv2d_1/kernel/read)]]
[[{{node loss/mul/_4005}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4855_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

最佳答案

我多次遇到此问题,所有这些都是由于 Saver 试图恢复的脏日志文件造成的 - 唯一的解决方案是删除最后一个模型检查点文件并从上一个模型重新启动(同时删除引用 checkpoint.txt 文件中最后一行的行)。

当模型保存期间发生某些事情时,可能会发生这种情况(保存程序已处理死亡 - 文件仍在写入时发生了某些更改,...)

关于python - Keras - 未知错误 : Failed to get convolution algorithm,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54278698/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com