- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我希望在对象检测 API
中同时在我自己的数据集上训练/评估 ssd_mobile_v1_coco。
但是,当我尝试这样做时,我面临 GPU 内存几乎已满的情况,因此评估脚本无法启动。以下是我用于训练和评估的命令:
训练脚本在一个终端 Pane 中调用,如下所示:
python3 train.py \
--logtostderr \
--train_dir=training_ssd_mobile_caltech \
--pipeline_config_path=ssd_mobilenet_v1_coco_2017_11_17/ssd_mobilenet_v1_focal_loss_coco.config
运行良好,训练有效...然后我尝试在第二个终端 Pane 中运行评估脚本:
python3 eval.py \
--logtostderr \
--checkpoint_dir=training_ssd_mobile_caltech \
--eval_dir=eval_caltech \
--pipeline_config_path=ssd_mobilenet_v1_coco_2017_11_17/ssd_mobilenet_v1_focal_loss_coco.config
它失败并出现以下错误:
python3 eval.py \
--logtostderr \
--checkpoint_dir=training_ssd_mobile_caltech \
--eval_dir=eval_caltech \
--pipeline_config_path=ssd_mobilenet_v1_coco_2017_11_17/ssd_mobilenet_v1_focal_loss_coco.config
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
2018-02-28 18:40:00.302271: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-02-28 18:40:00.412808: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-02-28 18:40:00.413217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.835
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 93.00MiB
2018-02-28 18:40:00.413424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-02-28 18:40:00.957090: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 43.00M (45088768 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-02-28 18:40:00.957919: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 38.70M (40580096 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
INFO:tensorflow:Restoring parameters from training_ssd_mobile_caltech/model.ckpt-4775
INFO:tensorflow:Restoring parameters from training_ssd_mobile_caltech/model.ckpt-4775
2018-02-28 18:40:02.274830: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 8.17M (8566528 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-02-28 18:40:02.278599: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 8.17M (8566528 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-02-28 18:40:12.280515: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 8.17M (8566528 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-02-28 18:40:12.281958: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 8.17M (8566528 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-02-28 18:40:12.282082: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.75MiB. Current allocation summary follows.
2018-02-28 18:40:12.282160: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (256): Total Chunks: 190, Chunks in use: 190. 47.5KiB allocated for chunks. 47.5KiB in use in bin. 11.8KiB client-requested in use in bin.
2018-02-28 18:40:12.282251: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (512): Total Chunks: 70, Chunks in use: 70. 35.0KiB allocated for chunks. 35.0KiB in use in bin. 35.0KiB client-requested in use in bin.
[.......................................]2018-02-28 18:40:12.290959: I tensorflow/core/common_runtime/bfc_allocator.cc:684] Sum Total of in-use chunks: 29.83MiB
2018-02-28 18:40:12.290971: I tensorflow/core/common_runtime/bfc_allocator.cc:686] Stats:
Limit: 45088768
InUse: 31284736
MaxInUse: 32368384
NumAllocs: 808
MaxAllocSize: 5796864
2018-02-28 18:40:12.291022: W tensorflow/core/common_runtime/bfc_allocator.cc:277] **********************xx*********xx**_*__****______***********************************************xx
2018-02-28 18:40:12.291044: W tensorflow/core/framework/op_kernel.cc:1198] Resource exhausted: OOM when allocating tensor with shape[1,32,150,150] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
WARNING:root:The following classes have no ground truth examples: 1
/home/mm/models/research/object_detection/utils/metrics.py:144: RuntimeWarning: invalid value encountered in true_divide
num_images_correctly_detected_per_class / num_gt_imgs_per_class)
/home/mm/models/research/object_detection/utils/object_detection_evaluation.py:710: RuntimeWarning: Mean of empty slice
mean_ap = np.nanmean(self.average_precision_per_class)
/home/mm/models/research/object_detection/utils/object_detection_evaluation.py:711: RuntimeWarning: Mean of empty slice
mean_corloc = np.nanmean(self.corloc_per_class)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1329, in _run_fn
status, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,32,150,150] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Preprocessor/sub, FeatureExtractor/MobilenetV1/Conv2d_0/weights/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/Gather/Gather_1/_469 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1068_Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/Gather/Gather_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "eval.py", line 146, in <module>
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "eval.py", line 142, in main
FLAGS.checkpoint_dir, FLAGS.eval_dir)
File "/home/mm/models/research/object_detection/evaluator.py", line 240, in evaluate
save_graph_dir=(eval_dir if eval_config.save_graph else ''))
File "/home/mm/models/research/object_detection/eval_util.py", line 407, in repeated_checkpoint_run
save_graph_dir)
File "/home/mm/models/research/object_detection/eval_util.py", line 286, in _run_checkpoint_once
result_dict = batch_processor(tensor_dict, sess, batch, counters)
File "/home/mm/models/research/object_detection/evaluator.py", line 183, in _process_batch
result_dict = sess.run(tensor_dict)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1128, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1344, in _do_run
options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,32,150,150] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Preprocessor/sub, FeatureExtractor/MobilenetV1/Conv2d_0/weights/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/Gather/Gather_1/_469 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1068_Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/Gather/Gather_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Caused by op 'FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Conv2D', defined at:
File "eval.py", line 146, in <module>
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "eval.py", line 142, in main
FLAGS.checkpoint_dir, FLAGS.eval_dir)
File "/home/mm/models/research/object_detection/evaluator.py", line 161, in evaluate
ignore_groundtruth=eval_config.ignore_groundtruth)
File "/home/mm/models/research/object_detection/evaluator.py", line 72, in _extract_prediction_tensors
prediction_dict = model.predict(preprocessed_image, true_image_shapes)
File "/home/mm/models/research/object_detection/meta_architectures/ssd_meta_arch.py", line 334, in predict
preprocessed_inputs)
File "/home/mm/models/research/object_detection/models/ssd_mobilenet_v1_feature_extractor.py", line 112, in extract_features
scope=scope)
File "/home/mm/models/research/slim/nets/mobilenet_v1.py", line 232, in mobilenet_v1_base
scope=end_point)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1057, in convolution
outputs = layer.apply(inputs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 762, in apply
return self.__call__(inputs, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 652, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/convolutional.py", line 167, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 838, in __call__
return self.conv_op(inp, filter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 502, in __call__
return self.call(inp, filter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 190, in __call__
name=self.name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 639, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,32,150,150] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Preprocessor/sub, FeatureExtractor/MobilenetV1/Conv2d_0/weights/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/Gather/Gather_1/_469 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1068_Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/Gather/Gather_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
在启动 eval.py
TF 训练之前,所有 GPU 内存都已提前分配,因此我无法弄清楚如何让它们同时运行,或者至少让它们同时运行ODA,按特定时间间隔进行评估。
因此,首先是否有可能让评估与培训同时进行?如果是的话怎么办?
系统信息
您正在使用的模型的顶级目录是什么:object_detection
我是否编写了自定义代码:还没有...
操作系统平台和发行版:Linux Ubuntu 16.04 LTS
TensorFlow 安装自(源代码或二进制文件):pip3 tensorflow-gpu
TensorFlow版本(使用下面的命令):1.5.0
CUDA/cuDNN 版本:9.0/7.0
GPU 型号和内存:GTX 1080,8Gb
最佳答案
一种简单的方法是在命令之前添加 CUDA_VISIBILE_DEVICES
CUDA_VISIBLE_DEVICES=""python eval.py --logtostderr --pipeline_config_path=multires.config --checkpoint_dir=/train_dir/--eval_dir=eval_dir/
这将阻止您的评估脚本看到任何 GPU,并且它应该自动回退到 CPU。
关于tensorflow - 如何在对象检测 API 中同时训练和评估?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49033008/
目前,由于生成变量的评估,我的Makefile遇到了问题。我降低了复杂性,仅保留了导致问题的基本要素。 读取Makefile时, $(LIST)被评估为文件列表。 在步骤1中,其中一个文件被删除。 在
为什么这 eval 没有调用alert("Summer") ? eval('(caption="Summer";alert(caption))'); 和《夏天》里的台词有关系吗? 最佳答案 Uncau
我正在努力让以下工作正常进行。最初似乎可以工作,但不知何故它停止工作了 var setCommonAttr = "1_row1_common"; var val = document.getEleme
eval('({"suc":true})') 以上错误,应该是: eval('{"suc":true}') 为什么? 最佳答案 当尝试评估时,解释器会看到大括号并认为它是一个 block 开头。将其括
我的页面 A 发出了 ajax 调用并引入了片段 B。该片段被添加到 DOM 中,并且该片段中的所有脚本都经过了评估。在该代码片段中,我有 2 个脚本标签: function doOptions()
这里是javascript代码: var test = { "h" : function (a) {return a;}, "say" : "hello" }; 第一次运行: test
我正在查看一些工作代码,并遇到了这一行: eval("\$element = \"$element\";"); 我真的很困惑为什么任何 PHP 开发人员都会写这一行。除了给自己设置一个变量之外,这还有
谁能帮我解决以下问题: 我有这样的代码: if(cond1 && cond2 && .. && cond10) 这里,cond1 是昂贵的操作,其输出是 boolean 值。 现在我的问题是,当 co
**摘要:**华为AppCube应用魔方顺利通过信通院评估,被认证为具备 “低代码开发平台通用能力”的企业服务平台。 本文分享自华为云社区《华为AppCube通过中国信通院“低代码开发平台通用能力要求
我正在尝试通过 PHP 从图像的 EXIF 数据中获取焦距。 这是我目前得到的代码: $exif = exif_read_data("$photo"); $length10 = $exif['Foca
我想使用id =“key”将一个类添加到元素中,但是为什么不起作用?我是js的初学者:这是代码: audio.classList.add('yellow'); 这是错误: null is not an
这是我的 XML: QueWay Password Recovery 现在我想用 php 用 xpath 选择文本“QueWay”。到目前为止我所拥有的一切都很好: $xml =
使用下面的代码,即使我输入的数字大于 18,我也会得到这个结果。 运行:你今年多大? 21你还没有达到成年年龄!构建成功(总时间:3 秒) 我是java新手,正在尝试自学,有人可以帮忙吗? impor
我正在阅读 http://www.cran.r-project.org/doc/manuals/R-lang.pdf手册第 4.3 章,我就是不明白。也许有人可以给我一个快速的解释,为什么 R 的行为
在这个实现中,每次都会评估 hand 并返回另一个列表吗? foreach (Card card in hand.Cards) { } 我们应该用下面的实现替换上面的实现吗? var cards =
我正在制作 LINQ lambda 表达式: Expression> add = (x, y) => x + y; 但现在我将如何评估它,比如说找到 2+3? 最佳答案 这应该适合你: var su
我正在制作一个语言解释器,我已经到了需要评估 if 语句的地步。起初我认为这很简单,我能够让我的解释器评估简单的 if 条件,10 == 10 但是当我试图让它评估更复杂的条件时, 10 == 10
我正在尝试以下代码,该代码向 RDD 中的每一行添加一个数字,并使用 PySpark 返回 RDD 列表。 from pyspark.context import SparkContext file
在阅读了很多关于 Lisp eval-when 运算符的文档后,我仍然无法理解它的用途,我知道使用这个运算符我可以控制表达式的计算时间,但我做不到找出任何可能适用的示例? 最好的问候,utxee. 最
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。 要求我们推荐或查找工具、库或最喜欢的场外资源的问题对于 Stack Overflow 来说是偏离主题的,
我是一名优秀的程序员,十分优秀!