machine-learning - 如何防止caffe中的权重更新-6ren

machine-learning - 如何防止caffe中的权重更新

转载作者：行者123 更新时间：2023-11-30 08:39:51

我的网络的某些层加载了预训练的模型。我想修复它们的参数并训练其他层。

我关注了this page并将 lr_multi decay_multi 设置为 0，propagate_down: false，甚至 base_lr: 0 weight_decay: 0 在求解器中。然而，测试损失(每次测试使用所有测试图像)在每个 iter 中仍然变化非常缓慢。经过数千次迭代后，精度将降至 0(加载预训练模型时的精度为 80%)。

这是一个两层的示例，我只是初始化权重并将上述参数设置为 0。我想对这个示例中的所有层进行调整，但是当训练开始时，损失不断变化......

  layer {
    name: "data"
    type: "ImageData"
    top: "data"
    top: "label"
    include {
      phase: TRAIN
    }
    transform_param {
      scale: 0.017
      mirror: true
      crop_size: 32
      mean_value: 115
      mean_value: 126
      mean_value: 130
      color: true
      contrast: true
      brightness: true
    }
    image_data_param {
      source: "/data/zhuhao5/data/cifar100/cifar100_train_replicate.txt"
      batch_size: 64
      shuffle: true
      #pair_size: 3
    }
  }
  layer {
    name: "data"
    type: "ImageData"
    top: "data"
    top: "label"
    include {
      phase: TEST
    }
    transform_param {
      scale: 0.017
      mirror: false
      crop_size: 32
      mean_value: 115
      mean_value: 126
      mean_value: 130
    }
    image_data_param {
      source: "/data/zhuhao5/data/cifar100/cifar100_test.txt"
      batch_size: 100
      shuffle: false
    }
  }
  #-------------- TEACHER --------------------
  layer {
    name: "conv1"
    type: "Convolution"
    bottom: "data"
    propagate_down: false
    top: "conv1"
    param { 
      lr_mult: 0 
      decay_mult: 0 
    }
    convolution_param {
      num_output: 16
      bias_term: false
      pad: 1
      kernel_size: 3
      stride: 1
      weight_filler {
        type: "msra"
      }
    }
  }
  layer {
    name: "res2_1a_1_bn"
    type: "BatchNorm"
    bottom: "conv1"
    propagate_down: false
    top: "res2_1a_1_bn"
    param { 
      lr_mult: 0 
      decay_mult: 0 
    }
        param { 
      lr_mult: 0 
      decay_mult: 0 
    }
  }
  layer {
    name: "res2_1a_1_scale"
    type: "Scale"
    bottom: "res2_1a_1_bn"
    propagate_down: false
    top: "res2_1a_1_bn"
      param { 
      lr_mult: 0 
      decay_mult: 0 
    }
    scale_param {
      bias_term: true
    }
  }
  layer {
    name: "res2_1a_1_relu"
    type: "ReLU"
    bottom: "res2_1a_1_bn"
    propagate_down: false
    top: "res2_1a_1_bn"
  }
  layer {
    name: "pool_5"
    type: "Pooling"
    bottom: "res2_1a_1_bn"
    propagate_down: false
    top: "pool_5"
    pooling_param {
      pool: AVE
      global_pooling: true
    }
  }
  layer {
    name: "fc100"
    type: "InnerProduct"
    bottom: "pool_5"
    propagate_down: false
    top: "fc100"
    param {
      lr_mult: 0
      decay_mult: 0
    }
    param {
      lr_mult: 0
      decay_mult: 0
    }
    inner_product_param {
      num_output: 100
      weight_filler {
        type: "msra"
      }
      bias_filler {
        type: "constant"
        value: 0
      }
    }
  }
  #---------------------------------
  layer {
    name: "tea_soft_loss"
    type: "SoftmaxWithLoss"
    bottom: "fc100"
    bottom: "label"
    propagate_down: false
    propagate_down: false
    top: "tea_soft_loss"
    loss_weight: 0
  }

  ##----------- ACCURACY----------------

  layer {
    name: "teacher_accuracy"
    type: "Accuracy"
    bottom: "fc100"
    bottom: "label"
    top: "teacher_accuracy"
    accuracy_param {
      top_k: 1
    }
  }

这是求解器:

test_iter: 100

test_interval: 10

base_lr: 0
momentum: 0
weight_decay: 0

lr_policy: "poly"
power: 1

display: 10000

max_iter: 80000

snapshot: 5000

type: "SGD"

solver_mode: GPU

random_seed: 10086

和日志:

I0829 16:31:39.363433 14986 net.cpp:200] teacher_accuracy does not need backward computation.
I0829 16:31:39.363438 14986 net.cpp:200] tea_soft_loss does not need backward computation.
I0829 16:31:39.363442 14986 net.cpp:200] fc100_fc100_0_split does not need backward computation.
I0829 16:31:39.363446 14986 net.cpp:200] fc100 does not need backward computation.
I0829 16:31:39.363451 14986 net.cpp:200] pool_5 does not need backward computation.
I0829 16:31:39.363454 14986 net.cpp:200] res2_1a_1_relu does not need backward computation.
I0829 16:31:39.363458 14986 net.cpp:200] res2_1a_1_scale does not need backward computation.
I0829 16:31:39.363462 14986 net.cpp:200] res2_1a_1_bn does not need backward computation.
I0829 16:31:39.363466 14986 net.cpp:200] conv1 does not need backward computation.
I0829 16:31:39.363471 14986 net.cpp:200] label_data_1_split does not need backward computation.
I0829 16:31:39.363485 14986 net.cpp:200] data does not need backward computation.
I0829 16:31:39.363490 14986 net.cpp:242] This network produces output tea_soft_loss
I0829 16:31:39.363494 14986 net.cpp:242] This network produces output teacher_accuracy
I0829 16:31:39.363507 14986 net.cpp:255] Network initialization done.
I0829 16:31:39.363559 14986 solver.cpp:56] Solver scaffolding done.
I0829 16:31:39.363852 14986 caffe.cpp:248] Starting Optimization
I0829 16:31:39.363862 14986 solver.cpp:272] Solving WRN_22_12_to_WRN_18_4_v5_net
I0829 16:31:39.363865 14986 solver.cpp:273] Learning Rate Policy: poly
I0829 16:31:39.365981 14986 solver.cpp:330] Iteration 0, Testing net (#0)
I0829 16:31:39.366190 14986 blocking_queue.cpp:49] Waiting for data
I0829 16:31:39.742347 14986 solver.cpp:397]     Test net output #0: tea_soft_loss = 85.9064
I0829 16:31:39.742437 14986 solver.cpp:397]     Test net output #1: teacher_accuracy = 0.0113
I0829 16:31:39.749806 14986 solver.cpp:218] Iteration 0 (0 iter/s, 0.385886s/10000 iters), loss = 0
I0829 16:31:39.749862 14986 solver.cpp:237]     Train net output #0: tea_soft_loss = 4.97483
I0829 16:31:39.749877 14986 solver.cpp:237]     Train net output #1: teacher_accuracy = 0
I0829 16:31:39.749908 14986 sgd_solver.cpp:105] Iteration 0, lr = 0
I0829 16:31:39.794306 14986 solver.cpp:330] Iteration 10, Testing net (#0)
I0829 16:31:40.171447 14986 solver.cpp:397]     Test net output #0: tea_soft_loss = 4.9119
I0829 16:31:40.171510 14986 solver.cpp:397]     Test net output #1: teacher_accuracy = 0.0115
I0829 16:31:40.219133 14986 solver.cpp:330] Iteration 20, Testing net (#0)
I0829 16:31:40.596911 14986 solver.cpp:397]     Test net output #0: tea_soft_loss = 4.91862
I0829 16:31:40.596971 14986 solver.cpp:397]     Test net output #1: teacher_accuracy = 0.0116
I0829 16:31:40.645246 14986 solver.cpp:330] Iteration 30, Testing net (#0)
I0829 16:31:41.021711 14986 solver.cpp:397]     Test net output #0: tea_soft_loss = 4.92105
I0829 16:31:41.021772 14986 solver.cpp:397]     Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:41.069464 14986 solver.cpp:330] Iteration 40, Testing net (#0)
I0829 16:31:41.447345 14986 solver.cpp:397]     Test net output #0: tea_soft_loss = 4.91916
I0829 16:31:41.447407 14986 solver.cpp:397]     Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:41.495157 14986 solver.cpp:330] Iteration 50, Testing net (#0)
I0829 16:31:41.905607 14986 solver.cpp:397]     Test net output #0: tea_soft_loss = 4.9208
I0829 16:31:41.905654 14986 solver.cpp:397]     Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:41.952659 14986 solver.cpp:330] Iteration 60, Testing net (#0)
I0829 16:31:42.327942 14986 solver.cpp:397]     Test net output #0: tea_soft_loss = 4.91936
I0829 16:31:42.328025 14986 solver.cpp:397]     Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:42.374279 14986 solver.cpp:330] Iteration 70, Testing net (#0)
I0829 16:31:42.761359 14986 solver.cpp:397]     Test net output #0: tea_soft_loss = 4.91859
I0829 16:31:42.761430 14986 solver.cpp:397]     Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:42.807821 14986 solver.cpp:330] Iteration 80, Testing net (#0)
I0829 16:31:43.232321 14986 solver.cpp:397]     Test net output #0: tea_soft_loss = 4.91668
I0829 16:31:43.232398 14986 solver.cpp:397]     Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:43.266436 14986 solver.cpp:330] Iteration 90, Testing net (#0)
I0829 16:31:43.514633 14986 blocking_queue.cpp:49] Waiting for data
I0829 16:31:43.638617 14986 solver.cpp:397]     Test net output #0: tea_soft_loss = 4.91836
I0829 16:31:43.638684 14986 solver.cpp:397]     Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:43.685451 14986 solver.cpp:330] Iteration 100, Testing net (#0)

我想知道我在caffe的更新过程中错过了什么:(

最佳答案

找到原因了。

BatchNorm 层在 TRAIN 和 TEST 阶段使用不同的 use_global_stats。

在我的问题中，我应该在训练过程中设置use_global_stats: true。

也不要忘记Scale层。

修改后的图层应该是

layer {
  name: "res2_1a_1_bn"
  type: "BatchNorm"
  bottom: "conv1"
  top: "res2_1a_1_bn"
  batch_norm_param {
      use_global_stats: true
  }
}
layer {
  name: "res2_1a_1_scale"
  type: "Scale"
  bottom: "res2_1a_1_bn"
  top: "res2_1a_1_bn"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}

关于machine-learning - 如何防止caffe中的权重更新，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45934342/

文章推荐： javascript - jquery JSON 不从 jQ AJAX 请求解析

文章推荐： java - 如何将数组中的文件连接到新文件夹中？

文章推荐： javascript - knockout 观察和表现

文章推荐： javascript - 将事件动态绑定(bind)到表行和列

java - JFrame 中的 JPanel 中的 JScrollPane 中的 JTextPane
我想做的是让 JTextPane 在 JPanel 中占用尽可能多的空间。对于我使用的 UpdateInfoPanel: public class UpdateInfoPanel extends JP
java - JFrame 中的 JPanel 中的 JTextArea 中的 JScrollPane 出现问题
我在 JPanel 中有一个 JTextArea，我想将其与 JScrollPane 一起使用。我正在使用 GridBagLayout。当我运行它时，框架似乎为 JScrollPane 腾出了空间，但
ios - iOs Xcode 中的 UIViewController 中的 UIView 中的 UITableView
我想在 xcode 中实现以下功能。我有一个 View Controller 。在这个 UIViewController 中，我有一个 UITabBar。它们下面是一个 UIView。将 UITab
sql - 与 SQL 中的 STUFF 等效的函数(MySQL 中的 GROUP_CONCAT/Oracle 中的 LISTAGG)
有谁知道Firebird 2.5有没有类似于SQL中“STUFF”函数的功能？我有一个包含父用户记录的表，另一个表包含与父相关的子用户记录。我希望能够提取用户拥有的“ROLES”的逗号分隔字符串，而
Mirth 中的 Json 解析或 Mirth 中的 Json 或 Mirth 中的 HL7 到 JSON
我想使用 JSON 作为 mirth channel 的输入和输出，例如详细信息保存在数据库中或创建 HL7 消息。简而言之，输入为 JSON 解析它并输出为任何格式。最佳答案 var objec
python - 如果文件 1 中的 A 列 = 文件 2 中的 A 列，则替换为文件 2 中的 B 列
通常我会使用 R 并执行 merge.by，但这个文件似乎太大了，部门中的任何一台计算机都无法处理它! (任何从事遗传学工作的人的附加信息)本质上，插补似乎删除了 snp ID 的 rs 数字，我只剩
Javascript 中的 HAML 中的 Javascript
我有一个以前可能被问过的问题，但我很难找到正确的描述。我希望有人能帮助我。在下面的代码中，我设置了varprice，我想添加javascript变量accu_id以通过rails在我的数据库中查找记
HTML 中的 SVG 中的 HTML
我有一个简单的 SVG 文件，在 Firefox 中可以正常查看 - 它的一些包装文本使用 foreignObject 包含一些 HTML - 文本包装在 div 中:
ruby - Ruby 中的 If block 中的 "Or"
所以我正在为学校编写一个 Ruby 程序，如果某个值是 1 或 3，则将 bool 值更改为 true，如果是 0 或 2，则更改为 false。由于我有 Java 背景，所以我认为这段代码应该有效:
amazon-web-services - 如何从账户 A 中的 Lambda(VPC 中的 Lambda)调用账户 B(VPC 中的此 Lambda)中的 AWS Lambda 函数
我做了什么: 我在这些账户之间创建了 VPC 对等连接互联网网关也连接到每个 VPC 还配置了路由表(以允许来自双方的流量) 情况1: 当这两个 VPC 在同一个账户中时，我成功测试了从另一个 La
php - 如何获取 column1 中的 value1 和 column2 中的 value2 但 column1 中的 value2 在 column2 中没有 value1 的行？
我有一个名为 contacts 的表: user_id contact_id 10294 10295 10294 10293 10293 10294 102
php - Magento 中的 foreach 中的 getChildHtml
我正在使用 Magento 中的新模板。为避免重复代码，我想为每个产品预览使用相同的子模板。特别是我做了这样一个展示: $products = Mage::getModel('catalog/pro
protocols - Elixir 中的 "for"中的 "defimpl"实际上检查了什么？
“for”是否总是检查协议(protocol)中定义的每个函数中第一个参数的类型？编辑(改写): 当协议(protocol)方法只有一个参数时，根据该单个参数的类型(直接或任意)找到实现。当协议(p
javascript - PHP 中的 JavaScript 中的 PHP
我想从我的 PHP 代码中调用 JavaScript 函数。我通过使用以下方法实现了这一点: echo ' drawChart($id); '; 这工作正常，但我想从我的 PHP 代码中获取数据，我使
javascript - html 中的 html 中的 JavaScript
这个问题已经有答案了: Event binding on dynamically created elements? (23 个回答) 已关闭 5 年前。我有一个动态表单，我想在其中附加一些其他 h
javascript - componentDidMount() 中的 .map 中的 setState
我正在尝试找到一种解决方案，以在 componentDidMount 中的映射项上使用 setState。我正在使用 GraphQL连同 Gatsby返回许多 data 项目，但要求在特定的 pat
android - ScrollView 中的 View 中的 OnTouchListener
我在 ScrollView 中有一个 View 。只要用户按住该 View ，我想每 80 毫秒调用一次方法。这是我已经实现的: final Runnable vibrate = new Runnab
android - GetStringUTFChars 中的 dvmDecodeIndirectRef 中的 dvmAbort
我用 jni 开发了一个 android 应用程序。我在 GetStringUTFChars 的 dvmDecodeIndirectRef 中得到了一个 dvmabort。我只中止了一次。为什么会这
android - Activity 中的 FragmentPagerAdapter 中的 RecyclerView
当我到达我的 Activity 时，我调用 FragmentPagerAdapter 来处理我的不同选项卡。在我的一个选项卡中，我想显示一个 RecyclerView，但他从未出现过，有了断点，我看到
android - Activity 中的 DialogFragment 中的 RecyclerView
当我按下 Activity 中的按钮时，会弹出一个 DialogFragment。在对话框 fragment 中，有一个看起来像普通 ListView 的 RecyclerView。我想要的行为是当

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

machine-learning - 如何防止caffe中的权重更新