python - 对 im2txt 微调 inception v3 时损失并没有减少-6ren

python - 对 im2txt 微调 inception v3 时损失并没有减少

转载作者：行者123 更新时间：2023-11-30 09:50:11

25

4

我在微调 im2txt 的预训练 Inception v3 模型时遇到问题。由于某种原因，初始训练并没有减少太多损失，并且微调 Inception v3 也没有减少我的训练数据的任何损失。我正在尝试找出原因，任何见解都会有所帮助。

im2txt 是一个模型，它接受图像输入并打印出标题列表作为输出。最初，im2txt 打印出一个标题作为连贯的句子，描述图像。为了适合我的项目，我更改了训练数据中的代码和标签，以便它打印出与图像相关的单词列表。

例如，我的图像看起来像这样。请注意，图像中的对象比平均 Imagenet 图像要多:

我的标签标题如下所示:

 female lady woman clothes shop customer

我总共有 400,000 张图像和相应的标签说明。我初始训练了13万步，微调了17万步。词汇共有750个单词。初始训练+微调的损失曲线(从步骤 130,000 开始)如下:

准确率和召回率约为0.35~40。

训练的配置文件如下:

# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""Image-to-text model and training configurations."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function


class ModelConfig(object):
  """Wrapper class for model hyperparameters."""

  def __init__(self):
    """Sets the default model hyperparameters."""
    # File pattern of sharded TFRecord file containing SequenceExample protos.
    # Must be provided in training and evaluation modes.
    self.input_file_pattern = None

    # Image format ("jpeg" or "png").
    self.image_format = "jpeg"

    # Approximate number of values per input shard. Used to ensure sufficient
    # mixing between shards in training.
    self.values_per_input_shard = 2300
    # Minimum number of shards to keep in the input queue.
    self.input_queue_capacity_factor = 2
    # Number of threads for prefetching SequenceExample protos.
    self.num_input_reader_threads = 1

    # Name of the SequenceExample context feature containing image data.
    self.image_feature_name = "image/data"
    # Name of the SequenceExample feature list containing integer captions.
    self.caption_feature_name = "image/caption_ids"

    # Number of unique words in the vocab (plus 1, for <UNK>).
    # The default value is larger than the expected actual vocab size to allow
    # for differences between tokenizer versions used in preprocessing. There is
    # no harm in using a value greater than the actual vocab size, but using a
    # value less than the actual vocab size will result in an error.
    self.vocab_size = 750

    # Number of threads for image preprocessing. Should be a multiple of 2.
    self.num_preprocess_threads = 4

    # Batch size.
    self.batch_size = 32

    # File containing an Inception v3 checkpoint to initialize the variables
    # of the Inception model. Must be provided when starting training for the
    # first time.
    self.inception_checkpoint_file = None

    # Dimensions of Inception v3 input images.
    self.image_height = 299
    self.image_width = 299

    # Scale used to initialize model variables.
    self.initializer_scale = 0.08

    # LSTM input and output dimensionality, respectively.
    self.embedding_size = 512
    self.num_lstm_units = 512

    # If < 1.0, the dropout keep probability applied to LSTM variables.
    self.lstm_dropout_keep_prob = 0.7


class TrainingConfig(object):
  """Wrapper class for training hyperparameters."""

  def __init__(self):
    """Sets the default training hyperparameters."""
    # Number of examples per epoch of training data.
    self.num_examples_per_epoch = 100000

    # Optimizer for training the model.
    self.optimizer = "SGD"

    # Learning rate for the initial phase of training.
    self.initial_learning_rate = 2.0
    self.learning_rate_decay_factor = 0.5
    self.num_epochs_per_decay = 1.0

    # Learning rate when fine tuning the Inception v3 parameters.
    self.train_inception_learning_rate = 0.005

    # If not None, clip gradients to this value.
    self.clip_gradients = 5.0

    # How many model checkpoints to keep.
    self.max_checkpoints_to_keep = 5

任何建议、见解或观察都会很棒。

最佳答案

请注意，im2txt 的功能非常强大，因为它可以生成可读的句子。在句子中，一个单词与相邻的单词相关，这就是它起作用的原因。在您的情况下，您正在更改模型以生成一组顺序不相关的标签。实际上，在im2txt模型中，单词“female”、“woman”和“lady”基本上是相同的概念，im2txt可以创建滑动这些单词的句子。例如，在im2txt中:“这位女士穿着一条漂亮的粉红色裙子”与“这位女士的裙子是粉红色的”相同，或者应该非常相似。在你的情况下，如果你不提供一些单词顺序规则，它会让你的模型变得很困惑，并且可能无法学习。

如果您想从图像中获取标签列表，您应该仅使用具有多标签分类的初始模型(通过 sigmoid 层更改 softmax 层并使用 sigmoid 交叉熵作为损失函数)。

关于python - 对 im2txt 微调 inception v3 时损失并没有减少，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46637864/

25

4

0

文章推荐： java - 如何从 java 代码(对于 mac osx)以管理员身份运行？

文章推荐： java - 如何在 JComboBox 中加载树？

文章推荐： javascript - 从数组中删除相似的对象

numpy - 为向量矩阵 v 计算 "v^T A v"
我有一个 k*n矩阵 X 和 k*k矩阵A。对于X的每一列，我想计算标量 X[:, i].T.dot(A).dot(X[:, i]) (或者，数学上， Xi' * A * Xi )。目前，我有一个
c - 无效*v[]； v[i] = v[j]；为什么这是对的？
很难说出这里问的是什么。这个问题是含糊的、模糊的、不完整的、过于宽泛的或修辞性的，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开它，visit the help center 。已关
javascript - VueJS - 如何注册自定义元素、、、
我是 VueJS 的新手。我已经使用 vuetify/webpack-ssr 模板创建了一个项目，现在我想创建一个登录页面，但是没有显示表单，控制台给了我以下信息: [Vue warn]: Unkno
c++ - 是否保证 C++ vector v 的 v.begin() + v.size() == v.end()？
我尝试将 value 插入到 C++ vector v 之前的第 i 元素(或元素 (i-1) 之后) )。代码很简单 v.insert(v.begin() + i, value); 我确信当 i 介
c++ - vector v 的 v[0]、v.begin() 和 v.data() 之间有什么区别？
我需要显示使用合并排序算法排序的 vector 。然而，当我使用 v.begin() 时，我的 friend 使用 v.data() 来传递 vector 。他的代码运行良好，而我的却不行。请解释。
ffmpeg - 过滤图描述中的流说明符 ':v:0' [1 :v:0] [1:a:0] [2:v:0] [2:a:0] [3:v:0] [3:v:0] concat=n=4:v=1:a=1 [v] [a] matches no streams
这是我的命令(url1、url2、url3、url4 是占位符): ffmpeg -i url1 -i url2 -i url3 -i url4 -filter_complex “[1:v:0] [1
javascript - Vue : Use data to control DOM (but v-for, v-if、v-model 和 v-show 还不够)
我以前用过Vue，我知道怎么用v-for渲染元素序列，v-if或v-show有条件地显示元素，并且 v-model例如，控制段落的内容。但现在我需要对 DOM 进行更精细的控制: 我有一个range
rust - 为什么 &v[1] + &v[2] 与 Rust 中的 v[1] + v[2] 具有相同的结果？
我正在学习所有权和借用。 borrow1 和borrow2 的区别在于在borrow2 打印时使用了&: fn borrow1(v: &Vec) { println!("{}", &v[10]
vuejs2 - v-for 内部的 v-if 和 v-else 用于不同的文本渲染
我找不到一种方法来选择不同的选项来渲染 v-for 中的文本。是否有可能或者我是否需要以不同的方式构建逻辑来执行类似于下面的代码的操作？ // i
oop - 为什么 Seq[V] 不扩展 Map[Int,V] 也不 Set[V] 扩展 Map[V,Bool]？
Iterable 的三个直接子类型是 Map , Seq , 和 Set .除了性能问题之外，似乎还有一个 Seq是从整数到值的映射，以及 Set是从值到 bool 值的映射(如果值在集合中，则为 t
java - 为什么 v != null ？ v++ : 1 is not the same as (v ! = 空？ v : 0) + 1 on incrementing a key on HashMap. 计算？
我想应用一个计算方法，如果键存在则增加值，否则将 1。有 Map map = new HashMap<>(); 我不明白为什么 for (int i = 0; i v != null ? v++ :
c - IEEE 754 : is v *= -1 always guaranteed to be the same as v = -v?
标准(IEEE 754/C)是否保证以下代码断言永远不会失败？ int main() { for ( /* all possible float / double values */ )
javascript - v-for 在 v-if 条件下，v-else 不起作用，循环重复
代码由Vue语言编写，使用Element-ui框架，如果一个对象包含某些内容，则会显示该内容，如果不包含则禁用菜单按钮。输出应该是这样的: a、b(禁用)、c、d、e 但我的是这样的: a、a(禁
vue.js - v-for 与 v-if 处于同一级别，影响 v-else
如果我这样做: {‌{ morevalue }} {‌{ value }} v-else 中的跨度也会在第二个 V-FOR 上循环，即使它上面没有任何 v-for，为什么？这是
vue.js - v-for 与 v-if 处于同一级别，影响 v-else
如果我这样做: {‌{ morevalue }} {‌{ value }} v-else 中的跨度也会在第二个 V-FOR 上循环，即使它上面没有任何 v-for，为什么？这是
javascript - 如何在 v-datatable 中使用带有动态数组的 v-switches v-model
我将 Vue.js 与 Vuetify 一起使用，我正在尝试使用 v-data-table 从后端加载菜单列表并使用对其设置一些权限v-switches 但我在尝试 v-model 数组时遇到问题:
java - Map 在按值分组后返回到 Map>，而不是 Map>>
我在 Java 的流式操作中努力维护我想要的数据结构，这很可能是由于缺乏正确的理解和实践。 public class Main { public static void main(String
javascript - 是 incorrect? 我可以在同一元素的 v-bind 中使用来自 v-for 的匹配项吗？
我正在尝试为匹配中的每个匹配呈现一些 HTML，但是，我不太确定实际上是正确的。更具体地说，我不确定我是否可以使用 v-bind:match='match'在与循环相同的元素上 v-for='ma
vue.js - 带有选择选项的 V-IF 和 V-for 循环条件似乎永远不会进入 v-else 语句
所以我想知道为什么这个 v-if 和 v-else 语句不起作用，为什么我要以不同的方式解决它。代码如下 Required: Select a Workflow {{ isChain ?
vuejs2 - 防止 v-if、v-else、v-else-if 中的相同组件标签共享一个 Vue 实例
我有一个 VueJS 组件，我在同一个模板中使用了两次来显示两组不同的数据。每个都显示在自己的使用 v-if 切换的容器在导航选项卡上。似乎这些组件被实例化为同一个实例。我调用 console

首页

博学

6Ren·AI

商城

python - 对 im2txt 微调 inception v3 时损失并没有减少