gpt4 book ai didi

keras - InceptionResnetV2 STEM block keras implementation 与原始论文中的不匹配?

转载 作者:行者123 更新时间:2023-12-04 08:42:30 30 4
gpt4 key购买 nike

我一直在尝试将 InceptionResnetV2 中的 Keras implementation 模型摘要与他们论文中指定的模型摘要进行比较,当涉及到 filter_concat block 时,它似乎没有太多相似之处。
模型 summary() 的第一行如下图所示。 (对于我的情况,输入改为 512x512,但据我所知,它不会影响每层的过滤器数量,因此我们也可以使用它们来跟进纸质代码翻译):

Model: "inception_resnet_v2"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 512, 512, 3) 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 255, 255, 32) 864 input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 255, 255, 32) 96 conv2d_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 255, 255, 32) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 253, 253, 32) 9216 activation_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 253, 253, 32) 96 conv2d_2[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 253, 253, 32) 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 253, 253, 64) 18432 activation_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 253, 253, 64) 192 conv2d_3[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 253, 253, 64) 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 126, 126, 64) 0 activation_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 126, 126, 80) 5120 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 126, 126, 80) 240 conv2d_4[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 126, 126, 80) 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 124, 124, 192 138240 activation_4[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 124, 124, 192 576 conv2d_5[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 124, 124, 192 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 61, 61, 192) 0 activation_5[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, 61, 61, 64) 12288 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 61, 61, 64) 192 conv2d_9[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, 61, 61, 64) 0 batch_normalization_9[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, 61, 61, 48) 9216 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, 61, 61, 96) 55296 activation_9[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 61, 61, 48) 144 conv2d_7[0][0]
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 61, 61, 96) 288 conv2d_10[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 61, 61, 48) 0 batch_normalization_7[0][0]
__________________________________________________________________________________________________
activation_10 (Activation) (None, 61, 61, 96) 0 batch_normalization_10[0][0]
__________________________________________________________________________________________________
average_pooling2d_1 (AveragePoo (None, 61, 61, 192) 0 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
.
.
.
many more lines
their paper 的图 3(下图)中,显示了 InceptionV4 和 InceptionResnetV2 的 STEM block 是如何形成的。在图 3 中,STEM block 中有三个过滤器串联,但在我上面向您展示的输出中,串联似乎是顺序 maxpooling 或类似的混合(第一个串联应该出现在 max_pooling2d_1 之后) .它增加了连接应该做的过滤器的数量,但没有进行连接。过滤器似乎是按顺序放置的!任何人都知道此输出中发生了什么?它的作用与论文中描述的相同吗?
作为比较,我发现了一个 InceptionV4 keras implementation ,他们似乎确实在 concatenate_1 中为 STEM block 中的第一个串联做了一个 filter_concat。这是 summary() 第一行的输出。
Model: "inception_v4"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 512, 512, 3) 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 255, 255, 32) 864 input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 255, 255, 32) 96 conv2d_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 255, 255, 32) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 253, 253, 32) 9216 activation_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 253, 253, 32) 96 conv2d_2[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 253, 253, 32) 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 253, 253, 64) 18432 activation_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 253, 253, 64) 192 conv2d_3[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 253, 253, 64) 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 126, 126, 96) 55296 activation_3[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 126, 126, 96) 288 conv2d_4[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 126, 126, 64) 0 activation_3[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 126, 126, 96) 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 126, 126, 160 0 max_pooling2d_1[0][0]
activation_4[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, 126, 126, 64) 10240 concatenate_1[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 126, 126, 64) 192 conv2d_7[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 126, 126, 64) 0 batch_normalization_7[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D) (None, 126, 126, 64) 28672 activation_7[0][0]
__________________________________________________________________________________________________
.
.
.
and many more lines
因此,如本文所示,两种架构的第一层应该相同。或者我错过了什么?
编辑: 我发现,来自 Keras 的 InceptionResnetV2 的实现是 不遵循 InceptionResnetV2 的 STEM block ,而是实现 InceptionResnetV1 的实现(图 14 来自他们的论文,附在下面)在 STEM block 之后,它似乎很好地遵循了 InceptionResnetV2 的其他 block 。
InceptionResnetV1 的性能不如 InceptionResnetV2(图 25),因此我对使用来自 V1 的 block 而不是来自 keras 的完整 V2 持怀疑态度。我将尝试从我找到的 InceptionV4 中删除 STEM,并继续使用 InceptionResnetV2。
同样的问题在 tf-models github 中没有解释就关闭了。如果有人感兴趣,我把它留在这里: https://github.com/tensorflow/models/issues/1235
编辑 2: 出于某种原因,GoogleAI(Inception 架构的创建者)在发布代码时在 their blog 中显示了“inception-resnet-v2”的图像。但是 STEM block 是来自 InceptionV3 的 block ,而不是 InceptionV4 中的 block ,正如论文中指定的那样。因此,要么论文是错误的,要么代码由于某种原因没有遵循论文。
Figure 3 of the original paper. The  schema  for  stem  of  the  pure  Inception-v4  and Inception-ResNet-v2 networks. [...]
Figure 14 of the original paper. The stem of the Inception-ResNet-v1 network.
Figure 25. Top-5 error evolution of all four models (single model,single crop).  Showing the improvement due to larger model size.Although the residual version converges faster, the final accuracyseems to mainly depend on the model size

最佳答案

它达到了相似的结果 .
我刚刚收到一封来自 Google 高级研究科学家和 blog post regarding the release of the code for InceptionResnetV2 的原始出版商 Alex Alemi 的确认错误的电子邮件。 .似乎在内部实验期间,STEM 模块被切换了,释放就保持这样。
引用:

Dani Azemar,

It seems you're right. Not entirely sure what happened but the codeis obviously the source of truth in the sense that the releasedcheckpoint is for the code that is also released. When we weredeveloping the architecture we did a whole slew of internalexperiments and I imagine at some point the stems were switched. Notsure I have the time to dig deeper at the moment, but like I said, thereleased checkpoint is a checkpoint for the released code as you canverify yourself by running the evaluation pipeline. I agree with youthat it seems like this is using the original Inception V1 stem.Best Regards,

Alex Alemi


我将通过有关此主题的更改来更新此帖子。
更新 :Christian Szegedy,也是原论文的出版商,只是 tweeted me :

The original experiments and model was created in DistBelief, a completely different framework pre-dating Tensorflow.

The TF version was added a year later and might have had discrepancies from the original model, however it was made sure to achieve similar results.


因此,由于它达到了相似的结果,因此您的实验将大致相同。

关于keras - InceptionResnetV2 STEM block keras implementation 与原始论文中的不匹配?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64488034/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com