python - Keras 自动编码器 : Tying Weights from Encoder To Decoder not working-6ren

python - Keras 自动编码器 : Tying Weights from Encoder To Decoder not working

转载作者：太空狗更新时间：2023-10-30 01:30:43

我正在创建一个自动编码器作为我的 Kaggle 竞赛完整模型的一部分。我试图将编码器的重量联系起来，转移到解码器。在第一个纪元之前，权重正确同步，之后，解码器权重只是卡住，并且跟不上梯度下降更新的编码器权重。

我在谷歌上几乎每篇关于这个问题的帖子都找了 12 个小时，但似乎没有人知道我的案例的答案。最接近的是这个 Tying Autoencoder Weights in a Dense Keras Layer但是问题是通过不使用可变张量作为内核来解决的，但是我已经没有使用那种类型的张量作为我的解码器内核，所以没有用。

我正在使用本文中定义的 DenseTied Keras 自定义图层类 https://towardsdatascience.com/build-the-right-autoencoder-tune-and-optimize-using-pca-principles-part-ii-24b9cca69bd6 ，完全一样，只是改变了我引用 Keras 支持的方式以适应我的导入风格。

import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

这是自定义层定义

class DenseTied(tf.keras.layers.Layer):

    def __init__(self, units,
                 activation=None,
                 use_bias=True,
                 kernel_initializer='glorot_uniform',
                 bias_initializer='zeros',
                 kernel_regularizer=None,
                 bias_regularizer=None,
                 activity_regularizer=None,
                 kernel_constraint=None,
                 bias_constraint=None,
                 tied_to=None,
                 **kwargs):
        self.tied_to = tied_to
        if 'input_shape' not in kwargs and 'input_dim' in kwargs:
            kwargs['input_shape'] = (kwargs.pop('input_dim'),)
        super().__init__(**kwargs)
        self.units = units
        self.activation = tf.keras.activations.get(activation)
        self.use_bias = use_bias
        self.kernel_initializer = tf.keras.initializers.get(kernel_initializer)
        self.bias_initializer = tf.keras.initializers.get(bias_initializer)
        self.kernel_regularizer = tf.keras.regularizers.get(kernel_regularizer)
        self.bias_regularizer = tf.keras.regularizers.get(bias_regularizer)
        self.activity_regularizer = tf.keras.regularizers.get(activity_regularizer)
        self.kernel_constraint = tf.keras.constraints.get(kernel_constraint)
        self.bias_constraint = tf.keras.constraints.get(bias_constraint)
        self.input_spec = tf.keras.layers.InputSpec(min_ndim=2)
        self.supports_masking = True

    def build(self, input_shape):
        assert len(input_shape) >= 2
        input_dim = input_shape[-1]

        if self.tied_to is not None:
            self.kernel = tf.keras.backend.transpose(self.tied_to.kernel)
            self.non_trainable_weights.append(self.kernel)
        else:
            self.kernel = self.add_weight(shape=(input_dim, self.units),
                                          initializer=self.kernel_initializer,
                                          name='kernel',
                                          regularizer=self.kernel_regularizer,
                                          constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(self.units,),
                                        initializer=self.bias_initializer,
                                        name='bias',
                                        regularizer=self.bias_regularizer,
                                        constraint=self.bias_constraint)
        else:
            self.bias = None
        self.input_spec = tf.keras.layers.InputSpec(min_ndim=2, axes={-1: input_dim})
        self.built = True

    def compute_output_shape(self, input_shape):
        assert input_shape and len(input_shape) >= 2
        output_shape = list(input_shape)
        output_shape[-1] = self.units
        return tuple(output_shape)

    def call(self, inputs):
        output = tf.keras.backend.dot(inputs, self.kernel)
        if self.use_bias:
            output = tf.keras.backend.bias_add(output, self.bias, data_format='channels_last')
        if self.activation is not None:
            output = self.activation(output)
        return output

这是使用虚拟数据集进行模型训练和测试

rand_samples = np.random.rand(16, 51)
dummy_ds = tf.data.Dataset.from_tensor_slices((rand_samples, rand_samples)).shuffle(16).batch(16)

encoder = tf.keras.layers.Dense(1, activation="linear", input_shape=(51,), use_bias=True)
decoder = DenseTied(51, activation="linear", tied_to=encoder, use_bias=True)

autoencoder = tf.keras.Sequential()
autoencoder.add(encoder)
autoencoder.add(decoder)

autoencoder.compile(metrics=['accuracy'],
                    loss='mean_squared_error',
                    optimizer='sgd')

autoencoder.summary()

print("Encoder Kernel Before 1 Epoch", encoder.kernel[0])
print("Decoder Kernel Before 1 Epoch", decoder.kernel[0][0])

autoencoder.fit(dummy_ds, epochs=1)

print("Encoder Kernel After 1 Epoch", encoder.kernel[0])
print("Decoder Kernel After 1 Epoch", decoder.kernel[0][0])

预期输出是两个内核在第一个元素中完全相同(为简单起见，只打印一个权重)

当前的输出显示 Decoder Kernel 没有像 Transposed Encoder Kernel 一样更新

2019-09-06 14:55:42.070003: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll
2019-09-06 14:55:42.984580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.733
pciBusID: 0000:01:00.0
2019-09-06 14:55:43.088109: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.        
2019-09-06 14:55:43.166145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-06 14:55:43.203865: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-09-06 14:55:43.277988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.733
pciBusID: 0000:01:00.0
2019-09-06 14:55:43.300888: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.        
2019-09-06 14:55:43.309040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-06 14:55:44.077814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-06 14:55:44.094542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-09-06 14:55:44.099411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-09-06 14:55:44.103424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4712 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 1)                 52
_________________________________________________________________
dense_tied (DenseTied)       (None, 51)                103
=================================================================
Total params: 103
Trainable params: 103
Non-trainable params: 0
_________________________________________________________________
Encoder Kernel Before 1 Epoch tf.Tensor([0.20486075], shape=(1,), dtype=float32)
Decoder Kernel Before 1 Epoch tf.Tensor(0.20486075, shape=(), dtype=float32)
1/1 [==============================] - 1s 657ms/step - loss: 0.3396 - accuracy: 0.0000e+00
Encoder Kernel After 1 Epoch tf.Tensor([0.20530733], shape=(1,), dtype=float32)
Decoder Kernel After 1 Epoch tf.Tensor(0.20486075, shape=(), dtype=float32)
PS C:\Users\whitm\Desktop\CodeProjects\ForestClassifier-DEC>

我不明白我做错了什么。

最佳答案

为了绑定(bind)权重，我建议使用 Keras functional API可以共享图层。也就是说，这是一种将编码器和解码器之间的权重联系起来的替代实现:

class TransposableDense(tf.keras.layers.Dense):

    def __init__(self, units, **kwargs):
        super().__init__(units, **kwargs)

    def build(self, input_shape):
        assert len(input_shape) >= 2
        input_dim = input_shape[-1]
        self.t_output_dim = input_dim

        self.kernel = self.add_weight(shape=(int(input_dim), self.units),
                                      initializer=self.kernel_initializer,
                                      name='kernel',
                                      regularizer=self.kernel_regularizer,
                                      constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(self.units,),
                                        initializer=self.bias_initializer,
                                        name='bias',
                                        regularizer=self.bias_regularizer,
                                        constraint=self.bias_constraint)
            self.bias_t = self.add_weight(shape=(input_dim,),
                                          initializer=self.bias_initializer,
                                          name='bias_t',
                                          regularizer=self.bias_regularizer,
                                          constraint=self.bias_constraint)
        else:
            self.bias = None
            self.bias_t = None
        # self.input_spec = tf.keras.layers.InputSpec(min_ndim=2, axes={-1: input_dim})
        self.built = True

    def call(self, inputs, transpose=False):
        bs, input_dim = inputs.get_shape()

        kernel = self.kernel
        bias = self.bias
        if transpose:
            assert input_dim == self.units
            kernel = tf.keras.backend.transpose(kernel)
            bias = self.bias_t

        output = tf.keras.backend.dot(inputs, kernel)
        if self.use_bias:
            output = tf.keras.backend.bias_add(output, bias, data_format='channels_last')
        if self.activation is not None:
            output = self.activation(output)
        return output

    def compute_output_shape(self, input_shape):
        bs, input_dim = input_shape
        output_dim = self.units
        if input_dim == self.units:
            output_dim = self.t_output_dim
        return bs, output_dim

可以通过使用 transpose=True 调用该层来转置该密集层的内核。请注意，这可能破坏一些基本的 Keras 原则(例如，该层具有多个输出形状)，但它应该适用于您的情况。

这是一个示例，展示了如何使用它来定义模型:

a = tf.keras.layers.Input((51,))
dense = TransposableDense(1, activation='linear', use_bias=True)
encoder_out = dense(a)
decoder_out = dense(encoder_out, transpose=True)
encoder = tf.keras.Model(a, encoder_out)
autoencoder = tf.keras.Model(a, decoder_out)

关于python - Keras 自动编码器 : Tying Weights from Encoder To Decoder not working，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57827274/

文章推荐： python - setuptools "eager_resources"到可执行目录

文章推荐： C# Directory.GetFiles 在不同的计算机上返回不同的结果

文章推荐： c# - 按字典顺序排序的字符串列表列表

recommendation-engine - LightFM : Weights and Sample Weights
我希望深入了解 LightFM 实现的以下权重: 样本权重什么是sample_weights在 fit方法？我读到它们是为了模拟时间衰减，但这究竟是如何工作的？解释这一点的示例或文章将非常有帮助。
netlogo - nw :weighted-path-to, nw:turtles-on-weighted-path-to 和多个同等加权的路径
如果这是一个愚蠢的问题，我预先道歉。当调用 nw:weighted-path-to 时，会返回一个链接列表，描述起始海龟和目标海龟之间的最短路径。类似地，调用 nw:turtles-on-weig
c - scanf ("%f", &weight) 和 scanf ("%f", Weight) 如何确定何时添加前缀 &
我阅读了以下代码: //talkback.c -- nosy, informative program #include #include //for strlen() prototype #d
html - font-weight 900 和 font-weight 700 的渲染方式相同
我有这个示例 html 文件: Test font bold (weight = 700) Test font bold (weight = 900) 我不知道为什么如果我将
python - 用户警告 : Discrepancy between trainable weights and collected trainable weights error
您好，我正在为我自己的数据集训练 VGG16 网络。下面给出了我使用的代码。 from keras.models import Sequential from scipy.misc import im
neural-network - 卷积神经网络中的 "Activations"、 "Activation Gradients"、 "Weights"和 "Weight Gradients"是什么？
我刚看完the notes CNN 上的斯坦福 CS231n 和现场链接 demo ;然而，我不确定演示中的“激活”、“激活梯度”、“权重”和“权重梯度”指的是什么。以下屏幕截图已从演示中复制。混
machine-learning - 最大输出神经元 : are the weights in the maxout function referring to 2 unique sets of weights?
我不明白 maxout 是如何工作的，我怀疑这是由于我对线性代数乘法的可视化造成的。基本上，我的印象是 maxout 函数有两组权重，均经过单独训练，然后只选择一组。但我怀疑这可能是错误的，因为我没有
python - 值错误 : Attempt to have a second RNNCell use the weights of a variable scope that already has weights
我运行了 Tensorflow 1.0 提供的 ptb_word_lm.py，但它显示了这条消息: ValueError: Attempt to have a second RNNCell use t
R 样本概率 : Default is equal weight; why does specifying equal weights cause different values to be returned?
我有一个关于 R 中的样本函数的简单问题。我从长度为 5 的输入向量中随机抽样 0 和 1 并将它们相加，该向量指定要运行的试验次数并设置种子以生成可重复的随机数字。种子按预期工作，但根据我在 pro
python - ValueError : You called `set_weights(weights)` on optimizer RMSprop with a weight list of length 3, 但优化器期望权重为 0
我有一个以“h5”格式保存的模型。在使用编译模型后，我正在尝试恢复训练并尝试加载优化器的权重 model.optimizer.set_weights(weights_list) 哪里weights_
python - 在状态字典 : 0. 0.weight"、 "0.1.weight"、 "0.1.bias"、 "0.1.running_mean"中遇到缺少 key
我正在尝试为我的大学项目开发电话分类器模型。我已经训练了我的模型，当我尝试通过执行 python app/server.pyserve 来部署模型时遇到了问题。我读了一篇文章( https://f
javascript - 数组和对象变量的存储和 'weight'
所以... var testArray=new Array("hello"); testArray.length=100; console.log(testArray.length); 我相信上面的内
CSS 规范 : what has more "weight"
在 css 规范中，什么会影响更多的 inline 样式或外部 !important 外部“style.css”: #di{color: green!important;} div 文本颜色是红色还
javascript - D3力导向图节点没有属性 "weight"
我正在使用 http://hughsk.github.io/colony/ 的调整后的代码，但我不断收到错误: Uncaught TypeError: Cannot read property 'we
SQL 按 "weight"对记录进行排序
我们有一个系统，它按表中的“优先级”编号处理记录。我们通过表的内容定义优先级，例如 UPDATE table SET priority=3 WHERE processed IS NULL UPDATE
tensorflow - 将生成的检查点转换为 .weights darkflow
我已经使用 Darkflow 和 yolov2.weights 进行了定制训练。 checkpoint 文件夹中有四个文件。它们是: 1.yolov2-3c-5500.data-00000-of-00
r - `weighted.mean` 在带有可选参数的函数中？
我需要将 weighted.mean 函数包含在另一个函数中，作为我正在处理的项目的一部分。我无法让 w 参数在我正在处理的函数中正常工作。为了使我的整体功能正常工作，我需要满足使权重参数必须是可选
R ggplot : Weighted CDF
我想使用 ggplot 绘制加权 CDF。一些旧的非 SO 讨论(例如 2012 年的 this)表明这是不可能的，但我想我会重新加注。例如，考虑以下数据: df <- data.frame(x=s
java - 计算对象的 'weight'，howto
这个问题已经有答案了: 已关闭11 年前。 Possible Duplicate: In Java, what is the best way to determine the size of an
r - `weighted.mean` 在带有可选参数的函数中？
我需要将 weighted.mean 函数包含在另一个函数中，作为我正在处理的项目的一部分。我无法让 w 参数在我正在处理的函数中正常工作。为了使我的整体功能正常工作，我需要满足使权重参数必须是可选

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - Keras 自动编码器 : Tying Weights from Encoder To Decoder not working