python - CNN、GAN，生成器如何知道它应该绘制哪个类？-6ren

python - CNN、GAN，生成器如何知道它应该绘制哪个类？

转载作者：行者123 更新时间：2023-12-01 02:25:21

我有一个 GAN 网络。生成器正在绘制 mnist 数字。效果很好。但我不明白它是如何知道应该绘制哪个数字的。这是生成器:

def build_generator(latent_size):
    # we will map a pair of (z, L), where z is a latent vector and L is a
    # label drawn from P_c, to image space (..., 1, 28, 28)
    cnn = Sequential()

    cnn.add(Dense(1024, input_dim=latent_size, activation='relu'))
    cnn.add(Dense(128 * 7 * 7, activation='relu'))
    cnn.add(Reshape((128, 7, 7)))

    # upsample to (..., 14, 14)
    cnn.add(UpSampling2D(size=(2, 2)))
    cnn.add(Conv2D(256, 5, padding='same',
                   activation='relu',
                   kernel_initializer='glorot_normal'))

    # upsample to (..., 28, 28)
    cnn.add(UpSampling2D(size=(2, 2)))
    cnn.add(Conv2D(128, 5, padding='same',
                   activation='relu',
                   kernel_initializer='glorot_normal'))

    # take a channel axis reduction
    cnn.add(Conv2D(1, 2, padding='same',
                   activation='tanh',
                   kernel_initializer='glorot_normal'))

    # this is the z space commonly refered to in GAN papers
    latent = Input(shape=(latent_size, ))

    # this will be our label
    image_class = Input(shape=(1,), dtype='int32')

    cls = Flatten()(Embedding(num_classes, latent_size,
                              embeddings_initializer='glorot_normal')(image_class))

    # hadamard product between z-space and a class conditional embedding
    h = layers.multiply([latent, cls])

    fake_image = cnn(h)

    return Model([latent, image_class], fake_image)

输入是一个潜在数组

noise = np.random.uniform(-1, 1, (batch_size, latent_size))

标签是随机生成的。

所以我的问题是。网络嵌入标签后。它们应该看起来像这样

Embedding Labels

所以，现在。如果我给网络更多的潜在数组和标签。他将潜在数组(噪声)与(标签的)嵌入相乘:所以我的期望是:

What I expect

所以网络知道，什么新数组代表什么数字。

但是 np.multiply(noise,embedded_label) 的输出是这样的:

What is Reality

那么网络如何知道应该绘制什么数字？

编辑:

这是完整的代码。它有效。但为什么？代码中的latent_size是100。我的图片中的latent_size是2，因为我想将它们可视化。但我认为，如果我将 2 维空间或 100 维空间中的噪声相乘，这不会改变任何事情。最后，带有标签“1”的新点与带有标签“1”的其他点不接近。其他数字相同(“0”，“1”，“2”，“3”，...)

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Train an Auxiliary Classifier Generative Adversarial Network (ACGAN) on the
MNIST dataset. See https://arxiv.org/abs/1610.09585 for more details.

You should start to see reasonable images after ~5 epochs, and good images
by ~15 epochs. You should use a GPU, as the convolution-heavy operations are
very slow on the CPU. Prefer the TensorFlow backend if you plan on iterating,
as the compilation time can be a blocker using Theano.

Timings:

Hardware           | Backend | Time / Epoch
-------------------------------------------
 CPU               | TF      | 3 hrs
 Titan X (maxwell) | TF      | 4 min
 Titan X (maxwell) | TH      | 7 min

Consult https://github.com/lukedeo/keras-acgan for more information and
example output
"""
from __future__ import print_function

from collections import defaultdict
try:
    import cPickle as pickle
except ImportError:
    import pickle
from PIL import Image

from six.moves import range

import keras.backend as K
from keras.datasets import mnist
from keras import layers
from keras.layers import Input, Dense, Reshape, Flatten, Embedding, Dropout
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam
from keras.utils.generic_utils import Progbar
import numpy as np
import time, os
np.random.seed(1337)

K.set_image_data_format('channels_first')

num_classes = 10


def build_generator(latent_size):
    # we will map a pair of (z, L), where z is a latent vector and L is a
    # label drawn from P_c, to image space (..., 1, 28, 28)
    cnn = Sequential()

    cnn.add(Dense(1024, input_dim=latent_size, activation='relu'))
    cnn.add(Dense(128 * 7 * 7, activation='relu'))
    cnn.add(Reshape((128, 7, 7)))

    # upsample to (..., 14, 14)
    cnn.add(UpSampling2D(size=(2, 2)))
    cnn.add(Conv2D(256, 5, padding='same',
                   activation='relu',
                   kernel_initializer='glorot_normal'))

    # upsample to (..., 28, 28)
    cnn.add(UpSampling2D(size=(2, 2)))
    cnn.add(Conv2D(128, 5, padding='same',
                   activation='relu',
                   kernel_initializer='glorot_normal'))

    # take a channel axis reduction
    cnn.add(Conv2D(1, 2, padding='same',
                   activation='tanh',
                   kernel_initializer='glorot_normal'))

    # this is the z space commonly refered to in GAN papers
    latent = Input(shape=(latent_size, ))

    # this will be our label
    image_class = Input(shape=(1,), dtype='int32')

    cls = Flatten()(Embedding(num_classes, latent_size,
                              embeddings_initializer='glorot_normal')(image_class))

    # hadamard product between z-space and a class conditional embedding
    h = layers.multiply([latent, cls])

    fake_image = cnn(h)

    return Model([latent, image_class], fake_image)


def build_discriminator():
    # build a relatively standard conv net, with LeakyReLUs as suggested in
    # the reference paper
    cnn = Sequential()

    cnn.add(Conv2D(32, 3, padding='same', strides=2,
                   input_shape=(1, 28, 28)))
    cnn.add(LeakyReLU())
    cnn.add(Dropout(0.3))

    cnn.add(Conv2D(64, 3, padding='same', strides=1))
    cnn.add(LeakyReLU())
    cnn.add(Dropout(0.3))

    cnn.add(Conv2D(128, 3, padding='same', strides=2))
    cnn.add(LeakyReLU())
    cnn.add(Dropout(0.3))

    cnn.add(Conv2D(256, 3, padding='same', strides=1))
    cnn.add(LeakyReLU())
    cnn.add(Dropout(0.3))

    cnn.add(Flatten())

    image = Input(shape=(1, 28, 28))

    features = cnn(image)

    # first output (name=generation) is whether or not the discriminator
    # thinks the image that is being shown is fake, and the second output
    # (name=auxiliary) is the class that the discriminator thinks the image
    # belongs to.
    fake = Dense(1, activation='sigmoid', name='generation')(features) # fake oder nicht fake
    aux = Dense(num_classes, activation='softmax', name='auxiliary')(features) #welche klasse ist es

    return Model(image, [fake, aux])

if __name__ == '__main__':
    start_time_string = time.strftime("%Y_%m_%d_%H_%M_%S", time.gmtime())
    os.mkdir('history/' + start_time_string)
    os.mkdir('images/' + start_time_string)
    os.mkdir('acgan/' + start_time_string)
    # batch and latent size taken from the paper
    epochs = 50
    batch_size = 100
    latent_size = 100

    # Adam parameters suggested in https://arxiv.org/abs/1511.06434
    adam_lr = 0.00005
    adam_beta_1 = 0.5

    # build the discriminator
    discriminator = build_discriminator()
    discriminator.compile(
        optimizer=Adam(lr=adam_lr, beta_1=adam_beta_1),
        loss=['binary_crossentropy', 'sparse_categorical_crossentropy']
    )

    # build the generator
    generator = build_generator(latent_size)
    generator.compile(optimizer=Adam(lr=adam_lr, beta_1=adam_beta_1),
                      loss='binary_crossentropy')

    latent = Input(shape=(latent_size, ))
    image_class = Input(shape=(1,), dtype='int32')

    # get a fake image
    fake = generator([latent, image_class])

    # we only want to be able to train generation for the combined model
    discriminator.trainable = False
    fake, aux = discriminator(fake)
    combined = Model([latent, image_class], [fake, aux])

    combined.compile(
        optimizer=Adam(lr=adam_lr, beta_1=adam_beta_1),
        loss=['binary_crossentropy', 'sparse_categorical_crossentropy']
    )

    # get our mnist data, and force it to be of shape (..., 1, 28, 28) with
    # range [-1, 1]
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train = (x_train.astype(np.float32) - 127.5) / 127.5
    x_train = np.expand_dims(x_train, axis=1)

    x_test = (x_test.astype(np.float32) - 127.5) / 127.5
    x_test = np.expand_dims(x_test, axis=1)

    num_train, num_test = x_train.shape[0], x_test.shape[0]

    train_history = defaultdict(list)
    test_history = defaultdict(list)

    for epoch in range(1, epochs + 1):
        print('Epoch {}/{}'.format(epoch, epochs))

        num_batches = int(x_train.shape[0] / batch_size)
        progress_bar = Progbar(target=num_batches)

        epoch_gen_loss = []
        epoch_disc_loss = []

        for index in range(num_batches):
            # generate a new batch of noise
            noise = np.random.uniform(-1, 1, (batch_size, latent_size))

            # get a batch of real images
            image_batch = x_train[index * batch_size:(index + 1) * batch_size]
            label_batch = y_train[index * batch_size:(index + 1) * batch_size]

            # sample some labels from p_c
            sampled_labels = np.random.randint(0, num_classes, batch_size)

            # generate a batch of fake images, using the generated labels as a
            # conditioner. We reshape the sampled labels to be
            # (batch_size, 1) so that we can feed them into the embedding
            # layer as a length one sequence
            generated_images = generator.predict(
                [noise, sampled_labels.reshape((-1, 1))], verbose=0)

            x = np.concatenate((image_batch, generated_images))
            y = np.array([1] * batch_size + [0] * batch_size)
            aux_y = np.concatenate((label_batch, sampled_labels), axis=0)

            # see if the discriminator can figure itself out...
            epoch_disc_loss.append(discriminator.train_on_batch(x, [y, aux_y]))

            # make new noise. we generate 2 * batch size here such that we have
            # the generator optimize over an identical number of images as the
            # discriminator
            noise = np.random.uniform(-1, 1, (2 * batch_size, latent_size))
            sampled_labels = np.random.randint(0, num_classes, 2 * batch_size)

            # we want to train the generator to trick the discriminator
            # For the generator, we want all the {fake, not-fake} labels to say
            # not-fake
            trick = np.ones(2 * batch_size)

            epoch_gen_loss.append(combined.train_on_batch(
                [noise, sampled_labels.reshape((-1, 1))],
                [trick, sampled_labels]))

            progress_bar.update(index + 1)

        print('Testing for epoch {}:'.format(epoch))

        # evaluate the testing loss here

        # generate a new batch of noise
        noise = np.random.uniform(-1, 1, (num_test, latent_size))

        # sample some labels from p_c and generate images from them
        sampled_labels = np.random.randint(0, num_classes, num_test)
        generated_images = generator.predict(
            [noise, sampled_labels.reshape((-1, 1))], verbose=False)

        x = np.concatenate((x_test, generated_images))
        y = np.array([1] * num_test + [0] * num_test)
        aux_y = np.concatenate((y_test, sampled_labels), axis=0)

        # see if the discriminator can figure itself out...
        discriminator_test_loss = discriminator.evaluate(
            x, [y, aux_y], verbose=False)

        discriminator_train_loss = np.mean(np.array(epoch_disc_loss), axis=0)

        # make new noise
        noise = np.random.uniform(-1, 1, (2 * num_test, latent_size))
        sampled_labels = np.random.randint(0, num_classes, 2 * num_test)

        trick = np.ones(2 * num_test)

        generator_test_loss = combined.evaluate(
            [noise, sampled_labels.reshape((-1, 1))],
            [trick, sampled_labels], verbose=False)

        generator_train_loss = np.mean(np.array(epoch_gen_loss), axis=0)

        # generate an epoch report on performance
        train_history['generator'].append(generator_train_loss)
        train_history['discriminator'].append(discriminator_train_loss)

        test_history['generator'].append(generator_test_loss)
        test_history['discriminator'].append(discriminator_test_loss)

        print('{0:<22s} | {1:4s} | {2:15s} | {3:5s}'.format(
            'component', *discriminator.metrics_names))
        print('-' * 65)

        ROW_FMT = '{0:<22s} | {1:<4.2f} | {2:<15.2f} | {3:<5.2f}'
        print(ROW_FMT.format('generator (train)',
                             *train_history['generator'][-1]))
        print(ROW_FMT.format('generator (test)',
                             *test_history['generator'][-1]))
        print(ROW_FMT.format('discriminator (train)',
                             *train_history['discriminator'][-1]))
        print(ROW_FMT.format('discriminator (test)',
                             *test_history['discriminator'][-1]))

        # save weights every epoch
        generator.save_weights(
            'acgan/'+ start_time_string +'/params_generator_epoch_{0:03d}.hdf5'.format(epoch), True)
        discriminator.save_weights(
            'acgan/'+ start_time_string +'/params_discriminator_epoch_{0:03d}.hdf5'.format(epoch), True)

        # generate some digits to display
        noise = np.random.uniform(-1, 1, (100, latent_size))

        sampled_labels = np.array([
            [i] * num_classes for i in range(num_classes)
        ]).reshape(-1, 1)

        # get a batch to display
        generated_images = generator.predict(
            [noise, sampled_labels], verbose=0)

        # arrange them into a grid
        img = (np.concatenate([r.reshape(-1, 28)
                               for r in np.split(generated_images, num_classes)
                               ], axis=-1) * 127.5 + 127.5).astype(np.uint8)

        Image.fromarray(img).save(
            'images/'+ start_time_string +'/plot_epoch_{0:03d}_generated.png'.format(epoch))

    pickle.dump({'train': train_history, 'test': test_history},
                open('history/'+ start_time_string +'/acgan-history.pkl', 'wb'))

最佳答案

您的噪音太大，并且具有负值。

您不应该将噪声相乘，而是将其相加(并使其小很多)。通过+1和-1相乘，你可以完全改变输入。这就是 reality 中出现完全分散的图像的原因。 .

如果即使使用奇怪的分散输入，模型仍然能够识别您想要的数字，那么它可能使用的潜在向量的某些维度超过其实际值。

如果仔细观察散点图，它有一些有趣的模式，例如:

0 - 垂直线。它仅使用某个维度为零。
4 - 另一条垂直线。
7 - 一条水平线。
3 - 似乎是对角线，不确定。

如果我们可以看到一个模式(即使在隐藏实际 100 个维度的 2D 图中)，模型也可以看到一个模式。如果我们能看到所有 100 个维度，这种模式可能会非常明显。

因此，您的嵌入可能会通过消除某些维度组中为零的随机因素来对狂野的随机因素进行补偿。这使得直线遵循特定的轴。零维度与不同维度的某些组合可以识别标签。

示例:

对于标签 0，您的嵌入可能会创建 [0,0,0,0,1,1,1,1,1,1,1,1,...]
对于标签 1，它可能会创建 [1,1,1,1,0,0,0,0,1,1,1,1,1....]
对于标签 2，它可能正在创建 [1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1...]<

那么随机因子将永远不会改变这些零，并且模型可以通过检查示例中的四个零组成的组来识别数字。

当然，这只是一种假设...模型可能有许多其他可能的方法来解决随机因素...但如果存在一种，就足以表明模型可以找到它。

关于python - CNN、GAN，生成器如何知道它应该绘制哪个类？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47477371/

文章推荐： javascript - localStorage导致页面无限刷新？

文章推荐： linker - Luajit:将字节码编译成对象格式

文章推荐： jquery - 阻止 fadeToggle() 建立

elasticsearch - 应该+ ElasticSearch中的distance_function
我正在尝试在Elasticsearch中返回的值中考虑地理位置的接近性。我希望近距离比某些字段(例如legal_name)重要，但比其他字段重要。从文档看来，当前的方法是使用distance_fea
php - 在Elasticsearch中处理必须/应该
我是Elasticsearch的初学者，今天在进行“多与或”查询时遇到问题。我有一个SQL查询，需要在Elastic中进行转换: WHERE host_id = 999 AND psh_pid =
c++ - 应该/可以在函数中通过引用传递智能指针
智能指针应该/可以在函数中通过引用传递吗？即: void foo(const std::weak_ptr& x) 最佳答案当然你可以通过const&传递一个智能指针。这样做也是有原因的: 如果接
elasticsearch - '应该' bool 查询获取不需要的结果
我想执行与以下MYSQL查询等效的查询 SELECT http_user, http_req_method, dst dst_port count(*) as total FROM my_table
Elasticsearch:应该 + minimum_should_match 与必须
我用这两个查询进行测试用must查询 { "size": 200, "from": 0, "query": { "bool": { "must": [ { "mat
android - 我如何(应该)将处理程序添加到服务中的线程
我仍在研究 Pro Android 2 的简短服务示例(第 304 页)同样，服务示例由两个类组成:如下所示的 BackgroundService.java 和如下所示的 MainActivity.j
html - 当引入水平滚动时，*应该*如何呈现此内容？
给定标记 like this : header really_wide_table..........................................
javascript - ChaiJS 应该 - 测试空字符串
根据 shouldJS 上的文档网站我应该能够做到这一点: ''.should.be.empty(); ChaiJS网站没有使用 should 语法的示例，但它列出了 expect 并且上面的示例似乎
c - 必须(应该)避免使用标准库中的哪些函数？
我在 Stack Overflow 上读到一些 C 函数是“过时的”或“应该避免”。你能给我一些这种功能的例子以及原因吗？这些功能有哪些替代方案？我们可以安全地使用它们 - 有什么好的做法吗？最
c++11 - 省略号可以/应该/将适用于元组吗？
在 C++11 中，可变参数模板允许使用任意数量的参数和省略号运算符 ... 调用函数。允许该可变参数函数对每个参数做一些事情，即使每个参数的事情不是一样的: template void dummy(
ruby-on-rails - 应该:测试validates_presence_of:on =>:update
我在我从事的项目之一上将Shoulda与Test::Unit结合使用。我遇到的问题是我最近更改了此设置: class MyModel :update end 以前，我的(通过)测试看起来像这样: c
chai - 如何在 chai 中做一个 "or"应该
我该如何做 or使用 chai.should 进行测试? 例如就像是 total.should.equal(4).or.equal(5) 或者 total.should.equal.any(4,5)
Mercurial - .hgtags 应该 merge 吗？
如果您要将存储库 B 中的更改 merge 到存储库 A 中，是否应该 merge .hgtags 中的更改？存储库 B 可能具有 A 中没有的标签 1.01、1.02、1.03。为什么要将这些 m
elasticsearch - 带有Must(and)应该(或)不产生期望结果的Elasticsearch查询
我正在尝试执行X AND(y OR z)的查询我需要获得该代理为上市代理或卖方的所有已售属性(property)。我只用 bool(boolean) 值就可以得到9324个结果。当我添加 bool
javascript - Mocha/应该 'undefined is not a function'
我要离开 this教程，尝试使用 Mocha、Supertest 和 Should.js 进行测试。我有以下基本测试来通过 PUT 创建用户接受 header 中数据的端点。 describe('U
java - JUnit:可以(应该)这样做吗？
我正在尝试为 Web 应用程序编写一些 UI 测试，但有一些复杂的问题希望您能帮助我解决。首先，该应用程序有两种模式。其中一种模式是“训练”，另一种是“现场”。在实时模式下，数据直接从我们的数据库中
ruby-on-rails - 应该 helper 不工作
我有一个规范: require 'spec_helper' # hmm... I need to include it here because if I include it inside desc
ruby-on-rails - 行动有效，但测试无效(应该)
我正在尝试用这个测试我在 Rails 中的更新操作: context "on PUT to :update" do setup do @countdown = Factory(:count
html - 应该 &'s be escaped in onclick="...”？
我还没有找到合适的答案: onclick="..." 中是否应该转义 &(& 符号)？ (或者就此而言，在每个 HTML 属性中？) 我已经尝试在 jsFiddle 和 W3C 的验证器上运行转义和非
java - 应该 move 球的程序，但不执行方法运行
import java.applet.*; import java.awt.*; import java.awt.event.*; public class Main extends Applet i

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - CNN、GAN，生成器如何知道它应该绘制哪个类？

编辑: