python - 修改神经网络对单个示例进行分类-6ren

python - 修改神经网络对单个示例进行分类

转载作者：太空狗更新时间：2023-10-29 21:38:39

这是我对深度学习类(class)中 Andrew NG 的神经网络之一的自定义扩展，我正在尝试为二进制分类生成 0 或 1，而不是生成 0 或 1
对多个示例进行分类。

输入和输出都是一种热编码。

在没有太多训练的情况下，我的准确度为 'train accuracy: 67.51658067499625 %'
如何对单个训练示例进行分类而不是对所有训练示例进行分类？

我认为我的实现中存在一个错误，因为该网络的一个问题是训练示例 (train_set_x) 和输出值 (train_set_y) 都需要具有相同的维度，否则会收到与矩阵维度相关的错误。
例如使用:

train_set_x = np.array([
    [1,1,1,1],[0,1,1,1],[0,0,1,1]
])

train_set_y = np.array([
    [1,1,1],[1,1,0],[1,1,1]
])

返回错误:

ValueError                                Traceback (most recent call last)
<ipython-input-11-0d356e8d66f3> in <module>()
     27 print(A)
     28 
---> 29 np.multiply(train_set_y,A)
     30 
     31 def initialize_with_zeros(numberOfTrainingExamples):

ValueError:操作数无法与形状一起广播 (3,3) (1,4)

网络代码:

import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from scipy import ndimage
import pandas as pd
%matplotlib inline

train_set_x = np.array([
    [1,1,1,1],[0,1,1,1],[0,0,1,1]
])

train_set_y = np.array([
    [1,1,1,0],[1,1,0,0],[1,1,1,1]
])

numberOfFeatures = 4
numberOfTrainingExamples = 3

def sigmoid(z):
    s = 1 / (1 + np.exp(-z))  
    return s

w = np.zeros((numberOfTrainingExamples , 1))
b = 0
A = sigmoid(np.dot(w.T , train_set_x))    
print(A)

np.multiply(train_set_y,A)

def initialize_with_zeros(numberOfTrainingExamples):
    w = np.zeros((numberOfTrainingExamples , 1))
    b = 0
    return w, b

def propagate(w, b, X, Y):

    m = X.shape[1]

    A = sigmoid(np.dot(w.T , X) + b)    

    cost = -(1/m)*np.sum(np.multiply(Y,np.log(A)) + np.multiply((1-Y),np.log(1-A)), axis=1)

    dw =  ( 1 / m ) *   np.dot( X, ( A - Y ).T )    # consumes ( A - Y )
    db =  ( 1 / m ) *   np.sum( A - Y )    # consumes ( A - Y ) again

#     cost = np.squeeze(cost)

    grads = {"dw": dw,
             "db": db}

    return grads, cost

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = True):

    costs = []

    for i in range(num_iterations):

        grads, cost = propagate(w, b, X, Y)

        dw = grads["dw"]
        db = grads["db"]

        w = w - (learning_rate * dw)
        b = b - (learning_rate * db)

        if i % 100 == 0:
            costs.append(cost)

        if print_cost and i % 10000 == 0:
            print(cost)

    params = {"w": w,
              "b": b}

    grads = {"dw": dw,
             "db": db}

    return params, grads, costs

def model(X_train, Y_train, num_iterations, learning_rate = 0.5, print_cost = False):

    w, b = initialize_with_zeros(numberOfTrainingExamples)

    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost = True)
    w = parameters["w"]
    b = parameters["b"]
    Y_prediction_train = sigmoid(np.dot(w.T , X_train) + b) 

    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))

model(train_set_x, train_set_y, num_iterations = 20000, learning_rate = 0.0001, print_cost = True)

更新:此实现中存在一个错误，即训练示例对 (train_set_x , train_set_y)必须包含相同的维度。可以指出应该如何修改线性代数的方向吗？

更新2:

我修改了@Paul Panzer 的答案，使学习率为 0.001 并且 train_set_x 、 train_set_y 对是唯一的:

train_set_x = np.array([
    [1,1,1,1,1],[0,1,1,1,1],[0,0,1,1,0],[0,0,1,0,1]
])

train_set_y = np.array([
    [1,0,0],[0,0,1],[0,1,0],[1,0,1]
])

grads = model(train_set_x, train_set_y, num_iterations = 20000, learning_rate = 0.001, print_cost = True)

# To classify single training example : 

print(sigmoid(dw @ [0,0,1,1,0] + db))

此更新产生以下输出:

-2.09657359028
-3.94918577439
[[ 0.74043089  0.32851512  0.14776077  0.77970162]
 [ 0.04810012  0.08033521  0.72846174  0.1063849 ]
 [ 0.25956911  0.67148488  0.22029838  0.85223923]]
[[1 0 0 1]
 [0 0 1 0]
 [0 1 0 1]]
train accuracy: 79.84462279013312 %
[[ 0.51309252  0.48853845  0.50945862]
 [ 0.5110232   0.48646923  0.50738869]
 [ 0.51354109  0.48898712  0.50990734]]

应 print(sigmoid(dw @ [0,0,1,1,0] + db))产生一个曾经舍入匹配的向量 train_set_y对应值: [0,1,0] ?

修改以生成一个向量(将 [0,0,1,1,0] 添加到 numpy 数组并进行转置):

print(sigmoid(dw @ np.array([[0,0,1,1,0]]).T + db))

array([[ 0.51309252],
       [ 0.48646923],
       [ 0.50990734]])

同样，将这些值四舍五入到最接近的整数会产生向量 [1,0,1]当 [0,1,0]是期待。

这些是为单个训练示例生成预测的错误操作？

最佳答案

您的困难来自不匹配的维度，所以让我们解决问题并尝试解决问题。

您的网络有许多输入和功能，让我们称其为 N_in ( numberOfFeatures 在您的代码中)。它有许多对应于不同类别的输出，我们称其为 N_out .输入和输出通过权重连接 w .

现在问题来了。连接是多对多的，所以我们需要为 N_out x N_in 中的每一个添加一个权重。成对的输出和输入。因此在您的代码中 w 的形状必须改为 (N_out, N_in) .您可能还需要一个偏移量 b对于每个输出，所以 b 应该是大小为 (N_out,) 的向量或者更确切地说 (N_out, 1)所以它适用于 2d 术语。

我已经在下面的修改后的代码中修复了这个问题，并试图让它变得非常明确。我还提出了一个模拟数据创建者的讨价还价。

关于 one-hot 编码的分类输出，我不是神经网络方面的专家，但我认为，大多数人都理解它，因此类是互斥的，因此模拟输出中的每个样本都应该有一个 1，其余的都是 0。

边注:

有一次，一个竞争性的答案建议您摆脱 1-...成本函数中的术语。虽然这对我来说似乎是一个有趣的想法，但我的直觉( 编辑现在使用无梯度最小化器确认；在下面的代码中使用 activation="hybrid"。求解器将简单地最大化所有至少在一个训练示例。)它不会像那样工作，因为成本将无法惩罚误报(详细说明见下文)。为了使它工作，你必须添加某种正则化。一种似乎有效的方法是使用 softmax而不是 sigmoid . softmax是为了一热么sigmoid是二进制。它确保输出是“模糊一热”。

因此我的建议是:

如果您想坚持使用 sigmoid并且没有明确强制执行 one-hot 预测。保留 1-...学期。

如果要使用较短的成本函数。强制执行 One-hot 预测。例如使用 softmax而不是 sigmoid .

我添加了一个 activation="sigmoid"|"softmax"|"hybrid"在模型之间切换的代码的参数。我还提供了 scipy 通用最小化器，当成本梯度不可用时，它可能很有用。

回顾成本函数的工作原理:

成本是该术语的所有类和所有训练样本的总和

-y log (y') - (1-y) log (1-y')

其中 y 是预期响应，即输入的“y”训练样本(“x”训练样本)给出的响应。 y' 是预测值，即具有当前权重和偏差的网络生成的响应。现在，因为预期响应是 0 或 1，所以可以写出单个类别和单个训练样本的成本

-log (y')   if   y = 1
-log(1-y')  if   y = 0

因为在第一种情况下 (1-y) 为零，所以第二项消失，在第二种情况下 y 为零，所以第一项消失。
现在人们可以说服自己成本很高，如果

预期响应 y 为 1，网络预测 y' 接近于零

预期响应 y 为 0，网络预测 y' 接近 1

换句话说，成本在惩罚错误预测方面发挥了作用。现在，如果我们去掉第二项 (1-y) log (1-y')这个机制的一半消失了。如果预期响应为 1，低预测仍然会产生成本，但如果预期响应为 0，成本将为零，无论预测如何，特别是高预测(或误报)将不受惩罚。

现在，因为总成本是所有训练样本的总和，所以存在三种可能性。

所有训练样本都规定类别为零:
那么成本将完全独立于该类的预测，并且无法进行学习

一些训练样本将类归零，一些归一:
然后因为“假阴性”或“未命中”仍然受到惩罚但假阳性不是网络将找到最小化成本的最简单方法，即不加选择地增加所有样本的类别预测

所有训练样本都规定该类为一:
与第二种情况基本相同，只是这里没有问题，因为这是正确的行为

最后，如果我们使用 softmax 为什么它会起作用？而不是 sigmoid ?误报仍然是不可见的。现在很容易看出 softmax 的所有类的总和是 1。因此，如果至少减少另一类以进行补偿，我只能增加对一类的预测。特别是，没有假阴性就不可能有假阳性，而成本将检测到假阴性。

关于如何获得二元预测:

对于二元预期响应，四舍五入确实是适当的程序。对于 one-hot，我宁愿找到最大值，将其设置为 1，将所有其他值设置为零。我添加了一个方便的功能， predict ，实现。

import numpy as np
from scipy import optimize as opt
from collections import namedtuple

# First, a few structures to keep ourselves organized

Problem_Size = namedtuple('Problem_Size', 'Out In Samples')
Data = namedtuple('Data', 'Out In')
Network = namedtuple('Network', 'w b activation cost gradient most_likely')

def get_dims(Out, In, transpose=False):
    """extract dimensions and ensure everything is 2d
    return Data, Dims"""
    # gracefully acccept lists etc.
    Out, In = np.asanyarray(Out), np.asanyarray(In)
    if transpose:
        Out, In = Out.T, In.T
    # if it's a single sample make sure it's n x 1
    Out = Out[:, None] if len(Out.shape) == 1 else Out
    In = In[:, None] if len(In.shape) == 1 else In
    Dims = Problem_Size(Out.shape[0], *In.shape)
    if Dims.Samples != Out.shape[1]:
        raise ValueError("number of samples must be the same for Out and In")
    return Data(Out, In), Dims


def sigmoid(z):
    s = 1 / (1 + np.exp(-z))  
    return s

def sig_cost(Net, data):
    A = process(data.In, Net)
    logA = np.log(A)
    return -(data.Out * logA + (1-data.Out) * (1-logA)).sum(axis=0).mean()

def sig_grad (Net, Dims, data):
    A = process(data.In, Net)
    return dict(dw =  (A - data.Out) @ data.In.T / Dims.Samples,
                db =  (A - data.Out).mean(axis=1, keepdims=True))

def sig_ml(z):
    return np.round(z).astype(int)

def sof_ml(z):
    hot = np.argmax(z, axis=0)
    z = np.zeros(z.shape, dtype=int)
    z[hot, np.arange(len(hot))] = 1
    return z

def softmax(z):
    z = z - z.max(axis=0, keepdims=True)
    z = np.exp(z)
    return z / z.sum(axis=0, keepdims=True)

def sof_cost(Net, data):
    A = process(data.In, Net)
    logA = np.log(A)
    return -(data.Out * logA).sum(axis=0).mean()

sof_grad = sig_grad

def get_net(Dims, activation='softmax'):
    activation, cost, gradient, ml = {
        'sigmoid': (sigmoid, sig_cost, sig_grad, sig_ml),
        'softmax': (softmax, sof_cost, sof_grad, sof_ml),
        'hybrid': (sigmoid, sof_cost, None, sig_ml)}[activation]
    return Network(w=np.zeros((Dims.Out, Dims.In)),
                   b=np.zeros((Dims.Out, 1)),
                   activation=activation, cost=cost, gradient=gradient,
                   most_likely=ml)

def process(In, Net):
    return Net.activation(Net.w @ In + Net.b)

def propagate(data, Dims, Net):
    return Net.gradient(Net, Dims, data), Net.cost(Net, data)

def optimize_no_grad(Net, Dims, data):
    def f(x):
        Net.w[...] = x[:Net.w.size].reshape(Net.w.shape)
        Net.b[...] = x[Net.w.size:].reshape(Net.b.shape)
        return Net.cost(Net, data)
    x = np.r_[Net.w.ravel(), Net.b.ravel()]
    res = opt.minimize(f, x, options=dict(maxiter=10000)).x
    Net.w[...] = res[:Net.w.size].reshape(Net.w.shape)
    Net.b[...] = res[Net.w.size:].reshape(Net.b.shape)

def optimize(Net, Dims, data, num_iterations, learning_rate, print_cost = True):

    w, b = Net.w, Net.b
    costs = []

    for i in range(num_iterations):

        grads, cost = propagate(data, Dims, Net)

        dw = grads["dw"]
        db = grads["db"]

        w -= learning_rate * dw
        b -= learning_rate * db

        if i % 100 == 0:
            costs.append(cost)

        if print_cost and i % 10000 == 0:
            print(cost)

    return grads, costs

def model(X_train, Y_train, num_iterations, learning_rate = 0.5, print_cost = False, activation='sigmoid'):

    data, Dims = get_dims(Y_train, X_train, transpose=True)
    Net = get_net(Dims, activation)

    if Net.gradient is None:
        optimize_no_grad(Net, Dims, data)
    else:
        grads, costs = optimize(Net, Dims, data, num_iterations, learning_rate, print_cost = True)

    Y_prediction_train = process(data.In, Net)

    print(Y_prediction_train)
    print(data.Out)
    print(Y_prediction_train.sum(axis=0))
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - data.Out)) * 100))
    return Net

def predict(In, Net, probability=False):
    In = np.asanyarray(In)
    is1d = In.ndim == 1
    if is1d:
        In = In.reshape(-1, 1)
    Out = process(In, Net)
    if not probability:
        Out = Net.most_likely(Out)
    if is1d:
        Out = Out.reshape(-1)
    return Out

def create_data(Dims):
    Out = np.zeros((Dims.Out, Dims.Samples), dtype=int)
    Out[np.random.randint(0, Dims.Out, (Dims.Samples,)), np.arange(Dims.Samples)] = 1
    In = np.random.randint(0, 2, (Dims.In, Dims.Samples))
    return Data(Out, In)

train_set_x = np.array([
    [1,1,1,1,1],[0,1,1,1,1],[0,0,1,1,0],[0,0,1,0,1]
])

train_set_y = np.array([
    [1,0,0],[1,0,0],[0,0,1],[0,0,1]
])

Net1 = model(train_set_x, train_set_y, num_iterations = 20000, learning_rate = 0.001, print_cost = True, activation='sigmoid')

Net2 = model(train_set_x, train_set_y, num_iterations = 20000, learning_rate = 0.001, print_cost = True, activation='softmax')

Net3 = model(train_set_x, train_set_y, num_iterations = 20000, learning_rate = 0.001, print_cost = True, activation='hybrid')

Dims = Problem_Size(8, 100, 50)
data = create_data(Dims)
model(data.In.T, data.Out.T, num_iterations = 40000, learning_rate = 0.001, print_cost = True, activation='softmax') 
model(data.In.T, data.Out.T, num_iterations = 40000, learning_rate = 0.001, print_cost = True, activation='sigmoid')

关于python - 修改神经网络对单个示例进行分类，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47308181/

文章推荐： python - 动量反向传播

文章推荐： c# - 如何将日期字符串解析为 NodaTime 对象？

文章推荐： c# - 具有多个实体的存储库创建 DTO

文章推荐： python - 在 Pandas 的多列上应用自定义函数

IPv6 示例 Wireshark 示例
这个问题在这里已经有了答案: 关闭 11 年前。 Possible Duplicate: Sample data for IPv6? 除了 wireshark 在其网站上提供的内容之外，是否有可以下
c# - WPF 中的多拖放——示例/示例/教程？
我正在寻找可以集成到现有应用程序中并使用多拖放功能的示例或任何现成的解决方案。我在互联网上找到的大多数解决方案在将多个项目从 ListBox 等控件拖放到另一个 ListBox 时效果不佳。谁能指出我
java - GATE Embedded 示例示例 NoClassFound 错误
我是 GATE Embedded 的新手，我尝试了简单的示例并得到了 NoClassDefFoundError。首先我会解释我尝试了什么在 D:\project\gate-7.0 中下载并提取 Ga
eclipse-rcp - Eclipse 中的 JFace 示例，如 SWT 示例？
是否有像 Eclipse 中的 SWT 示例那样的多合一 JFace 控件示例？搜索(在 stackoverflow.com 上使用谷歌搜索和搜索)对我没有帮助。如果它是一个独立的应用程序或 ecl
google-compute-engine - Google 计算引擎 .NET API 示例/示例/教程
我找不到任何可以清楚地解释如何通过 .net API(特别是 c#)使用谷歌计算引擎的内容。有没有人可以指点我什么？附言我知道 API 引用 ( https://developers.google.
基于Basicauth的一个C#示例
最近在做公司的一个项目时，客户需要我们定时获取他们矩阵系统的数据。在与客户进行对接时，提到他们的接口使用的目前不常用的BASIC 认证。天呢，它好不安全，容易被不法人监听，咋还在使用呀。但是没办法呀，
基于Basicauth的一个C#示例
最近在做公司的一个项目时，客户需要我们定时获取他们矩阵系统的数据。在与客户进行对接时，提到他们的接口使用的目前不常用的BASIC 认证。天呢，它好不安全，容易被不法人监听，咋还在使用呀。但是没办法呀，
YAML 示例
我正在尝试为我的应用程序设计配置文件格式并选择了 YAML。但是，这(显然)意味着我需要能够定义、解析和验证正确的 YAML 语法! 在配置文件中，必须有一个名为 widgets 的集合/序列。 .这
python - 示例
你能给我一个使用 pysmb 库连接到一些 samba 服务器的例子吗？我读过有类 smb.SMBConnection.SMBConnection(用户名、密码、my_name、remote_name
示例：iptables限制ssh链接服务器
linux服务器默认通过22端口用ssh协议登录，这种不安全。今天想做限制，即允许部分来源ip连接服务器。案例目标：通过iptables规则限制对linux服务器的登录。处理方法：编
Sonarqube PostProjectAnalysisTask 示例？
我一直在寻找任何 PostProjectAnalysisTask 工作代码示例，但没有看。 This页面指出 HipChat plugin使用这个钩子(Hook)，但在我看来它仍然使用遗留的 Po
GWT CustomScrollPanel 示例
我发现了 GWT 的 CustomScrollPanel 以及如何自定义滚动条，但我找不到任何示例或如何设置它。是否有任何示例显示正在使用的自定义滚动条？最佳答案这是自定义 native 滚动条的
Marionette CRUD 示例
我正在尝试开发一个 Backbone Marionette 应用程序，我需要知道如何以最佳方式执行 CRUD(创建、读取、更新和销毁)操作。我找不到任何解释这一点的资源(仅适用于 Backbone)。
Android BLE 示例
关闭。这个问题需要details or clarity .它目前不接受答案。想改进这个问题？通过 editing this post 添加详细信息并澄清问题. 去年关闭。 Improve this
将多个实例提交到数据库的表单的 Django 示例？
我需要一个提交多个单独请求的 django 表单，如果没有大量定制，我找不到如何做到这一点的示例。即，假设有一个汽车维修店使用的表格。该表格将列出商店能够进行的所有可能的维修，并且用户将选择他们想要进
spring - MultiTenantSpringLiquibase 示例。
我有一个 Multi-Tenancy 应用程序。然而，这个相同的应用程序有 liquibase。我需要在我的所有数据源中运行 liquibase，但是我不能使用这个 Bean。我的应用程序.yml
业务应用程序的 TDD 示例
我了解有关单元测试的一般思想，并已在系统中发生复杂交互的场景中使用它，但我仍然对所有这些原则结合在一起有疑问。我们被警告不要测试框架或数据库。好的 UI 设计不适合非人工测试。 MVC 框架不包括一
Clojure For Comprehension 示例
我正在使用 docjure并且它的 select-columns 函数需要一个列映射。我想获取所有列而无需手动指定。如何将以下内容生成为惰性无限向量序列 [:A :B :C :D :E ... :A
yii - findByAttributes 示例
$condition使用说明和 $param在 findByAttributes在 Yii 在大多数情况下，这就是我使用 findByAttributes 的方式 Person::model()->f
未启用 qtcreator 示例
我在 Ubuntu 11.10 上安装了 qtcreator sudo apt-get install qtcreator 安装的版本有:QT Creator 2.2.1、QT 4.7.3 当我启动

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 修改神经网络对单个示例进行分类