Why does the cost in my implementation of a deep neural network increase after a few iterations?(为什么我实施深度神经网络的成本在几次迭代后会增加？)-6ren

Why does the cost in my implementation of a deep neural network increase after a few iterations?(为什么我实施深度神经网络的成本在几次迭代后会增加？)

转载作者：bug小助手更新时间：2023-10-28 13:40:23

I am a beginner in machine learning and neural networks. Recently, after watching Andrew Ng's lectures on deep learning, I tried to implement a binary classifier using deep neural networks on my own.

However, the cost of the function is expected to decrease after each iteration.
In my program, it decreases slightly in the beginning, but rapidly increases later. I tried to make changes in learning rate and number of iterations, but to no avail. I am very confused.

Here is my code

我是机器学习和神经网络的初学者。最近，在听了吴志强关于深度学习的讲座后，我尝试自己实现了一个使用深度神经网络的二进制分类器。然而，该函数的成本预计在每次迭代后都会降低。在我的程序中，它在开始时略有减少，但后来迅速增加。我试图改变学习速度和迭代次数，但无济于事。我很困惑。以下是我的代码

1. Neural network classifier class

1.神经网络分类器类

class NeuralNetwork:
    def __init__(self, X, Y, dimensions, alpha=1.2, iter=3000):
        self.X = X
        self.Y = Y
        self.dimensions = dimensions    # Including input layer and output layer. Let example be dimensions=4
        self.alpha = alpha  # Learning rate
        self.iter = iter    # Number of iterations
        self.length = len(self.dimensions)-1
        self.params = {}    # To store parameters W and b for each layer
        self.cache = {}     # To store cache Z and A for each layer
        self.grads = {}     # To store dA, dZ, dW, db
        self.cost = 1       # Initial value does not matter

    def initialize(self):
        np.random.seed(3)
        # If dimensions is 4, then layer 0 and 3 are input and output layers
        # So we only need to initialize w1, w2 and w3
        # There is no need of w0 for input layer
        for l in range(1, len(self.dimensions)):
            self.params['W'+str(l)] = np.random.randn(self.dimensions[l], self.dimensions[l-1])*0.01
            self.params['b'+str(l)] = np.zeros((self.dimensions[l], 1))

    def forward_propagation(self):
        self.cache['A0'] = self.X
        # For last layer, ie, the output layer 3, we need to activate using sigmoid
        # For layer 1 and 2, we need to use relu
        for l in range(1, len(self.dimensions)-1):
            self.cache['Z'+str(l)] = np.dot(self.params['W'+str(l)], self.cache['A'+str(l-1)]) + self.params['b'+str(l)]
            self.cache['A'+str(l)] = relu(self.cache['Z'+str(l)])
        l = len(self.dimensions)-1
        self.cache['Z'+str(l)] = np.dot(self.params['W'+str(l)], self.cache['A'+str(l-1)]) + self.params['b'+str(l)]
        self.cache['A'+str(l)] = sigmoid(self.cache['Z'+str(l)])

    def compute_cost(self):
        m = self.Y.shape[1]
        A = self.cache['A'+str(len(self.dimensions)-1)]
        self.cost = -1/m*np.sum(np.multiply(self.Y, np.log(A)) + np.multiply(1-self.Y, np.log(1-A)))
        self.cost = np.squeeze(self.cost)

    def backward_propagation(self):
        A = self.cache['A' + str(len(self.dimensions) - 1)]
        m = self.X.shape[1]
        self.grads['dA'+str(len(self.dimensions)-1)] = -(np.divide(self.Y, A) - np.divide(1-self.Y, 1-A))
        # Sigmoid derivative for final layer
        l = len(self.dimensions)-1
        self.grads['dZ' + str(l)] = self.grads['dA' + str(l)] * sigmoid_prime(self.cache['Z' + str(l)])
        self.grads['dW' + str(l)] = 1 / m * np.dot(self.grads['dZ' + str(l)], self.cache['A' + str(l - 1)].T)
        self.grads['db' + str(l)] = 1 / m * np.sum(self.grads['dZ' + str(l)], axis=1, keepdims=True)
        self.grads['dA' + str(l - 1)] = np.dot(self.params['W' + str(l)].T, self.grads['dZ' + str(l)])
        # Relu derivative for previous layers
        for l in range(len(self.dimensions)-2, 0, -1):
            self.grads['dZ'+str(l)] = self.grads['dA'+str(l)] * relu_prime(self.cache['Z'+str(l)])
            self.grads['dW'+str(l)] = 1/m*np.dot(self.grads['dZ'+str(l)], self.cache['A'+str(l-1)].T)
            self.grads['db'+str(l)] = 1/m*np.sum(self.grads['dZ'+str(l)], axis=1, keepdims=True)
            self.grads['dA'+str(l-1)] = np.dot(self.params['W'+str(l)].T, self.grads['dZ'+str(l)])

    def update_parameters(self):
        for l in range(1, len(self.dimensions)):
            self.params['W'+str(l)] = self.params['W'+str(l)] - self.alpha*self.grads['dW'+str(l)]
            self.params['b'+str(l)] = self.params['b'+str(l)] - self.alpha*self.grads['db'+str(l)]

    def train(self):
        np.random.seed(1)
        self.initialize()
        for i in range(self.iter):
            #print(self.params)
            self.forward_propagation()
            self.compute_cost()
            self.backward_propagation()
            self.update_parameters()
            if i % 100 == 0:
                print('Cost after {} iterations is {}'.format(i, self.cost))

2. Testing code for odd or even number classifier

2.奇、偶数分类器测试代码

import numpy as np
from main import NeuralNetwork
X = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
Y = np.array([[1, 0, 1, 0, 1, 0, 1, 0, 1, 0]])
clf = NeuralNetwork(X, Y, [1, 1, 1], alpha=0.003, iter=7000)
clf.train()

3. Helper Code

3.帮助器代码

import math
import numpy as np

def sigmoid_scalar(x):
    return 1/(1+math.exp(-x))
def sigmoid_prime_scalar(x):
    return sigmoid_scalar(x)*(1-sigmoid_scalar(x))
def relu_scalar(x):
    if x > 0:
        return x
    else:
        return 0
def relu_prime_scalar(x):
    if x > 0:
        return 1
    else:
        return 0
sigmoid = np.vectorize(sigmoid_scalar)
sigmoid_prime = np.vectorize(sigmoid_prime_scalar)
relu = np.vectorize(relu_scalar)
relu_prime = np.vectorize(relu_prime_scalar)

Output

输出

更多回答

Two things come to mind: 1. Have you tried a lower learning rate (1E-5) 2. Have you tried scaling your input? Maybe something as simple as X = X / 10 might be enough for your use case.

脑海中浮现出两件事：1.你尝试过较低的学习率(1E-5)2.你尝试过调整你的输入比例吗？也许像X=X/10这样简单的东西对您的用例来说就足够了。

I tried lowering the learning rate

我试着降低学习速度

Cost decrease to a point and becomes constant

成本下降到一个点，并变得恒定

@rodrigo-silveira please read my comment

@Rodrigo-Silveira请阅读我的评论

@rodrigo-silveira I also scaled the input and tried again, it initially decreases from 0.69 to 0.58 and then becomes constant

@rodrigo-silveira我也缩放了输入并再次尝试，它最初从0.69减少到0.58，然后变成常数

优秀答案推荐

I believe your cross-entropy derivative is wrong. Instead of this:

我相信你的交叉熵导数是错误的。而不是这样：

# WRONG!
self.grads['dA'+str(len(self.dimensions)-1)] = -(np.divide(self.Y, A) - np.divide(1-self.Y, A))

... do this:

..。执行以下操作：

# CORRECT
self.grads['dA'+str(len(self.dimensions)-1)] = np.divide(A - self.Y, (1 - A) *  A)

See these lecture notes for the details. I think you meant the formula (5), but forgot 1-A. Anyway, use formula (6).

有关详细内容，请参阅这些课堂讲稿。我想你指的是公式(5)，但忘记了1-A。总之，使用公式(6)。

I struggled for hours solving the same problem with the cost function in the IRIS dataset classification. For me the problem was the regularization parameter that was way too high. After setting it to 0 and tuning a bit here and there I got the 100% accuracy.

我花了几个小时才解决了IRIS数据集分类中的成本函数的相同问题。对我来说，问题是正则化参数太高了。在将其设置为0并在各处微调后，我获得了100%的准确率。

更多回答

I did correct it. Still ain't working. I mean the gradient descent is working but the results aren't satisfactory. The error decreases initially but very slowly and then becomes constant at a significantly high value. I tried different combinations of learning rate and number of iterations, but to no avail

我确实改正了。还是不管用。我的意思是，梯度下降是有效的，但结果并不令人满意。误差最初减小，但非常缓慢，然后在显著较高的值变得恒定。我尝试了不同的学习速度和迭代次数组合，但无济于事

Mod 2 isn't easy for NN to learn, especially with 1 neuron in each layer. Your next step is to make it "wide", not only "deep".

MOD 2对于神经网络来说并不容易学习，尤其是每层有一个神经元。你的下一步是让它变得“宽”，而不仅仅是“深”。

Yes. Then I trained it with another examples which was a data set in andrew ng's course's week 2 assignment. In the assignment, it begun with a cost of 0.69 and reached very near to 0. But with the same example, I'm getting maximum upto 0.23. I used 1 hidden layer with 4 neurons.

是。然后我用另一个例子来训练它，这是安德鲁·吴课程第二周作业中的一个数据集。在作业中，它以0.69的成本开始，并达到了非常接近0的水平。但在相同的例子中，我得到的最大值是0.23。我用了一个有4个神经元的隐藏层。

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

正如它目前所写的，你的答案并不清楚。请编辑以添加更多详细信息，以帮助其他人了解这是如何解决提出的问题的。你可以在帮助中心找到更多关于如何写出好答案的信息。