gpt4 book ai didi

Why does the cost in my implementation of a deep neural network increase after a few iterations?(为什么我实施深度神经网络的成本在几次迭代后会增加?)

转载 作者:bug小助手 更新时间:2023-10-28 13:40:23 27 4
gpt4 key购买 nike

I am a beginner in machine learning and neural networks. Recently, after watching Andrew Ng's lectures on deep learning, I tried to implement a binary classifier using deep neural networks on my own.

However, the cost of the function is expected to decrease after each iteration.
In my program, it decreases slightly in the beginning, but rapidly increases later. I tried to make changes in learning rate and number of iterations, but to no avail. I am very confused.

Here is my code


1. Neural network classifier class


class NeuralNetwork:
def __init__(self, X, Y, dimensions, alpha=1.2, iter=3000):
self.X = X
self.Y = Y
self.dimensions = dimensions # Including input layer and output layer. Let example be dimensions=4
self.alpha = alpha # Learning rate
self.iter = iter # Number of iterations
self.length = len(self.dimensions)-1
self.params = {} # To store parameters W and b for each layer
self.cache = {} # To store cache Z and A for each layer
self.grads = {} # To store dA, dZ, dW, db
self.cost = 1 # Initial value does not matter

def initialize(self):
# If dimensions is 4, then layer 0 and 3 are input and output layers
# So we only need to initialize w1, w2 and w3
# There is no need of w0 for input layer
for l in range(1, len(self.dimensions)):
self.params['W'+str(l)] = np.random.randn(self.dimensions[l], self.dimensions[l-1])*0.01
self.params['b'+str(l)] = np.zeros((self.dimensions[l], 1))

def forward_propagation(self):
self.cache['A0'] = self.X
# For last layer, ie, the output layer 3, we need to activate using sigmoid
# For layer 1 and 2, we need to use relu
for l in range(1, len(self.dimensions)-1):
self.cache['Z'+str(l)] =['W'+str(l)], self.cache['A'+str(l-1)]) + self.params['b'+str(l)]
self.cache['A'+str(l)] = relu(self.cache['Z'+str(l)])
l = len(self.dimensions)-1
self.cache['Z'+str(l)] =['W'+str(l)], self.cache['A'+str(l-1)]) + self.params['b'+str(l)]
self.cache['A'+str(l)] = sigmoid(self.cache['Z'+str(l)])

def compute_cost(self):
m = self.Y.shape[1]
A = self.cache['A'+str(len(self.dimensions)-1)]
self.cost = -1/m*np.sum(np.multiply(self.Y, np.log(A)) + np.multiply(1-self.Y, np.log(1-A)))
self.cost = np.squeeze(self.cost)

def backward_propagation(self):
A = self.cache['A' + str(len(self.dimensions) - 1)]
m = self.X.shape[1]
self.grads['dA'+str(len(self.dimensions)-1)] = -(np.divide(self.Y, A) - np.divide(1-self.Y, 1-A))
# Sigmoid derivative for final layer
l = len(self.dimensions)-1
self.grads['dZ' + str(l)] = self.grads['dA' + str(l)] * sigmoid_prime(self.cache['Z' + str(l)])
self.grads['dW' + str(l)] = 1 / m *['dZ' + str(l)], self.cache['A' + str(l - 1)].T)
self.grads['db' + str(l)] = 1 / m * np.sum(self.grads['dZ' + str(l)], axis=1, keepdims=True)
self.grads['dA' + str(l - 1)] =['W' + str(l)].T, self.grads['dZ' + str(l)])
# Relu derivative for previous layers
for l in range(len(self.dimensions)-2, 0, -1):
self.grads['dZ'+str(l)] = self.grads['dA'+str(l)] * relu_prime(self.cache['Z'+str(l)])
self.grads['dW'+str(l)] = 1/m*['dZ'+str(l)], self.cache['A'+str(l-1)].T)
self.grads['db'+str(l)] = 1/m*np.sum(self.grads['dZ'+str(l)], axis=1, keepdims=True)
self.grads['dA'+str(l-1)] =['W'+str(l)].T, self.grads['dZ'+str(l)])

def update_parameters(self):
for l in range(1, len(self.dimensions)):
self.params['W'+str(l)] = self.params['W'+str(l)] - self.alpha*self.grads['dW'+str(l)]
self.params['b'+str(l)] = self.params['b'+str(l)] - self.alpha*self.grads['db'+str(l)]

def train(self):
for i in range(self.iter):
if i % 100 == 0:
print('Cost after {} iterations is {}'.format(i, self.cost))

2. Testing code for odd or even number classifier


import numpy as np
from main import NeuralNetwork
X = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
Y = np.array([[1, 0, 1, 0, 1, 0, 1, 0, 1, 0]])
clf = NeuralNetwork(X, Y, [1, 1, 1], alpha=0.003, iter=7000)

3. Helper Code


import math
import numpy as np

def sigmoid_scalar(x):
return 1/(1+math.exp(-x))
def sigmoid_prime_scalar(x):
return sigmoid_scalar(x)*(1-sigmoid_scalar(x))
def relu_scalar(x):
if x > 0:
return x
return 0
def relu_prime_scalar(x):
if x > 0:
return 1
return 0
sigmoid = np.vectorize(sigmoid_scalar)
sigmoid_prime = np.vectorize(sigmoid_prime_scalar)
relu = np.vectorize(relu_scalar)
relu_prime = np.vectorize(relu_prime_scalar)


enter image description here



Two things come to mind: 1. Have you tried a lower learning rate (1E-5) 2. Have you tried scaling your input? Maybe something as simple as X = X / 10 might be enough for your use case.


I tried lowering the learning rate


Cost decrease to a point and becomes constant


@rodrigo-silveira please read my comment


@rodrigo-silveira I also scaled the input and tried again, it initially decreases from 0.69 to 0.58 and then becomes constant



I believe your cross-entropy derivative is wrong. Instead of this:


self.grads['dA'+str(len(self.dimensions)-1)] = -(np.divide(self.Y, A) - np.divide(1-self.Y, A))

... do this:


self.grads['dA'+str(len(self.dimensions)-1)] = np.divide(A - self.Y, (1 - A) * A)

See these lecture notes for the details. I think you meant the formula (5), but forgot 1-A. Anyway, use formula (6).


I struggled for hours solving the same problem with the cost function in the IRIS dataset classification. For me the problem was the regularization parameter that was way too high. After setting it to 0 and tuning a bit here and there I got the 100% accuracy.



I did correct it. Still ain't working. I mean the gradient descent is working but the results aren't satisfactory. The error decreases initially but very slowly and then becomes constant at a significantly high value. I tried different combinations of learning rate and number of iterations, but to no avail


Mod 2 isn't easy for NN to learn, especially with 1 neuron in each layer. Your next step is to make it "wide", not only "deep".

MOD 2对于神经网络来说并不容易学习,尤其是每层有一个神经元。你的下一步是让它变得“宽”,而不仅仅是“深”。

Yes. Then I trained it with another examples which was a data set in andrew ng's course's week 2 assignment. In the assignment, it begun with a cost of 0.69 and reached very near to 0. But with the same example, I'm getting maximum upto 0.23. I used 1 hidden layer with 4 neurons.


As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.


27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号