gpt4 book ai didi

java - 修改感知器成为梯度下降

转载 作者:行者123 更新时间:2023-11-30 11:07:31 24 4
gpt4 key购买 nike

根据 this视频感知器和梯度下降算法之间的实质性差异非常小。他们将其指定为本质上:

感知器:Δwi = η(y - ŷ)xi

梯度下降:Δwi = η(y - α)xi

我已经实现了感知器算法的工作版本,但我不明白我需要更改哪些部分才能将其转变为梯度下降法。

下面是我的感知器代码的承载部分,我想这些是我需要修改的组件。但是哪里?我需要改变什么?我不明白。

这是出于教学原因留下的,我已经想通了,但对梯度仍然感到困惑,请参阅UPDATE 下面

      iteration = 0;
do
{
iteration++;
globalError = 0;
//loop through all instances (complete one epoch)
for (p = 0; p < number_of_files__train; p++)
{
// calculate predicted class
output = calculateOutput( theta, weights, feature_matrix__train, p, globo_dict_size );
// difference between predicted and actual class values
localError = outputs__train[p] - output;
//update weights and bias
for (int i = 0; i < globo_dict_size; i++)
{
weights[i] += ( LEARNING_RATE * localError * feature_matrix__train[p][i] );
}
weights[ globo_dict_size ] += ( LEARNING_RATE * localError );

//summation of squared error (error value for all instances)
globalError += (localError*localError);
}

/* Root Mean Squared Error */
if (iteration < 10)
System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) );
else
System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) );
}
while(globalError != 0 && iteration<=MAX_ITER);

这是我的感知器的关键:

  static int calculateOutput( int theta, double weights[], double[][] feature_matrix, int file_index, int globo_dict_size )
{
//double sum = x * weights[0] + y * weights[1] + z * weights[2] + weights[3];
double sum = 0;

for (int i = 0; i < globo_dict_size; i++)
{
sum += ( weights[i] * feature_matrix[file_index][i] );
}
//bias
sum += weights[ globo_dict_size ];

return (sum >= theta) ? 1 : 0;
}

只是我将 caculateOutput 方法替换为如下内容:

public static double [] gradientDescent(final double [] theta_in, final double alpha, final int num_iters, double[][] data ) 
{
final double m = data.length;
double [] theta = theta_in;
double theta0 = 0;
double theta1 = 0;
for (int i = 0; i < num_iters; i++)
{
final double sum0 = gradientDescentSumScalar0(theta, alpha, data );
final double sum1 = gradientDescentSumScalar1(theta, alpha, data);
theta0 = theta[0] - ( (alpha / m) * sum0 );
theta1 = theta[1] - ( (alpha / m) * sum1 );
theta = new double [] { theta0, theta1 };
}
return theta;
}

更新编辑


在这一点上,我认为我已经非常接近了。

我知道如何计算假设,我认为我已经正确地做到了这一点,但是尽管如此,这段代码仍然存在严重错误。我很确定它与我对 gradient 的计算有关。当我运行它时,错误会剧烈波动,然后变为 infinity,然后变为 NaaN

  double cost, error, hypothesis;
double[] gradient;
int p, iteration;

iteration = 0;
do
{
iteration++;
error = 0.0;
cost = 0.0;

//loop through all instances (complete one epoch)
for (p = 0; p < number_of_files__train; p++)
{

// 1. Calculate the hypothesis h = X * theta
hypothesis = calculateHypothesis( theta, feature_matrix__train, p, globo_dict_size );

// 2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
cost = hypothesis - outputs__train[p];

// 3. Calculate the gradient = X' * loss / m
gradient = calculateGradent( theta, feature_matrix__train, p, globo_dict_size, cost, number_of_files__train);

// 4. Update the parameters theta = theta - alpha * gradient
for (int i = 0; i < globo_dict_size; i++)
{
theta[i] = theta[i] - LEARNING_RATE * gradient[i];
}

}

//summation of squared error (error value for all instances)
error += (cost*cost);

/* Root Mean Squared Error */
if (iteration < 10)
System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( error/number_of_files__train ) );
else
System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( error/number_of_files__train ) );
//System.out.println( Arrays.toString( weights ) );

}
while(cost != 0 && iteration<=MAX_ITER);


}

static double calculateHypothesis( double[] theta, double[][] feature_matrix, int file_index, int globo_dict_size )
{
double hypothesis = 0.0;

for (int i = 0; i < globo_dict_size; i++)
{
hypothesis += ( theta[i] * feature_matrix[file_index][i] );
}
//bias
hypothesis += theta[ globo_dict_size ];

return hypothesis;
}

static double[] calculateGradent( double theta[], double[][] feature_matrix, int file_index, int globo_dict_size, double cost, int number_of_files__train)
{
double m = number_of_files__train;

double[] gradient = new double[ globo_dict_size];//one for bias?

for (int i = 0; i < gradient.length; i++)
{
gradient[i] = (1.0/m) * cost * feature_matrix[ file_index ][ i ] ;
}

return gradient;
}

最佳答案

当你有像 (sum >= theta) 这样不可微分的激活函数时,感知器规则只是梯度下降的近似值? 1 : 0。正如他们在视频末尾所问的那样,你不能在那里使用梯度,因为这个阈值函数不可微分(好吧,它的梯度没有为 x=0 定义,梯度​​在其他任何地方都为零)。如果不是这个阈值,你有一个像sigmoid这样的平滑函数你可以计算实际的梯度。

在这种情况下,您的权重更新将为 LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]。对于 sigmoid 的情况,我发送给您的链接还显示了如何计算 output_gradient

总而言之,要从感知器转变为梯度下降,您必须

  1. 使用导数(梯度)不为零的激活函数到处。
  2. 应用链式规则定义新的更新规则

关于java - 修改感知器成为梯度下降,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28913062/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com