gpt4 book ai didi

java - 为什么 Alpha/Beta 剪枝对我的 MiniMax 算法没有影响?

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:17:12 25 4
gpt4 key购买 nike

首先,我很抱歉标题有点不正确,我只是不想让它有 30 个字那么长。当我将它应用到我的 TicTacToe 游戏时,我实现的 alpha/beta 修剪极大地减少了评估的数量,请参见下面的内容。

每对评估计数均使用与输入相同的游戏状态进行测量。

Pruning vs no pruning

当我想对我一直在研究的玩神经网络的西洋跳棋实现修剪时,问题就出现了。这就是整个事情的目标,我刚刚制作了 TicTacToe 游戏来试验 MiniMax + Alpha/Beta,因为我以前从未处理过这些算法。

这是使用神经网络进行的同类实验。

checkers minimax evals checkers minimax with pruning evals

现在是代码(西洋跳棋,如果您想看一下 TicTacToe 版本,请告诉我,尽管它们几乎相同)。

我将只粘贴一次这两种方法的开头,因为它们完全相同,我将显示两个签名,因为它们略有不同。

Small note to make the code more clear.

Board is the object which keeps track of pieces, available moves, which turn it is, if the game has been won/drawn etc...

Move is the object which contains all information pertinent to moves, when I make the clone as the first line of the method I simply make a clone of the given board and the constructor applies the given move to it.

private double miniMax(Board b, Move m, int depth) {

private double alphaBeta(Board b, Move m, int depth, double alpha, double beta) {

两种方法的开始:

Testboard clone = new Testboard(b, m);
// Making a clone of the board in order to
// avoid making changes to the original one

if (clone.isGameOver()) {

if (clone.getLoser() == null)
// It's a draw, evaluation = 0
return 0;

if (clone.getLoser() == Color.BLACK)
// White (Max) won, evaluation = 1
return 1;

// Black (Min) won, evaluation = -1
return -1;
}

if (depth == 0)
// Reached the end of the search, returning current Evaluation of the board
return getEvaluation(clone);

常规 MiniMax 延续:

    // If it's not game over
if (clone.getTurn() == Color.WHITE) {

// It's white's turn (Maxing player)
double max = -1;
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = miniMax(clone, move, depth-1);
// Only the highest score is stored
if (score > max)
max = score;
}
// And is returned
return max;
}

// It's black's turn (Min player)
double min = 1;
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = miniMax(clone, move, depth-1);
// Only the lowest score is stored
if (score < min)
min = score;
}
// And is returned
return min;
}

带有 Alpha/Beta 剪枝延续的 MiniMax:

    // If it's not game over
if (clone.getTurn() == Color.WHITE) {

// It's white's turn (Maxing player)
for (Move move : clone.getMoves()) {

// For each children node (available moves)
// Their minimax value is calculated
double score = alphaBeta(clone, move, depth-1, alpha, beta);

if (score > alpha)
// If this score is greater than alpha
// It is assigned to alpha as the new highest score
alpha = score;
if (alpha >= beta)
// The cycle is interrupted early if the value of alpha equals or is greater than beta
break;
}
// The alpha value is returned
return alpha;
}

// It's black's turn (Min player)
for (Move move : clone.getMoves()) {

// For each children node (available moves)
// Their minimax value is calculated
double score = alphaBeta(clone, move, depth-1, alpha, beta);

if (score < beta)
// If this score is lower than beta
// It is assigned to beta as the new lowest score
beta = score;
if (alpha >= beta)
// The cycle is interrupted early if the value of alpha equals or is greater than beta
break;
}
// The beta value is returned
return beta;
}

老实说,我被困住了,我不确定我该怎么做才能弄清楚发生了什么。我已经在几个不同甚至随机生成的神经网络上尝试过 MiniMax+A/B,但在评估次数方面我从未见过任何改进。我希望这里的人能够对这种情况有所了解,谢谢!

最佳答案

感谢@maraca 帮我解决了这个问题,因为他只回复了一条评论,所以要回答我自己。

我发布的代码没有任何问题,问题出在搜索达到深度限制后我使用的评估函数。

我使用的是一个仍然未经训练的神经网络,它本质上只是吐出随机值,这迫使 MiniMax+A/B 遍历所有节点,因为与答案不一致,结果证明这是必要的修剪发生。

关于java - 为什么 Alpha/Beta 剪枝对我的 MiniMax 算法没有影响?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42988523/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com