gpt4 book ai didi

c# - 蒙特卡洛树搜索 : Implementation for Tic-Tac-Toe

转载 作者:IT王子 更新时间:2023-10-29 04:50:00 25 4
gpt4 key购买 nike

编辑:如果您想看看是否能让 AI 表现得更好,请上传完整的源代码:https://www.dropbox.com/s/ous72hidygbnqv6/MCTS_TTT.rar

编辑:搜索搜索空间并找到导致损失的移动。但是由于 UCT 算法,导致损失的移动并不经常被访问。

为了了解 MCTS(蒙特卡洛树搜索),我使用该算法为经典的井字游戏制作了 AI。我使用以下设计实现了该算法:

MCTS stages树策略基于 UCT,默认策略是执行随机移动直到游戏结束。我在实现过程中观察到,计算机有时会做出错误的举动,因为它无法“看到”特定的举动会直接导致损失。

例如: Tic Tac Toe example请注意行动 6(红色方 block )的值(value)如何略高于蓝色方 block ,因此计算机标记了这个位置。我认为这是因为游戏策略是基于随机移动的,因此人类很有可能不会在蓝色框中输入“2”。如果玩家没有在蓝色框中输入 2,则计算机将获胜。

我的问题

1) 这是 MCTS 的已知问题还是实现失败的结果?

2) 可能的解决方案是什么?我正在考虑在选择阶段限制移动,但我不确定 :-)

核心MCTS的代码:

    //THE EXECUTING FUNCTION
public unsafe byte GetBestMove(Game game, int player, TreeView tv)
{

//Setup root and initial variables
Node root = new Node(null, 0, Opponent(player));
int startPlayer = player;

helper.CopyBytes(root.state, game.board);

//four phases: descent, roll-out, update and growth done iteratively X times
//-----------------------------------------------------------------------------------------------------
for (int iteration = 0; iteration < 1000; iteration++)
{
Node current = Selection(root, game);
int value = Rollout(current, game, startPlayer);
Update(current, value);
}

//Restore game state and return move with highest value
helper.CopyBytes(game.board, root.state);

//Draw tree
DrawTree(tv, root);

//return root.children.Aggregate((i1, i2) => i1.visits > i2.visits ? i1 : i2).action;
return BestChildUCB(root, 0).action;
}

//#1. Select a node if 1: we have more valid feasible moves or 2: it is terminal
public Node Selection(Node current, Game game)
{
while (!game.IsTerminal(current.state))
{
List<byte> validMoves = game.GetValidMoves(current.state);

if (validMoves.Count > current.children.Count)
return Expand(current, game);
else
current = BestChildUCB(current, 1.44);
}

return current;
}

//#1. Helper
public Node BestChildUCB(Node current, double C)
{
Node bestChild = null;
double best = double.NegativeInfinity;

foreach (Node child in current.children)
{
double UCB1 = ((double)child.value / (double)child.visits) + C * Math.Sqrt((2.0 * Math.Log((double)current.visits)) / (double)child.visits);

if (UCB1 > best)
{
bestChild = child;
best = UCB1;
}
}

return bestChild;
}

//#2. Expand a node by creating a new move and returning the node
public Node Expand(Node current, Game game)
{
//Copy current state to the game
helper.CopyBytes(game.board, current.state);

List<byte> validMoves = game.GetValidMoves(current.state);

for (int i = 0; i < validMoves.Count; i++)
{
//We already have evaluated this move
if (current.children.Exists(a => a.action == validMoves[i]))
continue;

int playerActing = Opponent(current.PlayerTookAction);

Node node = new Node(current, validMoves[i], playerActing);
current.children.Add(node);

//Do the move in the game and save it to the child node
game.Mark(playerActing, validMoves[i]);
helper.CopyBytes(node.state, game.board);

//Return to the previous game state
helper.CopyBytes(game.board, current.state);

return node;
}

throw new Exception("Error");
}

//#3. Roll-out. Simulate a game with a given policy and return the value
public int Rollout(Node current, Game game, int startPlayer)
{
Random r = new Random(1337);
helper.CopyBytes(game.board, current.state);
int player = Opponent(current.PlayerTookAction);

//Do the policy until a winner is found for the first (change?) node added
while (game.GetWinner() == 0)
{
//Random
List<byte> moves = game.GetValidMoves();
byte move = moves[r.Next(0, moves.Count)];
game.Mark(player, move);
player = Opponent(player);
}

if (game.GetWinner() == startPlayer)
return 1;

return 0;
}

//#4. Update
public unsafe void Update(Node current, int value)
{
do
{
current.visits++;
current.value += value;
current = current.parent;
}
while (current != null);
}

最佳答案

我认为您的回答不应标记为已接受。对于 Tic-Tac-Toe,搜索空间相对较小,应该在合理的迭代次数内找到最佳 Action 。

看起来您的更新函数(反向传播)向不同树级别的节点添加了相同数量的奖励。这是不正确的,因为当前玩家在不同的树级别上是不同的。

我建议您从这个示例中看一下 UCT 方法中的反向传播: http://mcts.ai/code/python.html

您应该根据先前玩家在特定级别(示例中的 node.playerJustMoved)计算的奖励来更新节点的总奖励。

关于c# - 蒙特卡洛树搜索 : Implementation for Tic-Tac-Toe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23803186/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com