c# - 蒙特卡洛树搜索 : Implementation for Tic-Tac-Toe-6ren

c# - 蒙特卡洛树搜索 : Implementation for Tic-Tac-Toe

转载作者：IT王子更新时间：2023-10-29 04:50:00

25

4

编辑:如果您想看看是否能让 AI 表现得更好，请上传完整的源代码:https://www.dropbox.com/s/ous72hidygbnqv6/MCTS_TTT.rar

编辑:搜索搜索空间并找到导致损失的移动。但是由于 UCT 算法，导致损失的移动并不经常被访问。

为了了解 MCTS(蒙特卡洛树搜索)，我使用该算法为经典的井字游戏制作了 AI。我使用以下设计实现了该算法:

MCTS stages 树策略基于 UCT，默认策略是执行随机移动直到游戏结束。我在实现过程中观察到，计算机有时会做出错误的举动，因为它无法“看到”特定的举动会直接导致损失。

例如: Tic Tac Toe example 请注意行动 6(红色方 block )的值(value)如何略高于蓝色方 block ，因此计算机标记了这个位置。我认为这是因为游戏策略是基于随机移动的，因此人类很有可能不会在蓝色框中输入“2”。如果玩家没有在蓝色框中输入 2，则计算机将获胜。

我的问题

1) 这是 MCTS 的已知问题还是实现失败的结果？

2) 可能的解决方案是什么？我正在考虑在选择阶段限制移动，但我不确定 :-)

核心MCTS的代码:

    //THE EXECUTING FUNCTION
    public unsafe byte GetBestMove(Game game, int player, TreeView tv)
    {

        //Setup root and initial variables
        Node root = new Node(null, 0, Opponent(player));
        int startPlayer = player;

        helper.CopyBytes(root.state, game.board);

        //four phases: descent, roll-out, update and growth done iteratively X times
        //-----------------------------------------------------------------------------------------------------
        for (int iteration = 0; iteration < 1000; iteration++)
        {
            Node current = Selection(root, game);
            int value = Rollout(current, game, startPlayer);
            Update(current, value);
        }

        //Restore game state and return move with highest value
        helper.CopyBytes(game.board, root.state);

        //Draw tree
        DrawTree(tv, root);

        //return root.children.Aggregate((i1, i2) => i1.visits > i2.visits ? i1 : i2).action;
        return BestChildUCB(root, 0).action;
    }

    //#1. Select a node if 1: we have more valid feasible moves or 2: it is terminal 
    public Node Selection(Node current, Game game)
    {
        while (!game.IsTerminal(current.state))
        {
            List<byte> validMoves = game.GetValidMoves(current.state);

            if (validMoves.Count > current.children.Count)
                return Expand(current, game);
            else
                current = BestChildUCB(current, 1.44);
        }

        return current;
    }

    //#1. Helper
    public Node BestChildUCB(Node current, double C)
    {
        Node bestChild = null;
        double best = double.NegativeInfinity;

        foreach (Node child in current.children)
        {
            double UCB1 = ((double)child.value / (double)child.visits) + C * Math.Sqrt((2.0 * Math.Log((double)current.visits)) / (double)child.visits);

            if (UCB1 > best)
            {
                bestChild = child;
                best = UCB1;
            }
        }

        return bestChild;
    }

    //#2. Expand a node by creating a new move and returning the node
    public Node Expand(Node current, Game game)
    {
        //Copy current state to the game
        helper.CopyBytes(game.board, current.state);

        List<byte> validMoves = game.GetValidMoves(current.state);

        for (int i = 0; i < validMoves.Count; i++)
        {
            //We already have evaluated this move
            if (current.children.Exists(a => a.action == validMoves[i]))
                continue;

            int playerActing = Opponent(current.PlayerTookAction);

            Node node = new Node(current, validMoves[i], playerActing);
            current.children.Add(node);

            //Do the move in the game and save it to the child node
            game.Mark(playerActing, validMoves[i]);
            helper.CopyBytes(node.state, game.board);

            //Return to the previous game state
            helper.CopyBytes(game.board, current.state);

            return node;
        }

        throw new Exception("Error");
    }

    //#3. Roll-out. Simulate a game with a given policy and return the value
    public int Rollout(Node current, Game game, int startPlayer)
    {
        Random r = new Random(1337);
        helper.CopyBytes(game.board, current.state);
        int player = Opponent(current.PlayerTookAction);

        //Do the policy until a winner is found for the first (change?) node added
        while (game.GetWinner() == 0)
        {
            //Random
            List<byte> moves = game.GetValidMoves();
            byte move = moves[r.Next(0, moves.Count)];
            game.Mark(player, move);
            player = Opponent(player);
        }

        if (game.GetWinner() == startPlayer)
            return 1;

        return 0;
    }

    //#4. Update
    public unsafe void Update(Node current, int value)
    {
        do
        {
            current.visits++;
            current.value += value;
            current = current.parent;
        }
        while (current != null);
    }

最佳答案

我认为您的回答不应标记为已接受。对于 Tic-Tac-Toe，搜索空间相对较小，应该在合理的迭代次数内找到最佳 Action 。

看起来您的更新函数(反向传播)向不同树级别的节点添加了相同数量的奖励。这是不正确的，因为当前玩家在不同的树级别上是不同的。

我建议您从这个示例中看一下 UCT 方法中的反向传播: http://mcts.ai/code/python.html

您应该根据先前玩家在特定级别(示例中的 node.playerJustMoved)计算的奖励来更新节点的总奖励。

关于c# - 蒙特卡洛树搜索 : Implementation for Tic-Tac-Toe，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23803186/

25

4

0

文章推荐： javascript - 在 Backbone.js 中组合哈希和非哈希 URL

文章推荐： JavaScript Pub/Sub - 消息优先级

c++ - C++ 中的 Tic-Tac-Toe 帮助，如何制作循环以便 Tic Tac Toe 游戏每次都会重复棋盘
我是初学者，所以我的代码很乱。我还没有完整地评论这个游戏，所以如果你需要澄清一些变量，我可以给你。 (顺便说一句，这是一个要求制作井字游戏的c++项目) 我的主要问题是，我将如何重复我的棋盘(每次有人
tic-tac-toe - 使计算机永远不会在井字游戏中迷路
我正在为C的Tic Tac Toe代码编写一个简单的游戏。我已经完成了大部分代码，但是我希望AI永不丢失。我已经阅读了有关minimax算法的信息，但我不理解。如何使用此算法使计算机获胜或平局，但永
被调用函数中的 MATLAB tic/toc
在MATLAB中，我想对一个别人写的函数进行计时，他们的函数内部可能使用了tic/toc。我想要我自己的 tic/toc。但如果内部函数调用 tic，则外部计时器会重置。我怎样才能避免这种情况？我不
被调用函数中的 MATLAB tic/toc
在MATLAB中，我想对一个别人写的函数进行计时，他们的函数内部可能使用了tic/toc。我想要我自己的 tic/toc。但如果内部函数调用 tic，则外部计时器会重置。我怎样才能避免这种情况？我不
c++ - Tic Tac Toe中如何分出胜负以及如何不让双方都进入同一个位置？
我现在想用我的代码做两件事。1) 检查获胜者2) 不让双方玩家在同一个位置进入eg.如果player1已经在board[0][0]='X'处输入了value，player2再次进入board[0][0
MATLAB tic-toc 结果以分钟格式显示
我在我的 Matlab 项目中的很多地方都使用了 tic-toc 函数。输出时间可以是331.5264 或1234.754 秒等。我可以输出这种分钟格式吗？例如。 5 分 30.6 秒？谢谢! 最佳
c# - 日期时间变量 "tics"一小时
我的代码(或者更确切地说，其他人的代码)有一个奇怪的问题。我正在调试并试图弄清楚为什么我们的时间显示错误。无论如何，这是打印时间的代码:
javascript - Tic-Tac_Toe 计算机算法
我一直在开发一个简单的井字棋游戏，但遇到了一堵砖墙。虽然大多数游戏功能都已到位，但我缺少适当放置计算机图 block 所需的关键算法。我需要一种算法，可以搜索 3x3 的瓷砖网格，并在网格中搜索计
charts - Gnuplot x tic 标签重叠
我正在用这种格式从数据文件中绘制一个 gnuplot 图表: 01 value_1_1 value_2_1 02 value_1_1 value_2_1 ... 01 value_1_n value_
每个 tic 上的 Gnuplot 水平条
在 gnuplot 中，如何在 y 轴上的每个 tic 标记处在整个图形上绘制水平条？就像一种特定点在哪里的视觉指示器。 (抱歉，如果这很简单，但谷歌搜索无果而终) 最佳答案见 set grid命令
javascript - Tic Tac Toe 游戏结束后禁用按钮
感谢这里人们的帮助，我成功地禁用了点击 div 并在已经使用 $(".pos").addClass('already-played'); 选择它们时覆盖它们；以及 CSS 中的这个: .已经播放{
gnuplot - 如何获取自动生成的 gnuplot tic 之间的距离？
我正在使用 gnuplot 绘制大量绘图。由于每个图的数据范围(x 轴和 y 轴)都是可变的，因此我需要让 gnuplot 自动设置范围和控制。但是，我需要在绘图下方放置一个定义的网格，水平线各 1/
java - Tic Tac Toe 游戏重置太快
我有一个井字棋游戏，其中用户(x)玩CPU(o)。游戏开始时，CPU 将 (o) 放置在中心，并在用户之后移动到随机位置。游戏设置为循环，但一旦出现获胜者，它就会重置，并且不会显示“你赢/输的横幅”。
Gnuplot : xtics - place strings at tics
我是 gnuplot 新手，正在尝试为项目创建堆叠直方图。我遇到的问题是，我无法将 ticlabels 放在 x 轴上(即使可以，它们也没有以整齐的方式格式化)。我的gp文件如下: 这是我的数据文件的
javascript Tic tac toe 游戏等待用户输入
我试图在没有人工智能的情况下实现井字棋游戏。不知怎的，我的点击功能会自动触发。您能帮我理解为什么点击功能会自动触发吗？这是 HTML 代码片段。 Tic Tac Toe Gam
gnuplot - Gnuplot 中 tic 之间的间距
我一直在疯狂地寻找这个问题的答案。如何设置 gnuplot 上抽动之间的距离？目前我的情节中的抽搐被挤得太紧了。我希望它们更加分散。这是一个例子: 我有一个如下所示的图表: 100 ——
c - Tic-Tac-Toe:如何填充决策树？
我正在制作一个井字游戏程序。我计划将 minimax 与它一起使用。我制作了一棵树，其中包含所有可能的游戏序列的空间，并且我正在寻找一种方法来填充它。我目前有这种类型: typedef struct
java - 如何实现java tic tac toe游戏功能
我在完成这项学校作业时遇到了问题。我想实现一种方法，其中代码显示 //call method to check for Winner，在每轮后检查获胜者。我不确定该怎么做。我尝试过各种不同的方法。然
matlab - TIC TOC 或 Profiler
我正在编写一些计算时间很重要的代码。我使用 tic toc 函数和 profiler 来测量时间。它们之间有什么区别？对于我的一段代码，tic toc 函数说明例如时间是 3 秒，但是我的所有代码行
Java Tic Tac Toe 构造函数
我正在尝试遵循本教程: https://www.youtube.com/watch?v=Db3cC5iPrOM 2:59 我听不懂他在说什么。我不明白为什么他在构造函数(public static

首页

博学

6Ren·AI

商城

c# - 蒙特卡洛树搜索 : Implementation for Tic-Tac-Toe