作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
尝试使用像这样的 YouTube 视频和论文来学习 MCST。
http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/grand-challenge.pdf
然而,除了高级理论解释之外,我并没有多少运气来理解细节。这是上面论文的一些引述和我的问题。
最佳答案
- Selection Phase: MCTS iteratively selects the highest scoring child node of the current state. If the current state is the root node, where did these children come from in the first place? Wouldn't you have a tree with just a single root node to begin with? With just a single root node, do you get into Expansion and Simulation phase right away?
- If MCTS selects the highest scoring child node in Selection phase, you never explore other children or possibly even a brand new child whilst going down the levels of the tree?
W(s, a) / N(s, a)
是exploitation 部分(简单的平均分),
B(s, a)
是exploration 部分。
- How does the Expansion phase happen for a node? In the diagram above, why did it not choose leaf node but decided to add a sibling to the leaf node?
- During the Simulation phase, stochastic policy is used to select legal moves for both players until the game terminates. Is this stochastic policy a hard-coded behavior and you are basically rolling a dice in the simulation to choose one of the possible moves taking turns between each player until the end?
- The way I understand this is you start at a single root node and by repeating the above phases you construct the tree to a certain depth. Then you choose the child with the best score at the second level as your next move. The size of the tree you are willing to construct is basically your hard AI responsiveness requirement right? Since while the tree is being constructed the game will stall and compute this tree.
关于montecarlo - 蒙特卡罗搜索树是如何工作的?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44230911/
我正在编写一个 c 脚本来将 pi 近似与 OpenMp 并行化。我认为我的代码运行良好,输出令人信服。我现在用 4 个线程运行它。我不确定的是这段代码是否容易受到竞争条件的影响?如果是,我该如何协调
我现在正在学习拉斯维加斯和蒙特卡洛算法自己,有两个问题可能很简单但我无法回答,如果有人能帮助我......提前谢谢 考虑针对问题 P 的蒙特卡洛算法 A,其预期运行时间在任何大小为 n 的实例上至多为
在 Sutton's book on RL ,在蒙特卡罗政策评估下,他在第 111 页提到注意估计单个状态值的计算费用与状态数量无关。然而,对于蒙特卡洛来说: 状态的平均返回是从第一次遇到该状态时到该
我是一名优秀的程序员,十分优秀!