algorithm - 蒙特卡洛树搜索 : Tree Policy for two player games-6ren

algorithm - 蒙特卡洛树搜索 : Tree Policy for two player games

转载作者：塔克拉玛干更新时间：2023-11-03 03:23:41

24

4

我对 MCTS“树木政策”的实现方式有些困惑。我读过的每篇论文或文章都谈到从当前游戏状态(在 MCTS 术语中:玩家即将采取行动的根)下树。我的问题是，即使我处于 MIN 玩家级别(假设我是 MAX 玩家)，我如何选择最好的 child 。即使我选择了 MIN 可能采取的某些特定 Action ，并且我的搜索树通过该节点变得更深，MIN 玩家在轮到它时也可能会选择一些不同的节点。(如果最小玩家是业余人类，它可能就像最好选择一些不一定是最好的节点)。由于 MIN 选择了不同的节点，因此这种情况下 MAX 通过该节点传播的全部工作都是徒劳的。对于我所指的步骤: https://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/其中树政策:https://jeffbradberry.com/images/mcts_selection.png这让我相信他们是从单人玩家的角度来执行的。

最佳答案

要为双人游戏实现 MCTS，您只需在反向传播的每一步中翻转符号，代码中的一行更改。

这意味着我们试图在每一层中最大化奖励，但是当我们将奖励传播到树上时，当你到达你的层时，对你的对手的正面奖励对你来说是负面的。

关于algorithm - 蒙特卡洛树搜索 : Tree Policy for two player games，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42302142/

24

4

0

文章推荐： java - 忽略导入语句中的弃用类型警告

文章推荐：方法调用中的Java 7 Diamond Operation

文章推荐： java - 对话范围是 View 范围的合适替代品吗？

文章推荐： javascript - 找到二进制搜索结果的最左重复项

reinforcement-learning - off-policy 学习方法是否比 on-policy 方法更好？
我无法理解 on-policy 方法(如 A3C )和 off-policy 方法(如 DDPG )之间的根本区别是什么。据我所知，无论行为策略如何，off-policy 方法都可以学习到最优策略。它
Azure 数据资源管理器 : How do Partitioning Policy and Merge Policy work?
在我们的 ADX 集群中，表上没有分区策略和合并策略，但 adx 仍然创建范围。我很困惑它是如何工作的以及默认设置是什么。有谁知道这个吗？此外，分区键的组合如何工作？例如我有 { "Partit
jquery - S3 "Invalid according to Policy: Policy expired"
我最近在尝试本地上传图像时开始遇到此错误。不过，我以前不习惯收到错误。 S3 方面或代码方面没有任何变化。不过，上传在生产中仍然有效。我已经尝试了所有常见的方法，重新启动服务器，重新启动计算机，更改为
open-policy-agent - OR 在 Open Policy Agent(联合行为)中
在 OPA 中，很清楚如何查询 condition AND condition : values := { "value1": { "a": "one" }, "value2":
azure - 地形 : How to define the Azure Policy Initiative along with Azure Policies?
我有一个自定义政策 // Policy: Management Group Level resource "azurerm_policy_definition" "only-deploy-in-eas
Docker Swarm : restart-policy Additional property restart-policy is not allowed
鉴于以下(为了论证而简化)docker-compose.yml文件: version: '3' services: postgres: image: fleetit-postgres
open-policy-agent - open policy agent - 如何从 REST API 中保留策略？
我是 OPA(开放策略代理)的新手，正在尝试使用 REST API/v1/policies/{id} 创建新策略。有用!但是，OPA 服务器将其保存到内存中，并且在重新启动后我的所有策略都被删除了。我
open-policy-agent - 如何使用 Open Policy Agent Gatekeeper K8sPSPCapabilities 约束模板将容器列入白名单
我想在 K8sPSPCapabilities 约束模板中将一个容器列入白名单，但我在使用 rego 语言时遇到了一些困难。我想禁止除特定容器之外的所有容器的 NET_RAW 功能。如果有人能指出我正确
amazon-web-services - AWS : What is the relationship between S3 bucket policies and user policies?
S3 存储桶策略与其指定管理员的用户策略之间的关系是什么(或应该是什么)？例如假设我新创建了一个存储桶: $ aws --profile admin --endpoint-url http://lo
security - Content-Security-Policy 和 Content-Security-Policy-Report-Only 头可以共存而不相互干扰吗
我正在为我公司的网站添加 Content-Security-Policy-Report-Only 标题。在我研究它时，我发现一些页面已经设置了 Content-Security-Policy head
content-security-policy - Content-Security-Policy header 应该在每个服务器响应中还是只在 text/html 中？
应Content-Security-Policy header 是在每个服务器响应(图像、CSS、JS 等)中还是仅在 text/html(PHP 脚本的 .html 或 HTML 输出)中？最佳答
content-security-policy -
我的 https://my-site.com网站有一些类似下面的 html: 在控制台中，我得到这个错误: Refused to load media from 'blob:https://my-s
content-security-policy - 如何修复 "Content Security Policy - contains an invalid source"错误？
我收到这个错误，我不知道为什么，我包含的脚本有效？并且错误仅在我加载子页面时出现。不是在我加载起始页时。那么我做错了什么？ The source list for Content Security
azure - 地形 : Create an Azure Policy Initiative with multiple custom Azure Policies with Parameters
我想创建一个包含多个自定义 Azure 策略的 Azure 策略计划我有以下自定义政策 # Azure Provider source and version being used terrafor
Azure AD B2C : Policy IP Addresses to Allow Access to Custom Policy HTML Templates
我们正在使用 Azure AD B2C(仍处于预览版)对我们的应用程序的客户进行身份验证。我们将使用自定义 html 模板来实现登录体验和注册(使我们对 MS 内容之外的格式和链接拥有更多权力)。
kubernetes - Istio (1.6.2) : DENY policy in Authorization Policy does not work with Valid Token
我是 Istio 的新手。我正在使用 JWT 实现授权。有效的 JWT token 不会反射(reflect) DENY 操作。我添加了 JWT Payload and Authorization P
java - org.apache.cxf.ws.policy.PolicyException : None of the policy alternatives can be satisfied
我想用 JUnit 和 Apache CXF 编写一个简单的集成测试来测试一些支持 WS-Security 的服务。当我尝试运行我的代码时: MyService myService = new myW
tomcat - 如何将 'Referrer-Policy' 和 'Feature-Policy' header 添加到 Tomcat for Jira 8
在 https://securityheaders.com 上测试我们的网站时，它表明我们缺少两个 header : 推荐人政策功能政策我们的站点是 Jira 8.3.1，它本身运行 Tomcat
xml - Azure B2C : Where to Create DateTime Extension Attribute to be used in Custom Policy- Portal or Custom Policy or Both?
我需要在数据类型“DateTime”的自定义策略中使用扩展属性。我将声明类型定义如下。 myAttrbute dateTime This is for
java - SSLHandshakeException - java.security.cert.CertPathValidatorException : non-null policy tree required and policy tree is null
我的信任库中有服务器根证书，在设置 -Djavax.net.debug=all 后，我可以看到信任库已初始化并且受信任的证书在那里: trustStore is: test.truststore tr

首页

博学

6Ren·AI

商城

algorithm - 蒙特卡洛树搜索 : Tree Policy for two player games