python - RLLib - Tensorflow - InvalidArgumentError : Received a label value of N which is outside the valid range of [0, N)-6ren

python - RLLib - Tensorflow - InvalidArgumentError : Received a label value of N which is outside the valid range of [0, N)

转载作者：太空宇宙更新时间：2023-11-03 20:00:26

34

4

我在自定义环境中使用 RLLib 的 PPOTrainer，我执行了两次trainer.train()，第一次成功完成，但是当我第二次执行它时，它崩溃了错误:

lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call (pid=15248) raise type(e)(node_def, op, message) (pid=15248)

tensorflow.python.framework.errors_impl.InvalidArgumentError:

Received a label value of 5 which is outside the valid range of [0, 5). >Label values: 5 5

(pid=15248) [[node default_policy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at /tensorflow_core/python/framework/ops.py:1751) ]]

这是我的代码:

main.py

ModelCatalog.register_custom_preprocessor("tree_obs_prep", TreeObsPreprocessor)
ray.init()

trainer = PPOTrainer(env=MyEnv, config={
    "train_batch_size": 4000,
    "model": {
        "custom_preprocessor": "tree_obs_prep"
    }
})

for i in range(2):
    print(trainer.train())

MyEnv.py

class MyEnv(rllib.env.MultiAgentEnv):
    def __init__(self, env_config):
        self.n_agents = 2

        self.env = *CREATES ENV*
        self.action_space = gym.spaces.Discrete(5)
        self.observation_space = np.zeros((1, 12))

    def reset(self):
        self.agents_done = []
        obs = self.env.reset()
        return obs[0]

    def step(self, action_dict):
        obs, rewards, dones, infos = self.env.step(action_dict)

        d = dict()
        r = dict()
        o = dict()
        i = dict()
        for i_agent in range(len(self.env.agents)):
            if i_agent not in self.agents_done:
                o[i_agent] = obs[i_agent]
                r[i_agent] = rewards[i_agent]
                d[i_agent] = dones[i_agent]
                i[i_agent] = infos[i)agent]
        d['__all__'] = dones['__all__']

        for agent, done in dones.items():
            if done and agent != '__all__':
                self.agents_done.append(agent)

        return  o, r, d, i

我不知道问题出在哪里，有什么建议吗？这个错误是什么意思？

最佳答案

This评论对我很有帮助:

FWIW, I think such issues can happen if NaNs appear in the policy output. When that happens, you can get out of range errors.

Usually it's due to the observation or reward somehow becoming NaN, though it could be the policy diverging as well.

就我而言，我必须修改我的观察结果，因为代理无法学习策略，并且在训练中的某个时刻(在随机时间步长)返回的操作为 NaN。

关于python - RLLib - Tensorflow - InvalidArgumentError : Received a label value of N which is outside the valid range of [0, N)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59272939/

34

4

0

文章推荐： jQuery 每隔一个背景灰色

文章推荐： c# - 从 IEnumerable 中进行选择时避免重复自己

文章推荐： css - HTML 宽度缩放不需要的水平滚动条

文章推荐： Matlab - With ... 结束结构

c - 我如何设置套接字来执行 "send/receive"或 "receive/send"
如果数据是从另一台计算机(首先)“发送”的，我如何设置我的套接字例程以“发送”(首先)或(切换)“接收”？谢谢通用代码: -(void) TcpClient{ char buffer[12
java - 方法引用 Bound Receiver 和 Unbound Receiver 的区别
我正在尝试在代码中使用 Java 8 方法引用。有四种类型的方法引用可用。静态方法引用。实例方法(绑定(bind)接收者)。实例方法(未绑定(bind)接收者)。构造函数引用。使用静态方法引
Java 8 : Difference between method reference Bound Receiver and UnBound Receiver
我正在尝试在我的代码中使用 Java 8 方法引用。有四种类型的方法引用可用。静态方法引用。实例方法(绑定(bind)接收器)。实例方法(UnBound 接收器)。构造函数引用。使用静态方法
go - Pointer receiver 和 Value receiver 与 Iris 框架实现的区别
这个问题在这里已经有了答案: X does not implement Y (... method has a pointer receiver) (4 个答案) 关闭 3 年前。最近在研究Iri
GIT 使用 receive.denyCurrentBranch=updateInstead 更新工作树，即使对于被拒绝的推送(receive.denyNonFastForwards=true)
我把这个问题/错误发布到 GIT 官方 channel ，但没有得到任何回应。希望这里有人可以帮助我。当 receive.denyCurrentBranch 设置为 updateInstead 并且
python - celery 任务是 "Received"是什么意思？当所有 celery worker 都被阻塞时，不是 "Received"的新任务会发生什么？
我正在开发一个新的监控系统，该系统可以测量 Celery 队列吞吐量并在队列备份时帮助提醒团队。在我的工作过程中，我遇到了一些我不理解的奇怪行为(并且在 Celery 规范中没有详细记录)。出于测试
python - celery 任务是 "Received"是什么意思？当所有 celery worker 都被阻塞时，不是 "Received"的新任务会发生什么？
我正在开发一个新的监控系统，该系统可以测量 Celery 队列吞吐量并在队列备份时帮助提醒团队。在我的工作过程中，我遇到了一些我不理解的奇怪行为(并且在 Celery 规范中没有详细记录)。出于测试
android - 版本 9 未提供给任何设备配置 : all devices that might receive version 9 would receive version 10
这个问题在这里已经有了答案: What does this Google Play APK publish error message mean? (17 个答案) 关闭 3 年前。我为我的应用程
java - 火力地堡用户界面 : Solution to receive all keys from a child and use it to receive information from another node and fill it in a Recyclerview?
我正在寻找一种解决方案来从我的 child “药物”中获取数据，并使用 ID 从“medication_plan”节点接收特定数据并将它们显示在 Recyclerview 中。数据库结构: 目前我正
python - 等级不匹配 : Rank of labels (received 2) should equal rank of logits minus 1 (received 2)
我正在构建 DNN 来预测对象是否存在于图像中。我的网络有两个隐藏层，最后一层看起来像这样: # Output layer W_fc2 = weight_variable([2048, 1])
wcf - 如何处理乱序调用的 "Receive"？
我有一个模拟销售漏斗的 WF4 服务。它的工作原理是从“注册”接听电话开始。之后，有 10 个类似的阶段(每个阶段包含 2 个接收)。在当前阶段验证接收到的数据之前，您不能前进到一个阶段。但我不确定的
NSubstitute Received() 响应多个调用
我有一个用 NSubstitute 伪造的对象，它有一个被调用两次的方法。我想验证该方法实际上已被调用两次(且仅调用两次)。我浏览了文档和谷歌，但没有运气。任何帮助，将不胜感激。谢谢。最佳答案 NS
sockets - D语言Socket.receive()
我在 Windows 上使用 D 编写了一个套接字服务器，现在我想将它移植到 Linux 上。这是代码摘要: /* * this.rawsocks - SocketSet * this.serve
Android Receiver 随机禁用
我有一个在 AndroidManifest.xml 中定义了 Receiver 的应用程序，它似乎随机地被禁用，这导致应用程序强制关闭，直到重新安装应用程序。在发生这种情况之前，应用可能会在一天、一周
java - android注解@Receiver
我正在尝试使用 android 注释库通过两种方式进行广播接收器，但 ide 无法识别此代码中的 @Receiver 或 @ReceiverAction import android.content.
安卓实时数据 : Not receiving all notifications
我正在试验 Android 的 LiveData .我只是试图将大量通知推送给观察 LiveData 对象的观察者。我让一个线程在后台运行，在一个 while 循环中，我不断地通过 LiveData
c# - 检查异步方法的调用 Received()
当我运行以下代码时: [Test] public async Task Can_Test_Update() { var response = await _controller.UpdateA
email - 为什么 RECEIVED 电子邮件标题似乎按时间顺序乱序？
查看 header 时，似乎第二台接收邮件的服务器直到最终 header 中报告的送达日期之后才转发它。在 c9mailgw11.amadis.com，报告的时间是 22:47:49 -0800
Git post-receive 未运行
我在这里搜索了几个问题都没有得到答案，所以我会根据我的具体情况询问。真正简单的接收后 Hook ，它只是 curl 到 Redmine 以强制 Redmine 在提交时更新 repo 的 View
Elixir 大小写与使用 `receive` 的类似语法
我目前正在尝试 Elixir。我对 Ruby 或函数式编程的经验很少，所以我不太熟悉语法。我在读Learn Elixir in Y minutes其中一个例子让我有点困惑。起初，指南显示了 case

首页

博学

6Ren·AI

商城

python - RLLib - Tensorflow - InvalidArgumentError : Received a label value of N which is outside the valid range of [0, N)