gpt4 book ai didi

python - RLLib - Tensorflow - InvalidArgumentError : Received a label value of N which is outside the valid range of [0, N)

转载 作者:太空宇宙 更新时间:2023-11-03 20:00:26 34 4
gpt4 key购买 nike

我在自定义环境中使用 RLLib 的 PPOTrainer,我执行了两次trainer.train(),第一次成功完成,但是当我第二次执行它时,它崩溃了错误:

lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call (pid=15248) raise type(e)(node_def, op, message) (pid=15248)

tensorflow.python.framework.errors_impl.InvalidArgumentError:

Received a label value of 5 which is outside the valid range of [0, 5). >Label values: 5 5

(pid=15248) [[node default_policy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at /tensorflow_core/python/framework/ops.py:1751) ]]

这是我的代码:

main.py

ModelCatalog.register_custom_preprocessor("tree_obs_prep", TreeObsPreprocessor)
ray.init()

trainer = PPOTrainer(env=MyEnv, config={
"train_batch_size": 4000,
"model": {
"custom_preprocessor": "tree_obs_prep"
}
})

for i in range(2):
print(trainer.train())

MyEnv.py

class MyEnv(rllib.env.MultiAgentEnv):
def __init__(self, env_config):
self.n_agents = 2

self.env = *CREATES ENV*
self.action_space = gym.spaces.Discrete(5)
self.observation_space = np.zeros((1, 12))

def reset(self):
self.agents_done = []
obs = self.env.reset()
return obs[0]

def step(self, action_dict):
obs, rewards, dones, infos = self.env.step(action_dict)

d = dict()
r = dict()
o = dict()
i = dict()
for i_agent in range(len(self.env.agents)):
if i_agent not in self.agents_done:
o[i_agent] = obs[i_agent]
r[i_agent] = rewards[i_agent]
d[i_agent] = dones[i_agent]
i[i_agent] = infos[i)agent]
d['__all__'] = dones['__all__']

for agent, done in dones.items():
if done and agent != '__all__':
self.agents_done.append(agent)

return o, r, d, i

我不知道问题出在哪里,有什么建议吗?这个错误是什么意思?

最佳答案

This评论对我很有帮助:

FWIW, I think such issues can happen if NaNs appear in the policy output. When that happens, you can get out of range errors.

Usually it's due to the observation or reward somehow becoming NaN, though it could be the policy diverging as well.

就我而言,我必须修改我的观察结果,因为代理无法学习策略,并且在训练中的某个时刻(在随机时间步长)返回的操作为 NaN

关于python - RLLib - Tensorflow - InvalidArgumentError : Received a label value of N which is outside the valid range of [0, N),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59272939/

34 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com