gpt4 book ai didi

python - Stablebaselines3 自定义健身房记录奖励

转载 作者:行者123 更新时间:2023-12-05 05:50:46 27 4
gpt4 key购买 nike

我有这个自定义回调来在我的自定义矢量化环境中记录奖励,但奖励一如既往地出现在控制台中 [0] 并且根本没有记录在 tensorboard 中

class TensorboardCallback(BaseCallback):
"""
Custom callback for plotting additional values in tensorboard.
"""

def __init__(self, verbose=0):
super(TensorboardCallback, self).__init__(verbose)

def _on_step(self) -> bool:
self.logger.record('reward', self.training_env.get_attr('total_reward'))
return True

这是主函数的一部分

model = PPO(
"MlpPolicy", env,
learning_rate=3e-4,
policy_kwargs=policy_kwargs,
verbose=1,

# as the environment is not serializable, we need to set a new instance of the environment
loaded_model = model = PPO.load("model", env=env)
loaded_model.set_env(env)

# and continue training
loaded_model.learn(1e+6, callback=TensorboardCallback())
tensorboard_log="./tensorboard/")

最佳答案

需要添加[0]作为索引,

所以你在哪里写了 self.logger.record('reward', self.training_env.get_attr('total_reward')) 你只需要用 self.logger.record( '奖励', self.training_env.get_attr ('total_reward')[0])

class TensorboardCallback(BaseCallback):
"""
Custom callback for plotting additional values in tensorboard.
"""

def __init__(self, verbose=0):
super(TensorboardCallback, self).__init__(verbose)

def _on_step(self) -> bool:
self.logger.record('reward', self.training_env.get_attr('total_reward')[0])

return True

关于python - Stablebaselines3 自定义健身房记录奖励,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70468394/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com