gpt4 book ai didi

python - 可以在 env.step 中返回 False 以某种方式返回 True 吗? (健身房)

转载 作者:行者123 更新时间:2023-12-02 05:47:55 25 4
gpt4 key购买 nike

当我试图弄清楚 flocking env(来自 gym-flock)的重置条件时,我想到了这个问题:'return False' 能以某种方式返回 True 吗??

核心代码是:

1: test_model.py 中 https://github.com/katetolstaya/multiagent_gnn_policies#available-algorithms

def test(args, actor_path, render=True):
# initialize gym env
env_name = args.get('env')
env = gym.make(env_name)
if isinstance(env.env, gym_flock.envs.FlockingRelativeEnv):
env.env.params_from_cfg(args)

# use seed
seed = args.getint('seed')
env.seed(seed)
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)

# initialize params tuple
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
learner = DAGGER(device, args)
n_test_episodes = args.getint('n_test_episodes')
learner.load_model(actor_path, device)

**for _ in range(n_test_episodes):
episode_reward = 0
state = MultiAgentStateWithDelay(device, args, env.reset(), prev_state=None)
done = False
while not done:
action = learner.select_action(state)
next_state, reward, done, _ = env.step(action.cpu().numpy())
next_state = MultiAgentStateWithDelay(device, args, next_state, prev_state=state)
episode_reward += reward
state = next_state
if render:
env.render()
print(episode_reward)
env.close()**

2:gym环境代码:flocking_relative.py 中 https://github.com/katetolstaya/gym-flock/tree/stable/gym_flock/envs/flocking

    def step(self, u):

#u = np.reshape(u, (-1, 2))
assert u.shape == (self.n_agents, self.nu)
#u = np.clip(u, a_min=-self.max_accel, a_max=self.max_accel)
self.u = u * self.action_scalar

# x position
self.x[:, 0] = self.x[:, 0] + self.x[:, 2] * self.dt + self.u[:, 0] * self.dt * self.dt * 0.5
# y position
self.x[:, 1] = self.x[:, 1] + self.x[:, 3] * self.dt + self.u[:, 1] * self.dt * self.dt * 0.5
# x velocity
self.x[:, 2] = self.x[:, 2] + self.u[:, 0] * self.dt
# y velocity
self.x[:, 3] = self.x[:, 3] + self.u[:, 1] * self.dt

self.compute_helpers()

return (self.state_values, self.state_network), self.instant_cost(), **False**, {}

for while loop in test_model.py to break and reset env,在某些时候 done 应该是 True 。但是,env.step(代码第 2 部分)中的代码总是在 done 的位置返回 False。

当 env.step 总是返回 False 时,这个循环如何中断?我已经测试并确认这段代码工作正常,但是很难理解如何做。

请有RL和gym经验的帮帮我非常感谢您提前

最佳答案

https://github.com/katetolstaya/gym-flock/blob/stable/gym_flock/__init__.py#L65

在上面的文件中:

register(
id='FlockingLeader-v0',
entry_point='gym_flock.envs.flocking:FlockingLeaderEnv',
max_episode_steps=200,
)

随着步数变为 max_episode_steps,步骤中的假返回真

关于python - 可以在 env.step 中返回 False 以某种方式返回 True 吗? (健身房),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67734924/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com