gpt4 book ai didi

machine-learning - 强化学习-Afterstates 的 TD 学习

转载 作者:行者123 更新时间:2023-11-30 09:22:04 25 4
gpt4 key购买 nike

我正在制作一个程序,教 2 名玩家使用强化学习和基于后状态的时间差异学习方法 (TD(λ) ) 玩简单的棋盘游戏。学习是通过训练神经网络来进行的。我用Sutton's NonLinear TD/Backprop neural network )我真的很想听听你对我接下来的困境的看法。在两个对手之间进行回合的基本算法/伪代码是这样的

WHITE.CHOOSE_ACTION(GAME_STATE); //White player decides on its next move by evaluating the current game state ( TD(λ) learning)

GAME_STATE = WORLD.APPLY(WHITE_PLAYERS_ACTION); //We apply the chosen action of the player to the environment and a new game state emerges

IF (GAME STATE != FINAL ){ // If the new state is not final (not a winning state for white player), do the same for the Black player

BLACK.CHOOSE_ACTION(GAME_STATE)

GAME_STATE = WORLD.APPLY(BLACK_PLAYERS_ACTION) // We apply the chosen action of the black player to the environment and a new game state emerges.
}

每个玩家应该何时调用他的学习方法 PLAYER.LEARN(GAME_STATE)。这就是困境。

选项A。每个玩家移动后,新的余态出现后,如下:

WHITE.CHOOSE_ACTION(GAME_STATE);
GAME_STATE = WORLD.APPLY(WHITE_PLAYERS_ACTION);
WHITE.LEARN(GAME_STATE) // White learns from the afterstate that emerged right after his action
IF (GAME STATE != FINAL ){
BLACK.CHOOSE_ACTION(GAME_STATE)
GAME_STATE = WORLD.APPLY(BLACK_PLAYERS_ACTION)
BLACK.LEARN(GAME_STATE) // Black learns from the afterstate that emerged right after his action

选项B。在每个玩家的移动之后,在新的余态出现之后,而且在对手移动之后,如果对手做出了获胜的移动。

WHITE.CHOOSE_ACTION(GAME_STATE);
GAME_STATE = WORLD.APPLY(WHITE_PLAYERS_ACTION);
WHITE.LEARN(GAME_STATE)
IF (GAME_STATE == FINAL ) //If white player won
BLACK.LEARN(GAME_STATE) // Make the Black player learn from the White player's winning afterstate
IF (GAME STATE != FINAL ){ //If white player's move did not produce a winning/final afterstate
BLACK.CHOOSE_ACTION(GAME_STATE)
GAME_STATE = WORLD.APPLY(BLACK_PLAYERS_ACTION)
BLACK.LEARN(GAME_STATE)
IF (GAME_STATE == FINAL) //If Black player won
WHITE.LEARN(GAME_STATE) //Make the White player learn from the Black player's winning afterstate

我认为选项B更合理。

最佳答案

通常,通过 TD 学习,代理将具有 3 个功能:

  • 开始(观察)→ 行动
  • 步骤(观察、奖励)→ 行动
  • 完成(奖励)

行动与学习相结合,游戏结束时还会进行更多的学习。

关于machine-learning - 强化学习-Afterstates 的 TD 学习,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31227273/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com