RL(Chapter 6): Temporal-Difference Learning (TD learning) (时序差分学习)TD PredictionAdvantages of TD Prediction MethodsOptimality of TD(0)Sarsa: On-policy TD ControlQ-learning: Off-policy TD ControlExpected SarsaMaximization Bias and Double LearningGames,
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录TD PredictionTD learning is a combination of Monte Carlo ideas and dynamic programming (DP) ideas.Like Monte Carlo methods, TD methods can learn directly from ra