文章目录Q-learning for Continuous ActionsSolution1--sample actionSolution2--gradient ascendSolution3--design a networkSolution4--Don't use Q-learningQ-learning for Continuous ActionsQ:Q-learning相比于policy gradient based方法为什么训练起来效果更好,更平稳?A:只要能够 estimate 出Q