NFQ文献中参考文献的作用

[BM95] Boyan and Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7. Morgan Kaufmann, 1995.

运用多层感知器表示价值函数，所存在的问题

[EPG05] D. Ernst and and L. Wehenkel P. Geurts. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503–556, 2005.

NFQ是其中’Fitted Q Iteration’的special realisation

[Gor95] G. J. Gordon. Stable function approximation in dynamic programming. In A. Prieditis and S. Russell, editors, Proceedings of the ICML, San Francisco, CA, 1995.

定值迭代算法fitted value iteration algorithm，NFQ基于此

[Lin92] L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8:293–321, 1992.

运用多层感知器表示价值函数的成功案例；

’experience replay‘ technique

[LP03] M. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107–1149, 2003.

倒立摆（5.1节）所需的样本，系统方程及参数；LSPI方法及其结果

[RB93] M. Riedmiller and H. Braun. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In H. Ruspini, editor, Proceedings of the IEEE International Conference on Neural Networks (ICNN), pages 586 – 591, San Francisco, 1993.

Rprop算法，一种用于批量学习的监督学习方法，训练Q函数

[Rie00] M. Riedmiller. Concepts and facilities of a neural reinforcement learning control architecture for technical process control. Journal of Neural Computing and Application, 8:323–338, 2000.

运用多层感知器表示价值函数的成功案例

[SB98] R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, Cambridge, MA, 1998.

爬山小车的模型；cartploe模型

[Tes92] G. Tesauro. Practical issues in temporal diﬀerence learning. Machine Learning, (8):257–277, 1992.

运用多层感知器表示价值函数的成功案例

NFQ文献中 参考文献的作用

猜你喜欢

NFQ文献中参考文献的作用