强化学习
-
goal:learn how to take actions maximize reward
-
agent and environment
-
environment–>state–>agent–>action–>environment–>reward&next state–>agent
-
example :
- cart-pole problem(倒立摆问题)
list the Objective State Action & Reward - Robot locomotion
- Atari Games
- go
- cart-pole problem(倒立摆问题)
-
markov decision process (无记忆性)
S,A,R,P,Y
definition: Value function and Q-value function
how good is a state??? && how good a state-action pair???
Bellman equation:如果我们之前的状态选择是最大的,那么总体也是最优的
the optimal policy is a policy that every step is optimal
function Q is a very complex function and we want to use a neural network to approximate the function.
Training the Q-network:Experience Replay
弄一下放到一个集合里,再选取一个batch,以其为集合作为训练集。
论文关于Q-learning -
Spiking NN 脉冲神经网络
脉冲神经网络
少数派报告