Goal
Control a quadrotor with a neural network trained using reinforcement learning.
Policy network is a function directly mapping a state to rotor thrusts.
Related Work
Guided Policy Search with a MPC Controller
This work uses a policy that maps the raw sensor data to the rotor velocities.
Contribution
Propose a deterministic on-policy method using zero-bias, zero variance samples.
Use small number of high quality samples, so there is only a small burden in neural network.
Network Structure
input: {orientation(rotation matrix), position, angular velocity, linear velocity} --> 18-dimensional state vector
output: 4-dimensional action vector
Exploration Strategy
TRPO