Better imitation learning with self-correcting policies by negative sampling.
Pre-training using a generative model of pre-recorded trajectories and bias correction.
Beginning a new series highlighting a few interesting RL papers on the arXiv each week. This week: Simple curriculum learning, learning to interact with humans, and warm starting RL with propositional logic.
- 1
- 2