Keeping to the Narrow Path
Better imitation learning with self-correcting policies by negative sampling.
Keeping to the Narrow Path
Better imitation learning with self-correcting policies by negative sampling.
Way Off-Policy Batch DRL
Pre-training using a generative model of pre-recorded trajectories and bias correction.
A New Series arXiv Sampler
Beginning a new series highlighting a few interesting RL papers on the arXiv each week. This week: Simple curriculum learning, learning to interact with humans, and warm starting RL with propositional logic.