DRL may not be superhuman on Atari after all, and how to avoid making mistakes like that in the future.
Traffic signal control comparing supervised learning, random search, and deep reinforcement learning
Hierarchical RL for concurrent discovery of compound and composable policies.
Efficient exploration with self-imitation learning.
Better imitation learning with self-correcting policies by negative sampling.
Way Off-Policy Batch DRL using a generative model of pre-recorded trajectories, and bias correction.
Beginning a new series highlighting a few interesting RL papers on the arXiv each week. This week: Simple curriculum learning, learning to interact with humans, and warm starting RL with propositional logic.