DRL may not be superhuman on Atari after all, and how to avoid making mistakes like that in the future.
An introduction and statement of purpose for a series on the basics of deep reinforcement learning
Inspecting the gradients of entropy-augmented policy updates to show their equivalence
Traffic signal control comparing supervised learning, random search, and deep reinforcement learning
Hierarchical RL for concurrent discovery of compound and composable policies.
Efficient exploration with self-imitation learning.
Expanding DQN to produce estimates of return distributions, and an exploration into why this helps learning
Better imitation learning with self-correcting policies by negative sampling.
CNNs trained in "the usual way" tend to learn something different than you might expect. They learn to recognize textures (local structure) rather than shapes (global structure).
Way Off-Policy Batch DRL using a generative model of pre-recorded trajectories, and bias correction.