Cox's theorem is the strongest argument for the use of standard probability theory. Here we examine the axioms to establish a firm foundation for the interpretation of probability theory as the unique extension of true-false logic to degrees of belief.
An unfocused sweep of eight abstracts from a very busy week in AI research: Emergent tool use, why hierarchical learning can work so well, brain-inspired hardware for artificial neural networks, pretraining and transfer learning for RL, chromatic network compression, semi-supervised reward shaping, WGAN model imitation for model-based RL, and navigation in turbulent flows!
Accumulating evidence about peers to discriminate potential threats.
Learning more like a human, and more like a scientist, by actively seeking useful auxiliary questions during learning.
Long-term learning of multiple tasks without forgetting old skills, using a new technique called Pseudo-Rehearsal.
Improving safety and control by preventing all manner of reward tampering by the agent itself.
DRL may not be superhuman on Atari after all, and how to avoid making mistakes like that in the future.
An introduction and statement of purpose for a series on the basics of deep reinforcement learning
Inspecting the gradients of entropy-augmented policy updates to show their equivalence
Comparing supervised learning, random search, and deep reinforcement learning on traffic signal control.