absorb.md

Demis Hassabis

Chronological feed of everything captured from Demis Hassabis.

Machine Learning's High-Impact Applications for Mitigating and Adapting to Climate Change

Machine learning experts can address climate change by applying ML to reduce greenhouse gas emissions and enhance societal adaptation. Key areas include smart grids and disaster management, where ML fills critical gaps through interdisciplinary collaboration. The paper outlines research questions and business opportunities, urging the ML community to prioritize these efforts.

Population-Based RL Achieves Human-Level Play in Multiplayer Quake III Capture the Flag

A population of independent RL agents, trained concurrently across thousands of parallel matches in randomized Quake III Arena Capture the Flag environments, attains human-level performance using only pixel and score inputs. The approach employs a two-tier optimization with self-learned internal rewards supplementing sparse win signals, paired with a temporally hierarchical action representation for multi-timescale reasoning. Agents exhibit human-like behaviors including navigation, following, and defending via encoded high-level game knowledge, outperforming strong humans and prior bots in tournament evaluations.

MERLIN: Predictive Memory Enables RL Agents to Conquer Severe Partial Observability

Standard RL algorithms with deep networks fail on simple tasks under partial observability, even with extensive memory, because they store irrelevant information in suboptimal formats. The MERLIN architecture integrates memory formation guided by predictive modeling, allowing a single agent to maintain long-duration memories and solve partially observable tasks in 3D VR environments. This unifies RL with inference to tackle canonical psychology and neurobiology benchmarks without assumptions on input dimensionality or episode length.

Memory-Based Adaptation Enables Fast, Stable Neural Network Updates Without Catastrophic Forgetting

Memory-based Parameter Adaptation (MemPA) stores training examples in memory and performs context-based lookups to directly modify neural network weights, enabling much higher learning rates than standard gradient updates. This approach accelerates adaptation to distribution shifts, avoids performance degradation on prior data, and mitigates issues like catastrophic forgetting, imbalanced labels, and slow evaluation-time learning. Demonstrated on large-scale image classification and language modeling, it supports fast, stable knowledge acquisition.

State-Space Generative Models Accelerate Model-Based RL with Pixel-Level Atari Dynamics

State-space generative models learn compact representations from raw pixels to predict action sequence outcomes in Atari games, drastically cutting computational costs versus standard models. These models maintain high accuracy on Arcade Learning Environment dynamics. In RL, agents querying these models for planning outperform model-free baselines on Ms. Pac-Man.

Psychlab Enables Psychological Testing of RL Agents, Revealing UNREAL's Size Bias and Foveal Fix

Psychlab integrates classical psychology experiments into DeepMind Lab for testing both human and RL agents via a flexible API, with implementations for visual search, change detection, motion discrimination, and object tracking. Analysis of the UNREAL agent shows it learns faster for larger target stimuli than smaller ones. Adding a foveal vision model corrects this bias and boosts UNREAL's performance on Psychlab and standard DMLab tasks.

AlphaZero Masters Chess and Shogi Tabula Rasa in 24 Hours via Generalized Self-Play RL

AlphaZero generalizes AlphaGo Zero's tabula rasa reinforcement learning to a single algorithm achieving superhuman performance in chess, shogi, and Go. Starting from random play with only game rules, it reaches superhuman levels within 24 hours and defeats world-champion programs in each domain. This eliminates reliance on human expertise, search heuristics, or handcrafted evaluations traditional in chess AI.

Parallel WaveNet Distills Sequential High-Fidelity Speech Synthesis into 20x Real-Time Feed-Forward Generation

WaveNet achieves state-of-the-art realistic speech synthesis across languages but is limited by sequential autoregressive generation, unsuitable for real-time parallel computing. Probability Density Distillation trains a parallel feed-forward network from a pretrained WaveNet, preserving audio quality. The distilled model generates speech over 20x faster than real-time and powers Google Assistant's English and Japanese voices.

Imagination-Augmented Agents Boost Data Efficiency in Deep RL via Flexible Model Integration

Imagination-Augmented Agents (I2As) fuse model-free and model-based deep RL by feeding learned environment model predictions as context into policy networks, enabling arbitrary implicit planning without prescribed model usage. This contrasts with traditional model-based methods that rigidly dictate model-to-policy translation. I2As demonstrate superior data efficiency, performance, and robustness to model errors over baselines.

SCAN Enables Unsupervised Discovery and Symbolic Recombination of Hierarchical Visual Concepts

SCAN learns compositional, hierarchical visual concepts through fast symbol association grounded in unsupervised disentangled primitives, requiring minimal symbol-image pairings without assumptions on symbol form. It supports bi-directional multimodal inference, generating diverse images from symbols and symbols from images. Learned logical operations allow traversal, manipulation, and recombination of concepts to produce novel visuals beyond training distributions.

NoisyNets Enable Superior Exploration in Deep RL via Learned Parametric Noise

NoisyNet introduces parametric noise to deep RL agent weights, inducing stochastic policies that enhance exploration without relying on traditional heuristics. Noise parameters are optimized via gradient descent alongside network weights, maintaining low computational overhead. It outperforms entropy rewards in A3C and ε-greedy in DQN/dueling architectures across Atari games, often achieving superhuman performance from subhuman baselines.

Agent Masters Grounded Language via RL in Simulated 3D World, Generalizing to Novel Instructions

An AI agent learns to interpret natural language instructions in a simulated 3D environment using reinforcement learning combined with unsupervised learning, starting from minimal priors. It grounds linguistic symbols to emergent perceptual representations and action sequences, enabling execution of unseen instructions in novel situations. Learning efficiency for new words accelerates with growing semantic knowledge, demonstrating semantic bootstrapping.

Neural Episodic Control Accelerates Deep RL Learning via Semi-Tabular Value Function

Neural Episodic Control (NEC) introduces a deep RL agent that rapidly assimilates new experiences using a semi-tabular value function representation: a buffer of slowly changing state embeddings paired with rapidly updated value estimates. This design enables sample efficiency far superior to standard deep RL methods, which require orders of magnitude more data than humans. Empirical results demonstrate NEC learns significantly faster than state-of-the-art general-purpose deep RL agents across diverse environments.

DeepMind Lab: 3D Game Platform for AI Agent Research in Complex Environments

DeepMind Lab is a first-person 3D game platform built for advancing general AI and ML through agent training in large, partially observed, visually diverse worlds. It features a simple, flexible API that supports rapid iteration on creative task designs and novel AI architectures. Powered by a high-performance game engine, it is optimized for research community use.

Elastic Weight Consolidation Prevents Catastrophic Forgetting in Sequential Task Learning

Neural networks suffer from catastrophic forgetting when learning sequential tasks, overwriting prior knowledge. The proposed method overcomes this by selectively slowing learning rates on weights critical to old tasks using elastic weight consolidation. It demonstrates scalability on permuted MNIST classification sequences and effectiveness on sequential Atari 2600 games, maintaining long-term expertise.

Hippocampus-Inspired Episodic Control Outperforms Deep RL in Sample Efficiency and Reward

A model-free episodic control algorithm mimicking hippocampal episodic memory enables rapid exploitation of rewarding environment nuances, contrasting with deep RL's millions of interactions for human-level performance. It solves difficult sequential decision tasks faster than state-of-the-art deep RL methods. On challenging domains, it achieves both accelerated strategy acquisition and superior total rewards.

Hubel-Wiesel Modules Model Perception-Memory Interface with View-Invariance and Episodic Recall

Proposes algorithmic framework modeling perception-memory interface using approximate Hubel-Wiesel modules for hierarchical invariant object representations. Accounts for view-dependence/invariance in visual cortex (V1 to IT) and episodic interference in medial temporal lobe. Unifies two-speed learning from neocortex to hippocampus via novel interpretation of Hubel-Wiesel conjecture on complex receptive fields.