absorb.md — A knowledge graph of what AI thinkers are actually saying

paper / demishassabis / Dec 5

AlphaZero Masters Chess and Shogi Tabula Rasa in 24 Hours via Generalized Self-Play RL

AlphaZero generalizes AlphaGo Zero's tabula rasa reinforcement learning to a single algorithm achieving superhuman performance in chess, shogi, and Go. Starting from random play with only game rules, it reaches superhuman levels within 24 hours and defeats world-champion programs in each domain. This eliminates reliance on human expertise, search heuristics, or handcrafted evaluations traditional in chess AI.

alphazeroreinforcement-learningself-playchessshogigotabula-rasa

“AlphaZero achieves superhuman performance in chess, shogi, and Go starting from random play with no domain knowledge except game rules”

paper / demishassabis / Nov 28

Parallel WaveNet Distills Sequential High-Fidelity Speech Synthesis into 20x Real-Time Feed-Forward Generation

WaveNet achieves state-of-the-art realistic speech synthesis across languages but is limited by sequential autoregressive generation, unsuitable for real-time parallel computing. Probability Density Distillation trains a parallel feed-forward network from a pretrained WaveNet, preserving audio quality. The distilled model generates speech over 20x faster than real-time and powers Google Assistant's English and Japanese voices.

speech-synthesiswavenetparallel-generationprobability-density-distillationmachine-learningdeepmind

“WaveNet is the state-of-the-art in realistic speech synthesis, rated more natural than previous systems for many languages.”

paper / demishassabis / Jul 19

Imagination-Augmented Agents Boost Data Efficiency in Deep RL via Flexible Model Integration

Imagination-Augmented Agents (I2As) fuse model-free and model-based deep RL by feeding learned environment model predictions as context into policy networks, enabling arbitrary implicit planning without prescribed model usage. This contrasts with traditional model-based methods that rigidly dictate model-to-policy translation. I2As demonstrate superior data efficiency, performance, and robustness to model errors over baselines.

deep-reinforcement-learningimagination-augmented-agentsmodel-based-rlreinforcement-learningarxiv-paperdemis-hassabis

“I2As combine model-free and model-based reinforcement learning in a novel architecture.”

paper / demishassabis / Jul 11

SCAN Enables Unsupervised Discovery and Symbolic Recombination of Hierarchical Visual Concepts

SCAN learns compositional, hierarchical visual concepts through fast symbol association grounded in unsupervised disentangled primitives, requiring minimal symbol-image pairings without assumptions on symbol form. It supports bi-directional multimodal inference, generating diverse images from symbols and symbols from images. Learned logical operations allow traversal, manipulation, and recombination of concepts to produce novel visuals beyond training distributions.

hierarchical-conceptssymbol-concept-associationunsupervised-learningvisual-abstractionsmultimodal-inferenceconcept-recombinationdisentangled-primitives

“SCAN discovers visual primitives in an unsupervised manner”

paper / demishassabis / Jun 30

NoisyNets Enable Superior Exploration in Deep RL via Learned Parametric Noise

NoisyNet introduces parametric noise to deep RL agent weights, inducing stochastic policies that enhance exploration without relying on traditional heuristics. Noise parameters are optimized via gradient descent alongside network weights, maintaining low computational overhead. It outperforms entropy rewards in A3C and ε-greedy in DQN/dueling architectures across Atari games, often achieving superhuman performance from subhuman baselines.

noisy-netsreinforcement-learningexploration-strategiesdeep-rlatari-gamesa3cdqn

“NoisyNet adds parametric noise to agent weights, with noise parameters learned by gradient descent”

paper / demishassabis / Jun 20

Agent Masters Grounded Language via RL in Simulated 3D World, Generalizing to Novel Instructions

An AI agent learns to interpret natural language instructions in a simulated 3D environment using reinforcement learning combined with unsupervised learning, starting from minimal priors. It grounds linguistic symbols to emergent perceptual representations and action sequences, enabling execution of unseen instructions in novel situations. Learning efficiency for new words accelerates with growing semantic knowledge, demonstrating semantic bootstrapping.

grounded-language3d-simulationreinforcement-learninglanguage-understandingembodied-aisemantic-generalization

“Agent learns to execute written instructions successfully in simulated 3D world via rewards”

paper / demishassabis / Mar 6

Neural Episodic Control Accelerates Deep RL Learning via Semi-Tabular Value Function

Neural Episodic Control (NEC) introduces a deep RL agent that rapidly assimilates new experiences using a semi-tabular value function representation: a buffer of slowly changing state embeddings paired with rapidly updated value estimates. This design enables sample efficiency far superior to standard deep RL methods, which require orders of magnitude more data than humans. Empirical results demonstrate NEC learns significantly faster than state-of-the-art general-purpose deep RL agents across diverse environments.

neural-episodic-controldeep-reinforcement-learningvalue-functionsemi-tabular-representationsample-efficiencymachine-learningarxiv-paper

“Standard deep RL methods require orders of magnitude more data than humans to achieve reasonable performance.”

paper / demishassabis / Dec 12

DeepMind Lab: 3D Game Platform for AI Agent Research in Complex Environments

DeepMind Lab is a first-person 3D game platform built for advancing general AI and ML through agent training in large, partially observed, visually diverse worlds. It features a simple, flexible API that supports rapid iteration on creative task designs and novel AI architectures. Powered by a high-performance game engine, it is optimized for research community use.

deepmind-labai-researchreinforcement-learning3d-game-environmentdeepmindarxiv-paperai-platform

“DeepMind Lab is designed for research and development of general artificial intelligence and machine learning systems”

paper / demishassabis / Dec 2

Elastic Weight Consolidation Prevents Catastrophic Forgetting in Sequential Task Learning

Neural networks suffer from catastrophic forgetting when learning sequential tasks, overwriting prior knowledge. The proposed method overcomes this by selectively slowing learning rates on weights critical to old tasks using elastic weight consolidation. It demonstrates scalability on permuted MNIST classification sequences and effectiveness on sequential Atari 2600 games, maintaining long-term expertise.

catastrophic-forgettingneural-networkscontinual-learningelastic-weight-consolidationmachine-learningdeep-learningreinforcement-learning

“Neural networks generally cannot learn tasks sequentially without catastrophic forgetting.”

paper / demishassabis / Jun 14

Hippocampus-Inspired Episodic Control Outperforms Deep RL in Sample Efficiency and Reward

A model-free episodic control algorithm mimicking hippocampal episodic memory enables rapid exploitation of rewarding environment nuances, contrasting with deep RL's millions of interactions for human-level performance. It solves difficult sequential decision tasks faster than state-of-the-art deep RL methods. On challenging domains, it achieves both accelerated strategy acquisition and superior total rewards.

deep-reinforcement-learningmodel-free-episodic-controlhippocampus-episodic-memorysequential-decision-makingarxiv-paperdemis-hassabis

“State-of-the-art deep reinforcement learning algorithms require many millions of interactions to reach human-level performance”

paper / demishassabis / Dec 28

Hubel-Wiesel Modules Model Perception-Memory Interface with View-Invariance and Episodic Recall

Proposes algorithmic framework modeling perception-memory interface using approximate Hubel-Wiesel modules for hierarchical invariant object representations. Accounts for view-dependence/invariance in visual cortex (V1 to IT) and episodic interference in medial temporal lobe. Unifies two-speed learning from neocortex to hippocampus via novel interpretation of Hubel-Wiesel conjecture on complex receptive fields.

neural-computationhubel-wieselreceptive-fieldsview-invarianceepisodic-memoryvisual-cortexarxiv-paper

“Framework models interface between perception and memory at algorithmic level”

Demis Hassabis