Chronological feed of everything captured from Demis Hassabis.
paper / demishassabis / Jun 10
Machine learning experts can address climate change by applying ML to reduce greenhouse gas emissions and enhance societal adaptation. Key areas include smart grids and disaster management, where ML fills critical gaps through interdisciplinary collaboration. The paper outlines research questions and business opportunities, urging the ML community to prioritize these efforts.
climate-changemachine-learningarxiv-paperai-applicationssustainabilitydemis-hassabisgreenhouse-emissions
“Machine learning can be a powerful tool in reducing greenhouse gas emissions”
paper / demishassabis / Jul 3
A population of independent RL agents, trained concurrently across thousands of parallel matches in randomized Quake III Arena Capture the Flag environments, attains human-level performance using only pixel and score inputs. The approach employs a two-tier optimization with self-learned internal rewards supplementing sparse win signals, paired with a temporally hierarchical action representation for multi-timescale reasoning. Agents exhibit human-like behaviors including navigation, following, and defending via encoded high-level game knowledge, outperforming strong humans and prior bots in tournament evaluations.
reinforcement-learningmulti-agent-rldeep-rlpopulation-based-trainingmultiplayer-gamesquake-iii-ctf
“RL agent achieves human-level performance in 3D multiplayer first-person game Quake III Arena Capture the Flag using only pixels and game points as input.”
paper / demishassabis / Mar 28
Standard RL algorithms with deep networks fail on simple tasks under partial observability, even with extensive memory, because they store irrelevant information in suboptimal formats. The MERLIN architecture integrates memory formation guided by predictive modeling, allowing a single agent to maintain long-duration memories and solve partially observable tasks in 3D VR environments. This unifies RL with inference to tackle canonical psychology and neurobiology benchmarks without assumptions on input dimensionality or episode length.
reinforcement-learningpartial-observabilitypredictive-memorydeep-rlmemory-augmented-agentsgoal-directed-agentsmerlin-model
“Contemporary RL algorithms struggle to solve simple tasks when information is concealed from the agent's sensors due to partial observability.”
paper / demishassabis / Feb 28
Memory-based Parameter Adaptation (MemPA) stores training examples in memory and performs context-based lookups to directly modify neural network weights, enabling much higher learning rates than standard gradient updates. This approach accelerates adaptation to distribution shifts, avoids performance degradation on prior data, and mitigates issues like catastrophic forgetting, imbalanced labels, and slow evaluation-time learning. Demonstrated on large-scale image classification and language modeling, it supports fast, stable knowledge acquisition.
memory-based-adaptationparameter-adaptationcatastrophic-forgettingneural-networksdeep-learningmachine-learningarxiv-paper
“MemPA uses context-based lookup from stored examples to directly modify neural network weights”
paper / demishassabis / Feb 8
State-space generative models learn compact representations from raw pixels to predict action sequence outcomes in Atari games, drastically cutting computational costs versus standard models. These models maintain high accuracy on Arcade Learning Environment dynamics. In RL, agents querying these models for planning outperform model-free baselines on Ms. Pac-Man.
reinforcement-learninggenerative-modelsstate-space-modelsmodel-based-rlatari-gamesmachine-learning
“State-space models substantially reduce computational costs for predicting sequences of actions compared to other generative models.”
paper / demishassabis / Jan 24
Psychlab integrates classical psychology experiments into DeepMind Lab for testing both human and RL agents via a flexible API, with implementations for visual search, change detection, motion discrimination, and object tracking. Analysis of the UNREAL agent shows it learns faster for larger target stimuli than smaller ones. Adding a foveal vision model corrects this bias and boosts UNREAL's performance on Psychlab and standard DMLab tasks.
psychlabdeep-reinforcement-learningdeepmind-labvisual-psychophysicscognitive-scienceunreal-agentfoveal-vision
“UNREAL agent learns more quickly about larger target stimuli than smaller ones”
paper / demishassabis / Dec 5
AlphaZero generalizes AlphaGo Zero's tabula rasa reinforcement learning to a single algorithm achieving superhuman performance in chess, shogi, and Go. Starting from random play with only game rules, it reaches superhuman levels within 24 hours and defeats world-champion programs in each domain. This eliminates reliance on human expertise, search heuristics, or handcrafted evaluations traditional in chess AI.
alphazeroreinforcement-learningself-playchessshogigotabula-rasa
“AlphaZero achieves superhuman performance in chess, shogi, and Go starting from random play with no domain knowledge except game rules”
paper / demishassabis / Nov 28
WaveNet achieves state-of-the-art realistic speech synthesis across languages but is limited by sequential autoregressive generation, unsuitable for real-time parallel computing. Probability Density Distillation trains a parallel feed-forward network from a pretrained WaveNet, preserving audio quality. The distilled model generates speech over 20x faster than real-time and powers Google Assistant's English and Japanese voices.
speech-synthesiswavenetparallel-generationprobability-density-distillationmachine-learningdeepmind
“WaveNet is the state-of-the-art in realistic speech synthesis, rated more natural than previous systems for many languages.”
paper / demishassabis / Jul 19
Imagination-Augmented Agents (I2As) fuse model-free and model-based deep RL by feeding learned environment model predictions as context into policy networks, enabling arbitrary implicit planning without prescribed model usage. This contrasts with traditional model-based methods that rigidly dictate model-to-policy translation. I2As demonstrate superior data efficiency, performance, and robustness to model errors over baselines.
deep-reinforcement-learningimagination-augmented-agentsmodel-based-rlreinforcement-learningarxiv-paperdemis-hassabis
“I2As combine model-free and model-based reinforcement learning in a novel architecture.”
paper / demishassabis / Jul 11
SCAN learns compositional, hierarchical visual concepts through fast symbol association grounded in unsupervised disentangled primitives, requiring minimal symbol-image pairings without assumptions on symbol form. It supports bi-directional multimodal inference, generating diverse images from symbols and symbols from images. Learned logical operations allow traversal, manipulation, and recombination of concepts to produce novel visuals beyond training distributions.
hierarchical-conceptssymbol-concept-associationunsupervised-learningvisual-abstractionsmultimodal-inferenceconcept-recombinationdisentangled-primitives
“SCAN discovers visual primitives in an unsupervised manner”
paper / demishassabis / Jun 30
NoisyNet introduces parametric noise to deep RL agent weights, inducing stochastic policies that enhance exploration without relying on traditional heuristics. Noise parameters are optimized via gradient descent alongside network weights, maintaining low computational overhead. It outperforms entropy rewards in A3C and ε-greedy in DQN/dueling architectures across Atari games, often achieving superhuman performance from subhuman baselines.
noisy-netsreinforcement-learningexploration-strategiesdeep-rlatari-gamesa3cdqn
“NoisyNet adds parametric noise to agent weights, with noise parameters learned by gradient descent”
paper / demishassabis / Jun 20
An AI agent learns to interpret natural language instructions in a simulated 3D environment using reinforcement learning combined with unsupervised learning, starting from minimal priors. It grounds linguistic symbols to emergent perceptual representations and action sequences, enabling execution of unseen instructions in novel situations. Learning efficiency for new words accelerates with growing semantic knowledge, demonstrating semantic bootstrapping.
grounded-language3d-simulationreinforcement-learninglanguage-understandingembodied-aisemantic-generalization
“Agent learns to execute written instructions successfully in simulated 3D world via rewards”
paper / demishassabis / Mar 6
Neural Episodic Control (NEC) introduces a deep RL agent that rapidly assimilates new experiences using a semi-tabular value function representation: a buffer of slowly changing state embeddings paired with rapidly updated value estimates. This design enables sample efficiency far superior to standard deep RL methods, which require orders of magnitude more data than humans. Empirical results demonstrate NEC learns significantly faster than state-of-the-art general-purpose deep RL agents across diverse environments.
neural-episodic-controldeep-reinforcement-learningvalue-functionsemi-tabular-representationsample-efficiencymachine-learningarxiv-paper
“Standard deep RL methods require orders of magnitude more data than humans to achieve reasonable performance.”
paper / demishassabis / Dec 12
DeepMind Lab is a first-person 3D game platform built for advancing general AI and ML through agent training in large, partially observed, visually diverse worlds. It features a simple, flexible API that supports rapid iteration on creative task designs and novel AI architectures. Powered by a high-performance game engine, it is optimized for research community use.
deepmind-labai-researchreinforcement-learning3d-game-environmentdeepmindarxiv-paperai-platform
“DeepMind Lab is designed for research and development of general artificial intelligence and machine learning systems”
paper / demishassabis / Dec 2
Neural networks suffer from catastrophic forgetting when learning sequential tasks, overwriting prior knowledge. The proposed method overcomes this by selectively slowing learning rates on weights critical to old tasks using elastic weight consolidation. It demonstrates scalability on permuted MNIST classification sequences and effectiveness on sequential Atari 2600 games, maintaining long-term expertise.
catastrophic-forgettingneural-networkscontinual-learningelastic-weight-consolidationmachine-learningdeep-learningreinforcement-learning
“Neural networks generally cannot learn tasks sequentially without catastrophic forgetting.”
paper / demishassabis / Jun 14
A model-free episodic control algorithm mimicking hippocampal episodic memory enables rapid exploitation of rewarding environment nuances, contrasting with deep RL's millions of interactions for human-level performance. It solves difficult sequential decision tasks faster than state-of-the-art deep RL methods. On challenging domains, it achieves both accelerated strategy acquisition and superior total rewards.
deep-reinforcement-learningmodel-free-episodic-controlhippocampus-episodic-memorysequential-decision-makingarxiv-paperdemis-hassabis
“State-of-the-art deep reinforcement learning algorithms require many millions of interactions to reach human-level performance”
paper / demishassabis / Dec 28
Proposes algorithmic framework modeling perception-memory interface using approximate Hubel-Wiesel modules for hierarchical invariant object representations. Accounts for view-dependence/invariance in visual cortex (V1 to IT) and episodic interference in medial temporal lobe. Unifies two-speed learning from neocortex to hippocampus via novel interpretation of Hubel-Wiesel conjecture on complex receptive fields.
neural-computationhubel-wieselreceptive-fieldsview-invarianceepisodic-memoryvisual-cortexarxiv-paper
“Framework models interface between perception and memory at algorithmic level”