absorb.md

Demis Hassabis

Chronological feed of everything captured from Demis Hassabis.

Demis Hassabis Discusses AlphaGo and AGI on DeepMind Podcast

Demis Hassabis, CEO of Google DeepMind, recently featured on the Google DeepMind Podcast alongside Michael Fry to discuss the Alpha series (including AlphaGo) and Artificial General Intelligence (AGI). The discussion likely covered advancements in AI for science and the broader implications of these technologies.

AlphaGo: A Decade of AI Advancement and Its AGI Implications

Ten years post-AlphaGo's victory, the AI community reflects on its pivotal role in initiating the modern AI era. The technological advancements demonstrated, particularly by "Move 37," proved AI's readiness for complex scientific problem-solving. These methods are now considered foundational for the development of Artificial General Intelligence (AGI).

Aletheia: Advancing AI in Mathematical Research from Olympiad to PhD-level

Aletheia, an advanced math research agent powered by Gemini Deep Think, demonstrates robust capabilities in mathematical problem-solving. It excels at iteratively generating, verifying, and revising solutions in natural language, extending beyond Olympiad-level problems to PhD-level exercises. The system leverages intensive tool use to navigate complex mathematical research and has achieved milestones such as autonomously generating research papers and solving open problems.

Demis Hassabis on the Path to AGI and the Impact of AI on Society

Demis Hassabis, CEO of Google DeepMind, discusses the current state and future of AI, emphasizing the need for breakthroughs in continual learning, memory, long-term reasoning, and planning to achieve Artificial General Intelligence (AGI). He defines AGI as a system possessing all human cognitive capabilities, including creativity and physical intelligence, and estimates it to be 5-10 years away. Hassabis also highlights the potential of AI in various product applications like smart glasses and addresses the economic and societal implications of widespread AI adoption, stressing adaptation and the evolving nature of human purpose.

Google DeepMind Bolsters Robotics Team with Boston Dynamics Veteran

Aaron Saunders, former CTO of Boston Dynamics, has joined Google DeepMind as VP of hardware engineering. This strategic hire significantly strengthens DeepMind's robotics team, signaling an increased focus on the intersection of robotics and AI. The company is actively recruiting to further expand its capabilities in this domain.

Google DeepMind Partners with Boston Dynamics, Expands Robotics Team for AGI Development

Google DeepMind is advancing its Gemini Robotics initiative to integrate AI into physical systems, a crucial step for achieving Artificial General Intelligence (AGI). This effort includes a strategic partnership with Boston Dynamics, leveraging DeepMind's robotics models with Boston Dynamics' Atlas humanoid hardware. Concurrently, Google DeepMind is expanding its internal robotics team, notably by hiring former Boston Dynamics CTO Aaron Saunders, to strengthen its hardware engineering capabilities.

Demis Hassabis on the Future of AI: AGI, Multimodality, and Societal Impact

Demis Hassabis, CEO of Google DeepMind, discusses the rapid advancements in AI, emphasizing the imminent arrival of Artificial General Intelligence (AGI) within 5-10 years. He highlights the critical role of multimodal AI, especially in video understanding, and the development of reliable agent-based systems as key short-term developments. Hassabis also addresses the societal implications of AGI, including the need for careful consideration of AI safety, responsible use, and humanity's adaptation to a potentially post-scarcity future.

SIMA 2: A Generalist Embodied Agent Powered by Gemini Achieves Near-Human Performance and Open-Ended Learning in Virtual Worlds

SIMA 2 is an embodied AI agent utilizing a Gemini foundation model, demonstrating advanced interaction capabilities in diverse 3D virtual environments. It surpasses previous iterations by moving beyond simple command execution to engage in goal-directed reasoning, conversation, and multimodal instruction interpretation. This agent exhibits near-human performance in various games and generalizes to novel environments, while also possessing the capacity for autonomous skill acquisition through self-generated tasks and rewards.

Demis Hassabis on DeepMind's AI Advancements and Future Outlook

Demis Hassabis discusses Google DeepMind's role as the AI engine for Alphabet, integrating advanced models like Gemini across various Google products. He highlights the development of "world models" such as Genie for interactive environment generation, crucial for AGI and robotics. Hassabis also touches upon the application of AI in scientific discovery through Isomorphic, aiming to revolutionize drug discovery and accelerate breakthroughs in fields like material science and health.

Demis Hassabis on World Models, Jagged Intelligence, and the Road to AGI Benchmarks

Google DeepMind is converging its specialized models (Gemini, Genie, Veo) into a unified "omni model" capable of handling multimodal tasks at parity with specialized systems — a trajectory Hassabis frames as necessary for AGI. Current frontier models exhibit "jagged intelligence": superhuman on narrow benchmarks (e.g., 99.2% on AIME, IMO gold medal) yet brittle on simple reasoning tasks, pointing to unresolved gaps in consistency, planning, and memory. To address benchmark saturation and measure progress toward AGI more rigorously, DeepMind is launching Game Arena with Kaggle — a self-scaling, adversarial evaluation environment where model capability determines test difficulty. Genie 3's world model architecture (persistent, physics-consistent world generation) is being used to generate synthetic training data for robotics and general AGI systems, with a Simma agent already operating inside Genie-generated environments.

Hassabis's "Learnable Natural Systems" Conjecture: Classical AI May Model All of Nature's Structured Patterns

In his Nobel Prize lecture, Demis Hassabis proposed that any pattern generated or found in nature can be efficiently discovered and modeled by a classical learning algorithm — a conjecture grounded in the observation that natural systems carry learned structure imposed by evolutionary and physical selection processes. This "survival of the stablest" principle means that proteins, planetary orbits, geological formations, and biological systems all inhabit lower-dimensional manifolds that neural networks can exploit via gradient following. The paradigm is validated empirically by AlphaFold, AlphaGo, and Veo's emergent physics modeling, and Hassabis suggests it points toward a new complexity class — analogous to P and NP — defining problems solvable by neural-network-based classical systems. He views this as a physics question as much as a computer science one, framing information as the most fundamental unit of the universe and P=NP as a core question about the informational structure of reality.

Gemini 2.5 Pro Achieves SoTA on Coding/Reasoning While Spanning Full Capability-Cost Pareto Frontier

Google DeepMind's Gemini 2.X model family introduces a tiered architecture — 2.5 Pro, 2.5 Flash, 2.0 Flash, and Flash-Lite — designed to cover the full capability-vs-cost tradeoff spectrum. Gemini 2.5 Pro is positioned as a "thinking model" achieving state-of-the-art on frontier coding and reasoning benchmarks, with native support for up to 3 hours of video input and long-context multimodal processing. The combination of extended context, multimodal understanding, and reasoning is explicitly framed as an enabler for next-generation agentic workflows. The family's architecture reflects a deliberate design philosophy: match model capability to deployment constraints rather than optimizing for a single frontier point.

Google I/O 2024: Shifting AI Strategy and the Road to AGI

Google I/O 2024 revealed a significant shift in the company's AI strategy, emphasizing practical applications and a more confident stance in the AI race. Google is integrating AI across its product ecosystem, notably in Search with "AI mode" and the widespread adoption of Gemini. Despite a focus on product integration, discussions with Demis Hassabis highlight Google DeepMind's continued pursuit of AGI, viewing current advancements as building blocks for future generalized intelligence while acknowledging challenges in productizing rapidly evolving AI capabilities.

Gemma 3: Multimodal, Efficient, and Scalable Language Models

Gemma 3 introduces a multimodal architecture with integrated vision understanding, expanded language support, and significantly longer context windows (up to 128K tokens). Architectural changes, specifically in attention mechanisms, optimize KV-cache memory usage. The models achieve superior performance over Gemma 2 through distillation and a refined post-training recipe, making them competitive with larger, state-of-the-art models like Gemini-1.5-Pro.

Demis Hassabis on AI-Driven Scientific Advancement and Personal Philosophy

Demis Hassabis, drawing from a lifelong engagement with games and AI, advocates for AI as the ultimate tool for scientific discovery. He asserts that AI's pattern recognition and data insight capabilities can profoundly advance fields like biology, exemplified by AlphaFold's impact on protein folding. Hassabis emphasizes the importance of tackling ambitious problems at the opportune moment and embracing interdisciplinary collaboration to unlock significant breakthroughs.

AlphaFold: A Case Study in AI-Driven Scientific Discovery

Demis Hassabis, CEO of Google DeepMind, discusses the journey from AI in games to its application in scientific grand challenges, exemplified by AlphaFold. He highlights how DeepMind^{\prime}s approach of leveraging self-learning in combinatorial search spaces, initially perfected in games like Go, was successfully adapted to solve complex problems in structural biology, particularly protein folding. This success, marked by AlphaFold^{\prime}s atomic-level accuracy and subsequent open-sourcing, signals a new era of "digital biology" and AI-accelerated scientific discovery across various fields.

AlphaProteo Achieves 3-300x Higher Binding Affinities in De Novo Protein Binder Design

AlphaProteo, a new family of ML models, enables de novo design of protein binders with 3- to 300-fold improved affinities over prior methods across seven targets. It delivers higher experimental success rates, allowing ready-to-use binders via one round of medium-throughput screening without optimization. This addresses the challenge of on-demand high-affinity binder generation for biomedical applications.

Demis Hassabis on the Future of AI: From General Intelligence to Societal Impact

Demis Hassabis discusses the current state and future trajectory of AI, emphasizing Google DeepMind's role in developing advanced AI models like Gemini and Project Astra. He highlights the distinction between near-term overhype and long-term underappreciation of AGI's transformative potential, advocating for careful development and international cooperation to mitigate risks and ensure beneficial outcomes for humanity.

Imagen 3: Google's Latent Diffusion Model Outperforms SOTA in Text-to-Image Generation

Imagen 3 is a latent diffusion model from Google that produces high-quality images from text prompts. It surpasses other state-of-the-art models in blind preference evaluations conducted at the time of assessment. The work includes detailed quality evaluations alongside safety and representation analyses with mitigations to reduce potential harms.

Gemma 2 Achieves SOTA Performance in 2B-27B Scale via Architectural Tweaks and Distillation

Gemma 2 introduces lightweight open models from 2B to 27B parameters, outperforming peers and rivaling models 2-3x larger. Key enhancements include interleaving local-global attentions and group-query attention in the Transformer architecture. The 2B and 9B variants use knowledge distillation over next-token prediction for training. All models are released openly to the community.

Demis Hassabis on the Future of AI

Demis Hassabis discusses the evolution and future of AI, emphasizing the journey of DeepMind from its inception when AI was not a mainstream topic to its current status at the forefront of global AI research. He highlights the strategic milestones, breakthroughs in areas like protein folding and mathematical problem-solving, and the ongoing development of multimodal AI agents. Hassabis also addresses critical challenges, including the need for advanced planning and reasoning capabilities in AI, responsible development, and the UK's role in fostering deep tech innovation.

Med-Gemini Achieves State-of-the-Art in 10 Medical Benchmarks, Outperforming GPT-4

Med-Gemini, a specialized multimodal family of Gemini models for medicine, integrates web search and custom encoders for novel modalities. It sets new SoTA on 10 of 14 medical benchmarks, surpassing GPT-4 across all comparable tasks with margins up to 44.5% relatively on multimodal benchmarks like NEJM Image Challenges. Long-context capabilities enable SoTA in health record retrieval and video QA via in-context learning alone, indicating utility in summarization, dialogue, research, and education.

RecurrentGemma Leverages Griffin Architecture for Transformer-Free Efficiency in Open Language Models

RecurrentGemma introduces Google's Griffin architecture, combining linear recurrences with local attention to deliver strong language modeling performance. It maintains a fixed-sized state, minimizing memory usage and enabling efficient long-sequence inference without transformers. The 2B and 9B parameter models, available in pre-trained and instruction-tuned variants, match Gemma baselines despite training on fewer tokens.

Gemma: Lightweight Open Models Rivaling Proprietary Tech with Strong Benchmarks and Safety Focus

Gemma family consists of 2B and 7B parameter open models derived from Gemini's research and technology, delivering state-of-the-art performance in language understanding, reasoning, and safety. These models outperform comparable open models on 11 of 18 text-based tasks. Pretrained and fine-tuned checkpoints are released alongside comprehensive safety evaluations to advance responsible LLM development.

Gemini 1.5 Achieves Near-Perfect Recall and Reasoning Over 10M Token Contexts Across Modalities

Gemini 1.5 Pro and Flash models process millions of tokens, including long documents, hours of video, and audio, with near-perfect (>99%) retrieval up to 10M tokens and continued next-token prediction gains. They surpass prior versions and rivals like Claude 3.0 (200k) and GPT-4 Turbo (128k) in long-context retrieval, QA, and ASR tasks while matching or exceeding Gemini 1.0 Ultra on broad benchmarks. Real-world applications show 26-75% time savings in professional tasks and rapid learning of rare languages from grammar manuals.

Demis Hassabis on the Future of AGI and AI Development

Demis Hassabis, CEO of DeepMind, discusses the rapid advancements in AI, predicting AGI-like systems within a decade. He emphasizes the importance of multimodal models for understanding the real world, the necessity of combining large models with planning mechanisms, and the critical role of responsible development and safety in achieving beneficial AI.

Gemini Ultra Sets New Multimodal AI Benchmarks, Achieving Human-Expert MMLU Performance

Google's Gemini family introduces Ultra, Pro, and Nano multimodal models natively handling text, images, audio, and video. Gemini Ultra leads 30 of 32 benchmarks, marking the first model to reach human-expert levels on MMLU and topping all 20 multimodal benchmarks evaluated. The models support scalable deployment from complex reasoning to on-device applications, with responsible post-training emphasized.

AlphaZero Extracts Novel, Human-Learnable Chess Concepts Beyond Existing Knowledge

Researchers developed a method to extract novel chess concepts from AlphaZero, a self-play trained AI that achieved superhuman performance without human data. Analysis reveals AlphaZero encodes knowledge extending human understanding yet accessible for learning. In a human study, four top grandmasters improved at solving prototype positions embodying these concepts, demonstrating AI's potential to advance human expertise.

TacticAI: AI Assistant Outperforms Human Football Corner Kick Tactics in Expert Blind Tests

TacticAI is a geometric deep learning system for analyzing and generating football corner kick tactics, developed with Liverpool FC experts. It predicts outcomes like receivers and shots while generating alternative player positions, achieving data efficiency despite scarce gold-standard data. In blind evaluations by Liverpool FC coaches, TacticAI's suggestions were preferred over real tactics 90% of the time and indistinguishable from actual setups.

Transformer-Based Recurrent Neural Network Outperforms Conventional Decoders on Google's Surface Code Hardware

A recurrent transformer neural network decodes the surface code, surpassing state-of-the-art algorithmic decoders on real data from Google's Sycamore processor for distance-3 and distance-5 codes. It retains superior performance on simulated data with realistic noise—including cross-talk, leakage, and analog readouts—up to distance 11. The decoder generalizes beyond its 25-cycle training, demonstrating ML's potential to exceed human-designed quantum error correction algorithms.

AlphaZero Diversity League Doubles Puzzle-Solving Capacity and Boosts Elo via Specialized Agents

Researchers extend AlphaZero into AZ_db, a latent-conditioned architecture enabling a league of diverse agents trained with behavioral diversity techniques and sub-additive planning for idea selection. AZ_db generates broader chess move ideas, solves twice as many challenging puzzles—including Penrose positions—as baseline AlphaZero, and achieves 50 Elo improvement by specializing agents to openings. Findings indicate diversity bonuses in AI teams mirror human teams for computationally hard tasks.

Targeted Human Judgments and Evidence Provision Boost RLHF for Helpful, Correct, Harmless Dialogue Agents

Sparrow, an information-seeking dialogue agent, uses RLHF with decomposed natural language rules for targeted rater judgments on helpfulness and harmlessness, enabling efficient rule-conditional reward models. It provides source evidence supporting factual claims, validating 78% of responses. Sparrow outperforms prompted LM baselines in preferences, resists adversarial probes (violating rules only 8% of the time), but shows distributional biases despite rule adherence.

Demis Hassabis: From Games to AGI, AlphaFold Breakthroughs Chart Path to Simulating Biology and Physics

Demis Hassabis recounts his journey from child chess prodigy and game developer to DeepMind CEO, emphasizing games as ideal benchmarks for AI due to clear metrics, self-play efficiency, and human-level challenges like AlphaGo. AlphaFold 2 solves the 50-year protein folding problem by predicting 3D structures from amino acid sequences in seconds, enabling proteome-scale simulations and accelerating drug discovery via end-to-end deep learning with physics constraints. He envisions scaling to virtual cells, fusion plasma control via RL, quantum simulations, and AGI to probe universe fundamentals like origins of life, where humanity likely stands alone post-great filters such as multicellularity.

DeepNash Masters Imperfect-Information Stratego via Model-Free RL, Surpassing Human Experts

DeepNash, a model-free multiagent RL agent, learns Stratego from scratch using self-play and achieves human expert level without search. Stratego's game tree exceeds 10^535 nodes, 10^175 times larger than Go's, with imperfect information and long episodes complicating decisions. The R-NaD algorithm enables convergence to approximate Nash equilibrium by regularizing multiagent dynamics, outperforming prior AI and ranking top-3 against humans on Gravon.

Gopher: 280B Parameter LM Excels on Diverse Tasks with Scale-Driven Gains in Comprehension but Limited Reasoning

DeepMind's Gopher, a 280B parameter Transformer LM, achieves SOTA performance across 152 diverse tasks when scaled from tens of millions to billions of parameters. Scaling yields largest improvements in reading comprehension, fact-checking, and toxic language detection, while logical and mathematical reasoning show smaller benefits. The analysis examines training data impacts on bias/toxicity and discusses LM applications to AI safety and harm mitigation.

AlphaZero Acquires Human Chess Concepts During Self-Training

Probing reveals that AlphaZero's neural network learns representations of human chess concepts as it trains from scratch on chess. The study identifies when and where these concepts emerge in the network via targeted probes across a broad range. Behavioral analysis of opening play, endorsed by Grandmaster Vladimir Kramnik, complements representational insights, with low-level details made publicly available online.

Graph Neural Networks Leverage Simulation Data to Boost Scarce Experimental Materials Predictions

Graph Neural Networks trained on large-scale simulated formation energies produce atomistic descriptors that enhance predictions of experimental formation energies for solids. These learned structural features outperform composition-based methods, especially under data scarcity or when extrapolating to novel chemical spaces. The approach bridges abundant simulation data with limited experimental datasets to improve ML-driven materials property prediction.

Alchemy Benchmark Exposes Fundamental Failures in Meta-Reinforcement Learning

Alchemy introduces a Unity-based 3D benchmark for meta-RL featuring procedurally resampled latent causal structures across episodes, enabling tasks like structure learning, online inference, hypothesis testing, and abstract action sequencing. Evaluation of powerful RL agents reveals specific meta-learning failures, validating its challenge. The benchmark is released publicly with analysis tools and agent trajectories for in-depth research.

Football Analytics: A Bidirectional Catalyst for AI and Sports Science

Football analytics leverages advances in statistical learning, game theory, and computer vision to model individual player behaviors and team coordination, addressing challenges like predictive and prescriptive analysis. This intersection creates a unique microcosm for AI research, enabling innovations such as counterfactual simulations and game-theoretic penalty kick analysis. The domain mutually benefits AI by providing real-world benchmarks and sports by enhancing performance analytics, with extensions to other sports anticipated.

AlphaZero Enables Rapid Assessment and Design of Balanced Chess Variants

AlphaZero learns near-optimal strategies from scratch for chess variants, allowing in silico evaluation of rule changes without human supervision. The study explores nine atomic modifications to chess rules, including comparisons to Fischer Random Chess, revealing novel strategic patterns while maintaining proximity to classical chess. Analytic results show variant-specific piece valuations and higher decisiveness in some variants compared to standard chess, which has high draw rates.

Single Macaque IT Neurons Encode Interpretable Semantic Factors Disentangled by Unsupervised Beta-VAE

Beta-VAE, an unsupervised generative model, disentangles face images into interpretable latent factors like gender and hair length, revealing strong correspondence with responses of individual macaque inferotemporal (IT) neurons. Unlike high-dimensional distributed codes in supervised classifiers, single IT neurons encode these low-dimensional semantic factors. Face images can be reconstructed from signals of just a few cells, indicating the ventral stream optimizes for semantic disentanglement at the single-unit level.

MEMO Network Enables Long-Distance Associative Reasoning via Separated Memories and Adaptive Hops

Existing memory-augmented neural networks fail on classic associative inference tasks requiring reasoning over distant relationships across multiple memories, as well as shortest-path tasks. MEMO addresses this by separating stored facts from their constituent items in external memory and using an adaptive retrieval mechanism that supports a variable number of memory hops. MEMO solves these novel long-distance reasoning tasks and matches state-of-the-art on bAbI.

MuZero Masters Complex Games via Learned Model Planning Without Dynamics Knowledge

MuZero integrates tree-based search with a learned model to achieve superhuman performance in Atari, Go, chess, and shogi, without prior knowledge of environment dynamics. The model iteratively predicts rewards, action policies, and value functions—quantities essential for planning. On 57 Atari games, it sets a new state-of-the-art; on board games, it matches AlphaZero's superhuman level despite lacking rules.

Older entries →