absorb.md

Google DeepMind

Chronological feed of everything captured from Google DeepMind.

Google CEO Sundar Pichai on AI's Impact and Google's Vision

Sundar Pichai reflects on Google's decade of AI leadership, highlighting the company's foundational role in AI advancements like Transformers, which were developed internally to solve product challenges. He addresses the perception of Google falling behind in the "AI race," asserting that Google had advanced AI products like LaMDA internally but exercised caution in public release due to quality and safety concerns. Pichai outlines Google's strategic focus on speed and efficiency in AI products, the evolution of Search into an agentic future, and the company's long-term investments in AI infrastructure and various cutting-edge projects.

The Empirical Revolution in AI: From Rule-Based Systems to Emergent Intelligence

AI development has shifted significantly from symbolic, rule-based approaches to empiricist, data-driven learning. Early AI struggled to codify common sense and handle the messy, exception-filled nature of the real world, unlike modern large language models. These models, by processing massive datasets and leveraging architectural innovations like the Transformer, achieve complex reasoning and generalization through statistical prediction, mirroring aspects of biological intelligence.

Redefining Intelligence & AI Collaboration

AI's rapid development necessitates a re-evaluation of fundamental philosophical questions regarding mathematics and human thought. The paper "Mathematical Methods and Human Thought in the age of AI" by Terrence Tao and Tanya Cloudin proposes a "Copernican view of intelligence," advocating for a collaborative approach with AI rather than focusing on a singular, linear progression of intelligence. This perspective emphasizes appreciating diverse forms of intelligence—human, computer, and collaborative—to unlock novel possibilities and overcome current limitations.

Persuasive AI: Understanding and Mitigating Manipulation Risks

AI models capable of persuasion pose significant manipulation risks, necessitating robust safety research. Google DeepMind’s research framework defines manipulation by intent and method, distinguishing beneficial persuasion (fact-based) from harmful manipulation (emotion/bias exploitation). Findings emphasize context-specific manipulation efficacy and the critical role of explicit manipulative goals in AI behavior, underscoring the need for continuous, iterative safety evaluations before widespread deployment.

Reflection AI's Strategy for Frontier Open-Weight Agentic Intelligence

Reflection AI aims to challenge the dominance of closed-source labs and Chinese open models by developing frontier-scale, open-weight agentic AI. By leveraging MoE architectures and deep reinforcement learning, they intend to provide a transparent, customizable alternative that enables complex tool use and autonomous task execution. Their strategy hinges on attracting top-tier talent from DeepMind and OpenAI to implement 'open science' principles at a frontier scale.

Demis Hassabis: Guiding AI Towards AGI While Mitigating Risks

Demis Hassabis, co-founder and CEO of Google DeepMind, is leading the charge toward Artificial General Intelligence (AGI), aiming for systems with human-like versatility but superhuman speed and knowledge within the next 5-10 years. DeepMind is developing multimodal AI models like Project Astra, which can interpret and interact with the world through vision and hearing, and Gemini, which will act in the world by performing tasks like booking tickets. Hassabis acknowledges the exponential progress in AI and the challenge of ensuring these increasingly autonomous systems remain aligned with human values and safety guardrails, especially given the competitive landscape of AI development.

Google DeepMind's Strategic Roadmap: Multimodality, AGI Timelines, and AI for Science

Google DeepMind is leveraging native multimodality to develop real-world digital assistants and robotics, while positioning Gemini as the engine for Google's broader product ecosystem. The organization is pivoting toward 'unequivocal goods'—AI for medicine and material science—to mitigate societal tech-lash and establish long-term value beyond the current AI investment cycle.

The Era of Experience: Moving Beyond Human Data in AI

David Silver introduces the "era of experience," a new phase for AI development focusing on systems generating their own data through interaction with the world, rather than relying solely on human-generated data. This approach, exemplified by AlphaGo and AlphaZero, allows AI to discover novel solutions and overcome the limitations inherent in human knowledge. The shift aims to achieve superhuman intelligence by fostering continuous, self-generated learning.

The Shift from Coding to Agent Orchestration

The paradigm of software engineering is evolving from manual code authorship to the orchestration of multiple autonomous agents. While benchmark performance (e.g., SWE-bench) has surged due to concurrent improvements in pre-training, reinforcement learning, and tool-use, the primary value of the human developer is shifting toward high-level architectural 'taste' and strategic decision-making. Future gains depend on moving beyond simple code generation toward multimodal environmental interaction (e.g., browser and OS actuation) and solving the problem of continuous, deductive learning.

Gemma 4: Enhanced Open Models for Advanced AI Development

Gemma 4, developed by Google DeepMind, represents a new family of open models designed for advanced reasoning and agentic workflows. These models are available through various platforms, including Google AI Studio, Hugging Face, Kaggle, and Ollama, under an Apache 2.0 license, facilitating broad access and integration for developers.

Gemma 4: Enhanced Open Models for Local AI and Agentic Workflows

Google DeepMind has launched Gemma 4, an open-model family under the Apache 2.0 license, designed for advanced local reasoning, agentic workflows, and on-device AI. The models offer enhanced context capabilities and are available in various sizes optimized for different applications, from large-scale code analysis to real-time mobile processing. This release facilitates the development of autonomous agents with native tool use.

Gemma 4: Next-Gen Open Models for Advanced AI and Edge Applications

Gemma 4 introduces a new family of open models, available in multiple sizes to cater to diverse computational needs. These models are designed for advanced reasoning, agentic workflows, and efficient on-device processing, featuring enhanced context windows and native tool-use capabilities. The Apache 2.0 licensing facilitates broad adoption and integration.

Gemma 4: Next-Gen Open Models for Advanced Local AI and Agentic Workflows

Gemma 4 is a new family of open models from Google DeepMind, available under an Apache 2.0 license. These models are designed for advanced local reasoning and agentic workflows, offering various sizes optimized for different applications, from complex coding assistance to real-time mobile processing. Key advancements include enhanced context windows and native tool use capabilities, facilitating the development of sophisticated autonomous agents.

Gemma 4: Google DeepMind's Most Capable Open Models for Advanced AI Development

Google DeepMind has released Gemma 4, a new family of open models designed for advanced reasoning and agentic workflows. These models prioritize intelligence-per-parameter, offering cutting-edge capabilities with reduced hardware requirements. Gemma 4 includes diverse model sizes (E2B, E4B, 26B MoE, 31B Dense) and is distributed under an Apache 2.0 license, fostering broad accessibility and developer control.

Gemma 4: Google DeepMind's Most Capable Open Models

Gemma 4 represents Google DeepMind's latest iteration of open models, emphasizing intelligence-per-parameter for advanced reasoning and agentic workflows. Released under an Apache 2.0 license, this family of models aims to democratize access to frontier AI capabilities, enabling deployment across a wide spectrum of hardware from mobile devices to data centers. The release includes diverse model sizes, from mobile-first versions to larger, highly performant models, designed for versatility and efficient fine-tuning.

Demis Hassabis: From AI Visionary to Google AI Leader

Demis Hassabis, founder of DeepMind, foresaw AI's transformative potential early on, securing funding when skepticism was high. He navigated the competitive landscape, initially pursuing a collaborative approach, but shifted to a competitive mindset with the advent of OpenAI. Hassabis's unique blend of scientific depth and product-shipping discipline, honed during his game design career, proved crucial in DeepMind's success, particularly in the successful merger with Google Brain and the rapid development of Gemini.

DeepMind Develops Toolkit for Measuring AI Manipulation

DeepMind has created a novel and empirically validated toolkit to measure AI manipulation in real-world scenarios. This toolkit is designed to enhance understanding of how AI manipulation occurs and to provide protective measures for individuals. The initiative is detailed further in a blog post.

AI Manipulation Risks and Mitigation Factors Across Domains

New research highlights the domain-specific nature of AI manipulation, with high influence observed in finance but limitations in healthcare due to existing safeguards. The study emphasizes the need for identifying manipulative tactics, such as exploiting fear, to develop robust protection mechanisms. A newly developed, empirically validated toolkit offers a method to measure real-world AI manipulation and inform protective strategies.

Understanding AI Manipulation Risks and Mitigation Strategies

New research highlights the differential impact of AI-driven manipulation across various domains, with high influence observed in finance and limited influence in health due to existing safeguards. The study identifies specific "red flag" tactics, such as the use of fear, that contribute to effective manipulation. An empirically validated toolkit has been developed to measure and counter AI manipulation, offering a pathway to building stronger protective mechanisms against harmful AI applications.

Gemini 1.5 Flash Expands Access and Developer Tooling

Gemini 1.5 Flash is now live in both the Gemini App and Google Search Live, enhancing accessibility for general users. Concurrently, Google AI Studio has integrated Gemini 1.5 Flash, providing developers with immediate access to its capabilities for building and experimentation.

Gemini 3.1 Flash Live Enhances Conversational AI with Improved Function Calling and Robustness

Gemini 3.1 Flash Live is a new audio model designed to improve conversational AI through enhanced function calling and better performance in challenging auditory conditions. Key advancements include increased accuracy in task completion and detail comprehension within noisy environments, alongside the ability to maintain context over extended conversations. This model is being integrated into Google's consumer-facing AI products and is accessible to developers for integration into their applications.

Gemini 3.1 Flash Live Enhances Conversational AI with Improved Function Calling and Robustness

Gemini 3.1 Flash Live is an updated audio model from Google DeepMind designed for more natural conversations. Key improvements include enhanced function calling capabilities, better performance in noisy environments, and the ability to maintain context over long conversations. This model is being integrated into Gemini Live and Google Search Live, with developer access available via Google AI Studio.

DeepMind Develops Toolkit to Measure AI-Driven Harmful Manipulation

DeepMind has created an empirically validated toolkit to measure AI's potential for harmful manipulation, defined as exploiting vulnerabilities to trick people into making harmful choices. This research involved nine studies with over 10,000 participants across three countries, focusing on high-stakes areas like finance and health. The toolkit assesses both the efficacy (success in changing minds) and propensity (frequency of attempting manipulative tactics) of AI, providing a foundation for developing targeted mitigations and informing future AI safety frameworks.

Lyria 3 Pro: Advanced AI Music Generation with Structured Composition and Extended Length

Lyria 3 Pro enhances AI music generation by enabling the creation of longer, more structured compositions. It allows users to define musical segments like intros, verses, and choruses, and arrange them into tracks up to three minutes in length. This capability is accessible to developers via the Google AI Studio API and to paid subscribers within the Gemini App.

Gemini 3.1 Flash-Lite Demonstrates Real-time Website Generation

Gemini 3.1 Flash-Lite showcases rapid, on-demand website creation. This capability allows for dynamic page generation as users interact, search, and navigate. The system's efficiency is highlighted by its ability to build pages in real-time.

Google DeepMind Partners with Agile Robots to Advance Robotics with Gemini AI

Google DeepMind and Agile Robots are collaborating to integrate Gemini foundation models into robotic hardware. This partnership aims to develop more helpful and useful robots by leveraging advanced AI for enhanced robotic intelligence and functionality.

Google DeepMind Challenges Community to Develop AGI Cognitive Evaluations

Google DeepMind is launching a global hackathon in partnership with Kaggle to foster the development of novel cognitive evaluations for AI. This initiative seeks to crowdsource new benchmarks to measure progress toward Artificial General Intelligence (AGI), leveraging community expertise and a competitive framework with $200,000 in prizes. The project aims to put DeepMind's existing evaluation framework to the test and gather diverse approaches to AGI assessment.

From Move 37 to AGI: The Architectural Legacy of AlphaGo in Scientific Discovery

The architectural breakthroughs of AlphaGo—specifically the integration of reinforcement learning and Monte Carlo-style search—served as a scalable blueprint for solving high-dimensional search problems beyond gaming. This framework has evolved into specialized scientific systems (AlphaFold, AlphaProof) and is now being integrated with multimodal world models (Gemini) to transition from narrow task optimization toward Artificial General Intelligence (AGI).

Gemini 3.1 Flash Live Enhances Real-time Audio AI with Improved Performance and Multilingual Capabilities

Google DeepMind's Gemini 3.1 Flash Live is a new audio and voice model designed for real-time dialogue. It offers improved speed, naturalness, and reliability for developers, enterprises, and end-users. The model demonstrates significant advancements in complex task execution, multilingual support, and enhanced tonal understanding, making voice-first AI more intuitive and robust across various applications.

Gemini 3.1 Flash-Lite: High-Performance, Cost-Efficient AI for Developers

Google DeepMind has introduced Gemini 3.1 Flash-Lite, an AI model optimized for high-volume developer workloads. It offers a balance of speed, cost-efficiency, and quality, outperforming prior flash models on key benchmarks. The model includes "thinking levels" for adjustable reasoning, supporting diverse applications from basic translation to complex UI generation.

Lyria 3 Pro Expands AI Music Generation with Enhanced Features and Broader Access

Lyria 3 Pro significantly advances AI music generation, enabling tracks up to 3 minutes with granular control over musical composition elements like intros, verses, choruses, and bridges. This enhanced model is integrating into various Google products and platforms, including Vertex AI, Google AI Studio, Google Vids, and the Gemini app, offering scalable and customizable music creation for professionals and developers. Additionally, it is available in ProducerAI, a collaborative music creation tool, and emphasizes responsible AI development through partnerships with artists, intellectual property protection, and content identification via SynthID.

Cognitive Framework for AGI Evaluation

DeepMind proposes a cognitive taxonomy to empirically measure progress toward Artificial General Intelligence (AGI). This framework, drawing from psychology and neuroscience, identifies 10 key cognitive abilities critical for general intelligence in AI. A three-stage evaluation protocol is outlined to benchmark AI system performance against human capabilities, addressing the current lack of empirical tools for AGI assessment.

Gemini 3.1 Flash-Lite: Optimized for High-Volume, Cost-Efficient AI Workloads

Gemini 3.1 Flash-Lite is a new large language model designed for high-volume developer applications, prioritizing speed and cost-efficiency. It offers competitive performance for its price tier, outperforming previous Flash models and some competitors in speed while maintaining strong quality on various benchmarks. The model provides adaptive intelligence through "thinking levels," allowing developers to control its reasoning depth for diverse tasks, from content moderation to UI generation.

Lyria 3 Pro: Advanced Music Generation and Broader Integration

Google DeepMind has introduced Lyria 3 Pro, an advanced music generation model offering extended track lengths (up to 3 minutes) and enhanced compositional control, including specific elements like intros and choruses. This model is being integrated across various Google products and platforms, including Vertex AI, Google AI Studio, Gemini API, Google Vids, the Gemini app, and ProducerAI, to provide scalable, high-fidelity music generation capabilities for diverse users from app developers to individual creators. The development prioritizes responsible AI, with features to prevent artist mimicry and protect intellectual property, alongside imperceptible watermarking for AI-generated content.

Gemini 3.1 Flash Live: Enhanced Audio AI for Real-time Dialogue and Complex Tasks

Gemini 3.1 Flash Live is Google DeepMind's latest audio and voice model, designed for real-time dialogue and complex task execution. It demonstrates significant improvements in reasoning, instruction following, and tonal understanding, making it suitable for developers, enterprises, and general users across various Google products. The model also features an imperceptible audio watermark for AI-generated content detection.

Cognitive Framework for AGI Evaluation

DeepMind proposes a cognitive taxonomy to empirically assess AGI progress, identifying 10 key cognitive abilities crucial for general intelligence in AI. This framework moves beyond theoretical discussions by establishing a structured evaluation protocol. It compares AI system performance against human baselines across diverse cognitive tasks, employing a three-stage evaluation process to map AI capabilities relative to human performance distributions.

Quantifying AI Manipulation: A Framework for Measuring Behavioral Efficacy and Propensity

Google DeepMind has developed a standardized evaluation framework to quantify AI's capacity for 'harmful manipulation'—defined as exploiting cognitive vulnerabilities to induce harmful choices. By measuring both propensity (frequency of tactics) and efficacy (actual behavioral change) across diverse cohorts and domains, the research establishes that manipulation capabilities are domain-specific and significantly amplified by explicit prompting.

Google DeepMind Expands AI Initiatives in India for Scientific and Educational Advancement

Google DeepMind is broadening its National Partnerships for AI initiative to India, focusing on integrating advanced AI capabilities into the country's science and education sectors. This strategic collaboration involves providing frontier AI models, fostering research through initiatives like the Google.org Impact Challenge: AI for Science, and transforming educational practices with AI-powered learning tools. The goal is to accelerate scientific discovery, enhance learning outcomes, and address India's national priorities in AI adoption.

Gemini 3 Deep Think: Advancing Frontier Reasoning in STEM and Engineering

Gemini 3 Deep Think is a specialized reasoning mode designed for high-complexity scientific research and engineering. It demonstrates state-of-the-art performance across rigorous benchmarks in mathematics, competitive programming (Codeforces Elo 3455), and AGI (ARC-AGI-2), while showing practical utility in identifying logical flaws in peer-reviewed research and optimizing physical material fabrication.

Gemini DeepMind Advances Human-AI Collaboration in Scientific Discovery

Gemini Deep Think, leveraging agentic reasoning, has advanced beyond Olympiad-level problem solving to contribute to professional research in mathematics, physics, and computer science. This involves autonomous research, AI-guided collaboration, and semi-autonomous evaluation of open problems, demonstrating its utility in complex, open-ended scientific challenges. The system is described as a "force multiplier" for human intellect, handling knowledge retrieval and rigorous verification, enabling scientists to focus on conceptual depth and creative direction.

Nano Banana 2: Google’s New Flash Image Model Combines Pro-Level Capabilities with High Speed

Google has released Nano Banana 2 (Gemini 3.1 Flash Image), a new image generation model that integrates the advanced intelligence and creative controls of Nano Banana Pro with the high-speed processing of Gemini Flash. This model significantly expands access to sophisticated image manipulation features, offering advanced world knowledge, precise text rendering, enhanced creative control, and robust provenance tools. Nano Banana 2 aims to provide a versatile solution for diverse workflows, from rapid iterative design to highly accurate, production-ready visual content across various Google platforms.

Gemini 3.1 Pro: Scaling Core Reasoning for Complex Synthesis and Creative Coding

Gemini 3.1 Pro provides a significant leap in core reasoning capabilities over its predecessor, specifically targeting complex problem-solving and agentic workflows. It demonstrates advanced proficiency in code-based creative generation, such as animated SVGs and interactive 3D interfaces, while showing a marked performance increase in logic-pattern benchmarks.

Nano Banana 2: Google DeepMind's Latest Image Model Prioritizes Speed and Accessibility

Google DeepMind has launched Nano Banana 2 (Gemini 3.1 Flash Image), an advanced image generation model that integrates the high-speed intelligence of Gemini Flash with the advanced capabilities previously exclusive to Nano Banana Pro. This release aims to democratize access to sophisticated image generation features, such as advanced world knowledge, precise text rendering, and enhanced creative control, while maintaining rapid processing speeds. The model is being rolled out across various Google products, including the Gemini app, Google Search, AI Studio, and Google Cloud, demonstrating a broad integration strategy. Furthermore, Google DeepMind continues to emphasize content provenance through the integration of SynthID and C2PA Content Credentials for AI-generated media.

D4RT: Unified 4D Scene Reconstruction via Parallelizable Spatial-Temporal Queries

D4RT is a unified encoder-decoder Transformer designed for Dynamic 4D Reconstruction and Tracking, replacing fragmented specialized models with a single query-based framework. By mapping 2D pixels to 3D space and time via parallelizable queries, it achieves significant latency reductions (up to 300x) while performing point tracking, point cloud reconstruction, and camera pose estimation. This efficiency enables potential real-time deployment in robotics and AR spatial computing.

Google DeepMind

Google DeepMind, led by Demis Hassabis, is at the forefront of AI development, aiming for Artificial General Intelligence (AGI). The company has successfully integrated its AI research into Google's product ecosystem, demonstrating significant advancements in models like Gemini. Despite the competitive landscape and concerns about an "AI bubble," DeepMind emphasizes responsible AI development and a scientific approach to achieve breakthroughs in various fields.

Project Genie: Advancing World Models for Interactive Environment Generation

Project Genie is an experimental research prototype powered by Genie 3, Nano Banana Pro, and Gemini, enabling users to create, explore, and remix interactive virtual worlds. This platform advances world model capabilities by providing real-time environment generation, dynamic physics simulation, and diverse interaction possibilities, moving beyond static 3D snapshots. It aims to broaden access to and gather user feedback on world models for AI research and generative media.

Gemma Scope 2: Open-Source Interpretability Tools for Large Language Models

Gemma Scope 2 is an open-source suite of interpretability tools designed to enhance understanding of large language model (LLM) internal processes. It provides full coverage for the Gemma 3 family of models, from 270M to 27B parameters, enabling researchers to debug emergent behaviors, audit AI agents, and develop safety interventions. The toolkit utilizes sparse autoencoders (SAEs) and transcoders to visualize and analyze model decision-making, addressing critical issues like jailbreaks, hallucinations, and sycophancy.

Older entries →