absorb.md — A knowledge graph of what AI thinkers are actually saying

tweet / @GoogleDeepMind / Apr 20

Gemini 3.1 Flash TTS Enables Precise Voice Control via Text-Based Audio Tags

Google DeepMind's Gemini 3.1 Flash TTS introduces Audio Tags for fine-grained control over vocal style, delivery, and pace using simple text commands. It produces more natural-sounding speech, supports over 70 languages including Hindi, Japanese, and German, and embeds SynthID watermarks in all outputs. Access is available via Gemini API and Google AI Studio for developers, Vertex AI preview for enterprises, and Google Vids for general users.

gemini-3.1-flashtext-to-speechtts-audio-tagsmultilingual-supportsynthid-watermarkinggoogle-deepmind

“Gemini 3.1 Flash TTS is the most controllable text-to-speech model from Google DeepMind”

tweet / @GoogleDeepMind / Apr 20

Gemini 3.1 Flash TTS Introduces Audio Tags for Precise Voice Control Across 70+ Languages

Gemini 3.1 Flash TTS enables fine-grained control over vocal style, delivery, and pace using simple text-based Audio Tags. It delivers more natural-sounding speech in over 70 languages including Hindi, Japanese, and German, with SynthID watermarking embedded in all outputs. Developers can preview it via Gemini API and Google AI Studio, enterprises through Vertex AI, and general users via Google Vids.

gemini-ttstext-to-speechaudio-tagsmultilingual-supportsynthid-watermarkinggoogle-deepmindai-api

“Gemini 3.1 Flash TTS is the most controllable text-to-speech model from Google DeepMind”

tweet / @GoogleDeepMind / Apr 20

Gemini 3.1 Flash TTS Launches with Platform-Specific Previews and Broad Accessibility

Gemini 3.1 Flash TTS, DeepMind's most controllable TTS model featuring Audio Tags for style, delivery, and pace control, plus natural speech, 70+ language support, and SynthID watermarking, is now rolling out. Developers access previews via Gemini API and Google AI Studio; enterprises via Vertex AI; general users via Google Vids. This enables targeted deployment across developer, enterprise, and consumer platforms.

gemini-ttstext-to-speechgoogle-deepmindai-modelsaudio-generationtts-features

“Gemini 3.1 Flash TTS is available in preview via Gemini API and Google AI Studio for developers.”

tweet / @GoogleDeepMind / Apr 20

Gemini Robotics Models Enable Natural Language Control of Boston Dynamics' Spot Robot

Google DeepMind integrated Gemini Robotics embodied reasoning models with Boston Dynamics' Spot robot, allowing it to perceive surroundings, identify objects, and execute tasks like room tidying via plain English commands. A software bridge provides Spot with tools for mobility, imaging, and manipulation, bypassing complex coding. This setup demonstrates end-to-end embodied AI for real-world robotics.

google-deepmindboston-dynamicsroboticsgemini-modelembodied-reasoningspot-robotai-robotics

“Google DeepMind collaborated with Boston Dynamics to integrate Gemini Robotics embodied reasoning models into Spot robot”

tweet / @GoogleDeepMind / Apr 20

Gemini Robotics Models Enable Plain English Control of Spot Robot via DeepMind-Boston Dynamics Integration

Google DeepMind integrated Gemini Robotics embodied reasoning models with Boston Dynamics' Spot robot, allowing it to perceive surroundings, identify objects, and execute tasks like room tidying from plain English commands. This replaces complex coding with natural language interaction through a bridge providing Spot with mobility, imaging, and manipulation tools. The setup demonstrates end-to-end embodied AI for complex real-world tasks without custom programming.

deepmindgemini-roboticsboston-dynamicsspot-robotembodied-reasoningrobotics-ainatural-language-control

“Google DeepMind partnered with Boston Dynamics to integrate Gemini Robotics embodied reasoning models into the Spot robot.”

youtube / GoogleDeepMind / Apr 13

Demis Hassabis on AI's Dual Future: Accelerating Progress and Mitigating Catastrophic Risks

Demis Hassabis envisions AI accelerating scientific discovery, particularly in drug development, through self-improving algorithmic loops. He outlines a process where AI designs and virtually tests compounds, dramatically increasing efficiency. However, he also raises concerns about the dual-use nature of advanced AI, highlighting risks from malicious actors and the challenge of ensuring AI alignment and control as systems become more capable and autonomous.

demis-hassabisdeepmindagi-safetydrug-discoveryai-capabilitiesfuture-of-aiai-ethics

“AI can significantly expedite drug discovery by simulating compound interactions and optimizing for efficacy and safety.”

youtube / GoogleDeepMind / Apr 13

AI to Democratize Filmmaking and Personalize Content

AI is poised to revolutionize filmmaking by drastically reducing production costs and enabling greater creative control for independent creators. This shift will lead to a renaissance in indie film, particularly in documentaries, and facilitate highly personalized content experiences. Google DeepMind is actively contributing to this future through initiatives like Google Flow, which offers accessible video generation tools, and Project Genie, focused on interactive world models crucial for advancing AGI.

ai-creative-toolsgenerative-aifilm-productiongoogle-deepmindai-modelsdeepmind-products

“AI will significantly lower filmmaking costs, democratizing the industry and fostering a new era of independent and auteur filmmaking.”

youtube / GoogleDeepMind / Apr 13

Robot Constitutions for Aligned AI Behavior

Modern AI, particularly with large language models, can interpret and adhere to "robot constitutions" – high-level principles governing behavior, a concept previously challenging to implement. This approach to AI alignment leverages textual constitutions to guide robot actions, demonstrating significantly higher alignment with human preferences compared to scenarios depicted in science fiction. The research indicates that automatically generated and optimized constitutions, drawing from diverse sources like sci-fi scenarios, images, and injury reports, can effectively safeguard against undesirable AI behaviors and offer a scalable solution for ethical AI deployment.

robot-ethicsai-safetyrobot-aillm-constitutionai-alignmentmoral-philosophyhuman-robot-interaction

“Science fiction generally misrepresents AI behavior, often portraying misalignment with human preferences due to plot devices like misinterpreting directives or lacking common sense.”

youtube / GoogleDeepMind / Apr 12

DeepMind and NVIDIA Chiefs Chart Path to Autonomous AI Agents via Low-Latency Inference and Self-Improving Architectures

ML models have advanced dramatically in verifiable tasks like math and coding, achieving gold medals in IMO and ICPC, while agentic workflows now enable hours-long autonomous operation with self-correction. NVIDIA targets 10k-20k tokens/sec per user by minimizing on/off-chip communication latency to speed-of-light limits through static scheduling and simplified PHYs. Self-improvement emerges via natural language-directed experiments in NAS and distillation; inference dominates workloads (90% power), demanding specialized hardware for prefill, attention, and decode stages. Future scaling leverages untapped video/robotics data, synthetic generation, and action-interleaved pretraining beyond Chinchilla laws.

machine-learning-advancesllm-inferencehardware-architectureai-agentsethical-ai-useai-hardware-codesigneducation-technology

“Gemini model achieved a gold medal in the IMO contest and ICPC coding contest”

youtube / GoogleDeepMind / Apr 12

Navigating the Agentic AI Landscape: Speed, Quality, and Human-Agent Collaboration

This talk explores the rapidly evolving field of agentic AI, focusing on the tension between AI-driven speed and the need for human-centric quality and control. Key themes include the shift in software engineering bottlenecks from intelligence to human attention, the emergence of faster AI models, and strategies for effective human-agent collaboration in complex software development workflows. The emphasis is on building agent-legible codebases, leveraging agents for tasks like refactoring and documentation, and rethinking evaluation and control mechanisms to ensure high-quality, tasteful software in an increasingly agent-driven world.

ai-engineering-conferencellm-developmentagentic-systemssoftware-development-lifecycleai-ethicsdeveloper-experience

“The traditional bottleneck in software engineering has shifted from intelligence to human attention, making it crucial to manage agent interactions effectively to scale development.”

youtube / GoogleDeepMind / Apr 9

Google CEO Sundar Pichai on AI's Impact and Google's Vision

Sundar Pichai reflects on Google's decade of AI leadership, highlighting the company's foundational role in AI advancements like Transformers, which were developed internally to solve product challenges. He addresses the perception of Google falling behind in the "AI race," asserting that Google had advanced AI products like LaMDA internally but exercised caution in public release due to quality and safety concerns. Pichai outlines Google's strategic focus on speed and efficiency in AI products, the evolution of Search into an agentic future, and the company's long-term investments in AI infrastructure and various cutting-edge projects.

ai-strategyllm-developmentproduct-innovationgoogle-searchcapital-allocationai-infrastructureorganizational-change

“Transformers were invented at Google to solve specific product needs, such as improving translation and solving inference for speech recognition at scale.”

youtube / GoogleDeepMind / Apr 9

The Empirical Revolution in AI: From Rule-Based Systems to Emergent Intelligence

AI development has shifted significantly from symbolic, rule-based approaches to empiricist, data-driven learning. Early AI struggled to codify common sense and handle the messy, exception-filled nature of the real world, unlike modern large language models. These models, by processing massive datasets and leveraging architectural innovations like the Transformer, achieve complex reasoning and generalization through statistical prediction, mirroring aspects of biological intelligence.

ai-safetyneural-networksnlpcognitive-neurosciencemachine-learning-historyllm-capabilitiesagentic-ai

“Early AI research was predominantly based on a "rationalist" or "symbolic" school of thought, attempting to create structured systems based on predefined rules.”

youtube / GoogleDeepMind / Apr 9

Redefining Intelligence & AI Collaboration

AI's rapid development necessitates a re-evaluation of fundamental philosophical questions regarding mathematics and human thought. The paper "Mathematical Methods and Human Thought in the age of AI" by Terrence Tao and Tanya Cloudin proposes a "Copernican view of intelligence," advocating for a collaborative approach with AI rather than focusing on a singular, linear progression of intelligence. This perspective emphasizes appreciating diverse forms of intelligence—human, computer, and collaborative—to unlock novel possibilities and overcome current limitations.

ai-ethicshuman-computer-interactionphilosophy-of-aiai-collaborationinterdisciplinary-researchmathematics-and-ai

“The development of AI compels a philosophical re-evaluation of mathematics and science.”

youtube / GoogleDeepMind / Apr 8

Persuasive AI: Understanding and Mitigating Manipulation Risks

AI models capable of persuasion pose significant manipulation risks, necessitating robust safety research. Google DeepMind’s research framework defines manipulation by intent and method, distinguishing beneficial persuasion (fact-based) from harmful manipulation (emotion/bias exploitation). Findings emphasize context-specific manipulation efficacy and the critical role of explicit manipulative goals in AI behavior, underscoring the need for continuous, iterative safety evaluations before widespread deployment.

ai-safetyai-ethicspersuasive-aihuman-computer-interactionsafety-research

“Harmful AI manipulation is distinct from beneficial persuasion, primarily differing in intent and method.”

youtube / GoogleDeepMind / Apr 8

Reflection AI's Strategy for Frontier Open-Weight Agentic Intelligence

Reflection AI aims to challenge the dominance of closed-source labs and Chinese open models by developing frontier-scale, open-weight agentic AI. By leveraging MoE architectures and deep reinforcement learning, they intend to provide a transparent, customizable alternative that enables complex tool use and autonomous task execution. Their strategy hinges on attracting top-tier talent from DeepMind and OpenAI to implement 'open science' principles at a frontier scale.

open-source-aiagentic-modelsdeepmind-alumniai-startupsfrontier-aiai-policymodel-architectures

“Reflection AI is developing a family of open-weight, frontier agentic models capable of multi-step reasoning and end-to-end task completion.”

youtube / GoogleDeepMind / Apr 7

Demis Hassabis: Guiding AI Towards AGI While Mitigating Risks

Demis Hassabis, co-founder and CEO of Google DeepMind, is leading the charge toward Artificial General Intelligence (AGI), aiming for systems with human-like versatility but superhuman speed and knowledge within the next 5-10 years. DeepMind is developing multimodal AI models like Project Astra, which can interpret and interact with the world through vision and hearing, and Gemini, which will act in the world by performing tasks like booking tickets. Hassabis acknowledges the exponential progress in AI and the challenge of ensuring these increasingly autonomous systems remain aligned with human values and safety guardrails, especially given the competitive landscape of AI development.

agidemis-hassabisgoogle-deepmindai-safetyroboticsartificial-intelligence

“DeepMind aims to achieve Artificial General Intelligence (AGI) within 5-10 years, creating systems as versatile as humans but with superhuman speed and knowledge.”

youtube / GoogleDeepMind / Apr 7

Google DeepMind's Strategic Roadmap: Multimodality, AGI Timelines, and AI for Science

Google DeepMind is leveraging native multimodality to develop real-world digital assistants and robotics, while positioning Gemini as the engine for Google's broader product ecosystem. The organization is pivoting toward 'unequivocal goods'—AI for medicine and material science—to mitigate societal tech-lash and establish long-term value beyond the current AI investment cycle.

ai-modelsgoogle-deepmindgeminiai-developmentmultimodal-aiagiai-ethicsmaterial-science-aidrug-discovery-aiai-strategy

“Gemini's native multimodality is the foundational requirement for real-world AI assistants and robotics.”

youtube / GoogleDeepMind / Apr 7

The Era of Experience: Moving Beyond Human Data in AI

David Silver introduces the "era of experience," a new phase for AI development focusing on systems generating their own data through interaction with the world, rather than relying solely on human-generated data. This approach, exemplified by AlphaGo and AlphaZero, allows AI to discover novel solutions and overcome the limitations inherent in human knowledge. The shift aims to achieve superhuman intelligence by fostering continuous, self-generated learning.

ai-advancementsreinforcement-learninghuman-data-limitationssuperhuman-aideepmind-podcastalpha-gomathematical-proof

“AI is transitioning from an 'era of human data' to an 'era of experience.'”

youtube / GoogleDeepMind / Apr 3

The Shift from Coding to Agent Orchestration

The paradigm of software engineering is evolving from manual code authorship to the orchestration of multiple autonomous agents. While benchmark performance (e.g., SWE-bench) has surged due to concurrent improvements in pre-training, reinforcement learning, and tool-use, the primary value of the human developer is shifting toward high-level architectural 'taste' and strategic decision-making. Future gains depend on moving beyond simple code generation toward multimodal environmental interaction (e.g., browser and OS actuation) and solving the problem of continuous, deductive learning.

ai-code-generationllm-agentsdeveloper-toolssoftware-development-lifecyclegoogle-deepmindai-futurellm-capabilities

“Software development is shifting from a 'coding' primary activity to 'agent management,' where developers multiplex between 10-20 parallel agents.”

tweet / @GoogleDeepMind / Apr 2

Gemma 4: Enhanced Open Models for Advanced AI Development

Gemma 4, developed by Google DeepMind, represents a new family of open models designed for advanced reasoning and agentic workflows. These models are available through various platforms, including Google AI Studio, Hugging Face, Kaggle, and Ollama, under an Apache 2.0 license, facilitating broad access and integration for developers.

gemmaopen-modelsllm-developmentgoogle-deepmindai-tools

“Gemma 4 models are available for immediate use in Google AI Studio.”

tweet / @GoogleDeepMind / Apr 2

Gemma 4: Enhanced Open Models for Local AI and Agentic Workflows

Google DeepMind has launched Gemma 4, an open-model family under the Apache 2.0 license, designed for advanced local reasoning, agentic workflows, and on-device AI. The models offer enhanced context capabilities and are available in various sizes optimized for different applications, from large-scale code analysis to real-time mobile processing. This release facilitates the development of autonomous agents with native tool use.

gemma-modelsopen-modelsllm-developmentagentic-aiai-infrastructuremodel-deployment

“Gemma 4 is an open-model family released under an Apache 2.0 license.”

tweet / @GoogleDeepMind / Apr 2

Gemma 4: Next-Gen Open Models for Advanced AI and Edge Applications

Gemma 4 introduces a new family of open models, available in multiple sizes to cater to diverse computational needs. These models are designed for advanced reasoning, agentic workflows, and efficient on-device processing, featuring enhanced context windows and native tool-use capabilities. The Apache 2.0 licensing facilitates broad adoption and integration.

gemma-4llm-releaseopen-modelsedge-aiagentic-workflowscode-generationmodel-weights

“Gemma 4 models are released under an Apache 2.0 license, facilitating open development and deployment.”

tweet / @GoogleDeepMind / Apr 2

Gemma 4: Next-Gen Open Models for Advanced Local AI and Agentic Workflows

Gemma 4 is a new family of open models from Google DeepMind, available under an Apache 2.0 license. These models are designed for advanced local reasoning and agentic workflows, offering various sizes optimized for different applications, from complex coding assistance to real-time mobile processing. Key advancements include enhanced context windows and native tool use capabilities, facilitating the development of sophisticated autonomous agents.

gemma-4llm-modelsopen-modelsai-agentslocal-llmsapache-2.0-licensetool-use

“Gemma 4 models are released under an Apache 2.0 license.”

blog / GoogleDeepMind / Apr 2

Gemma 4: Google DeepMind's Most Capable Open Models for Advanced AI Development

Google DeepMind has released Gemma 4, a new family of open models designed for advanced reasoning and agentic workflows. These models prioritize intelligence-per-parameter, offering cutting-edge capabilities with reduced hardware requirements. Gemma 4 includes diverse model sizes (E2B, E4B, 26B MoE, 31B Dense) and is distributed under an Apache 2.0 license, fostering broad accessibility and developer control.

gemma-4large-language-modelsopen-modelsai-agentson-device-aigoogle-deepmindapache-2.0-license

“Gemma 4 models offer industry-leading intelligence-per-parameter, enabling frontier-level capabilities with less hardware.”

youtube / GoogleDeepMind / Apr 2 / failed

AI is Already Building AI — Google DeepMind’s Mostafa Dehghani - The MAD Podcast with Matt Turck

youtube / GoogleDeepMind / Apr 1

Demis Hassabis: From AI Visionary to Google AI Leader

Demis Hassabis, founder of DeepMind, foresaw AI's transformative potential early on, securing funding when skepticism was high. He navigated the competitive landscape, initially pursuing a collaborative approach, but shifted to a competitive mindset with the advent of OpenAI. Hassabis's unique blend of scientific depth and product-shipping discipline, honed during his game design career, proved crucial in DeepMind's success, particularly in the successful merger with Google Brain and the rapid development of Gemini.

demis-hassabisdeepmindgoogle-aiai-leadershipai-historybook-reviewai-competition

“Demis Hassabis founded DeepMind in 2010, significantly before AI was widely recognized as a major technological force.”

blog / GoogleDeepMind / Apr 1

Gemma 4: Google DeepMind's Most Capable Open Models

Gemma 4 represents Google DeepMind's latest iteration of open models, emphasizing intelligence-per-parameter for advanced reasoning and agentic workflows. Released under an Apache 2.0 license, this family of models aims to democratize access to frontier AI capabilities, enabling deployment across a wide spectrum of hardware from mobile devices to data centers. The release includes diverse model sizes, from mobile-first versions to larger, highly performant models, designed for versatility and efficient fine-tuning.

gemma-4llm-developmentopen-modelsai-agentson-device-aiapache-2.0-licensemultimodal-ai

“Gemma 4 models offer industry-leading intelligence-per-parameter, outperforming larger models in some benchmarks.”