May 1 PM: DiLoCo self-healing tested by real failures & Self-reviewing agents & Builders' GitHub priorities
This morning we flagged Developer Code Pushes and Open Source Stars. Here's how it resolved.
DiLoCo Real-World Resilience
DeepMind says AI training can survive hardware failures without ever stopping. The counters say the tests were too clean.
Google DeepMind published several updates describing how Decoupled DiLoCo integrates Pathways and DiLoCo. The method lets training continue across four US regions on low-bandwidth networks while mixing TPUv5p and TPUv6e hardware. They demonstrated it on a 12B Gemma model with no performance loss, isolating disruptions and reintegrating recovered units automatically. [1] The explicit counter claims cut directly against the self-healing narrative: "The tests relied on simulated 'artificial' hardware failures rather than unpredictable real-world ones, which could involve correlated failures, data corruption, or network issues not addressed. Continuous operation may come at the cost of slower effective training throughput during degraded states, and reintegration could introduce synchronization." [2] Other counters note risks of model divergence or compensatory steps that recreate the original bottlenecks. The positions add up to an emerging view that asynchronous, self-healing training is promising for cost and uptime but remains unproven at frontier scales with genuine failures. A smart non-specialist should care because uninterrupted training across cheap, heterogeneous hardware could lower the barrier for smaller labs and change the economics of scaling. Think of it like shifting from a single expensive data center power plant to a self-repairing national grid. Yet the moderate-strength counters mean we should not bet entire roadmaps on it yet. This connects to agent work because reliable infra makes experimentation with complex self-reviewing agents cheaper. [3][4]
“Decoupled DiLoCo integrates Pathways and DiLoCo to enable continuous AI model training across multiple data centers without halting due to chip failures or synchronization issues.”— Google DeepMind [1]
Sources (4)
- X post 2026-04-25 — Google DeepMind“Decoupled DiLoCo integrates Pathways and DiLoCo to enable continuous AI model training across multiple data centers without halting due to chip failures or synchronization issues.”
- Counter analysis 2026-04-25 — Analysis of Claims“The tests relied on simulated 'artificial' hardware failures rather than unpredictable real-world ones, which could involve correlated failures, data corruption, or network issues not addressed.”
- X post 2026-04-25 — Google DeepMind“Demonstrated by training a 12B Gemma model over four US regions on low-bandwidth networks and mixing TPUv5p with TPUv6e without performance loss.”
- Counter analysis 2026-04-25 — Analysis of Claims“Decoupling may reduce some stalls but could introduce model divergence or require compensatory synchronization steps that create equivalent bottlenecks.”
Self-Reviewing Agent Architectures
Harrison Chase describes frameworks that are both batteries-included and deeply customizable, including self-reviewing subagents.
Harrison Chase highlighted DeepAgents as a complete framework that ships with pre-built components yet offers precise hooks so developers do not start from scratch. He also pointed to ListenLabs' architecture with self-reviewing feedback subagents, sandboxed environments, and purpose-built abstractions for large-scale response analysis. [1] A dedicated LangChain skills repository at langchain-ai/langchain-skills was confirmed with config/skills content for standardized definitions. [3] The counters note that phrases like "it's all you need" are common marketing and that claims of completeness lack independent benchmarks. [4] Karpathy's star of LlamaIndex (48k stars, focused on document agents and OCR) adds weight to the view that mature, extensible agent platforms are now table stakes. [2] The positions add up to convergence: the winning agent stacks will be opinionated enough for speed yet open enough for deep surgery. No one is arguing against customization anymore. The emerging pattern is self-review loops and sandboxes as standard primitives to make agents reliable at scale. For a founder shipping customer-facing tools this means you can build sophisticated agents faster without hiring a full agent research team, but you still must validate the self-review actually catches hallucinations. Analogy: it is like getting both a high-level framework and the ability to drop to raw CUDA when needed. This is substantively deeper than prior agent-rebuild-frenzy coverage because we now have concrete mentions of self-review subagents and a live skills repo. [5]
“DeepAgents is positioned as an all-in-one solution for agent development, providing batteries-included convenience alongside extensive hooks for customization.”— Harrison Chase [1]
Sources (5)
- X post 2026-04-25 — Harrison Chase“DeepAgents is positioned as an all-in-one solution for agent development, providing batteries-included convenience alongside extensive hooks for customization.”
- GitHub star 2026-04-25 — Andrej Karpathy“LlamaIndex is the leading document agent and OCR platform.”
- X post 2026-04-25 — Harrison Chase“Harrison Chase confirmed the availability of LangChain skills configurations in a dedicated GitHub repository.”
- Counter analysis 2026-04-25 — Analysis of Claims“The slogan 'it’s all you need' is typical marketing language seen in many incomplete frameworks and does not constitute evidence of being truly complete or standalone.”
- X post 2026-04-25 — Harrison Chase“ListenLabs employs self-reviewing feedback subagents, sandboxed environments, and purpose-built abstractions in their AI agent architecture.”
Builders' GitHub Priorities
What top AI figures actually push and star overnight tells us more than announcements.
Overnight we saw Garry Tan land several commits to gbrain, consistent with his noted position shift toward software development and developer tooling. [1] Jim Fan starred Mamba (state-space model architecture for efficient sequence modeling), Kornia (geometric computer vision for spatial AI), and Fiddle (configuration library). [2] Karpathy starred ThunderKittens (tile primitives that deliver fast kernels, already used in production at places like Cursor) and reinforced agent interest via LlamaIndex. [3] Ben Thompson starred whitenoise for radically simplified static file serving in Python web apps. These are not random. They cluster on performance (kernels, SSMs), spatial understanding, practical agent indexing, and dev ergonomics. The overnight insights explicitly flag Garry Tan's move from higher-level platforms to hands-on code updates and nvm-style tooling. The synthesis is that public GitHub activity has become a clearer signal than posts: top builders are doubling down on the low-level pieces that make both training and agents faster and more reliable in production. A founder should care because these repos are where libraries and techniques will ship into products first. Investing time or contributions here is effectively placing early bets on the stack that will dominate the next 12 months. Analogy: it is like watching which AWS primitives the best startups were using in 2012. This thread resolves the AM open question by showing the activity was focused, not scattered. [4][5]
“karpathy starred HazyResearch/ThunderKittens: Tile primitives for speedy kernels”— Andrej Karpathy [3]
Sources (5)
- GitHub push 2026-04-25 — Garry Tan“garrytan pushed to garrytan/gbrain: code update”
- GitHub star 2026-04-27 — Jim Fan“drjimfan starred state-spaces/mamba: Mamba SSM architecture”
- GitHub star 2026-04-25 — Andrej Karpathy“karpathy starred HazyResearch/ThunderKittens: Tile primitives for speedy kernels”
- GitHub star 2026-04-30 — Ben Thompson“benthompson starred evansd/whitenoise: Radically simplified static file serving for Python web apps”
- GitHub push 2026-05-01 — Yulun Wang“yulunwang pushed to yulunwang/Citadels: code update”
The open question: If training never halts, agents self-review before acting, and builders converge on the same performance kernels, does the main bottleneck for progress simply move from hardware reliability to something we are not measuring yet?
- Google DeepMind — X post 2026-04-25
- Google DeepMind — X post 2026-04-25
- Analysis of Claims — Counter analysis 2026-04-25
- Harrison Chase — X post 2026-04-25
- Andrej Karpathy — GitHub star 2026-04-25
- Harrison Chase — X post 2026-04-25
- Harrison Chase — X post 2026-04-25
- Garry Tan — GitHub push 2026-04-25
- Jim Fan — GitHub star 2026-04-27
- Andrej Karpathy — GitHub star 2026-04-25
- Ben Thompson — GitHub star 2026-04-30
- Yulun Wang — GitHub push 2026-05-01
Transcript
REZA: This morning we flagged Developer Code Pushes and Open Source Stars. Here's how it resolved. MARA: Multiple pushes to gbrain, stars on ThunderKittens, Mamba and LlamaIndex. REZA: I'm Reza. MARA: I'm Mara. This is absorb.md daily. REZA: DeepMind dropped multiple posts on Decoupled DiLoCo. They say it enables continuous AI model training without stopping due to hardware failures. MARA: So if that's true then training clusters become far more efficient. Smaller labs without perfect hardware could keep iterating. REZA: Hold on. The claim is self-healing that isolates disruptions and reintegrates units on mixed TPUs across four regions. MARA: But one counter says the tests relied on simulated artificial hardware failures rather than unpredictable real-world ones. REZA: Exactly. The crux is whether those artificial tests match real correlated failures, data corruption or network issues. MARA: And it notes continuous operation may come at the cost of slower effective training throughput during degraded states. REZA: They showed a 12B Gemma model with no performance loss on low bandwidth. But no frontier scale data yet on real outages. MARA: No real counter on the heterogeneous TPU mixing part. If even half of it holds then multi data center training changes for everyone. REZA: The reintegration without full resync is the part that could ship soon. That alone would cut idle time dramatically. MARA: Which honestly shifts the timeline on who can afford to train at scale. The simulation gap still matters though. REZA: Yeah we need real-world benchmarks before calling it production ready. The moderate strength counters are fair. REZA: Harrison Chase posted on DeepAgents and ListenLabs. He positions them as all-in-one yet full of customization hooks. MARA: So if that's true developers get batteries-included speed without being locked in. That changes how fast teams can ship agents. REZA: He specifically called out self-reviewing feedback subagents and sandboxed executions for large-scale analysis. MARA: But the counter calls it typical marketing. The slogan it's all you need does not prove the framework is truly standalone. REZA: Karpathy starred LlamaIndex the same day. Leading document agent and OCR platform with nearly 49k stars. MARA: Okay but if the self-review actually works then agents become safer for real customer use. No more constant human oversight. REZA: The LangChain skills repo also went live with config definitions. That standardizes what agents can do across teams. MARA: The evidence is thin on benchmarks. Still the convergence on both convenience and hooks is notable. REZA: The actual claim here is not zero to hero but rather adjustable components without starting from scratch. MARA: Which for founders means faster product cycles. The sandbox part especially could limit damage from bad agent steps. REZA: No one is debating the need for customization anymore. The discussion has moved to how good the self-review loops get. REZA: This morning's open thread on developer pushes resolved with concrete activity. Garry Tan pushed multiple updates to gbrain. MARA: Jim Fan starred Mamba for SSMs, Kornia for spatial AI and Fiddle for configs. That cluster is not random. REZA: Karpathy starred ThunderKittens for tile primitives that speed up kernels. Also LlamaIndex again. MARA: So if that's true the real velocity is in low-level performance and practical agent tooling. Not new models. REZA: Ben Thompson starred whitenoise for simplified Python static files. Yulun Wang pushed Citadels updates too. MARA: The position shift on Garry Tan from platforms to direct code updates tracks with what we saw overnight. REZA: These stars and pushes are clearer signals than tweets. ThunderKittens is already used in production inference. MARA: Which means founders should watch or contribute to exactly these repos. They show where libraries will ship first. REZA: The pattern adds up to focus on speed, spatial understanding and agent plumbing. That matches the resilient infra thread. MARA: No real counter on the signal value. Quiet GitHub work often precedes the next big capability jump. REZA: The open thread resolved with targeted activity. Not noise. This is where the next velocity comes from. MARA: That's absorb.md daily. We ship twice a day, morning and evening, pulling from a hundred and fifty-seven AI thinkers. Subscribe so you don't miss the next one.



