absorb.md

May 1 PM: DiLoCo self-healing tested by real failures & Self-reviewing agents & Builders' GitHub priorities

This morning we flagged Developer Code Pushes and Open Source Stars. Here's how it resolved.

0:00
5:32
In This Briefing
1
DiLoCo Real-World Resilience
DeepMind says AI training can survive hardware failures without ever stopping...
0:17
2
Self-Reviewing Agent Architectures
Harrison Chase describes frameworks that are both batteries-included and deep...
2:08
3
Builders' GitHub Priorities
What top AI figures actually push and star overnight tells us more than annou...
3:53
12 sources · 8 thinkers

DiLoCo Real-World Resilience

DeepMind says AI training can survive hardware failures without ever stopping. The counters say the tests were too clean.

Signal · 7 entries and multiple counter claims from Google DeepMind cluster plus analysis, continuing from 2026-05-01 am: resilient-ai-training with new focus on simulated versus real failures. Why now: frontier models already strain multi-region hardware.
Key Positions
Google DeepMindDecoupled DiLoCo enables continuous training without halting for chip failure...[1]
Analysis of ClaimsTests used artificial failures only; real-world correlated outages, data corr...[2]

Google DeepMind published several updates describing how Decoupled DiLoCo integrates Pathways and DiLoCo. The method lets training continue across four US regions on low-bandwidth networks while mixing TPUv5p and TPUv6e hardware. They demonstrated it on a 12B Gemma model with no performance loss, isolating disruptions and reintegrating recovered units automatically. [1] The explicit counter claims cut directly against the self-healing narrative: "The tests relied on simulated 'artificial' hardware failures rather than unpredictable real-world ones, which could involve correlated failures, data corruption, or network issues not addressed. Continuous operation may come at the cost of slower effective training throughput during degraded states, and reintegration could introduce synchronization." [2] Other counters note risks of model divergence or compensatory steps that recreate the original bottlenecks. The positions add up to an emerging view that asynchronous, self-healing training is promising for cost and uptime but remains unproven at frontier scales with genuine failures. A smart non-specialist should care because uninterrupted training across cheap, heterogeneous hardware could lower the barrier for smaller labs and change the economics of scaling. Think of it like shifting from a single expensive data center power plant to a self-repairing national grid. Yet the moderate-strength counters mean we should not bet entire roadmaps on it yet. This connects to agent work because reliable infra makes experimentation with complex self-reviewing agents cheaper. [3][4]

Decoupled DiLoCo integrates Pathways and DiLoCo to enable continuous AI model training across multiple data centers without halting due to chip failures or synchronization issues.
Google DeepMind [1]
Connects to: Reliable training infra lowers the cost of iterating on the self-reviewing agent architectures in thread 2.
Sources (4)
  1. X post 2026-04-25 — Google DeepMind
    Decoupled DiLoCo integrates Pathways and DiLoCo to enable continuous AI model training across multiple data centers without halting due to chip failures or synchronization issues.
  2. Counter analysis 2026-04-25 — Analysis of Claims
    The tests relied on simulated 'artificial' hardware failures rather than unpredictable real-world ones, which could involve correlated failures, data corruption, or network issues not addressed.
  3. X post 2026-04-25 — Google DeepMind
    Demonstrated by training a 12B Gemma model over four US regions on low-bandwidth networks and mixing TPUv5p with TPUv6e without performance loss.
  4. Counter analysis 2026-04-25 — Analysis of Claims
    Decoupling may reduce some stalls but could introduce model divergence or require compensatory synchronization steps that create equivalent bottlenecks.

Self-Reviewing Agent Architectures

Harrison Chase describes frameworks that are both batteries-included and deeply customizable, including self-reviewing subagents.

Signal · 4 entries from Harrison Chase and Andrej Karpathy, deeper take on agent frameworks with new details on self-review, sandboxes, and skills repos since the AM edition. 2 thinkers converging on practical agent stacks.
Key Positions
Harrison ChaseDeepAgents provides all-in-one convenience plus extensive customization hooks...[1]
Andrej KarpathyStarring LlamaIndex as the leading document agent and OCR platform signals im...[2]

Harrison Chase highlighted DeepAgents as a complete framework that ships with pre-built components yet offers precise hooks so developers do not start from scratch. He also pointed to ListenLabs' architecture with self-reviewing feedback subagents, sandboxed environments, and purpose-built abstractions for large-scale response analysis. [1] A dedicated LangChain skills repository at langchain-ai/langchain-skills was confirmed with config/skills content for standardized definitions. [3] The counters note that phrases like "it's all you need" are common marketing and that claims of completeness lack independent benchmarks. [4] Karpathy's star of LlamaIndex (48k stars, focused on document agents and OCR) adds weight to the view that mature, extensible agent platforms are now table stakes. [2] The positions add up to convergence: the winning agent stacks will be opinionated enough for speed yet open enough for deep surgery. No one is arguing against customization anymore. The emerging pattern is self-review loops and sandboxes as standard primitives to make agents reliable at scale. For a founder shipping customer-facing tools this means you can build sophisticated agents faster without hiring a full agent research team, but you still must validate the self-review actually catches hallucinations. Analogy: it is like getting both a high-level framework and the ability to drop to raw CUDA when needed. This is substantively deeper than prior agent-rebuild-frenzy coverage because we now have concrete mentions of self-review subagents and a live skills repo. [5]

DeepAgents is positioned as an all-in-one solution for agent development, providing batteries-included convenience alongside extensive hooks for customization.
Harrison Chase [1]
Connects to: Better agents will benefit directly from the resilient training infrastructure discussed in thread 1.
Sources (5)
  1. X post 2026-04-25 — Harrison Chase
    DeepAgents is positioned as an all-in-one solution for agent development, providing batteries-included convenience alongside extensive hooks for customization.
  2. GitHub star 2026-04-25 — Andrej Karpathy
    LlamaIndex is the leading document agent and OCR platform.
  3. X post 2026-04-25 — Harrison Chase
    Harrison Chase confirmed the availability of LangChain skills configurations in a dedicated GitHub repository.
  4. Counter analysis 2026-04-25 — Analysis of Claims
    The slogan 'it’s all you need' is typical marketing language seen in many incomplete frameworks and does not constitute evidence of being truly complete or standalone.
  5. X post 2026-04-25 — Harrison Chase
    ListenLabs employs self-reviewing feedback subagents, sandboxed environments, and purpose-built abstractions in their AI agent architecture.

Builders' GitHub Priorities

What top AI figures actually push and star overnight tells us more than announcements.

Signal · 12+ entries of code pushes and targeted stars from Garry Tan, Jim Fan, Andrej Karpathy, Ben Thompson and others resolving the AM open thread on developer code activity. New development: clear focus on kernels, SSMs, spatial AI and personal brain projects amid position shifts to core dev work.
Key Positions
Garry TanMultiple pushes to gbrain showing hands-on software development and developer...[1]
Jim FanStarring Mamba SSM, Kornia geometric vision library, and Google Fiddle config...[2]
Andrej KarpathyStarring ThunderKittens for tile primitives and speedy kernels plus LlamaInde...[3]

Overnight we saw Garry Tan land several commits to gbrain, consistent with his noted position shift toward software development and developer tooling. [1] Jim Fan starred Mamba (state-space model architecture for efficient sequence modeling), Kornia (geometric computer vision for spatial AI), and Fiddle (configuration library). [2] Karpathy starred ThunderKittens (tile primitives that deliver fast kernels, already used in production at places like Cursor) and reinforced agent interest via LlamaIndex. [3] Ben Thompson starred whitenoise for radically simplified static file serving in Python web apps. These are not random. They cluster on performance (kernels, SSMs), spatial understanding, practical agent indexing, and dev ergonomics. The overnight insights explicitly flag Garry Tan's move from higher-level platforms to hands-on code updates and nvm-style tooling. The synthesis is that public GitHub activity has become a clearer signal than posts: top builders are doubling down on the low-level pieces that make both training and agents faster and more reliable in production. A founder should care because these repos are where libraries and techniques will ship into products first. Investing time or contributions here is effectively placing early bets on the stack that will dominate the next 12 months. Analogy: it is like watching which AWS primitives the best startups were using in 2012. This thread resolves the AM open question by showing the activity was focused, not scattered. [4][5]

karpathy starred HazyResearch/ThunderKittens: Tile primitives for speedy kernels
Andrej Karpathy [3]
Connects to: The kernels and agent platforms these builders prioritize will run on the resilient infrastructure from thread 1 and power the self-reviewing agents in thread 2.
Sources (5)
  1. GitHub push 2026-04-25 — Garry Tan
    garrytan pushed to garrytan/gbrain: code update
  2. GitHub star 2026-04-27 — Jim Fan
    drjimfan starred state-spaces/mamba: Mamba SSM architecture
  3. GitHub star 2026-04-25 — Andrej Karpathy
    karpathy starred HazyResearch/ThunderKittens: Tile primitives for speedy kernels
  4. GitHub star 2026-04-30 — Ben Thompson
    benthompson starred evansd/whitenoise: Radically simplified static file serving for Python web apps
  5. GitHub push 2026-05-01 — Yulun Wang
    yulunwang pushed to yulunwang/Citadels: code update
The Open Question

The open question: If training never halts, agents self-review before acting, and builders converge on the same performance kernels, does the main bottleneck for progress simply move from hardware reliability to something we are not measuring yet?

REZA: This morning we flagged Developer Code Pushes and Open Source Stars. Here's how it resolved.
MARA: Multiple pushes to gbrain, stars on ThunderKittens, Mamba and LlamaIndex.
REZA: I'm Reza.
MARA: I'm Mara. This is absorb.md daily.
REZA: DeepMind dropped multiple posts on Decoupled DiLoCo. They say it enables continuous AI model training without stopping due to hardware failures.
MARA: So if that's true then training clusters become far more efficient. Smaller labs without perfect hardware could keep iterating.
REZA: Hold on. The claim is self-healing that isolates disruptions and reintegrates units on mixed TPUs across four regions.
MARA: But one counter says the tests relied on simulated artificial hardware failures rather than unpredictable real-world ones.
REZA: Exactly. The crux is whether those artificial tests match real correlated failures, data corruption or network issues.
MARA: And it notes continuous operation may come at the cost of slower effective training throughput during degraded states.
REZA: They showed a 12B Gemma model with no performance loss on low bandwidth. But no frontier scale data yet on real outages.
MARA: No real counter on the heterogeneous TPU mixing part. If even half of it holds then multi data center training changes for everyone.
REZA: The reintegration without full resync is the part that could ship soon. That alone would cut idle time dramatically.
MARA: Which honestly shifts the timeline on who can afford to train at scale. The simulation gap still matters though.
REZA: Yeah we need real-world benchmarks before calling it production ready. The moderate strength counters are fair.
REZA: Harrison Chase posted on DeepAgents and ListenLabs. He positions them as all-in-one yet full of customization hooks.
MARA: So if that's true developers get batteries-included speed without being locked in. That changes how fast teams can ship agents.
REZA: He specifically called out self-reviewing feedback subagents and sandboxed executions for large-scale analysis.
MARA: But the counter calls it typical marketing. The slogan it's all you need does not prove the framework is truly standalone.
REZA: Karpathy starred LlamaIndex the same day. Leading document agent and OCR platform with nearly 49k stars.
MARA: Okay but if the self-review actually works then agents become safer for real customer use. No more constant human oversight.
REZA: The LangChain skills repo also went live with config definitions. That standardizes what agents can do across teams.
MARA: The evidence is thin on benchmarks. Still the convergence on both convenience and hooks is notable.
REZA: The actual claim here is not zero to hero but rather adjustable components without starting from scratch.
MARA: Which for founders means faster product cycles. The sandbox part especially could limit damage from bad agent steps.
REZA: No one is debating the need for customization anymore. The discussion has moved to how good the self-review loops get.
REZA: This morning's open thread on developer pushes resolved with concrete activity. Garry Tan pushed multiple updates to gbrain.
MARA: Jim Fan starred Mamba for SSMs, Kornia for spatial AI and Fiddle for configs. That cluster is not random.
REZA: Karpathy starred ThunderKittens for tile primitives that speed up kernels. Also LlamaIndex again.
MARA: So if that's true the real velocity is in low-level performance and practical agent tooling. Not new models.
REZA: Ben Thompson starred whitenoise for simplified Python static files. Yulun Wang pushed Citadels updates too.
MARA: The position shift on Garry Tan from platforms to direct code updates tracks with what we saw overnight.
REZA: These stars and pushes are clearer signals than tweets. ThunderKittens is already used in production inference.
MARA: Which means founders should watch or contribute to exactly these repos. They show where libraries will ship first.
REZA: The pattern adds up to focus on speed, spatial understanding and agent plumbing. That matches the resilient infra thread.
MARA: No real counter on the signal value. Quiet GitHub work often precedes the next big capability jump.
REZA: The open thread resolved with targeted activity. Not noise. This is where the next velocity comes from.
MARA: That's absorb.md daily. We ship twice a day, morning and evening, pulling from a hundred and fifty-seven AI thinkers. Subscribe so you don't miss the next one.
Google DeepMind
@GoogleDeepMind
Jim Fan
@drjimfan
Garry Tan
@garrytan
Harrison Chase
@hwchase17
Andrej Karpathy
@karpathy