BRIEFING · MAY 8, 2026 · 3 THREADS · 4:07

May 8 PM: Kernels deliver 20-60% LLM gains & Agent brains tackle forgetfulness & RL sims revived

This morning we flagged Foundational Tools and Efficient Kernels. Here's how it resolved.

0:00

4:07

In This Briefing

LLM Kernel Optimizations

Top engineers are doubling down on low-level kernels and inference proxies th...

0:20

Agent Brains and Modular Skills

Builders are giving LLMs persistent memory and composable skills so agents st...

1:44

RL Simulations and Online Learning

A key physical AI researcher is pulling classic online ML systems back into t...

2:52

5 sources · 5 thinkers

Thread 1 of 3

LLM Kernel Optimizations

Top engineers are doubling down on low-level kernels and inference proxies that deliver measured gains without new models.

Signal · 4 thinkers, 9 entries in last 14 hours, right after our AM thread on foundational tools. Why now: concrete benchmarks on memory and speed.

Key Positions

Andrej KarpathyStarred Liger-Kernel, signaling Triton kernels as practical path to faster ch...^[1]

Sebastian RaschkaStarred optillm for training-free reasoning gains and updated LLMs-from-scrat...^[2]

Analysis

Karpathy starred LinkedIn's Liger-Kernel which fuses operations like RMSNorm, RoPE, SwiGLU and CrossEntropy to deliver 20% higher training throughput and 60% lower memory usage on LLaMA-scale models while staying numerically exact. ^[1] Raschka engaged with optillm, an OpenAI-compatible proxy that layers 20+ techniques including MCTS, Mixture-of-Agents and planning to get 2-10x better accuracy on reasoning tasks at inference time without any retraining. ^[2] Together these moves show the community converging on software-level wins that let smaller teams train and run capable models on less hardware. The evidence from the benchmarks is unambiguous: context length can jump from 4k to 16k before OOM and alignment stages see up to 80% memory savings. This is not hype. It is the continuation of the 'make the obvious thing faster' philosophy that has driven deep learning progress for a decade. For a founder, this thread means your ability to iterate on custom models just got meaningfully cheaper. ^[1]^[2] It connects directly to the agent threads because faster training cycles accelerate experimentation on memory layers and RL policies.

Connects to: Direct callback to this morning's open thread on Foundational Tools and Efficient Kernels; the new stars and pushes show the bet is paying off with production-ready numbers.

Sources (2)

Liger-Kernel GitHub star 2026-05-08 — Andrej Karpathy
“Efficient Triton Kernels for LLM Training”
optillm GitHub star 2026-05-08 — Sebastian Raschka
“Optimizing inference proxy for LLMs”

Thread 2 of 3

Agent Brains and Modular Skills

Builders are giving LLMs persistent memory and composable skills so agents stop forgetting and can actually do useful work.

Signal · 3 thinkers, 6 entries, continuing the knowledge-base conversation from earlier AM editions but now focused on executable skills and anti-forgetfulness architectures.

Key Positions

François CholletStarred synalinks-skills for adding Claude-native skills, logical rules and k...^[1]

Garry TanMultiple code pushes to gbrain, iterating on memory, dossiers and protocols f...^[2]

Analysis

Chollet starred SynaLinks' synalinks-skills repository which equips Claude with reusable skills, memory primitives and neuro-symbolic patterns. ^[1] At the same time Tan pushed multiple updates to gbrain, his opinionated memory and context layer designed to combat the forgetfulness common in agent frameworks like OpenClaw and Hermes. ^[2] The emerging view is that raw next-token prediction is no longer enough. Agents need explicit memory architectures and skill libraries that persist across sessions. This thread adds up to a bet that the reliability gap in current agents will be closed by infrastructure layers rather than purely prompt engineering or larger models. A founder building customer-facing AI should care because an agent that reliably remembers context and composes skills is the difference between a demo and a product that ships. Analogy: think of it like giving Lambda functions a persistent filesystem and a plugin system in 2015; suddenly serverless became production-ready. ^[1]^[2]

Connects to: Builds on the efficiency thread because cheaper training makes it faster to experiment with these memory and skill layers.

Sources (2)

synalinks-skills GitHub star 2026-05-08 — François Chollet
“Claude skills for Synalinks OSS”
gbrain code pushes 2026-05-08 — Garry Tan
“code update”

Thread 3 of 3

RL Simulations and Online Learning

A key physical AI researcher is pulling classic online ML systems back into the toolkit for training agents in rich simulations.

Signal · Jim Fan plus convergence signals from the agent thread, 4 entries. Why now: simulation-to-real transfer is a bottleneck for robotics and physical AI.

Key Positions

Jim FanStarred both Unity ML-Agents for RL in games/simulations and Vowpal Wabbit fo...^[1]

François CholletSkills work complements simulation-trained agents by giving them executable p...^[2]

Analysis

Fan starred Unity's ML-Agents toolkit which lets developers train intelligent agents inside games and physics-rich simulations using PPO, SAC, multi-agent self-play and imitation learning. ^[1] He also starred Vowpal Wabbit, the battle-tested system built around online learning, hashing, reductions and active learning. ^[1] This is notable because it shows even researchers at the forefront of physical AI still see value in classic online learning primitives when paired with modern simulation environments. Chollet's skills work supplies the symbolic layer these RL agents can call once embodied. ^[2] The synthesis is that the path to capable physical agents runs through high-fidelity simulation plus hybrid neuro-symbolic interfaces rather than pure end-to-end learning. The SO WHAT is concrete: any company working on robotics or embodied AI now has cheaper, more accessible ways to generate training data and policies. This lowers the cost of the 'reality gap' problem that has slowed physical AI for years. ^[1]^[2]

“The Unity Machine Learning Agents Toolkit enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning”
— Jim Fan [1]

Connects to: Closes the loop with the agent brains thread; memory/skills provide the symbolic side while RL sims provide the grounded sensorimotor side.

Sources (2)

ml-agents GitHub star 2026-05-08 — Jim Fan
“The Unity Machine Learning Agents Toolkit enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning”
synalinks-skills GitHub star 2026-05-08 — François Chollet
“Claude skills for Synalinks OSS”

The Open Question

The open question: If kernels, agent memory layers, and simulation toolkits all improve 2-10x in the next 12 months, which labs or startups actually win versus those still betting purely on scale?

5 thinkers cited

Andrej Karpathy — Liger-Kernel GitHub star 2026-05-08
Sebastian Raschka — optillm GitHub star 2026-05-08
François Chollet — synalinks-skills GitHub star 2026-05-08
Garry Tan — gbrain code pushes 2026-05-08
Jim Fan — ml-agents GitHub star 2026-05-08

Transcript

REZA: This morning we flagged Foundational Tools and Efficient Kernels. Here's how it resolved.
MARA: Karpathy and Raschka both starred projects showing 20 percent throughput gains and 60 percent memory cuts.
REZA: I'm Reza.
MARA: I'm Mara. This is absorb.md daily.
REZA: Across our tracked thinkers the pattern is clear. Karpathy starred Liger-Kernel. Raschka starred optillm and pushed LLMs-from-scratch updates.
MARA: So if that's true then training a LLaMA-scale model just got a lot cheaper for anyone who can read a GitHub readme.
REZA: The Liger numbers are 20 percent higher throughput, 60 percent less memory, exact math, works with DeepSpeed.
MARA: But the part I keep getting stuck on is whether these kernel wins compound or if they're one-time gains.
REZA: Hold on. Optillm gives 2 to 10x reasoning lifts at inference with no retraining. That's the other half.
MARA: Okay but if that's true then smaller teams can now match frontier labs on efficiency. That's a real shift.
REZA: The crux is whether kernel and proxy work beats scaling model size. Data so far says both matter.
MARA: No direct contradictions today. The convergence itself is notable. Everyone is optimizing the stack they already have.
REZA: Exactly. This morning's thread resolved toward actionable efficiency instead of waiting for bigger clusters.
REZA: Next thread. Chollet starred synalinks-skills. Tan made repeated pushes to gbrain. Both target agent reliability.
MARA: Right and that's why the forgetfulness problem in agents might finally have a structural answer instead of prompt hacks.
REZA: Tan is iterating on memory dossiers and protocols. Chollet is adding executable skills and knowledge bases.
MARA: So in plain English that means agents can now remember context across sessions and call real tools reliably.
REZA: The emerging pattern is neuro-symbolic layers on top of LLMs. Not replacing them.
MARA: Okay but if that's true then every agent startup that ignored memory primitives just fell behind.
REZA: What's the actual claim here? Is it that skills beat scale or that they complement scale?
MARA: They complement. But the people building the complements are moving fast. That's the signal.
REZA: Last thread. Jim Fan starred both Unity ML-Agents and Vowpal Wabbit on the same day.
MARA: But the part I keep getting stuck on is why reach back to a classic online learning system in 2026.
REZA: Vowpal Wabbit brings hashing, reductions and active learning. ML-Agents brings rich physics sims and PPO.
MARA: So if that's true then simulation-to-real transfer for robotics just got better tooling overnight.
REZA: Fan is the clearest voice here. His stars suggest hybrid classic-plus-modern is winning for physical AI.
MARA: Which honestly is kind of terrifying for anyone who bet everything on pure transformer scaling for robots.
REZA: The discovery for me is how cleanly Chollet's skills work slots in as the symbolic interface once these agents are trained.
MARA: That connection across threads is the real story. Efficiency feeds agents which feed embodied systems.
REZA: No real counter on this one. The convergence is the notable part.
MARA: That's absorb.md daily. We ship twice a day, morning and evening, pulling from a hundred and fifty-seven AI thinkers. Subscribe so you don't miss the next one.

Featured thinkers