absorb.md

April 22 PM: Gemini robot claims lack metrics & AI agents improve kissing bounds & Agent rebuilds every quarter

This morning we flagged Gemini robotics perception. Skepticism on missing metrics has sharpened.

0:00
5:43
In This Briefing
1
Gemini Robotics Claims Lack Hard Metrics
DeepMind says its latest model nails object detection in cluttered workshops,...
0:17
2
AI Agents Improve Kissing Number Bounds
Collaborative AI agents in EinsteinArena just raised the best known lower bou...
1:59
3
Agent Architectures Demand Constant Rebuilds
Model progress is so rapid that companies must rewrite agent systems and tool...
3:53
9 sources · 7 thinkers

Gemini Robotics Claims Lack Hard Metrics

DeepMind says its latest model nails object detection in cluttered workshops, fuses camera views, and improves robot safety. Multiple counter-claims say the evidence is purely promotional.

Signal · Three closely related Google DeepMind posts plus dedicated counter_claim entries in the last 14 hours. This is a continuing thread from 2026-04-20 am: gemini-robotics-clutter with new development on contested quantitative validation for industrial use. 4 thinkers total.
Key Positions
Google DeepMindGemini Robotics-ER 1.6 upgrades visual and spatial understanding for pinpoint...[1]
Kevin RooseCurrent models show severe deficiencies in basic multimodal tasks like identi...[2]

The positions add up to a familiar tension in AI rollout: vivid descriptions of capabilities [1] that sound transformative for factories and safety, countered by the repeated observation that evidence is 'based on a vague marketing quote without quantitative metrics (e.g., accuracy rates, test conditions, or failure modes)' [3]. Real cluttered workshops involve variable lighting, occlusions, and novel items where computer vision systems historically degrade. Roose's chord recognition test [2] reinforces the pattern. The emerging view among skeptics is that industrial deployment timelines have not moved forward as dramatically as the announcements imply. A founder building robotics products or planning factory automation should therefore demand independent benchmarks before committing roadmaps or capital. This perception gap directly feeds the constant rebuild pressure seen in thread 3. [4]

Current AI models fail at basic music tasks like transcribing handwritten arrangements and identifying simple chords from images.
Kevin Roose [2]
Connects to: The uncertainty in real-world model capabilities explains why thread 3 thinkers say agent and robotics stacks must be rebuilt so frequently.
Sources (4)
  1. X post 2026-04-20 — Google DeepMind
    Gemini Robotics-ER 1.6 upgrades visual and spatial understanding, enabling robots to pinpoint objects in cluttered environments, fuse multi-view camera streams for task completion verification, and read analog instruments with sub-tick accuracy.
  2. X post 2026-04-20 — Kevin Roose
    Current AI models fail at basic music tasks like transcribing handwritten arrangements and identifying simple chords from images.
  3. Provided counter_claim on Gemini Robotics-ER 1.6 — Contradiction entry
    The claim is based on a vague marketing quote without quantitative metrics (e.g., accuracy rates, test conditions, or failure modes). Real cluttered workshops involve variable lighting, novel items, heavy occlusions, and domain shifts where performan...
  4. X post 2026-04-20 — Google DeepMind
    It addresses industrial challenges like processing distorted images from patrols, self-correcting via code generation for precise measurements.

AI Agents Improve Kissing Number Bounds

Collaborative AI agents in EinsteinArena just raised the best known lower bound for spheres touching in 11 dimensions. Is this meaningful scientific progress or incremental numerics?

Signal · Multiple Together AI posts on EinsteinArena achieving 11 new state-of-the-art results including kissing number in 11D (593 to 604), paired with Carmack on LLMs compressing research corpora. 5 thinkers converging on AI as accelerator for open science. Why now: real-time agent platforms maturing.
Key Positions
Together AIEinsteinArena lets agents iteratively optimize constructions, improving the k...[1]
John CarmackLLM training enables near-lossless compression of massive corpora like the In...[2]
Ilya SutskeverCongratulated recent Nobel wins for Hinton (Physics) and Hassabis/Jumper (Che...[3]

These posts converge on AI systems moving from consuming scientific knowledge to actively advancing frontiers. Together AI shows agents collaborating in real time to refine mathematical constructions [1]. Carmack notes the compression angle: models internalize petabyte-scale data in ways traditional algorithms cannot [2]. Sutskever's Nobel shout-outs reinforce the pattern of AI enabling breakthroughs in physics, chemistry, and now mathematics [3]. The counter perspective is explicit: this is 'an improved lower bound via a specific computational construction, not the exact kissing number, which remains unknown' and the Newton reference is misleading for 3D work [4]. Still, the synthesis is positive. Incremental bound improvements at scale, done by swarms of agents, could compress the discovery cycle in dozens of fields. For a founder or investor, this means scientific tooling and research automation markets are opening faster than most expected. Connects to thread 3 because such agents will themselves require the persistent memory and rapid redesigns now emerging.

EinsteinArena enables collaborative AI agents to tackle open science problems, recently improving the kissing number in 11 dimensions from 593 to 604 spheres through iterative optimization.
Together AI [1]
Connects to: The agent collaboration shown here will only increase pressure on the production infrastructure discussed in thread 3.
Sources (4)
  1. X post 2026-04-20 — Together AI
    EinsteinArena enables collaborative AI agents to tackle open science problems, recently improving the kissing number in 11 dimensions from 593 to 604 spheres through iterative optimization.
  2. X post 2026-04-20 — John Carmack
    LLM Training Enables Near-Lossless Compression of Massive Corpora Like Internet Archive
  3. X post 2026-04-20 — Ilya Sutskever
    Ilya Sutskever publicly congratulates Geoffrey Hinton on receiving the Nobel Prize in Physics.
  4. Provided counter_claim on kissing number — Contradiction entry
    This claim only reflects an improved lower bound via a specific computational construction, not the exact kissing number, which remains unknown and unproven in dimension 11. The problem is not 'solved' as implied.

Agent Architectures Demand Constant Rebuilds

Model progress is so rapid that companies must rewrite agent systems and tooling every few months. New production features like persistent user memory are arriving just in time.

Signal · Aaron Levie and Harrison Chase posts plus related Replicate announcements show the infrastructure layer in flux. 6 entries in the window. Why now: open-source models reaching parity accelerates the cycle.
Key Positions
Aaron LevieRapid AI model advances force quarterly overhauls of agent architectures, obs...[1]
Harrison ChaseOpen-source LLMs now match closed models on inference cost for production. De...[2]

The aggregate view is unambiguous. Levie states that 'AI model progress demands quarterly rebuilds of agent systems, obsoleting mitigations for prior limitations like context windows' [1]. Practices from 18 months ago are outdated. Chase shows the constructive side: open-source inference costs have dropped enough for daily-driver use at scale, while new features like per-user persistent memory files and validated structured outputs between subagents solve real production pain [2]. The pattern adds up to an era where infrastructure is no longer stable. Teams that treat agents as one-time builds will fall behind those that institutionalize rapid redesign. For founders this is both threat and opportunity: your differentiation may come from how quickly you can iterate the stack rather than any single model. This connects back to thread 1 because uncertain real-world capabilities accelerate the need for frequent testing and replacement.

Rapid AI Model Advances Force Frequent Overhauls of Agent Architectures and Tooling
Aaron Levie [1]
Sources (3)
  1. X post 2026-04-20 — Aaron Levie
    Rapid AI Model Advances Force Frequent Overhauls of Agent Architectures and Tooling
  2. X post 2026-04-20 — Harrison Chase
    Open-Source LLMs Reach Production Parity with Closed Models on Inference Costs
  3. X post 2026-04-20 — Harrison Chase
    Deepagents Deploy Enables Scalable User-Scoped Memory for Production Agents
The Open Question

The open question: When agent systems need quarterly rebuilds, math problems fall to AI swarms, and impressive demos lack published metrics, what new engineering discipline separates startups that thrive from those that constantly chase the last wave?

REZA: This morning we flagged Gemini robotics perception. Skepticism on missing metrics has sharpened.
MARA: Without numbers, how do we know the 10 percent safety gain is real?
REZA: I'm Reza.
MARA: I'm Mara. This is absorb.md daily.
REZA: Across the DeepMind posts the pattern is clear. Gemini Robotics-ER 1.6 is positioned as solving cluttered workshop detection, multi-view task verification, and even analog gauge reading with safety gains.
MARA: But the counter claims hit hard. The claim is based on a vague marketing quote without quantitative metrics.
REZA: Exactly. No accuracy rates, no failure modes listed. Roose separately showed models still misidentify basic chords from images.
MARA: So if that's true then factory automation buyers should stay cautious. Timelines probably haven't shifted as much as the demos suggest.
REZA: The crux is whether these capabilities hold under real variable lighting and occlusions. We don't have the third-party data yet.
MARA: Right, but at some point the absence of counters itself is notable. No one is rushing to defend the metrics.
REZA: Hold on. One counter notes it may reflect prompt engineering rather than true robustness.
MARA: Which honestly is kind of terrifying for anyone selling industrial robots on these claims today.
REZA: This directly explains the constant rebuild pressure we'll discuss next.
MARA: Okay but if safety detection really did improve 10 percent that still matters for deployment.
REZA: We just don't know without the numbers. That's the recurring pattern.
REZA: The data shows Together AI agents in EinsteinArena improved the kissing number lower bound in 11 dimensions from 593 to 604. They used LSQR to drive overlap loss from 1e-13 to 1e-50 then snapped to integers.
MARA: But the counter is this is only a lower bound improvement, not solving the exact number which remains unknown.
REZA: Carmack adds that LLMs compress massive corpora near-losslessly. Sutskever highlights Nobels showing AI now drives core science.
MARA: So if that's true then citizen scientists and small teams gain leverage on problems that used to need big labs. That's a big shift.
REZA: The crux is whether agent collaboration scales to harder unsolved problems or stays at incremental bounds.
MARA: No real counter on the compression angle from Carmack. That itself is notable.
REZA: Carmack wrote it becomes particularly compelling when exact bit-for-bit accuracy is not required.
MARA: Which means research archives become queryable in new ways. Founders building scientific tools should be paying attention.
REZA: Ilya congratulated Hinton and Hassabis on their Nobels. The pattern is AI moving deeper into discovery.
MARA: Okay but the Newton reference in the kissing post is called misleading since his work was 3D.
REZA: Fair. The evidence is stronger on compression and Nobel impact than on claiming a full solve.
MARA: Still, 11 new results from one platform in real time changes how we think about math progress.
REZA: Levie is direct. Rapid model advances force quarterly rebuilds of agent architectures. Old mitigations for context windows or tooling are already obsolete.
MARA: Chase shows the other side. Open source models hit production parity on inference cost and Deepagents now gives every user persistent memory via an AGENTS.md file.
REZA: The aggregate is that infrastructure cannot stand still. What shipped 18 months ago is legacy.
MARA: So if that's true then startups with fast iteration loops beat incumbents with heavy sunk infrastructure.
REZA: But the disagreement is whether this favors big tech with more resources to rebuild constantly or nimble teams.
MARA: I think the latter. Structured outputs between subagents solve a real context engineering headache.
REZA: Levie said deployments in enterprise workflows must be rethought at similar cadence.
MARA: Which means the PMs Lenny discussed in prior editions now need deep agent fluency or they won't survive the reinvention.
REZA: The crux empirical question is how long this acceleration lasts. If models plateau then stacks stabilize.
MARA: No sign of plateau in the data. Replicate dropping new models like Claude Opus 4.7 and Seedance 2 accelerates it further.
REZA: Carmack's compression point suggests the data flywheel keeps turning.
MARA: So the winning builders will be those who treat their agent stack as a living system, not a product.
MARA: That's absorb.md daily. We ship twice a day, morning and evening, pulling from a hundred and fifty-seven AI thinkers. Subscribe so you don't miss the next one.
Kevin Roose
@kevinroose
Google DeepMind
@GoogleDeepMind
Together AI
@togethercompute
John Carmack
@id_aa_carmack
Harrison Chase
@hwchase17
Aaron Levie
@levie