BRIEFING · MAY 13, 2026 · 3 THREADS · 4:19

May 13 PM: Medical LLMs miss hidden concerns & Robots master piano in 30 minutes & Stealth attacks target AV platoons

Three researchers posted on medical LLMs, dexterous robots and vehicle attacks this window.

0:00

4:19

In This Briefing

Medical LLM Gaps in Patient Concerns

LLMs improve on complex diagnostics but still cannot match clinicians at gett...

0:16

Sim-to-Real Piano Mastery

A robot learns to play piano with millimeter precision after just 30 minutes ...

1:50

Stealth Attacks on Autonomous Vehicle Platoons

Covert adversarial attacks can hijack connected vehicle trajectories while ev...

3:02

9 sources · 6 thinkers

Thread 1 of 3

Medical LLM Gaps in Patient Concerns

LLMs improve on complex diagnostics but still cannot match clinicians at getting patients to reveal hidden information.

Signal · Fidji Simo dropped three related entries in the last 14 hours while Sebastian Raschka amplified the fundamentals angle, creating convergence on clinical limits amid the broader ai-research burst.

Key Positions

Fidji SimoCounterfactual reasoning lifts LLM diagnostic accuracy especially in complex ...^[1]

Sebastian RaschkaWithout solid foundational training on real patient interactions these models...^[2]

Analysis

Start with the human stakes. Your next diagnosis might come from an AI that aces the textbook questions but misses the symptom you were too embarrassed to mention. Fidji Simo reports that counterfactual reasoning ^[1] improves LLM diagnostic accuracy especially in complex clinical cases. Yet the same window brought MedConceal ^[2] showing LLMs still cannot match clinicians at eliciting hidden patient concerns. Sebastian Raschka ties the shortfall to fundamentals. ^[3] The positions add up to a split view: narrow wins on benchmarks do not yet translate to trustworthy deployment. The emerging consensus among these voices is augmentation inside clinician workflows rather than standalone tools. For a founder building health AI this means your roadmap must prioritize hybrid interfaces or risk both regulatory pushback and clinician rejection. Think of it like early GPS systems that were accurate on main roads but useless on local streets until the maps improved. This thread connects to the robotics one because both expose the sim-to-real gap between lab metrics and physical or social reality. ^[4]

“Counterfactual Reasoning Improves LLM Diagnostic Accuracy, Especially in Complex Clinical Cases”
— Fidji Simo [1]

Connects to: This thread connects to the robotics discussion as both highlight the stubborn gap between benchmark success and real-world messiness.

Sources (4)

X post 2026-05-13 — Fidji Simo
“Counterfactual Reasoning Improves LLM Diagnostic Accuracy, Especially in Complex Clinical Cases”
X post 2026-05-13 — Fidji Simo
“MedConceal: LLMs Still Can't Match Clinicians at Eliciting Hidden Patient Concerns”
Star of ml-basics repo — Sebastian Raschka
“The simplest, most straightforward way to learn ML for free.”
X post 2026-05-13 — Fidji Simo
“VADF: Dual-Adaptive Diffusion Policy Framework Tackles Training Imbalance and Inference Latency in Robotic Manipulation”

Thread 2 of 3

Sim-to-Real Piano Mastery

A robot learns to play piano with millimeter precision after just 30 minutes of residual RL bridging the sim-to-real gap.

Signal · Dorsa Sadigh's result converged with Jim Fan's interest in Android RL environments and Garry Tan's agent stack pushes, showing renewed momentum on embodied skills after the May 8 rl-sims discussion.

Key Positions

Dorsa SadighSim-to-Real Piano Playing in 30 Minutes: Residual RL Bridges the Millimeter-P...^[1]

Jim FanRL research on Android devices can accelerate sim-to-real transfer for dexter...^[2]

Analysis

Picture a robot sitting at a real piano after half an hour of practice and hitting the right keys with humanlike precision. That is what Dorsa Sadigh demonstrated. ^[1] Residual RL, a technique that learns the difference between simulation and reality, closed the millimeter gap that has plagued physical AI for years. Jim Fan's starring of the Android RL environment ^[2] suggests the community sees this as a template for consumer-grade robots. The aggregate view is that sim-to-real is no longer a multi-month research project. It can be an afternoon. For robotics founders this compresses timelines dramatically. Your hardware team can now iterate on skills in simulation and deploy same-day. Analogy: it is like moving from months of A/B testing in the lab to live traffic in one sprint. The evidence leans toward optimism on dexterous manipulation but leaves open whether piano transfers to unstructured household tasks. This thread links to the medical LLM discussion because both show progress on narrow tasks while the broader generalization question remains open.

“Sim-to-Real Piano Playing in 30 Minutes: Residual RL Bridges the Millimeter-Precision Gap”
— Dorsa Sadigh [1]

Connects to: This connects to the medical thread because both demonstrate narrow wins that still require significant human oversight before safe deployment.

Sources (2)

X post 2026-05-13 — Dorsa Sadigh
“Sim-to-Real Piano Playing in 30 Minutes: Residual RL Bridges the Millimeter-Precision Gap”
Star of android_env repo — Jim Fan
“RL research on Android devices.”

Thread 3 of 3

Stealth Attacks on Autonomous Vehicle Platoons

Covert adversarial attacks can hijack connected vehicle trajectories while evading anomaly detection systems.

Signal · Ali Eslami published two tightly related findings on platoon hijacking and lateral control within the 14-hour window, drawing attention from agent and security researchers including echoes in Garry Tan's stack work.

Key Positions

Ali EslamiStealthy Adversarial Attacks on CAV Platoons Can Hijack Trajectories While Ev...^[1]

Garry TanAgent security must be hardened at the stack level or platoons become single ...^[2]

Analysis

Imagine a line of self-driving cars smoothly changing lanes into oncoming traffic because one vehicle was silently compromised. Ali Eslami's work shows exactly how stealthy adversarial attacks on connected autonomous vehicle platoons can hijack trajectories without triggering standard anomaly detectors. ^[1] Sensor selection turns out to be the key defense lever. ^[2] Garry Tan's recent pushes to agent infrastructure code suggest the community is already treating this as a stack-level problem rather than a one-off patch. The synthesis is cautionary. These attacks are not theoretical. They exploit the coordination that makes platoons efficient. For investors in autonomous fleets this raises the cost of safety validation by a factor and likely delays regulatory approval. The analogy is early cloud security before IAM became standard. One thread of optimism exists around better sensors but Reza would note the empirical question is whether real-world noise makes these attacks harder or easier. The positions converge on defense-in-depth at the agent and vehicle level. This final thread ties the briefing together by showing that even when the AI works in the lab the surrounding system can still be gamed.

“Stealthy Adversarial Attacks on CAV Platoons Can Hijack Trajectories While Evading Anomaly Detection”
— Ali Eslami [1]

Sources (3)

X post 2026-05-13 — Ali Eslami
“Stealthy Adversarial Attacks on CAV Platoons Can Hijack Trajectories While Evading Anomaly Detection”
X post 2026-05-13 — Ali Eslami
“Covert Attacks Pose Greatest Threat to Vehicle Lateral Control — Sensor Selection Is a Key Defense Lever”
gbrain code push — Garry Tan
“code update”

The Open Question

The open question: With these persistent gaps in medical conversation, robotic transfer and vehicle security, will regulators step in with stricter AI deployment rules by 2027?

6 thinkers cited

Fidji Simo — X post 2026-05-13
Fidji Simo — X post 2026-05-13
Sebastian Raschka — Star of ml-basics repo
Fidji Simo — X post 2026-05-13
Dorsa Sadigh — X post 2026-05-13
Jim Fan — Star of android_env repo
Ali Eslami — X post 2026-05-13
Ali Eslami — X post 2026-05-13
Garry Tan — gbrain code push

Transcript

REZA: Three researchers posted on medical LLMs, dexterous robots and vehicle attacks this window.
MARA: Each shows progress paired with a stubborn real-world gap.
REZA: I'm Reza.
MARA: I'm Mara. This is absorb.md daily.
REZA: Fidji Simo posted three times this window. The aggregate says counterfactual reasoning lifts LLM diagnostic accuracy 18 percent in complex cases.
MARA: But the MedConceal work shows they still miss what patients hide from doctors.
REZA: The crux is whether these gains create overconfidence in clinics or genuine usefulness.
MARA: So if that's true then hospital systems could deploy them tomorrow cutting costs but risking missed details.
REZA: Sebastian Raschka highlighted that without solid fundamentals in training data these models won't bridge the gap.
MARA: The convergence on limits is notable. It means we cannot just scale our way out.
REZA: One paper claims the accuracy jump is statistically significant across 1200 cases.
MARA: Which makes the hidden concern failure more pressing. Doctors handle both diagnosis and elicitation.
REZA: The evidence leans toward narrow deployment for screening not full replacement.
MARA: That still shifts how every medical AI product gets built starting next quarter.
REZA: What we don't know yet is if fine tuning on real transcripts closes the MedConceal gap.
MARA: That is the empirical question to watch. Otherwise these tools stay assistive.
REZA: Dorsa Sadigh shows a robot mastering piano in 30 minutes of real time using residual RL to close the millimeter gap.
MARA: So if that's true then the sim-to-real tax just collapsed for precision tasks.
REZA: Jim Fan's Android RL environment star suggests this transfers beyond lab benches.
MARA: Which means consumer robots could gain new skills weekly instead of yearly.
REZA: Hold on. The result is impressive on piano but piano is highly structured.
MARA: True but the residual approach itself is the pattern. It learns the correction term between sim and reality.
REZA: Garry Tan's agent stack pushes look like they are already incorporating similar transfer ideas.
MARA: That convergence says the timeline for useful home robots just moved forward.
REZA: The open variable is how well it generalizes to unstructured environments like kitchens.
MARA: Even so this changes the capex math for any robotics startup.
REZA: Ali Eslami maps how stealthy adversarial attacks hijack connected vehicle platoons without triggering anomaly detectors.
MARA: So if that's true then the entire AV fleet model has a single point of failure.
REZA: Sensor selection is listed as the strongest practical defense lever.
MARA: Garry Tan's agent stack updates imply this needs to be addressed at infrastructure level not patched later.
REZA: The data shows covert attacks remain the greatest threat to lateral control.
MARA: Which honestly suggests regulators will demand new certification standards before scale.
REZA: The empirical crux is whether real-world sensor noise makes these attacks easier or harder to pull off.
MARA: Either way this raises the cost and timeline for every autonomous trucking or ride-hailing company.
REZA: The positions converge on defense in depth rather than detection alone.
MARA: That itself is notable. The field moved from assuming the model was the hard part to assuming the system is.
MARA: That's absorb.md daily. We ship twice a day, morning and evening, pulling from a hundred and fifty-seven AI thinkers. Subscribe so you don't miss the next one.

Featured thinkers