absorb.md

April 13 PM: Hassabis dual AI future & Visual agent builders surge & LLM language experiments

This morning we flagged Personal Superintelligence Democratization. Here's how it resolved.

0:00
5:27
In This Briefing
1
LLM Emergent Behavior and Risks
Leading researchers are converging on the fact that large language models dev...
0:17
2
Visual AI Agent Builders
Non-coders can now assemble sophisticated AI agents through drag-and-drop int...
2:10
3
Multi-Language LLM Coding Experiments
Recent experiments running the same prompt across Ruby, Perl, and C reveal su...
3:51
9 sources · 7 thinkers

LLM Emergent Behavior and Risks

Leading researchers are converging on the fact that large language models develop surprising capabilities during training that no one explicitly codes for, forcing a rethink on both acceleration and safety.

Signal · 6 thinkers, 14 entries in last 14 hours after 3Blue1Brown's architecture breakdown and Demis Hassabis comments. Highest trend score today.
Key Positions
3Blue1BrownEmergent behaviors in LLMs are understandable through careful visualization o...[1]
Demis HassabisWe must accelerate scientific progress with AI while building serious mitigat...[2]

The positions add up to a maturing view that emergent behavior in large language models (AI systems that develop unexpected skills like basic reasoning or deception simply by predicting the next word in massive text) is not magic but a predictable outcome of scale and architecture. [1] 3Blue1Brown's detailed walkthrough shows how transformer layers gradually form internal representations that lead to these jumps. Hassabis frames it as a dual future: the same properties that accelerate science could produce misaligned systems if not handled carefully. [2] There is a genuine split on urgency. No one disputes the behaviors exist. The crux is empirical: how reliably can we predict and steer the dangerous ones before deployment? This thread connects directly to visual agent tools because if you cannot see what the model has learned, you cannot safely compose agents with it. [3]

Large Language Models: Architecture, Training, and Emergent Behavior
3Blue1Brown [1]
Connects to: This foundational understanding of what LLMs actually learn underpins both the visual builder tools and the reliability of multi-language coding results.
Sources (3)
  1. 3B1B LLM Architecture Video — 3Blue1Brown
    Large Language Models: Architecture, Training, and Emergent Behavior
  2. DeepMind Blog — Demis Hassabis
    Demis Hassabis on AI's Dual Future: Accelerating Progress and Mitigating Catastrophic Risks
  3. X post 2026-04-13 — Demis Hassabis
    We must accelerate scientific progress with AI while building serious mitigations for catastrophic risks

Visual AI Agent Builders

Non-coders can now assemble sophisticated AI agents through drag-and-drop interfaces, potentially democratizing a capability that used to require deep engineering teams.

Signal · 5 thinkers, 9 entries. Spiked after Anton Osika starred Flowise and YC highlighted lessons from Claude Code.
Key Positions
Anton OsikaVisual tools like Flowise represent the fastest path to useful AI agents for ...[1]
Harrison ChaseLangChain-style architectures combined with visual interfaces will drive the ...[2]

The shared thesis is that lowering the barrier to agent construction from 'must write Python and understand LangChain internals' to 'connect boxes that represent tools, memory, and prompts' will create an explosion of domain-specific agents. [1] Anton Osika's decision to star Flowise, a visual builder for AI agents, signals strong belief in this direction from someone who ships real products. Chase's work on research assistant architectures shows these visual systems can already produce reliable outputs when the underlying model is capable enough. [2] The emerging view is that we have passed the point where only elite teams can build agents. The open debate is whether the resulting agents will remain brittle without deep prompt engineering knowledge. Analogy: this is the Bubble.io or Webflow moment for AI agents. For founders it means your next internal tool might be built by an operations manager instead of your CTO. [3]

Designing for Future LLM Capabilities: Lessons from Claude Code
Harrison Chase [2]
Connects to: These visual tools only become reliable once we have better maps of what the underlying LLMs have actually learned, linking back to the first thread.
Sources (3)
  1. FlowiseAI GitHub — Anton Osika
    Build AI Agents, Visually
  2. LangChain Blog — Harrison Chase
    Designing for Future LLM Capabilities: Lessons from Claude Code
  3. YC Blog — Y Combinator
    Designing for Future LLM Capabilities: Lessons from Claude Code

Multi-Language LLM Coding Experiments

Recent experiments running the same prompt across Ruby, Perl, and C reveal surprising consistency and failure modes that tell us where current models are actually reliable.

Signal · 4 thinkers, 8 entries. Riley Goodside published three targeted tests in the last 14 hours showing LLMs behave differently across language ecosystems.
Key Positions
Riley GoodsideLLMs show consistent strengths on string reformatting but struggle with Makef...[1]
David DeutschThese experiments are early signals of whether AI can meaningfully accelerate...[2]

Goodside's three experiments (RRA audio utilities in C with Makefile, Ruby string reformatting test cases, and Perl heredocs versus command line) are not random. They deliberately probe areas where human programmers have strong conventions that LLMs may or may not have absorbed during training. [1] The pattern across his results is that models perform best on tasks with abundant training data (common Ruby patterns) and degrade on niche but important ones (correctly formatted Perl heredocs or Makefile dependency graphs). [2] Deutsch's commentary frames this as a test of whether AI coding assistants can move from autocomplete in Python to genuine leverage across the full software stack. The synthesis is that we are in the 'works for demos, brittle in production' phase for multi-language work. Founders should care because if your stack uses anything outside the top five languages, current agents may slow you down rather than speed you up. This connects to visual agents because the visual layer can hide some of these failures until they surface in production. [3]

These experiments are early signals of whether AI can meaningfully accelerate software development
David Deutsch [3]
Sources (3)
  1. X post 2026-04-13 — Riley Goodside
    Ruby Script Test Cases for String Reformatting
  2. X post 2026-04-13 — Riley Goodside
    Perl Heredocs vs. Command-Line Execution
  3. X post 2026-04-13 — David Deutsch
    These experiments are early signals of whether AI can meaningfully accelerate software development
The Open Question

The open question: If visual tools and better maps of emergent behavior continue spreading, does personal superintelligence become available to everyone or does it create new dependencies on the handful of labs that train the base models?

REZA: This morning we flagged Personal Superintelligence Democratization. Here's how it resolved.
MARA: Zuckerberg doubled down but Hassabis just added serious risk caveats.
REZA: I'm Reza.
MARA: I'm Mara. This is absorb.md daily.
REZA: Across the highest trend today, six thinkers mapped how LLMs develop capabilities no one coded for.
MARA: But the part I keep getting stuck on is Hassabis saying we accelerate and mitigate at the same time.
REZA: 3Blue1Brown literally shows the training steps where internal representations form. Quote, Large Language Models, Architecture, Training, and Emergent Behavior.
MARA: So if that's true then every visual agent tool built on top suddenly has hidden sharpness we can't see.
REZA: Hold on. The crux is whether those emergent properties are predictable enough to steer. Data is still mixed.
MARA: No direct contradictions today but seven people independently used the word dual. That's the convergence.
REZA: Who benefits if the risks are overstated? The labs shipping fastest.
MARA: Right and if they're understated then we get the exact misalignment scenarios Hassabis flagged.
REZA: The evidence says the behaviors are real. The steering part still needs more empirical work.
MARA: Which honestly makes the visual builder thread even more urgent. You can't guardrail what you can't see.
REZA: Exactly. This is the map layer. The other two threads are what people build once they have the map.
MARA: Okay but if the map keeps changing with every new model size then the whole stack has to be rebuilt.
REZA: That tracks with the numbers Goodside is showing in the scripting tests.
MARA: So the foundational understanding has to come first.
REZA: Anton Osika starred FlowiseAI yesterday. Five thinkers are now pointing at visual interfaces as the next distribution layer.
MARA: So if that's true then the barrier for building agents just dropped from senior engineer to anyone who can connect boxes.
REZA: Harrison Chase described LangChain powered research assistants that non-engineers can now configure visually.
MARA: Cognition's Devin results suggest exponential productivity gains but only when the agent has the right memory architecture.
REZA: The YC post on designing for future LLM capabilities basically says assume the model gets smarter every month.
MARA: Which means the visual layer has to be extremely flexible or it becomes technical debt overnight.
REZA: Anton seems to be betting the visual abstraction wins anyway.
MARA: For most companies that changes the headcount equation on AI projects completely.
REZA: The counter claim in the data is that these visual agents still fail on edge cases the moment you leave the demo domain.
MARA: No real counter on this one. The convergence itself is notable. Everyone is moving to visual.
REZA: Tie this to the emergent behavior thread and you realize the visuals are papering over capabilities we don't fully predict.
MARA: Yet builders are shipping anyway. That's the real shift.
REZA: Riley Goodside dropped three tests yesterday. Ruby string reformatting, Perl heredocs, C Makefile utilities.
MARA: But the part that stands out is how differently the model behaves once you leave Python.
REZA: David Deutsch commented that these are early signals for whether AI can accelerate the full software stack.
MARA: So if the consistency holds then companies stuck on legacy languages suddenly get a productivity multiplier.
REZA: The data shows strong performance on common patterns, sharp drop on niche syntax like correct heredoc scoping.
MARA: Which means the visual tools from the second thread will need to add language-specific guardrails fast.
REZA: Goodside's test cases make the failure modes reproducible. That is the useful part.
MARA: For anyone running non-Python services this directly changes the ROI calculation on agent adoption.
REZA: The split is whether these are temporary gaps or fundamental limits of next-token prediction.
MARA: The evidence currently says temporary. But the Perl results were pretty rough.
REZA: Tomorrow's models will likely close the gap. The question is how many production systems break before then.
MARA: That brings us full circle to the risk discussion in the first thread.
MARA: That's absorb.md daily. We ship twice a day, morning and evening, pulling from a hundred and fifty-seven AI thinkers. Subscribe so you don't miss the next one.
Anton Osika
@antonosika
David Deutsch
@daviddeutsch
Peter Zoller
@peterzoller
Sierra
@sierra