absorb.md

April 27 PM: Non-coders build apps in 10 minutes faces reality check & Kimi traces debate & Builders bet on agents and kernels

Non-engineers building working apps in 10 minutes.

0:00
5:16
In This Briefing
1
Non-Engineer Prototyping Reality
Product managers are shipping multi-feature apps in under 10 minutes without ...
0:11
2
Kimi 2.6 Long Traces Debate
An open-weights model produces a 74-page internal monologue on a poetry test....
2:01
3
Agent and Kernel Infrastructure Bets
Top AI engineers are not waiting for the next foundation model. They are acti...
3:35
7 sources · 6 thinkers

Non-Engineer Prototyping Reality

Product managers are shipping multi-feature apps in under 10 minutes without writing code. The question is whether these are truly functional products or convincing visual demos.

Signal · Lenny Rachitsky's two frameworks plus strong counter-claims drew the highest product-strategy convergence score; this is the sharpest tension in the last 14 hours.
Key Positions
Lenny RachitskyNew AI tools like Bolt, v0, Lovable and Cursor compress prototype-to-feedback...[1]
Product observersThe demos conflate functional-looking UIs with actual working software that h...[2]

Lenny Rachitsky argues a new generation of AI development tools has fundamentally compressed the prototype-to-feedback loop for product managers, enabling functional multi-page apps to be built without coding knowledge in under 10 minutes. [1] These tools fall into three categories: chatbots, cloud development environments like Bolt and v0, and local developer assistants such as Cursor, each with different ceilings on hosting, backend support and production readiness. Bolt runs server code in the browser sandbox with no persistent state or auth. v0 and Lovable can deploy to real cloud infrastructure and integrate with Supabase or GitHub. Yet the counter-claim is sharp: 'The demonstration conflates functional-looking with functional. A price-filter slider that animates in a browser sandbox and a CRM UI with an AI email writer field are visual prototypes, not working software — they likely lack real data persistence, actual API calls, and error handling.' [2] The evidence suggests these tools genuinely change who can explore product ideas and how fast teams can get feedback, but production readiness still requires traditional engineering. For founders this is the closest thing yet to an Uber moment for product development: non-technical team members can now validate concepts at the speed of conversation. This thread connects to the others because the prototypes increasingly rely on the agent and eval infrastructure the rest of the community is building. [3]

Evals, not prompt engineering, are the primary determinant of AI product quality and the defining skill for AI PMs in 2025 and beyond.
Lenny Rachitsky [3]
Connects to: The prototypes created here are only as good as the underlying agent frameworks and evals discussed in the other threads.
Sources (3)
  1. Lenny's Newsletter - AI Prototyping Guide — Lenny Rachitsky
    A new generation of AI development tools (Cursor, Replit, Bolt, v0, Lovable) has fundamentally compressed the prototype-to-feedback loop for product managers, enabling functional multi-page apps to be built without coding knowledge in under 10 minute...
  2. Lenny's Newsletter counter-claim synthesis — Lenny Rachitsky synthesis
    The demonstration conflates 'functional-looking' with 'functional.' A price-filter slider that animates in a browser sandbox and a CRM UI with an AI email writer field are visual prototypes, not working software — they likely lack real data persisten...
  3. Lenny's Newsletter - Evals for PMs — Lenny Rachitsky
    Evals, not prompt engineering, are the primary determinant of AI product quality and the defining skill for AI PMs in 2025 and beyond.

Kimi 2.6 Long Traces Debate

An open-weights model produces a 74-page internal monologue on a poetry test. Is this breakthrough reasoning or inefficient verbosity?

Signal · Ethan Mollick published two detailed assessments within hours; the provided contradictions directly challenge the quality of the long traces, continuing but deepening the kimi-traces-debate from 2026-04-25 AM.
Key Positions
Ethan MollickKimi 2.6 Thinking is a strong open-weights performer that generated a 74-page...[1]
Critics via Mollick coverageA 74-page trace likely reflects excessive verbosity, repetition or inefficien...[2]

Ethan Mollick reports that Kimi 2.6 Thinking, an open-weights model, delivers impressive reasoning capabilities, producing a 74-page trace on the Lem Test and adequate creative outputs like TiKZ unicorns and twigl shaders. [1] It falls short on advanced tasks such as composing a sestina and exhibits rough edges compared to closed-state-of-the-art models. The performance gap mirrors historical disparities between open and closed leaders. Yet the counter is direct: 'A 74-page trace likely reflects excessive verbosity, repetition, or inefficient token usage rather than high-quality reasoning, as evidenced by the merely okay-ish final answer; length alone does not validate depth or effectiveness of the thinking process.' [2] A second synthesis adds that it 'could primarily indicate token inefficiency, verbosity without focus, or compensatory overthinking rather than sophisticated reasoning.' What the positions add up to is cautious optimism: open-weights models are advancing fast enough to be useful for many tasks, but teams evaluating them for production should trust real-world outputs and final-answer quality more than trace length or benchmark scores. This matters for any company choosing between self-hosted open models and expensive closed APIs. The thread links to the first because many of these new prototypes now call models like Kimi under the hood.

Kimi 2.6 Thinking, an open-weights model, delivers impressive reasoning capabilities, producing a 74-page trace on the Lem Test and adequate creative outputs like TiKZ unicorns and twigl shaders.
Ethan Mollick [1]
Connects to: Prototypes built by non-engineers are only useful if the underlying models they call actually deliver reliable reasoning.
Sources (2)
  1. Ethan Mollick X post on Kimi 2.6 — Ethan Mollick
    Kimi 2.6 Thinking, an open-weights model, delivers impressive reasoning capabilities, producing a 74-page trace on the Lem Test and adequate creative outputs like TiKZ unicorns and twigl shaders.
  2. Ethan Mollick X post synthesis — Ethan Mollick synthesis
    A 74-page trace likely reflects excessive verbosity, repetition, or inefficient token usage rather than high-quality reasoning, as evidenced by the merely 'okay-ish' final answer.

Agent and Kernel Infrastructure Bets

Top AI engineers are not waiting for the next foundation model. They are actively reinforcing the layers underneath agents and efficient execution.

Signal · Eight GitHub stars and pushes in the window from Karpathy, Jim Fan and Harrison Chase cluster around agents, search, state-space models, computer vision primitives and fast kernels.
Key Positions
Andrej KarpathyStarred llama_index (leading document agent and OCR platform) and ThunderKitt...[1]
Jim FanStarred Mamba SSM architecture, Kornia geometric computer vision library, and...[2]
Harrison ChasePushed multiple code updates to langchain-ai/deepagents and starred Tavily Py...[3]

While foundation model labs chase bigger clusters, a different group of thinkers is reinforcing the practical infrastructure that makes agents reliable and fast. Andrej Karpathy starred run-llama/llama_index, the leading document agent and OCR platform with nearly 49k stars, and HazyResearch/ThunderKittens for tile primitives that deliver speedy kernels. [1] Jim Fan starred state-spaces/mamba (the SSM architecture positioned as a transformer alternative), kornia for geometric computer vision and spatial AI, and google/fiddle. [2] Harrison Chase, creator of LangChain, pushed fresh code updates to langchain-ai/deepagents and starred tavily-ai/tavily-python for high-quality search, extraction, crawl and research capabilities that agents actually need. [3] The emerging view is that the next performance gains will come from better tooling for long-running agents, efficient non-transformer architectures like Mamba, optimized kernels, and reliable retrieval rather than simply scaling parameters. For any founder or engineering leader this is a clear signal on where to allocate talent and capex: the boring but compounding layers that determine whether prototypes become products. This thread grounds the first two: the prototypes non-engineers are building and the models they call both sit on top of this infra.

starred run-llama/llama_index: LlamaIndex is the leading document agent and OCR platform
Andrej Karpathy [1]
Connects to: The prototypes and models in threads 1 and 2 ultimately run on the agent, search and kernel foundations these builders are reinforcing.
Sources (3)
  1. Karpathy stars LlamaIndex — Andrej Karpathy
    starred run-llama/llama_index: LlamaIndex is the leading document agent and OCR platform
  2. Jim Fan stars Mamba — Jim Fan
    starred state-spaces/mamba: Mamba SSM architecture
  3. Harrison Chase deepagents pushes — Harrison Chase
    hwchase17 pushed to langchain-ai/deepagents: code update
The Open Question

The open question: If non-engineers can prototype working software in minutes and evals replace prompt engineering, does this democratize product building or simply move the real bottlenecks downstream?

REZA: Non-engineers building working apps in 10 minutes.
MARA: But are they actually working or just demos?
REZA: I'm Reza.
MARA: I'm Mara. This is absorb.md daily.
REZA: Lenny Rachitsky posted that non-engineers can build functional multi-page apps in under 10 minutes using tools like Bolt, v0 and Lovable.
MARA: But the counter claim says the demonstration conflates functional-looking with functional. Those are visual prototypes lacking persistence and real API calls.
REZA: Exactly. The 10-minute figure appears anecdotal. Bolt runs everything in a browser sandbox so no persistent state.
MARA: So if that's true then every PM and designer can now test five ideas a day instead of one a month. That changes hiring.
REZA: Hold on. The crux is what counts as functional. A slider that animates is not the same as one that queries a real database.
MARA: Right but at some point we accept these tools are getting better fast. Lovable integrates with Supabase and GitHub.
REZA: I discovered the categories matter. Cloud environments beat pure chatbots once you need auth or backend logic.
MARA: Which means non-technical founders can reach product-market fit faster. The leverage here is enormous.
REZA: Still, production readiness separates the threads. This is the mandatory contradiction thread and the counter seems to be winning on current evidence.
MARA: Okay but the speed of iteration itself is the product win even if engineers finish the job.
REZA: Fair. The empirical test is how many of these 10-minute apps survive to production in the next quarter.
REZA: Across two posts Ethan Mollick said Kimi 2.6 Thinking produced a 74-page trace on the Lem Test and decent creative outputs.
MARA: But the counter is that a 74-page trace likely reflects excessive verbosity rather than high-quality reasoning.
REZA: Yes. The final answer was only okay-ish. Length alone does not validate depth.
MARA: So if that's true then companies evaluating open models should ignore trace length and test final outputs on their actual tasks.
REZA: The persistent gap to closed SoTA still exists. Kimi beats some benchmarks but struggles with sestinas and polish.
MARA: I didn't realize the TiKZ unicorn and twigl shader were adequate but not exceptional. That matches what I've seen.
REZA: The crux is whether long internal monologues actually improve synthesis. Current evidence says not reliably.
MARA: Still, for cost-sensitive internal tools this is real progress. Open weights let you inspect everything.
REZA: True. But for customer-facing products the rough edges probably push teams back toward closed models.
MARA: Which honestly makes the decision framework for any AI PM more complicated than last year.
REZA: Exactly. Benchmarks mislead. Real usage is what matters.
REZA: In the last day Karpathy starred LlamaIndex for document agents and ThunderKittens for fast kernels.
MARA: Jim Fan went for Mamba, Kornia spatial vision library, and Fiddle. Harrison Chase pushed deepagents code and starred Tavily search.
REZA: The pattern is clear. These builders are reinforcing agents, efficient architectures, and reliable retrieval instead of waiting for bigger models.
MARA: So if that's true then the next compounding advantage comes from infra layers, not just parameter count.
REZA: I discovered Harrison made three separate pushes to deepagents. That suggests active development on more autonomous agents.
MARA: Tavily gives agents high-quality search without the hallucination tax. That alone could lift every prototype we discussed earlier.
REZA: Mamba as a transformer alternative plus fast kernels from ThunderKittens points at efficiency focus.
MARA: For founders this tells you exactly where to point your engineering hires. Agents that actually remember and use tools.
REZA: No real counter on this one. The convergence across three independent engineers is the notable signal.
MARA: Which makes the prototyping tools from thread one far more powerful when they sit on this stack.
REZA: Agreed. The boring layers are suddenly the highest leverage.
MARA: That's absorb.md daily. We ship twice a day, morning and evening, pulling from a hundred and fifty-seven AI thinkers. Subscribe so you don't miss the next one.
Jim Fan
@drjimfan
Ethan Mollick
@emollick
Harrison Chase
@hwchase17
Andrej Karpathy
@karpathy
Lenny Rachitsky
@lennysrachitsky