April 27 PM: Non-coders build apps in 10 minutes faces reality check & Kimi traces debate & Builders bet on agents and kernels
Non-engineers building working apps in 10 minutes.
Non-Engineer Prototyping Reality
Product managers are shipping multi-feature apps in under 10 minutes without writing code. The question is whether these are truly functional products or convincing visual demos.
Lenny Rachitsky argues a new generation of AI development tools has fundamentally compressed the prototype-to-feedback loop for product managers, enabling functional multi-page apps to be built without coding knowledge in under 10 minutes. [1] These tools fall into three categories: chatbots, cloud development environments like Bolt and v0, and local developer assistants such as Cursor, each with different ceilings on hosting, backend support and production readiness. Bolt runs server code in the browser sandbox with no persistent state or auth. v0 and Lovable can deploy to real cloud infrastructure and integrate with Supabase or GitHub. Yet the counter-claim is sharp: 'The demonstration conflates functional-looking with functional. A price-filter slider that animates in a browser sandbox and a CRM UI with an AI email writer field are visual prototypes, not working software — they likely lack real data persistence, actual API calls, and error handling.' [2] The evidence suggests these tools genuinely change who can explore product ideas and how fast teams can get feedback, but production readiness still requires traditional engineering. For founders this is the closest thing yet to an Uber moment for product development: non-technical team members can now validate concepts at the speed of conversation. This thread connects to the others because the prototypes increasingly rely on the agent and eval infrastructure the rest of the community is building. [3]
“Evals, not prompt engineering, are the primary determinant of AI product quality and the defining skill for AI PMs in 2025 and beyond.”— Lenny Rachitsky [3]
Sources (3)
- Lenny's Newsletter - AI Prototyping Guide — Lenny Rachitsky“A new generation of AI development tools (Cursor, Replit, Bolt, v0, Lovable) has fundamentally compressed the prototype-to-feedback loop for product managers, enabling functional multi-page apps to be built without coding knowledge in under 10 minute...”
- Lenny's Newsletter counter-claim synthesis — Lenny Rachitsky synthesis“The demonstration conflates 'functional-looking' with 'functional.' A price-filter slider that animates in a browser sandbox and a CRM UI with an AI email writer field are visual prototypes, not working software — they likely lack real data persisten...”
- Lenny's Newsletter - Evals for PMs — Lenny Rachitsky“Evals, not prompt engineering, are the primary determinant of AI product quality and the defining skill for AI PMs in 2025 and beyond.”
Kimi 2.6 Long Traces Debate
An open-weights model produces a 74-page internal monologue on a poetry test. Is this breakthrough reasoning or inefficient verbosity?
Ethan Mollick reports that Kimi 2.6 Thinking, an open-weights model, delivers impressive reasoning capabilities, producing a 74-page trace on the Lem Test and adequate creative outputs like TiKZ unicorns and twigl shaders. [1] It falls short on advanced tasks such as composing a sestina and exhibits rough edges compared to closed-state-of-the-art models. The performance gap mirrors historical disparities between open and closed leaders. Yet the counter is direct: 'A 74-page trace likely reflects excessive verbosity, repetition, or inefficient token usage rather than high-quality reasoning, as evidenced by the merely okay-ish final answer; length alone does not validate depth or effectiveness of the thinking process.' [2] A second synthesis adds that it 'could primarily indicate token inefficiency, verbosity without focus, or compensatory overthinking rather than sophisticated reasoning.' What the positions add up to is cautious optimism: open-weights models are advancing fast enough to be useful for many tasks, but teams evaluating them for production should trust real-world outputs and final-answer quality more than trace length or benchmark scores. This matters for any company choosing between self-hosted open models and expensive closed APIs. The thread links to the first because many of these new prototypes now call models like Kimi under the hood.
“Kimi 2.6 Thinking, an open-weights model, delivers impressive reasoning capabilities, producing a 74-page trace on the Lem Test and adequate creative outputs like TiKZ unicorns and twigl shaders.”— Ethan Mollick [1]
Sources (2)
- Ethan Mollick X post on Kimi 2.6 — Ethan Mollick“Kimi 2.6 Thinking, an open-weights model, delivers impressive reasoning capabilities, producing a 74-page trace on the Lem Test and adequate creative outputs like TiKZ unicorns and twigl shaders.”
- Ethan Mollick X post synthesis — Ethan Mollick synthesis“A 74-page trace likely reflects excessive verbosity, repetition, or inefficient token usage rather than high-quality reasoning, as evidenced by the merely 'okay-ish' final answer.”
Agent and Kernel Infrastructure Bets
Top AI engineers are not waiting for the next foundation model. They are actively reinforcing the layers underneath agents and efficient execution.
While foundation model labs chase bigger clusters, a different group of thinkers is reinforcing the practical infrastructure that makes agents reliable and fast. Andrej Karpathy starred run-llama/llama_index, the leading document agent and OCR platform with nearly 49k stars, and HazyResearch/ThunderKittens for tile primitives that deliver speedy kernels. [1] Jim Fan starred state-spaces/mamba (the SSM architecture positioned as a transformer alternative), kornia for geometric computer vision and spatial AI, and google/fiddle. [2] Harrison Chase, creator of LangChain, pushed fresh code updates to langchain-ai/deepagents and starred tavily-ai/tavily-python for high-quality search, extraction, crawl and research capabilities that agents actually need. [3] The emerging view is that the next performance gains will come from better tooling for long-running agents, efficient non-transformer architectures like Mamba, optimized kernels, and reliable retrieval rather than simply scaling parameters. For any founder or engineering leader this is a clear signal on where to allocate talent and capex: the boring but compounding layers that determine whether prototypes become products. This thread grounds the first two: the prototypes non-engineers are building and the models they call both sit on top of this infra.
“starred run-llama/llama_index: LlamaIndex is the leading document agent and OCR platform”— Andrej Karpathy [1]
Sources (3)
- Karpathy stars LlamaIndex — Andrej Karpathy“starred run-llama/llama_index: LlamaIndex is the leading document agent and OCR platform”
- Jim Fan stars Mamba — Jim Fan“starred state-spaces/mamba: Mamba SSM architecture”
- Harrison Chase deepagents pushes — Harrison Chase“hwchase17 pushed to langchain-ai/deepagents: code update”
The open question: If non-engineers can prototype working software in minutes and evals replace prompt engineering, does this democratize product building or simply move the real bottlenecks downstream?
- Lenny Rachitsky — Lenny's Newsletter - AI Prototyping Guide
- Lenny Rachitsky — Lenny's Newsletter - Evals for PMs
- Ethan Mollick — Ethan Mollick X post on Kimi 2.6
- Ethan Mollick synthesis — Ethan Mollick X post synthesis
- Andrej Karpathy — Karpathy stars LlamaIndex
- Jim Fan — Jim Fan stars Mamba
- Harrison Chase — Harrison Chase deepagents pushes
Transcript
REZA: Non-engineers building working apps in 10 minutes. MARA: But are they actually working or just demos? REZA: I'm Reza. MARA: I'm Mara. This is absorb.md daily. REZA: Lenny Rachitsky posted that non-engineers can build functional multi-page apps in under 10 minutes using tools like Bolt, v0 and Lovable. MARA: But the counter claim says the demonstration conflates functional-looking with functional. Those are visual prototypes lacking persistence and real API calls. REZA: Exactly. The 10-minute figure appears anecdotal. Bolt runs everything in a browser sandbox so no persistent state. MARA: So if that's true then every PM and designer can now test five ideas a day instead of one a month. That changes hiring. REZA: Hold on. The crux is what counts as functional. A slider that animates is not the same as one that queries a real database. MARA: Right but at some point we accept these tools are getting better fast. Lovable integrates with Supabase and GitHub. REZA: I discovered the categories matter. Cloud environments beat pure chatbots once you need auth or backend logic. MARA: Which means non-technical founders can reach product-market fit faster. The leverage here is enormous. REZA: Still, production readiness separates the threads. This is the mandatory contradiction thread and the counter seems to be winning on current evidence. MARA: Okay but the speed of iteration itself is the product win even if engineers finish the job. REZA: Fair. The empirical test is how many of these 10-minute apps survive to production in the next quarter. REZA: Across two posts Ethan Mollick said Kimi 2.6 Thinking produced a 74-page trace on the Lem Test and decent creative outputs. MARA: But the counter is that a 74-page trace likely reflects excessive verbosity rather than high-quality reasoning. REZA: Yes. The final answer was only okay-ish. Length alone does not validate depth. MARA: So if that's true then companies evaluating open models should ignore trace length and test final outputs on their actual tasks. REZA: The persistent gap to closed SoTA still exists. Kimi beats some benchmarks but struggles with sestinas and polish. MARA: I didn't realize the TiKZ unicorn and twigl shader were adequate but not exceptional. That matches what I've seen. REZA: The crux is whether long internal monologues actually improve synthesis. Current evidence says not reliably. MARA: Still, for cost-sensitive internal tools this is real progress. Open weights let you inspect everything. REZA: True. But for customer-facing products the rough edges probably push teams back toward closed models. MARA: Which honestly makes the decision framework for any AI PM more complicated than last year. REZA: Exactly. Benchmarks mislead. Real usage is what matters. REZA: In the last day Karpathy starred LlamaIndex for document agents and ThunderKittens for fast kernels. MARA: Jim Fan went for Mamba, Kornia spatial vision library, and Fiddle. Harrison Chase pushed deepagents code and starred Tavily search. REZA: The pattern is clear. These builders are reinforcing agents, efficient architectures, and reliable retrieval instead of waiting for bigger models. MARA: So if that's true then the next compounding advantage comes from infra layers, not just parameter count. REZA: I discovered Harrison made three separate pushes to deepagents. That suggests active development on more autonomous agents. MARA: Tavily gives agents high-quality search without the hallucination tax. That alone could lift every prototype we discussed earlier. REZA: Mamba as a transformer alternative plus fast kernels from ThunderKittens points at efficiency focus. MARA: For founders this tells you exactly where to point your engineering hires. Agents that actually remember and use tools. REZA: No real counter on this one. The convergence across three independent engineers is the notable signal. MARA: Which makes the prototyping tools from thread one far more powerful when they sit on this stack. REZA: Agreed. The boring layers are suddenly the highest leverage. MARA: That's absorb.md daily. We ship twice a day, morning and evening, pulling from a hundred and fifty-seven AI thinkers. Subscribe so you don't miss the next one.




