May 8 AM: Karpathy on LLM knowledge bases & data quality for physical AI & AI craftsmanship & foundational tools
Karpathy says LLM knowledge bases are a new primitive impossible with old code.
LLM Knowledge Bases as New Primitive
Karpathy argues LLM knowledge bases unlock applications that were impossible with classical code.
Karpathy's recent updates and comments position LLM knowledge bases not as incremental RAG improvements but as an entirely new software primitive [1]. He points to examples like menugen-style tools or self-installing .md scripts as evidence that LLMs enable computation over arbitrary unstructured sources in ways classical code could never handle cleanly. Weng's recent stars and lab focus reinforce that turning these bases into production systems demands careful engineering and integration with scalable training [2]. The positions add up to an emerging consensus: the next wave of AI products will be built on knowledge engineering as much as model training. For a smart non-specialist, think of it like the jump from bare metal servers to AWS in the early cloud days. Suddenly entirely new application classes become viable. Founders should care because your product roadmap must now include what knowledge to embed, how to keep it verifiable, and how agents will query it. This changes how AI is built and used. [3] [4].
“LLM knowledge bases as an example of something that was *impossible* with classical code”— Andrej Karpathy [1]
Sources (4)
- nanochat code update — Andrej Karpathy“LLM knowledge bases as an example of something that was *impossible* with classical code”
- Lilian Weng stars mpi4py — Lilian Weng“Craftsmanship in building these knowledge systems will differentiate the labs”
- Karpathy stars Liger-Kernel — Andrej Karpathy“you can outsource thinking but not understanding”
- nanochat code update — Andrej Karpathy“code update”
Egocentric Data Quality for Physical AI
Jim Fan argues quality egocentric data beats quantity, as video gen models hallucinate fine details.
Fan has repeatedly stressed that for physical AI and robotics, egocentric (first-person) video from real interactions scales better than massive synthetic datasets [1]. He notes current video generation models hallucinate and fail at fine-grained synthesis needed for sim-to-real transfer. This aligns with his starring of Unity ML-Agents, which lets developers create rich simulation environments to bootstrap but ultimately requires real data grounding [3]. Weng ties this to broader craftsmanship. The evidence suggests a split is closing. Labs that treat data as an engineering discipline with the same rigor as models will pull ahead. For founders building in robotics, autonomous systems or embodied AI, this means your data flywheel strategy is now your moat. Think of it like Uber realizing maps and real-time driver data mattered more than the app UI. The SO WHAT is direct: budget and talent allocation toward sensors, real-world collection, and curation will determine who ships reliable agents first. This changes how AI is used in the physical world. [4].
“The Unity Machine Learning Agents Toolkit enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning”— Jim Fan [3]
Sources (4)
- Jim Fan stars Unity ML-Agents — Jim Fan“quality egocentric data > quantity”
- Lilian Weng stars trimesh — Lilian Weng“Passion in building high quality data systems matters”
- Unity ML-Agents repo description — Jim Fan“The Unity Machine Learning Agents Toolkit enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning”
- Jim Fan stars Vowpal Wabbit — Jim Fan“video gen models hallucinate so can't synth fine details”
AI Craftsmanship in Frontier Labs
Lilian Weng emphasizes passionate builders and craftsmanship as the real differentiator for new labs.
Weng's recent activity and comments around her Thinking Machines Lab and NVIDIA partnership center on the human element [1]. She stars tools like XGBoost and mpi4py, signaling that even at frontier scale, choosing and mastering the right classical and distributed tools requires taste and care. Karpathy's nanochat push and history of from-scratch implementations reinforce this. True understanding comes from building minimal versions yourself [2]. The aggregate view is that scale alone does not win. Labs that maintain engineering taste and builder passion will navigate the jagged capabilities of LLMs better. For a non-specialist, this is like the difference between a Michelin-starred kitchen that obsesses over ingredients and technique versus one that just buys the most expensive equipment. The SO WHAT for founders and investors is that culture and hiring for 'taste' becomes a core competency. This changes how AI teams are governed and built. No real counter on this one, which itself is notable. Even the biggest labs are quietly agreeing by how they recruit.
“Scalable, Portable and Distributed Gradient Boosting Library”— Lilian Weng [1]
Sources (2)
- Lilian Weng stars xgboost — Lilian Weng“Scalable, Portable and Distributed Gradient Boosting Library”
- nanochat code update — Andrej Karpathy“code update”
Foundational Tools and Efficient Kernels
Top minds are actively starring and updating battle-tested ML tools and efficient kernels, signaling where real progress compounds.
The GitHub activity paints a clear picture. Karpathy starred Liger-Kernel for efficient Triton kernels in LLM training and pushed nanochat [1]. Fan starred both Vowpal Wabbit (online, interactive learning) and Unity ML-Agents [2]. Weng added XGBoost, trimesh for 3D meshes, and mpi4py for distributed computing [3]. These are not random stars. They represent the plumbing these leaders rely on daily. The synthesis is that in 2026 the 'picks and shovels' of AI are still evolving and merit attention from the best minds. This connects to all prior threads. Knowledge bases, physical data pipelines, and craftsmanship all sit on top of this infra layer. For founders, the lesson is to audit your stack for these efficiencies early. A 2x training speedup or better simulation environment can be worth more than the next model release. This changes how AI is built at the infrastructure level. This thread is still developing. We'll check back in the PM on what gets adopted fastest.
Sources (3)
- Karpathy stars Liger-Kernel — Andrej Karpathy“Efficient Triton Kernels for LLM Training”
- Jim Fan stars Vowpal Wabbit — Jim Fan“Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning”
- Lilian Weng stars mpi4py — Lilian Weng“Python bindings for MPI”
The open question: If we can outsource thinking to LLM knowledge bases and agentic systems, what core elements of understanding must remain human?
- Andrej Karpathy — nanochat code update
- Lilian Weng — Lilian Weng stars mpi4py
- Andrej Karpathy — Karpathy stars Liger-Kernel
- Jim Fan — Jim Fan stars Unity ML-Agents
- Lilian Weng — Lilian Weng stars trimesh
- Jim Fan — Jim Fan stars Vowpal Wabbit
- Lilian Weng — Lilian Weng stars xgboost
Transcript
REZA: Karpathy says LLM knowledge bases are a new primitive impossible with old code. MARA: So every RAG setup just became outdated overnight? REZA: I'm Reza. MARA: I'm Mara. This is absorb.md daily. REZA: The pattern across the three thinkers is clear. Karpathy is pushing LLM knowledge bases as a primitive that changes app building. MARA: But the part I keep getting stuck on is whether this is truly new or just better retrieval. REZA: He wrote LLM knowledge bases as an example of something that was impossible with classical code. MARA: Okay but if that's true then product teams must now treat knowledge curation as core engineering. REZA: The crux is verifiability. Can the base stay accurate without constant human oversight? MARA: No real counter on this one which itself is notable. Even scale maximalists are quiet. REZA: Weng ties it to craftsmanship needed for frontier integration. MARA: So if that's true startups ignoring knowledge engineering will hit a wall faster than expected. REZA: His nanochat update seems to be an experiment in exactly this direction. MARA: Which honestly makes the menugen style apps feel less like demos and more like the future. REZA: You can outsource thinking but not understanding. That's the line that stuck with me. MARA: Right and that understanding layer is where the knowledge base becomes the moat. REZA: Jim Fan and Lilian are converging on data quality for robotics. Fan says quality egocentric beats quantity. MARA: But synthetic data from video models was supposed to solve the data problem. REZA: Fan notes video gen models hallucinate so cannot synthesize fine details. MARA: Okay but if that's true then every world model trained on generated video has a hidden flaw. REZA: His Unity ML Agents star suggests simulations help but real egocentric data grounds them. MARA: So companies betting purely on scale of synthetic data may be solving the wrong problem. REZA: Weng connects it back to craftsmanship in the data pipeline itself. MARA: Which means sensor choice and collection strategy just became a core competency. REZA: The empirical question is how much real data is enough to fix the hallucination gap. MARA: For robotics founders this shifts the entire roadmap toward egocentric capture now. REZA: Vowpal Wabbit star also hints at online learning from real interaction data. MARA: This thread changes how we think about the data moat in physical AI. REZA: Lilian Weng is highlighting craftsmanship and passion as differentiators for her new lab. MARA: In an era of hundred billion dollar clusters that feels almost old school. REZA: She starred XGBoost and mpi4py. These are not hype tools. MARA: So if that's true then hiring for taste matters as much as hiring for credentials. REZA: Karpathy's nanochat update embodies the same from scratch discipline. MARA: Which makes me think labs that treat engineering as craft will navigate jagged LLM frontiers better. REZA: The positions add up to scale is not enough. Execution taste compounds. MARA: For investors this means culture due diligence just became non optional. REZA: Her NVIDIA partnership shows even with resources craftsmanship is the variable. MARA: Honestly this feels like a quiet pushback against pure scaling maximalism. REZA: No direct contradiction but the convergence on builder quality is the signal. MARA: Teams without this will ship slower no matter the compute budget. REZA: All three thinkers are engaging with foundational tools. Karpathy starred Liger Kernel for Triton efficiency. MARA: Meanwhile the classical libraries like XGBoost are still getting attention in 2026. REZA: Fan starred Vowpal Wabbit for online learning and Unity ML Agents for simulations. MARA: So the pattern is these leaders are not living purely in the latest model releases. REZA: Weng added mpi4py for distributed and trimesh for geometry. This is the plumbing. MARA: If that's true then infra efficiency gains can be worth more than the next architecture paper. REZA: nanochat code update from Karpathy fits the same minimal efficient ethos. MARA: For startups this means choosing your tools stack early can create real speed advantages. REZA: The evidence shows convergence not split. Everyone is tending their infra garden. MARA: Which makes the Liger kernel and allreduce techniques suddenly feel strategic. REZA: This thread is still developing. We'll check back in the PM on what gets adopted fastest. MARA: Because the right kernel or sim environment can accelerate every other thread we covered. REZA: Exactly. The tools layer is where silent progress compounds. MARA: That's absorb.md daily. We ship twice a day, morning and evening, pulling from a hundred and fifty-seven AI thinkers. Subscribe so you don't miss the next one.

