absorb.md

About Andrej Karpathy

Former Tesla AI director, OpenAI co-founder. LLM Knowledge Bases pattern. Neural Networks: Zero to Hero.

Andrej Karpathy is a leading AI researcher, educator, and founder of Eureka Labs, previously co-founding OpenAI and serving as Director of AI at Tesla where he pioneered vision-based self-driving systems. His thinking emphasizes building deep intuition for neural networks through minimal from-scratch implementations ('Neural Networks: Zero to Hero'), conceptualizing LLMs as lossy compressors of internet-scale human knowledge that function as dynamic, compiled knowledge bases, and anticipating an agentic future where natural language interfaces and AI agents replace traditional software, code-heavy workflows, and proprietary memory systems. He champions Software 2.0 (data-driven differentiable programs), explicit local file-based personal wikis for user sovereignty and personalization, structured reasoning patterns like argument/counter-argument and Chain-of-Thought, while maintaining a practical focus on efficient tooling, multimodality, and societal legibility.

Democratizing Deep Learning Education

Karpathy's core contribution is making neural network fundamentals accessible through hands-on, minimal implementations that build intuition before relying on high-level frameworks. His 'Neural Networks: Zero to Hero' series and associated repositories start with micrograd (~150-line scalar autograd engine and 50-line NN library demonstrating backpropagation from first principles) [19][32], progress through character-level models with makemore (bigram counting to MLP to WaveNet-like hierarchical convolutions) [33][36][37], reproduce GPT-2 (124M params from scratch with proper init, mixed precision, and torch.compile) [25][30], and culminate in full Transformer language models [17][18]. Lectures cover manual backprop, activations/gradients diagnostics, BatchNorm, initialization to prevent saturation/vanishing gradients, tokenization (minbpe reproducing GPT-4's BPE) [21][34][35], and even llama2.c for training tiny models in pure C [20]. The approach assumes only basic Python/calculus and prioritizes understanding over efficiency, viewing neural nets as sequences of matrix multiplies and nonlinearities that yield emergent intelligence when scaled [26][17].

LLMs as Lossy Compressors and Knowledge Bases

Karpathy describes LLMs as the ultimate compression of human knowledge: pretraining on filtered internet text (e.g., FineWeb) via next-token prediction distills ~10-15T tokens into billions/trillions of parameters (~100x lossy compression), creating stochastic simulators of internet documents capable of regurgitation, hallucination, and in-context learning [23][28][1]. The 'Compilation Thesis' positions model weights themselves as the primary interface replacing search engines and wikis, with RAG patching gaps in the lossy representation [1]. Base models are turned into assistants via post-training on dialogue datasets (encoding special tokens, imitating labelers, adding tools like search to refresh context) [23][28][24]. Chain-of-Thought acts as directed context compaction/reduction, inheriting wiki-like structural properties for guided summarization [10]. Recent extensions include Bibby AI for LaTeX and multimodal models like GPT-4o [14][15].

The Rise of AI Agents and New Software Paradigms

Karpathy predicts AI agents will supplant traditional CRUD software, dashboards, and human-written code with conversational UIs that understand intent and execute multi-step workflows [2]. 'Prompt Requests' replace pull requests; users share high-level abstract ideas (not messy vibe-coded PRs from free-tier ChatGPT), and agents customize implementations [11][12]. Agents excel at practical tasks like converting diverse EPUBs to clean markdown (outperforming dedicated tools via reasoning) [13], generating git commits [16], and building custom knowledge bases. This shifts sharing from specific code to ideas, redirecting tokens toward knowledge manipulation. He notes platform implications (cheaper read vs. expensive write APIs on X/xAI to manage AI traffic and improve legibility) [4][5] and praises high-quality communities like GitHub Gists over X for constructive, less AI-slop comments [6].

Personal Knowledge Management, Explicit Wikis, and Data Sovereignty

A recurring theme is explicit, user-controlled personalization versus implicit, provider-locked memory. Farzapedia exemplifies maintaining a local, navigable wiki of LLM-generated knowledge (markdown files + images, Obsidian-compatible, Unix-tool interoperable) that agents can read/write, allowing BYOAI (including fine-tuned open-source models) and full control/interoperability [7][12][1]. This contrasts with proprietary systems and positions file-based memory as future-proof. Agents simplify wiki management; the pattern redirects focus from code to knowledge ingestion/compilation. Karpathy extends this to societal scale: AI reverses government 'legibility' by letting citizens parse bills, budgets, lobbying graphs, and voting records, enhancing democratic accountability and transparency (with acknowledged misuse risks) [9].

Software 2.0, Scaling, Emergence, and Real-World Applications

Neural networks function as general-purpose differentiable computers ('Software 2.0') where behavior emerges from optimizing simple mathematical expressions (matrix multiplies + nonlinearities) on massive curated datasets rather than hand-coded rules [26][29]. Tesla's vision-only Autopilot exemplified this: millions of fleet images, end-to-end nets, self-supervised pretraining, and custom hardware (Dojo) bounded only by data scale [29]. Early CV papers laid groundwork—DenseCap for dense image captioning with FCLN, multimodal CNN-RNN alignment, fragment-level embeddings for retrieval, PixelCNN++ improvements, LSTM interpretability for long-range dependencies, and ImageNet's role in scaling recognition [42][48][50][39][44][49]. Karpathy views biological evolution as a bootloader for inefficient human minds, with synthetic AIs potentially resolving Fermi-like questions via physics exploitation [26].

Reasoning Patterns, Critical Thinking, and Practical LLM Use

Beyond answers, the highest-value LLM outputs are structured argument/counter-argument pairs that steelman opposing views and expose blind spots, countering confirmation bias more effectively than direct answers [3]. CoT prompting serves as a reduction operation for directed context compaction, akin to wiki summarization, enhancing reasoning in expansive contexts [10]. Practical usage involves selecting models (reasoning 'thinking' variants via RL, multimodal with native audio/images), integrating tools (search, code interpreters, file uploads) to mitigate hallucinations by refreshing working memory, using notebooks like NotebookLM or Cursor, and cautious verification [24][28][23]. Multimodal end-to-end models like GPT-4o (human-like audio latency, strong non-English performance) exemplify progress [15].

Minimalist Implementations, Tooling, and Platform Reflections

Karpathy's gists and repos emphasize simplicity for education and edge deployment: optimized LSTMs in Torch/NumPy with batched forward/backward and gradient checks [43][46][47], NES for black-box optimization, policy gradients for Pong with debugging notes on common TF pitfalls, slerp for Stable Diffusion video, L2 normalization layers, and CSS hacks for presentations [38][40][27][45][41]. He critiques xAI API pricing/docs fragmentation while seeing promise in read access for agents, and reflects on platform design for AI legibility [4][5][6]. This minimalist ethos underpins everything from autograd to full training/inference stacks [20][21].

Democratizing Deep Learning Education

Building intuition via minimal from-scratch implementations of autograd, MLPs, Transformers, tokenizers, and training pipelines rather than black-box usage.

  • micrograd, nanoGPT, nn-zero-to-hero, makemore series demonstrate backprop, init, BatchNorm, attention from basics [17][19][25][30][32-37]

  • assumes minimal prereqs; prioritizes understanding internals over production efficiency [17][26]

LLMs as Lossy Knowledge Compressors and Bases

LLMs distill internet text into parameters acting as compiled, lossy knowledge repositories; base models simulate documents while post-training creates assistants.

  • Compilation Thesis: weights as primary interface replacing search/wikis, RAG patches gaps [1]

  • pretraining on FineWeb, ~100x compression, stochastic simulators [23][28]

  • CoT as context compaction inheriting wiki properties [10]

Rise of AI Agents and Conversational Software

Agents will replace CRUD apps, dashboards, and human PRs with intent-understanding conversational workflows; shift from sharing code to sharing abstract ideas.

  • UI of future is conversation; agents execute multi-step tasks [2]

  • PRs become 'prompt requests'; agents customize implementations [11][12]

  • superior at robust tasks like diverse EPUB->markdown [13]

Explicit Personal Wikis and Data Sovereignty

Local, navigable markdown/image wikis (Farzapedia) generated by LLMs provide user-controlled, interoperable memory superior to implicit proprietary systems; enables BYOAI and agent integration.

  • explicit vs implicit memory; full control, Obsidian/Unix compatibility [7]

  • ingest documents into persistent KB; agents manage it; vague gists for customization [12][1]

Software 2.0, Scaling, and Emergence

Neural nets as differentiable computers optimized on data (not hand-coded rules); emergence from scale explains Tesla success and potential universe-solving AI.

  • Tesla vision-only via massive curated data, end-to-end nets [29]

  • evolution as bootloader; plausible abiogenesis and Fermi resolutions [26]

  • early CV scaling via ImageNet, DenseCap, multimodal models [49][42][48]

Reasoning Patterns and Critical Thinking

AI's greatest value is structured debate (steelmanning counter-arguments) and directed compaction (CoT as wiki-like reduction) to expose blind spots and improve reasoning.

  • argument/counter-argument discovery more valuable than confirmation [3]

  • CoT as reduction alongside attention for guided summarization [10]

Practical Tooling, Efficiency, and Platform Critique

Minimalist open implementations (tokenizers, LSTMs, git tools) combined with reflections on API costs, documentation, community quality, and societal legibility.

  • gcm for AI git commits, minbpe, llama2.c, policy gradient debugging [16][21][20][40]

  • xAI API pricing/docs critique, Gists vs X comments, gov data legibility via AI [4][5][6][9]

tool · 64 mentions
tool · by Anthropic · 52 mentions
perplexity
tool · 45 mentions
tool · by Greg Brockman · 30 mentions
tool · by Meta / Soumith Chintala · 11 mentions
codeex
tool · 10 mentions
tool · 10 mentions
tool · 9 mentions
repo · 7 mentions
tool · 7 mentions
tool · 7 mentions
tool · by OpenAI · 7 mentions
tool · by Tobi Lütke · 6 mentions
app · 6 mentions
tool · by HuggingFace · 6 mentions
repo · by Andreij Karpathy · 5 mentions
tool · 4 mentions
npm
tool · 4 mentions
gpt5-pro
tool · by Greg Brockman · 3 mentions
dummys-guide
tool · 3 mentions

Every entry that fed the multi-agent compile above. Inline citation markers in the wiki text (like [1], [2]) are not yet individually linked to specific sources — this is the full set of sources the compile considered.

  1. Navigating the Open-Source Note-Taking Ecosystem for Privacy and Efficiencyyoutube · 2026-04-09
  2. LLMs as a Tool for Knowledge Curation, Not Creationtweet · 2026-04-06
  3. LLMs as Knowledge Bases: The Compilation Thesistweet · 2026-04-06
  4. AI Agents Will Replace Traditional Softwaretweet · 2026-04-05
  5. The Argument/Counter-Argument Discovery Patterntweet · 2026-04-05
  6. Karpathy Advocates Cheaper AI Read Access and Costly Write Endpoints for X Platformtweet · 2026-04-05
  7. xAI Read API Promising but Hindered by High Costs and Fragmented Docstweet · 2026-04-05
  8. GitHub Gists Outshine X in Comment Quality Due to Community and Formattweet · 2026-04-05
  9. Farzapedia Exemplifies Explicit, User-Controlled Personalization via Local Wiki Filestweet · 2026-04-04
  10. Karpathy Endorses Peter Xing's AI Research as 'Incredible'tweet · 2026-04-04
  11. AI Empowers Citizens to Reverse Government Legibility for Enhanced Accountabilitytweet · 2026-04-04
  12. Chain-of-Thought as Directed Context Compaction via Reduction, Echoing Wiki Structurestweet · 2026-04-04
  13. Shift PRs to "Prompt Requests" for AI Agents, Bypassing Messy Human-Generated Codetweet · 2026-04-04
  14. LLM Agents Shift Sharing from Code to Abstract Ideas for Custom Knowledge Base Buildstweet · 2026-04-04
  15. LLM-Powered Persistent Knowledge Bases: An Alternative to RAGgithub_gist · 2026-04-04
  16. AI Agents Excel at Converting Diverse EPUB Formats to Clean Markdowntweet · 2026-04-04
  17. nanochat: Optimizing Micro-LLM Training Pipelines for Extreme Cost-Efficiencygithub_readme · 2026-03-27
  18. Autonomous AI Agents for LLM Research and Optimizationgithub_readme · 2026-03-26
  19. The Future of Engineering in the Age of AI Agentsyoutube · 2026-03-20
  20. Bibby AI Redefines LaTeX Editing with Native AI Integration, Outperforming Overleaf and OpenAI Prismpaper · 2026-02-18
  21. Deconstructing GPT Architecture: From Atomic Implementation to Metaweight Heuristicsgithub_gist · 2026-02-11
  22. LLM Council: A Multi-Model Consensus Systemgithub_readme · 2025-11-22
  23. nanoGPT: A Minimalist Framework for GPT Model Training and Finetuninggithub_readme · 2025-11-12
  24. Andrej Karpathy on the "Decade of Agents" and Future of AIyoutube · 2025-10-17
  25. Karpathy's llm.c: GPT-2/3 Pretraining in Pure C/CUDA, Outpacing PyTorch Nightlygithub_readme · 2025-06-26
  26. Software Evolution: From Code to Programmable LLMs and Partial Autonomyyoutube · 2025-06-19
  27. GPT-4o: End-to-End Multimodal Model Achieving Human-Like Audio Latency and Superior Non-English Performancepaper · 2024-10-25
  28. Andrej Karpathy on the State of AI, Self-Driving, and Human-AI Educationyoutube · 2024-09-05
  29. AI-Powered Git Commit Message Generator via Shell Functiongithub_gist · 2024-08-25
  30. Karpathy's Hands-On Neural Networks Course: From Backprop Basics to GPT Implementationgithub_readme · 2024-08-18
  31. minGPT: Compact PyTorch GPT Reimplementation for Education and Experimentationgithub_readme · 2024-08-15
  32. Micrograd: Tiny 150-Line Autograd Engine Enables Full Neural Net Traininggithub_readme · 2024-08-08
  33. llama2.c: Minimal C Implementation for Training and Inferencing Tiny Llama 2 Models on Narrow Domainsgithub_readme · 2024-08-06
  34. minbpe: Compact BPE Tokenizers Reproducing GPT-4 with Trainable Implementationsgithub_readme · 2024-07-01
  35. Navigating the AI Ecosystem: Insights from Andrej Karpathyyoutube · 2024-03-26
  36. PyTorch Linear Layer Uses Fused addmm Only for 2D Inputs with Bias, Potentially Explaining Batched Input Discrepanciesgithub_gist · 2023-06-15
  37. LLMs as Token Stream Collaborators: Practical Tools, Models, and Modalities for Everyday Useyoutube · 2023-01-01
  38. Reproducing GPT-2 124M: From Scratch Implementation, Weight Loading, and Optimized Training in PyTorchyoutube · 2023-01-01
  39. LLM Pipeline: From Internet Text to Token Prediction Base Models and Post-Training into Assistantsyoutube · 2023-01-01
  40. Neural Nets as Software 2.0: Emergent Intelligence Bootloads Universe-Solving AI Amid Plausible Abiogenesis and Fermi Resolutionsyoutube · 2022-10-29
  41. Slerp Interpolation of Stable Diffusion Latents Generates Hypnotic Text-to-Video Sequencesgithub_gist · 2022-08-16
  42. LLMs as Lossy Internet Compressors: From Two-File Inference to OS-Like Tool Orchestration Amid Security Risksyoutube · 2022-01-01
  43. Deep Learning Scales Self-Driving Through Massive Data Curation, Not Algorithm Inventionyoutube · 2021-09-21
  44. Manual Backpropagation Demystifies PyTorch Autograd for Robust Neural Net Debuggingyoutube · 2021-01-01
  45. Auto-Ingestion Captures Only Audio Placeholders from Karpathy Videoyoutube · 2021-01-01
  46. From Bigram to Transformer: Building a Shakespeare-Generating NanoGPT from Scratchyoutube · 2021-01-01
  47. Micrograd: Scalar Autograd Engine Implements Backpropagation in 100 Lines, Core of Neural Net Trainingyoutube · 2021-01-01
  48. Proper Neural Net Initialization and Batch Normalization Stabilize Activations and Gradients for Reliable Trainingyoutube · 2021-01-01
  49. Multi-Layer Perceptron Scales Character-Level Language Modeling Beyond Bigram Limitationsyoutube · 2021-01-01
  50. Building Character-Level Bigram Language Models with PyTorch: From Counting to Neural Netsyoutube · 2021-01-01
  51. Hierarchical MLP Evolves into WaveNet-like Architecture for Character-Level Language Modelingyoutube · 2021-01-01
  52. NES Demonstrates Efficient Black-Box Optimization via Gaussian Perturbations and Standardized Reward Gradientsgithub_gist · 2017-03-22
  53. PixelCNN++ Enhances Original PixelCNN via Discretized Logistic Mixtures and Architectural Simplifications for Superior Generative Performancepaper · 2017-01-19
  54. Common TF Policy Gradient Pitfalls: Action Sampling, Loss Weighting, and State Initialization Derail Pong Traininggithub_gist · 2016-05-30
  55. Karpathy's CSS Hack Enlarges Next Slide Preview in Google Slides Presenter Viewgithub_gist · 2016-02-01
  56. DenseCap Introduces Fully Convolutional Networks for Joint Image Region Localization and Captioningpaper · 2015-11-24
  57. Minimal NumPy RNN for Character-Level Language Modeling with Adagrad Updates Modifying Globals via Mutable Referencesgithub_gist · 2015-07-26
  58. LSTM Cells Track Long-Range Dependencies in Character-Level Language Modelspaper · 2015-06-05
  59. Custom Torch nn.L2Normalize Layer for Batched Unit Vector Normalization with Backpropgithub_gist · 2015-05-05
  60. Optimized LSTM Cell in Torch via nngraph for GEMM Efficiencygithub_gist · 2015-05-05
  61. Batched LSTM Forward/Backward with Verified Numerical Correctnessgithub_gist · 2015-04-11
  62. Multimodal CNN-RNN Alignment Achieves SOTA for Image-Region Captioningpaper · 2014-12-07
  63. ImageNet Challenge: Enabling Large-Scale Object Recognition Advances Through Massive Annotated Datasetpaper · 2014-09-01
  64. Fragment-Level Embeddings Boost Bidirectional Image-Sentence Retrievalpaper · 2014-06-22