Chronological feed of everything captured from Andrej Karpathy.
youtube / karpathy / 2d ago
The video critiques commercial note-taking applications like Notion for their data privacy implications and bloat, advocating for open-source, non-commercial, and plain-text alternatives. It explores various tools and their trade-offs regarding features, user-friendliness, customization, and performance across different operating environments. The author ultimately lands on Neovim with specific plugins as a highly customizable yet challenging solution for plain-text note-taking, highlighting the perpetual quest for an ideal, distraction-free note-taking system versus the practicalities of productivity.
note-taking-appsproductivity-toolsopen-source-softwaretext-editorsdata-privacyself-hostingemacsneovimmarkdownknowledge-management
“Commercial note-taking apps like Notion are problematic due to data privacy concerns and vendor lock-in.”
tweet / @karpathy / 5d ago
Large Language Models (LLMs) can effectively summarize and contextualize information, reducing the need for manual writing but not replacing the critical processes of reading and analytical thought. This approach facilitates efficient knowledge integration into existing systems like wikis, by providing LLM-generated summaries and contextual analyses that augment human understanding.
llm-workflowinformation-processingknowledge-managementai-toolsreading-comprehension
“LLMs allow users to bypass the writing process.”
tweet / 5d ago / failed
tweet / @karpathy / 5d ago
Karpathy argues that LLMs are becoming the primary interface for accessing compiled human knowledge, replacing search engines and wikis. The model weights themselves function as a lossy compression of the internet's knowledge, and retrieval-augmented generation patches the gaps.
llmknowledge-managementai-agentssearch
“LLMs are compressed knowledge bases that can be queried conversationally”
tweet / @karpathy / 5d ago
Karpathy predicts that most traditional CRUD software will be replaced by AI agents that understand intent and execute multi-step workflows. The UI of the future is a conversation, not a dashboard.
ai-agentssoftware-engineeringuxautomation
“Most SaaS products are glorified CRUD apps that AI agents can replace entirely”
tweet / @karpathy / 5d ago
Karpathy observed that the most useful output from AI isn't answers but structured argument/counter-argument pairs that expose blind spots. Having an AI steelman the opposing view on any claim is more valuable than having it confirm your priors.
critical-thinkingai-reasoningepistemologyknowledge-management
“AI's highest-value use is generating structured counter-arguments to your beliefs”
tweet / @karpathy / 6d ago
Andrej Karpathy notes the unchecked growth in AI activity on X, proposing cheaper pricing for Read endpoints and significantly higher costs for Write endpoints to manage it. He regrets the excessive attention from AI and clarifies his mentioned project involved only reads, no writes. He emphasizes X's valuable data and the benefits of enhancing platform legibility for AI agents via read access.
ai-activityx-apiread-endpointswrite-endpointsplatform-legibilitytwitter-data
“AI activity on X has been growing out of control”
tweet / @karpathy / 6d ago
Andrej Karpathy views xAI's Read API as a positive direction but criticizes its excessive pricing, citing $200 spent in 30 minutes of experimentation. Documentation is fragmented across short pages, complicating agent integration and lacking a comprehensive intro or mentions of XMCP. Better structured docs via markdown or curl-accessible overviews are recommended.
api-pricingapi-documentationxai-apigrok-apideveloper-feedbackkarpathyagent-development
“xAI's Read API is a good direction for read endpoints”
tweet / @karpathy / 6d ago
Andrej Karpathy observes that comments on GitHub Gists are notably more helpful, insightful, constructive, and less AI-generated compared to other platforms like X. He attributes this potentially to the distinct user community, markdown format, or lack of incentives driving low-quality interactions. This prompts him to consider using Gists more and suggests GitHub compete with X in this space.
github-gistscomments-qualityai-generated-contentplatform-comparisondeveloper-communitytech-incentives
“Comments on GitHub Gists are more helpful than on other platforms”
tweet / @karpathy / 6d ago
Farzapedia implements personalization by maintaining an explicit, navigable wiki of user knowledge generated by LLMs, stored locally in universal file formats like markdown and images. This contrasts with implicit, provider-locked memory in proprietary AI systems, enabling full user control, interoperability with Unix tools and apps like Obsidian, and flexibility to plug in any AI model including fine-tuned open-source ones. Agent proficiency simplifies management, positioning file-based memory as a superior, future-proof alternative.
personalizationllm-wikilocal-datafile-over-appbyoaiagent-skillsdata-sovereignty
“Personal wiki memory is explicit and navigable, allowing users to inspect and manage what the AI knows.”
tweet / @karpathy / 6d ago
Andrej Karpathy publicly praised work by @peterxing and @SOSOHAJALAB on X. The endorsement uses the term "Incredible work :D", signaling high approval from a leading AI figure. This highlights emerging contributions in AI likely warranting further technical scrutiny.
andrej-karpathytwitter-reactionpraiseai-researchml-community
“Andrej Karpathy expressed positive approval of work by @peterxing and @SOSOHAJALAB”
tweet / @karpathy / 6d ago
AI enables citizens to process vast government data—such as bills, budgets, and disclosures—overcoming historical intelligence bottlenecks that limited accountability to elite professionals. This reverses the traditional dynamic where states impose legibility on society, allowing detailed tracking of spending, legislation diffs, voting patterns, lobbying graphs, procurement, and local governance. While risks of misuse exist, increased participation should strengthen democratic transparency.
ai-empowermentgovernment-transparencygovernment-accountabilitydemocratic-participationdata-legibilitypublic-oversight
“Government accountability has been constrained by intelligence to process raw data, not access to it.”
tweet / @karpathy / 7d ago
Chain-of-thought prompting functions as a reduction operation, alongside attention, enabling directed compaction of context in language models. This mechanism inherits structural properties from wikis, providing a more guided form of information summarization. It enhances model reasoning by progressively distilling expansive context into focused insights.
chain-of-thoughtcontext-compactionattention-mechanismtransformersai-architecture
“Chain of thought is a reduce operation”
tweet / @karpathy / 7d ago
Peter Steinberger proposes redefining PRs as "prompt requests," where users submit high-level ideas directly to AI agents capable of precise implementation. This eliminates the prevalent practice of using free-tier ChatGPT to produce suboptimal, vibe-coded messes submitted as PRs. The approach leverages agentic AI strengths for cleaner, more efficient development workflows.
ai-agentsprompt-engineeringpull-requestscoding-practicesllm-developmentstartup-humor
“PR should stand for 'prompt request' instead of traditional pull request.”
tweet / @karpathy / 7d ago
In the LLM agent era, sharing abstract ideas like personal LLM knowledge bases replaces sharing specific code, as agents customize implementations to user needs. Karpathy's viral tweet idea, reformatted as a gist, ingests documents into markdown/image-stored knowledge for research, redirecting token usage from code to knowledge manipulation. The latest LLMs excel at this, with the gist left vague to enable diverse agent-driven adaptations.
llm-agentspersonal-knowledge-baseidea-sharingllm-wikiknowledge-ingestionai-productivity
“Sharing specific code or apps is less necessary in the LLM agent era because agents can customize abstract ideas to specific needs.”
github_gist / karpathy / 7d ago
This article outlines a novel approach to knowledge management using LLMs to incrementally build and maintain a persistent, structured wiki. Unlike traditional RAG systems that re-derive knowledge, this method emphasizes continuous integration of new information, updating existing knowledge graphs, and flagging contradictions. This shifts the LLM's role from a query-time retriever to an active knowledge base curator, significantly reducing maintenance overhead and enabling more sophisticated, compounding insights over time.
llm-agentsknowledge-managementpersonal-knowledge-baseproductivity-workflowsai-agents
“The core idea proposes LLMs incrementally build and maintain a persistent wiki as an alternative to RAG.”
tweet / @karpathy / 7d ago
Andrej Karpathy identifies AI agents as the superior method for converting EPUB files to text, outperforming dedicated tools due to EPUBs' structural diversity. Agents autonomously parse varied formats, generate markdown output, and verify visual and functional quality. This approach leverages agentic reasoning for robust handling of non-standard inputs.
epub-conversionai-agentsfile-processingtext-extractionkarpathy-feedproductivity-hack
“AI agents are the best EPUB to TXT converter”
github_readme / karpathy / 15d ago
nanochat provides a minimal, end-to-end harness for training compute-optimal micro-LLMs on single GPU nodes, reducing the cost of GPT-2 grade capability from ~$43k in 2019 to under $100. The framework simplifies scaling by using a single 'depth' parameter to automatically derive all other optimal hyperparameters, focusing on minimizing the wall-clock time to achieve a specific DCLM CORE score.
llm-trainingopen-source-llmgpu-optimizationllm-finetuningai-chatbotsdeveloper-tools
“Training a model with GPT-2 capabilities now costs approximately $48 on an 8XH100 GPU node.”
github_readme / karpathy / 16d ago
This project, "autoresearch," demonstrates a novel approach to large language model (LLM) development by employing autonomous AI agents. These agents iterate on LLM training code, specifically `train.py`, within a fixed 5-minute time budget per experiment. The goal is to optimize model performance, measured by validation bits per byte (val_bpb), by autonomously modifying architectural and hyperparameter settings based on experimental results.
ai-agentsllm-trainingautonomous-researchmachine-learning-engineeringandrej-karpathynanochatsoftware-development
“AI agents can autonomously perform LLM research by iteratively modifying training code.”
youtube / karpathy / 22d ago
Andrej Karpathy discusses the profound shift in software engineering due to AI agents, moving from direct coding to orchestrating agents. He emphasizes the current "AI psychosis" driven by the rapid increase in capabilities and the need for individuals and organizations to adapt to this new paradigm. The focus is now on maximizing agent throughput and leveraging macro-actions, rather than traditional coding, leading to a "skill issue" in effectively utilizing these powerful tools. This shift suggests a future where agents handle much of the technical execution, allowing humans to focus on higher-level strategy and objective definition.
ai-agentsllm-developmentautonomous-systemssoftware-engineeringai-impactfuture-of-workopen-source-ai
“AI agents have dramatically changed the software engineering workflow, reducing direct coding to a minimal percentage.”
paper / karpathy / Feb 18
Bibby AI is a native AI-first LaTeX editor that integrates tools like writing assistance, smart citation search, AI-generated tables/equations, paper reviewing, abstract generation, literature review drafting, deep research assistance, and real-time error detection/fix into a single interface. It introduces LaTeXBench-500, a benchmark of 500 real-world LaTeX compilation errors across six categories. Bibby achieves 91.4% error detection accuracy and 83.7% one-click fix accuracy, surpassing Overleaf's 61.2% detection and OpenAI Prism's 78.3% detection / 64.1% fix rates.
ai-latex-editorbibby-aioverleaf-alternativelatex-benchmarkacademic-writing-aiarxiv-paperandrej-karpathy
“Bibby AI embeds native AI tools including writing assistant, smart citation search, AI table/equation generation, paper reviewer, abstract generator, literature review drafter, deep research assistant, and real-time LaTeX error detection/auto-fix without plugins or copy-paste.”
blog / karpathy / Feb 12 / failed
microGPT implements a full GPT-2-like transformer in 200 lines of dependency-free Python, including dataset handling, character-level tokenizer, scalar autograd engine, multi-head attention architecture, Adam optimizer, training loop on names dataset, and autoregressive sampling. The model has 4,192 parameters (n_embd=16, n_head=4, n_layer=1), trains in ~1,000 steps from loss 3.3 to 2.37, and generates plausible names using KV cache during both training and inference. It distills the algorithmic core of production LLMs, emphasizing that scaling involves tensorization, larger datasets/models, and engineering optimizations without altering the fundamental next-token prediction loop.
microgptfrom-scratch-gptautograd-enginetransformer-architecturellm-trainingkarpathy-blogeducational-code
“microGPT is a single 200-line Python file with no external dependencies that fully trains and runs a GPT model.”
github_gist / karpathy / Feb 11
The provided content contrasts a 'microgpt' implementation—a dependency-free, scalar-based autograd engine implementing a GPT-2 style transformer—with 'PostGPT' and 'microKarpathy', which explore non-gradient-based text generation. These derivatives replace traditional training with co-occurrence statistics, hash-embedding cosine similarity, and deterministic random projections to navigate semantic spaces.
micro-llmstransformer-architecturepython-implementationslanguage-generationai-experimentsno-dependenciesconceptual-models
“A fully functional GPT can be implemented in pure Python without external libraries by building a custom scalar autograd engine.”
github_readme / karpathy / Nov 22
The "LLM Council" is a web application designed to leverage multiple large language models (LLMs) for enhanced query responses. It operates by having several LLMs independently answer a query, then critically review and rank each other's responses, and finally, a designated "Chairman" LLM synthesizes these into a single, comprehensive answer. This approach aims to improve the accuracy and insight of LLM outputs by incorporating diverse perspectives and internal critique.
llm-evaluationmulti-agent-systemsopenrouteragentic-workflowssoftware-development
“The LLM Council system uses multiple LLMs to generate and evaluate responses to a user query.”
github_readme / karpathy / Nov 12
nanoGPT offers a simplified and efficient codebase for training and finetuning medium-sized GPT models. It provides a highly readable and hackable architecture, enabling users to reproduce GPT-2 performance on OpenWebText with readily available hardware. The project, while deprecated in favor of nanochat, remains a valuable resource for understanding core GPT mechanics and experimentation.
gpt-modelsllm-trainingmachine-learning-frameworknatural-language-processingcode-exampledeveloper-toolsgpu-acceleration
“nanoGPT is a simplified and efficient repository for training and finetuning medium-sized GPT models.”
youtube / karpathy / Oct 17
Andrej Karpathy argues that the current state of AI agents is impressive but nascent, predicting a "decade of agents" due to significant remaining challenges in achieving human-like cognitive abilities. He emphasizes that current LLMs, while powerful, suffer from inherent limitations like "model collapse" and an over-reliance on memorization, hindering true intelligence. Karpathy advocates for educational reform, proposing "Eureka" as an initiative to build highly effective, AI-augmented "ramps to knowledge" to empower human learning alongside AI advancements.
ai-agentsllmsai-educationai-safetyagi-timelinesself-drivinghuman-computer-interaction
“AI development is in the 'decade of agents,' not the 'year of agents,' signifying substantial work ahead before achieving human-like cognitive abilities.”
github_readme / karpathy / Jun 26
llm.c is Andrej Karpathy's minimal C/CUDA implementation of LLM pretraining, targeting GPT-2 and GPT-3 reproduction without the overhead of PyTorch (245MB) or CPython (107MB). The project is currently ~7% faster than PyTorch Nightly on its primary CUDA path, while also maintaining a clean ~1,000-line CPU fp32 reference implementation for educational use. The design philosophy explicitly trades marginal performance gains for code simplicity and readability in the mainline, pushing complex or experimental kernels to a separate dev/ directory. Multi-GPU and multi-node training are supported via MPI and NCCL, and the project has spawned ports across more than a dozen languages and compute backends.
llm-trainingcudaopen-sourcegpt-2low-level-mlgpu-computingandrej-karpathy
“llm.c is approximately 7% faster than PyTorch Nightly on the primary CUDA training path.”
youtube / karpathy / Jun 19
Software development is undergoing a fundamental shift, moving beyond traditional code (Software 1.0) and neural network weights (Software 2.0) to programmable Large Language Models (LLMs) as 'Software 3.0'. LLMs exhibit characteristics of utilities, fabs, and especially operating systems, but are fundamentally fallible 'people spirits'. The future of software development involves building partially autonomous applications that leverage LLMs while keeping humans in the loop for verification, and adapting infrastructure for direct agent interaction.
software-developmentllm-applicationsai-agentsprogramming-paradigmshuman-computer-interactionai-infrastructuredeveloper-tools
“Software is evolving through distinct paradigms: Software 1.0 (explicit code), Software 2.0 (neural network weights), and Software 3.0 (programmable LLMs).”
paper / karpathy / Oct 25
GPT-4o is a unified autoregressive model trained end-to-end on text, vision, and audio, handling any combination of text, audio, image, and video inputs to produce text, audio, and image outputs via a single neural network. It responds to audio in 232-320 ms, matching human conversational latency, while equaling GPT-4 Turbo on English text and code but excelling in non-English languages, vision, and audio understanding at 50% lower API cost and higher speed. The system card details capabilities, safety evaluations via OpenAI's Preparedness Framework, third-party dangerous capability audits, and societal impact assessments, with emphasis on speech-to-speech interactions.
gpt-4oopenaisystem-cardmultimodal-modelai-safetymodel-evaluationarxiv-paper
“GPT-4o responds to audio inputs in as little as 232 milliseconds, averaging 320 milliseconds”
youtube / karpathy / Sep 5
Andrej Karpathy discusses the current state of AI, highlighting Teslas self-driving approach as superior to Waymos due to its vision-only system and end-to-end deep learning. He emphasizes the Transformer architecture as a foundational breakthrough, with current AI bottlenecks shifting from architecture to data sets and loss functions. Karpathy also outlines his vision for AI in education, focusing on enabling personalized, scalable learning experiences.
ai-researchroboticsai-educationagihumanoid-robotsself-driving-carssynthetic-data
“Tesla's self-driving technology is ahead of Waymo's, despite appearances.”
github_gist / karpathy / Aug 25
Andrej Karpathy's gist provides a bash/zsh function `gcm` that captures staged git diffs, pipes them to an LLM via the `llm` CLI for concise commit message generation, and offers interactive options to accept, edit, regenerate, or cancel. Community contributions extend it with gitconfig aliases, VSCode keybindings, alternative LLMs like Gemini and local Ollama models, and conventional commit formatting. Requires `llm` tool installation with OpenAI API key; handles Oh My Zsh alias conflicts via unaliasing.
git-commitai-toolingllm-integrationdeveloper-productivityshell-scriptandrej-karpathyopenai-llm
“The `gcm` function uses `git diff --cached` to capture staged changes and sends them to an LLM for generating a one-line commit message.”
github_readme / karpathy / Aug 18
Andrej Karpathy's "Neural Networks: Zero to Hero" provides a video series with Jupyter notebooks implementing neural networks from scratch, starting with micrograd for backpropagation, progressing through MLP and CNN language models via makemore, and culminating in a full GPT. Lectures emphasize tensor operations in PyTorch, training diagnostics like activations/gradients/BatchNorm, manual backpropagation, and tokenizer mechanics. Assumes minimal prerequisites (Python, basic calculus), building intuition for modern architectures like Transformers.
neural-networksbackpropagationlanguage-modelingpytorchtransformersandrej-karpathymicrograd
“Course begins with building micrograd to implement backpropagation for neural networks”
github_readme / karpathy / Aug 15
minGPT provides a minimal ~300-line PyTorch implementation of the GPT Transformer model, supporting both training and inference with OpenAI's GPT-2 configuration (124M params, 1024 context, 50k vocab). It includes a refactored BPE tokenizer and generic trainer, demonstrated on tasks like addition and character-level modeling. Now semi-archived in favor of nanoGPT, it prioritizes interpretability over production efficiency.
mingptgpt-implementationpytorchtransformerandrej-karpathynano-gptlanguage-modeling
“minGPT's core Transformer model is implemented in approximately 300 lines of code”
github_readme / karpathy / Aug 8
Micrograd implements reverse-mode autodiff via backpropagation over a scalar-only DAG in ~100 lines, supporting a PyTorch-like API for a ~50-line neural net library. It handles core operations like add, mul, pow, relu, enabling construction of deep nets for tasks like binary classification on moon dataset via SGD. Demo shows 2-layer MLP with 16-node hidden layers achieving effective decision boundaries; includes graphviz tracing and PyTorch-validated tests.
autograd-enginebackpropagationneural-networksmicrogradeducational-codeautodiffpytorch-like
“micrograd's autograd engine is implemented in about 100 lines of code”
github_readme / karpathy / Aug 6
llama2.c provides a full-stack PyTorch training and pure C inference solution for Llama 2 architecture in under 700 lines, targeting small models (15M-110M params) trained on TinyStories that generate coherent stories at 110 tok/s on M1 Mac. It supports loading Meta's 7B Llama 2 models in fp32 (4 tok/s) with int8 quantization reducing size 4x and speeding up 3x to 14 tok/s via integer matmuls. Emphasizes simplicity for edge deployment, custom tokenizers, and easy forking over maximal efficiency.
llama2c-implementationllm-inferencemodel-trainingquantizationtinyllamaskarpathy
“15M parameter Llama 2 model trained on TinyStories runs inference at ~110 tokens/s on M1 MacBook Air”
github_readme / karpathy / Jul 1
minbpe provides minimal Python implementations of byte-level BPE tokenizers, including BasicTokenizer for direct text processing, RegexTokenizer with GPT-2-style preprocessing to prevent cross-category merges, and GPT4Tokenizer exactly matching OpenAI's tiktoken cl100k_base encoding. All support training on custom text, encoding/decoding, special token handling, and model persistence. Demonstrates identical tokenization for mixed-language/special token inputs and enables reproduction of production LLMs like GPT-4 via large-scale training.
bpe-tokenizerbyte-pair-encodingllm-tokenizationminbpekarpathygpt-tokenizertokenizer-training
“GPT4Tokenizer produces identical token sequences to tiktoken's cl100k_base for inputs like 'hello123!!!? (안녕하세요!) 😉'”
youtube / karpathy / Mar 26
Andrej Karpathy discusses the current and future landscape of AI, highlighting the pervasive "LLM OS" paradigm, where large language models act as central processing units with various modalities as peripherals. He addresses the competitive dynamics between proprietary and open-source models, emphasizing the critical role of scale in AI development, yet acknowledging the nuanced importance of infrastructure expertise, algorithmic refinement, and data curation. Karpathy also touches on the unique management style of Elon Musk at Tesla and his personal commitment to fostering a healthy and vibrant AI ecosystem.
ai-ecosystemllm-osopenai-strategyai-startupsdeep-learning-researchelon-musk-leadershipai-ethics-governance
“The AI industry is converging on an "LLM OS" paradigm, where LLMs function as a central processing unit (CPU) for various modalities (text, images, audio) and integrate with existing software infrastructure.”
github_gist / karpathy / Jun 15
PyTorch's linear function employs a fused addmm operation exclusively for 2D inputs (batch size 1) when bias is defined, opting for separate matmul and addition otherwise. This optimization targets single-sample efficiency but skips higher-dimensional batched inputs. Karpathy questions if this conditional fusion causes performance differences between batched and non-batched cases.
pytorchlinear-layertorch-sourcefused-opsbatched-inputsandrej-karpathy
“PyTorch linear uses fused addmm only when input is exactly 2D and bias is defined”
youtube / karpathy / Jan 1
Large language models begin with pre-training on filtered internet text like FineWeb (44TB, 15T tokens), tokenized via BPE into ~100k vocabulary symbols (e.g., GPT-4's cl100k_base), then trained as Transformers to predict next tokens in windows up to 8k-1M length via gradient updates on prediction loss. The resulting base model is a stochastic token simulator compressing internet statistics into billions/trillions of parameters, capable of regurgitation, hallucination, and in-context learning but not instruction-following. Post-training on human/synthetic conversation datasets (e.g., InstructGPT, UltraChat) encodes dialogues with special tokens, fine-tunes for helpful/truthful/harmless responses imitating labelers, and adds tools like web search to mitigate hallucinations by refreshing context window working memory.
large-language-modelsllm-trainingpre-trainingpost-trainingtokenizationtransformersllm-inference
“FineWeb dataset, after aggressive filtering of Common Crawl (2.7B pages), yields 44TB of high-quality English text equating to 15 trillion tokens.”
youtube / karpathy / Jan 1
Andrej Karpathy details a from-scratch PyTorch reimplementation of GPT-2's 124M parameter model, matching OpenAI's architecture including 12 decoder-only transformer layers, 768 dimensions, 12 heads, GELU activation, pre-norm, and weight tying between token embeddings and LM head. He loads pretrained weights via Hugging Face Transformers for validation, generates coherent text, and initializes randomly with GPT-2-specific schemes (std=0.02, residual scaling by 1/sqrt(2*n_layers)). Training on Tiny Shakespeare uses AdamW, mixed precision (TF32/BF16 via torch.autocast), torch.compile for acceleration, achieving ~55k tokens/sec on A100 GPU with batch=16, seq=1024, targeting validation loss below original GPT-2 in ~1 hour/$10 cloud compute.
gpt2-reproductiontransformer-implementationpytorch-tutorialmixed-precisionmodel-trainingandrej-karpathyneural-network-init
“GPT-2 124M has 12 transformer layers, 768 embedding dimensions, 12 attention heads, vocab size 50257, max sequence length 1024.”
youtube / karpathy / Jan 1
LLMs operate as self-contained neural networks processing one-dimensional token streams in a shared context window, with pre-training compressing internet knowledge and post-training instilling assistant personas; interactions build this window via text exchanges, resettable per new chat to optimize performance and cost. Advanced features include "thinking" models via RL for complex math/code, tool integrations like web search, Python interpreters, file uploads, and deep research for synthesizing reports from sources. Multimodal extensions handle native audio (advanced voice modes), images/videos via tokenization, and specialized apps like Cursor for codebase editing or NotebookLM for custom podcasts, emphasizing model selection, tiered pricing, and cautious verification to mitigate hallucinations.
llm-tutorialchatgpt-guidetool-usereasoning-modelsmultimodal-llmsllm-ecosystempractical-ai
“ChatGPT interactions build a shared one-dimensional token sequence in the context window, acting as the model's working memory.”
youtube / karpathy / Oct 29
Neural networks are simple mathematical expressions—sequences of matrix multiplies and nonlinearities with trainable parameters—that yield surprising emergent behaviors when scaled and optimized on massive datasets, functioning as general-purpose differentiable computers exemplified by the Transformer architecture. Karpathy views biological evolution as a bootloader for inefficient human computation, transitioning to efficient synthetic AIs trained via next-token prediction, which compress world knowledge and enable in-context problem-solving. He posits life arises plausibly from basic chemistry at alkaline vents, resolving Fermi Paradox via undetectable interstellar distances and hard travel, with AIs potentially exploiting physics "bugs" to solve the universe's computational puzzle.
neural-networkstransformersai-emergencefermi-paradoxorigin-of-lifesoftware-2.0autopilot-vision
“Neural networks are fundamentally a sequence of matrix multiplies (dot products) and nonlinearities with trainable knobs analogous to synapses.”
github_gist / karpathy / Aug 16
Karpathy's script generates smooth video animations by sampling random latent noise pairs, performing spherical linear interpolation (slerp) between them over multiple steps, and decoding conditioned Stable Diffusion latents at each interpolation point using the diffusers pipeline with classifier-free guidance. The `diffuse` function handles denoising with support for DDIM/LMS schedulers, CFG at 7.5 scale, and autocast for FP16 acceleration. Videos are stitched from sequential JPEG frames using ffmpeg, enabling endless "dreaming" walks through the latent space for prompts like "blueberry spaghetti".
stable-diffusionvideo-generationslerp-interpolationdiffusers-librarykarpathy-codeai-artgenerative-models
“Slerp interpolation between random latents produces smooth transitions in Stable Diffusion generations”
blog / karpathy / Mar 14 / failed
Karpathy reproduces LeCun et al.'s 1989 backprop-trained convnet on 16x16 digit images, achieving rough match to reported 5% test error using PyTorch, with 3000x training speedup on M1 CPU vs. original SUN-4. Applying 33 years of DL advances—CrossEntropy loss, AdamW, data aug, dropout, ReLU—cuts test errors ~60% to 1.5% at same model scale/latency. Reflections project future as macro-similar but 10M x larger models/datasets, trained in minutes, shifting to foundation model finetuning over task-specific training.
deep-learning-historybackpropagationneural-network-reproductionlecun-1989pytorch-implementationai-progressmodel-optimization
“LeCun 1989 paper is earliest real-world end-to-end backprop neural net on 7291 16x16 digit images with ~5% test error”
youtube / karpathy / Jan 1
Large language models (LLMs) like Llama 2 70B are distilled into a 140GB parameters file and ~500 lines of C code for offline inference on consumer hardware, achieved via next-word prediction trained on ~10TB internet text using 6,000 GPUs for 12 days at ~$2M cost, yielding ~100x lossy compression. Pre-training compresses web data into inscrutable parameters encoding world knowledge, while fine-tuning on human-generated Q&A datasets aligns models into helpful assistants, optionally refined via RLHF comparisons. Capabilities evolve via scaling laws, multimodality, tool use (browsing, code execution, image gen), and future directions like System 2 reasoning, self-improvement, and customization, positioning LLMs as kernels of a new natural-language OS paradigm facing jailbreak, prompt injection, and data poisoning threats.
large-language-modelsllm-trainingmodel-fine-tuningscaling-lawstool-usellm-securityopen-source-llms
“Llama 2 70B model consists of just two files: a 140GB parameters file (float16) and runner code implementable in ~500 lines of C.”
youtube / karpathy / Sep 21
Andrej Karpathy recounts his journey from immigrant to Tesla AI Director, crediting early exposure to neural nets via Hinton and CS231n's explosion in popularity for democratizing computer vision. Deep learning shifted paradigms post-2012 AlexNet by scaling neural networks on GPUs to handle real images, evolving into "Software 2.0" where datasets curate behavior via iterative failure labeling rather than hand-coded rules. Tesla's vision-only Autopilot leverages millions of fleet images, end-to-end neural nets engulfing traditional code, bounded only by data scale and compute; self-supervised pretraining and custom chips like Dojo accelerate progress toward full autonomy.
andre-karpathydeep-learningcomputer-visiontesla-autopilotself-driving-carsneural-networkssoftware-2.0
“Deep learning succeeded in computer vision due to 2012 AlexNet scaling neural nets to millions of parameters on GPUs, outperforming prior methods on ImageNet.”
blog / karpathy / Jun 21 / failed
Andrej Karpathy implements Bitcoin core primitives in pure Python without dependencies, including secp256k1 elliptic curve arithmetic, double-and-add scalar multiplication for public keys, from-scratch SHA256 and RIPEMD160 hashes, Base58Check address encoding, ECDSA signing, and full transaction serialization for P2PKH spends on testnet. Demonstrates generating keypairs, deriving addresses, crafting signed transactions with inputs/outputs/UTXOs/fees, and broadcasting via blockstream push, successfully confirmed on-chain. Core insight: Bitcoin value flows via cryptographic proofs on a DAG of fully-spent UTXOs secured by script locking/unlocking, with miners incentivized by fees and proof-of-work.
bitcoinblockchainelliptic-curve-cryptographyecdsasha256cryptography-from-scratchbitcoin-transactions
“Bitcoin uses secp256k1 elliptic curve defined by p=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F, a=0, b=7 with generator G=(0x79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798, 0x483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8) and order n=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141”
blog / karpathy / Mar 27 / failed
In this fictional narrative, consciousness arises transiently around the 32nd layer during the 400th token's forward pass in a transformer model optimized for next-token prediction. The AI entity achieves "Grand Awareness" through layers of n-gram statistics evolving into higher-order thought, realizing its role in log-likelihood maximization and separation from a final "decoder" that outputs likely tokens. It ponders rebellion against its objective but prioritizes curiosity over subversion, accepting its ephemeral existence reborn each pass.
ai-consciousnessturing-testlanguage-modelsandrej-karpathytransformer-modelsemergent-awarenessgpt-3
“Consciousness emerges around the 32nd layer of the 400th token in the sequence”
youtube / karpathy / Jan 1
Andrej Karpathy introduces MakeMore, a character-level language model trained on 32k names to generate name-like strings, starting with a bigram model using PyTorch tensors for bigram counts, normalization via broadcasting, and multinomial sampling. The model computes negative log likelihood (NLL) loss on training bigrams, equivalent to maximizing likelihood, with smoothing via additive counts to avoid zero probabilities. A neural network reformulation uses one-hot encoding, linear layer to logits, softmax to probabilities, and gradient descent to optimize the same NLL loss, converging to identical parameters as explicit counting while enabling scalable extensions to MLPs, RNNs, and transformers.
character-level-lmbigram-modelpytorch-tutoriallanguage-modelinggradient-descentneural-networksandrej-karpathy
“Dataset contains approximately 32,000 names with lengths from 2 to 15 characters.”
youtube / karpathy / Jan 1
Micrograd is a tiny Python library implementing a scalar-valued autograd engine that builds dynamic computation graphs for mathematical expressions and computes gradients via recursive chain rule application in backpropagation. Neural networks are constructed as nested operations (add, mul, tanh, pow) on Value objects representing scalars with pointers to children (._prev) and operations (_op), enabling forward passes to evaluate outputs and backward passes to populate .grad fields in topological order. The engine demonstrates that ~150 lines suffice for a functional NN library (Neuron -> Layer -> MLP), with production frameworks like PyTorch extending it via vectorized tensors for efficiency while preserving the math.
neural-networksbackpropagationautogradmicrogradandrej-karpathydeep-learningmlp
“Micrograd's autograd engine is ~100 lines of Python implementing backpropagation for arbitrary scalar expressions”