About Yann LeCun

Critique of Large Language Models

Yann LeCun consistently argues that current LLMs, while impressive at symbol manipulation, coding, and information retrieval, are fundamentally limited and not a viable path to human-level or superhuman intelligence. He emphasizes that language is neither necessary nor sufficient for advanced cognition; thinking primarily involves manipulating mental models in continuous, abstract representation spaces rather than discrete linguistic tokens. LLMs lack understanding of the physical world, intuitive physics, persistent memory, genuine reasoning, and the ability to plan, leading to what he calls 'AI stupidity.' They over-compress meaning, rely on System 1 intuitive associations rather than deliberate System 2 reasoning, face diminishing returns from scaling due to data walls, and cannot generalize to novel physical interactions or long-horizon tasks without architectural shifts. This view has sharpened with the LLM boom, contrasting their success in narrow domains with their inability to match the sample efficiency of biological learning. [5][6][9][11][25][32][34][58][84][91][92][95][99][129][154][180][187]

JEPA and World Models as the Core Architecture

At the heart of LeCun's vision is the Joint Embedding Predictive Architecture (JEPA), a non-generative, self-supervised framework that learns by predicting representations in abstract latent spaces rather than pixels or tokens. This avoids the pitfalls of generative models (compounding errors, high computational cost for unpredictable details) while capturing semantic structure, intuitive physics, object permanence, and causal relationships. Variants like I-JEPA, V-JEPA 2, Causal-JEPA, LeJEPA, LeWorldModel (LeWM), VL-JEPA, and others demonstrate state-of-the-art performance in dense visual understanding, video prediction, speech, robotics planning, and zero-shot transfer. These models enable stable training from pixels/videos, density estimation, sparsity, and integration with multimodal data. World models built this way serve as the foundation for predicting outcomes, counterfactual reasoning, and efficient planning, forming the basis for Advanced Machine Intelligence (AMI). Recent work at AMI Labs and collaborations emphasize scalable, stable implementations for real-world deployment. [10][20][22][23][33][36][38][39][41][44][46][59][63][66][76][78][88][130][132][133][150][151][167][20][21][179]

Self-Supervised Learning as the 'Dark Matter' of Intelligence

LeCun views self-supervised learning (SSL) from high-bandwidth sensory data (especially video) as essential for acquiring common sense and background knowledge, analogous to how babies and animals learn efficiently through observation without massive labeled data or explicit rewards. SSL methods like VICReg, RankMe, VCReg, DINOv2 integrations, and JEPA variants extract supervisory signals from raw data to build invariant/equivariant representations, prevent collapse, maximize mutual information, and enable emergent capabilities like intuitive physics without hardwired priors. This paradigm outperforms reconstruction-based or purely contrastive approaches in data efficiency, generalization, and transfer to downstream tasks including robotics, medical imaging, and multimodal understanding. It is positioned as the key to overcoming the limitations of supervised and reinforcement learning for building robust world models. [3][25][31][93][94][101][111][115][116][121][130][132][133][142][147][148][150][157][160][161][162][167][180][184]

Hierarchical Planning, Embodiment, and Robotics

True intelligence requires hierarchical planning in latent spaces at multiple temporal scales to handle long-horizon tasks, reduce complexity, and enable zero-shot control. LeCun's frameworks integrate world models with model predictive control (MPC), gradient-based planning (e.g., GRASP), value-guided representations, temporal straightening, and action-conditioned predictors for robotics. This supports dexterous manipulation, whole-body control for humanoids, navigation in dynamic environments, imitation from videos, and emergent intuitive physics. Embodiments bridge egocentric video, latent actions, and physical constraints, with applications in zero-shot transfer from internet video to robots. Papers demonstrate superiority over model-free RL in data-scarce, offline, or distribution-shifted settings. [15][19][24][40][43][46][57][60][62][78][87][90][100][108][110][112][122][138][40][179]

Cognitive and Philosophical Foundations of Intelligence

LeCun draws heavily from neuroscience, cognitive science, and biology: intelligence is multidimensional (a vector, not a scalar), specialized rather than 'general,' and emerges from observation, interaction, and internal meta-control signals across evolutionary/developmental timescales. He proposes architectures inspired by System A (observation), System B (active behavior), and System M (meta-control), energy-based models, and predictive coding. Mental models enable causal reasoning and counterfactuals; language communicates thoughts but is built upon non-linguistic foundations. He critiques nativist priors, showing intuitive physics can emerge purely from SSL video prediction, and advocates NeuroAI and embodied benchmarks over language-centric tests. Recent work rejects the AGI label as ill-defined, favoring Superhuman Adaptable Intelligence (SAI) or AMI/ASI. [19][32][52][54][91][101][174][19][52][79][86][91][174]

Open Source, Policy, Safety, and Governance

LeCun is a strong proponent of open-source foundational models (e.g., Llama) to foster innovation, diversity, prevent monopolies, and accelerate progress through global collaboration, contrasting it with closed models that benefit from open advances without contribution. He highlights high ROI from federally funded research, warns against budget cuts threatening science, and opposes overly restrictive regulations that could lead to capture by incumbents. On safety, he rejects 'uncontrollable superintelligence' doomerism as hype, arguing alignment is a solvable engineering problem via objective-driven architectures, guardrails, and hierarchical planning. Superintelligence will be decentralized, not controlled by one entity or individual; AI should amplify human intelligence as a 'staff' rather than replace it. He favors gradual progress and open ecosystems over proprietary secrecy. [2][3][7][14][16][32][45][77][80][103][123][139][2][79][86][103][187]

Historical Contributions and Evolution of Thought

LeCun's early work on CNNs (LeNet) in the 1980s-90s at Bell Labs laid the groundwork for modern computer vision and deep learning, overcoming winters in neural network research through practical applications like handwriting recognition. His views evolved from supervised learning successes and energy-based models to emphasizing self-supervised, non-generative predictive architectures as the path beyond scaling laws. Post-ChatGPT, critiques of LLMs intensified, leading to concrete JEPA implementations and the founding of AMI Labs to independently pursue world models, hierarchical systems, and AMI without industry LLM focus. He has long promoted open science and biological inspiration, with recent emphasis on rejecting narrow AGI definitions in favor of ASI/SAI and practical robotics deployments. [3][31][67][77][109][182][31][77][26][27][79][86]

Open Tensions and Future Directions

LeCun envisions breakthroughs in world models rendering LLMs obsolete within years, enabling robust robotics, scientific discovery, and personalized AI assistants within 3-5 years at scale. AMI Labs represents a bet on this paradigm through massive investment in JEPA variants, multimodal integration, and embodied AI. However, challenges remain in scaling hierarchical architectures stably, integrating discrete language/symbolic reasoning with continuous world models without inheriting LLM flaws, achieving reliable long-horizon planning under uncertainty, and bridging simulation-to-real gaps in robotics. His work continues to push reproducible ecosystems (e.g., stable-worldmodel) and theoretical foundations for representation quality, density estimation, and planning efficiency.

About Yann LeCun

What Yann talks about (last 116 posts)

Vibe

Critique of Large Language Models

JEPA and World Models as the Core Architecture

Self-Supervised Learning as the 'Dark Matter' of Intelligence

Hierarchical Planning, Embodiment, and Robotics

Cognitive and Philosophical Foundations of Intelligence

Open Source, Policy, Safety, and Governance

Historical Contributions and Evolution of Thought

Open Tensions and Future Directions

Limits of LLMs

JEPA and World Models

Self-Supervised Learning

Hierarchical Planning and Embodiment

Cognitive and Biological Inspiration

Open Source, Policy, and Safety

Historical Contributions and Evolution