Yann LeCun
Amjad Masad

Chief AI Scientist at Meta. Turing Award winner. Founding father of convolutional neural networks. Vocal on world models, JEPA, and the limits of LLMs.
Yann LeCun is Meta's Chief AI Scientist and a foundational deep learning researcher focused on self-supervised learning and world models as alternatives to LLM-centric approaches. He shapes AI discourse by consistently articulating technical critiques of current paradigms while advancing concrete research directions like JEPA.
Yann LeCun is a Turing Award-winning AI pioneer (2018, with Hinton and Bengio), recognized as the founding father of convolutional neural networks and a transformative figure in deep learning. As Meta's Chief AI Scientist until late 2025 and now leading AMI Labs (which raised billions to pursue his vision), he is a vocal critic of LLM-centric AI, arguing that language models are limited to sophisticated pattern matching and retrieval without grounded world understanding, persistent memory, reasoning, or hierarchical planning. His core belief is that true intelligence—Advanced Machine Intelligence (AMI) or superintelligence—emerges from self-supervised learning of abstract 'world models' via Joint Embedding Predictive Architectures (JEPA), inspired by how infants and animals acquire intuitive physics and common sense from observation, with strong advocacy for open-source development, decentralized governance, and rejection of existential AI risk narratives.
Yann LeCun consistently argues that current LLMs, while impressive at symbol manipulation, coding, and information retrieval, are fundamentally limited and not a viable path to human-level or superhuman intelligence. He emphasizes that language is neither necessary nor sufficient for advanced cognition; thinking primarily involves manipulating mental models in continuous, abstract representation spaces rather than discrete linguistic tokens. LLMs lack understanding of the physical world, intuitive physics, persistent memory, genuine reasoning, and the ability to plan, leading to what he calls 'AI stupidity.' They over-compress meaning, rely on System 1 intuitive associations rather than deliberate System 2 reasoning, face diminishing returns from scaling due to data walls, and cannot generalize to novel physical interactions or long-horizon tasks without architectural shifts. This view has sharpened with the LLM boom, contrasting their success in narrow domains with their inability to match the sample efficiency of biological learning. [5][6][9][11][25][32][34][58][84][91][92][95][99][129][154][180][187]
At the heart of LeCun's vision is the Joint Embedding Predictive Architecture (JEPA), a non-generative, self-supervised framework that learns by predicting representations in abstract latent spaces rather than pixels or tokens. This avoids the pitfalls of generative models (compounding errors, high computational cost for unpredictable details) while capturing semantic structure, intuitive physics, object permanence, and causal relationships. Variants like I-JEPA, V-JEPA 2, Causal-JEPA, LeJEPA, LeWorldModel (LeWM), VL-JEPA, and others demonstrate state-of-the-art performance in dense visual understanding, video prediction, speech, robotics planning, and zero-shot transfer. These models enable stable training from pixels/videos, density estimation, sparsity, and integration with multimodal data. World models built this way serve as the foundation for predicting outcomes, counterfactual reasoning, and efficient planning, forming the basis for Advanced Machine Intelligence (AMI). Recent work at AMI Labs and collaborations emphasize scalable, stable implementations for real-world deployment. [10][20][22][23][33][36][38][39][41][44][46][59][63][66][76][78][88][130][132][133][150][151][167][20][21][179]
LeCun views self-supervised learning (SSL) from high-bandwidth sensory data (especially video) as essential for acquiring common sense and background knowledge, analogous to how babies and animals learn efficiently through observation without massive labeled data or explicit rewards. SSL methods like VICReg, RankMe, VCReg, DINOv2 integrations, and JEPA variants extract supervisory signals from raw data to build invariant/equivariant representations, prevent collapse, maximize mutual information, and enable emergent capabilities like intuitive physics without hardwired priors. This paradigm outperforms reconstruction-based or purely contrastive approaches in data efficiency, generalization, and transfer to downstream tasks including robotics, medical imaging, and multimodal understanding. It is positioned as the key to overcoming the limitations of supervised and reinforcement learning for building robust world models. [3][25][31][93][94][101][111][115][116][121][130][132][133][142][147][148][150][157][160][161][162][167][180][184]
True intelligence requires hierarchical planning in latent spaces at multiple temporal scales to handle long-horizon tasks, reduce complexity, and enable zero-shot control. LeCun's frameworks integrate world models with model predictive control (MPC), gradient-based planning (e.g., GRASP), value-guided representations, temporal straightening, and action-conditioned predictors for robotics. This supports dexterous manipulation, whole-body control for humanoids, navigation in dynamic environments, imitation from videos, and emergent intuitive physics. Embodiments bridge egocentric video, latent actions, and physical constraints, with applications in zero-shot transfer from internet video to robots. Papers demonstrate superiority over model-free RL in data-scarce, offline, or distribution-shifted settings. [15][19][24][40][43][46][57][60][62][78][87][90][100][108][110][112][122][138][40][179]
LeCun draws heavily from neuroscience, cognitive science, and biology: intelligence is multidimensional (a vector, not a scalar), specialized rather than 'general,' and emerges from observation, interaction, and internal meta-control signals across evolutionary/developmental timescales. He proposes architectures inspired by System A (observation), System B (active behavior), and System M (meta-control), energy-based models, and predictive coding. Mental models enable causal reasoning and counterfactuals; language communicates thoughts but is built upon non-linguistic foundations. He critiques nativist priors, showing intuitive physics can emerge purely from SSL video prediction, and advocates NeuroAI and embodied benchmarks over language-centric tests. Recent work rejects the AGI label as ill-defined, favoring Superhuman Adaptable Intelligence (SAI) or AMI/ASI. [19][32][52][54][91][101][174][19][52][79][86][91][174]
LeCun is a strong proponent of open-source foundational models (e.g., Llama) to foster innovation, diversity, prevent monopolies, and accelerate progress through global collaboration, contrasting it with closed models that benefit from open advances without contribution. He highlights high ROI from federally funded research, warns against budget cuts threatening science, and opposes overly restrictive regulations that could lead to capture by incumbents. On safety, he rejects 'uncontrollable superintelligence' doomerism as hype, arguing alignment is a solvable engineering problem via objective-driven architectures, guardrails, and hierarchical planning. Superintelligence will be decentralized, not controlled by one entity or individual; AI should amplify human intelligence as a 'staff' rather than replace it. He favors gradual progress and open ecosystems over proprietary secrecy. [2][3][7][14][16][32][45][77][80][103][123][139][2][79][86][103][187]
LeCun's early work on CNNs (LeNet) in the 1980s-90s at Bell Labs laid the groundwork for modern computer vision and deep learning, overcoming winters in neural network research through practical applications like handwriting recognition. His views evolved from supervised learning successes and energy-based models to emphasizing self-supervised, non-generative predictive architectures as the path beyond scaling laws. Post-ChatGPT, critiques of LLMs intensified, leading to concrete JEPA implementations and the founding of AMI Labs to independently pursue world models, hierarchical systems, and AMI without industry LLM focus. He has long promoted open science and biological inspiration, with recent emphasis on rejecting narrow AGI definitions in favor of ASI/SAI and practical robotics deployments. [3][31][67][77][109][182][31][77][26][27][79][86]
LeCun envisions breakthroughs in world models rendering LLMs obsolete within years, enabling robust robotics, scientific discovery, and personalized AI assistants within 3-5 years at scale. AMI Labs represents a bet on this paradigm through massive investment in JEPA variants, multimodal integration, and embodied AI. However, challenges remain in scaling hierarchical architectures stably, integrating discrete language/symbolic reasoning with continuous world models without inheriting LLM flaws, achieving reliable long-horizon planning under uncertainty, and bridging simulation-to-real gaps in robotics. His work continues to push reproducible ecosystems (e.g., stable-worldmodel) and theoretical foundations for representation quality, density estimation, and planning efficiency.
LLMs excel at narrow symbol manipulation and retrieval but fundamentally lack physical grounding, true reasoning, planning, and world understanding; language is not the basis of thought.
AI models limited to explicitly taught questions [5]
Language neither necessary nor sufficient for cognition; analogy of roof without foundations [6]
Thinking manipulates mental models in continuous space, not language [9][11]
LLMs as advanced retrieval, not intelligent; need world models [25][34][58][92][95]
Over-compress representations vs human nuance [91]
Skeptical of LLM path to AGI/ASI [84]
JEPA is the central non-generative architecture for learning abstract predictive representations in latent space from video/sensory data, foundational for world models that enable planning and generalization.
JEPA as primary goal over token-generative baselines [10][72]
V-JEPA 2.1 for dense vision/world modeling [20]
Stabilizing JEPA with Gaussian regularization (LeWM) [22]
Causal-JEPA, LeJEPA, VL-JEPA, EB-JEPA variants for planning, speech, vision-language [36][38][59][63][76]
V-JEPA for unsupervised video reps and zero-shot robotics [88][132][133]
SSL from unlabeled rich data (video) is the 'dark matter' enabling efficient acquisition of common sense, intuitive physics, and background knowledge, far superior to supervised or RL paradigms for general intelligence.
Overcoming AI stupidity via SSL and world models [25]
Intuitive physics emerges from masked prediction in latent space [101]
SSL cookbook, regularization techniques (VICReg, VCReg, RankMe) [148][157][161][176]
Dark matter powering world models and animal-level intelligence [180][184]
Visual SSL matching language supervision at scale [94]
Intelligence requires hierarchical planning at multiple scales in latent world models for long-horizon control, robotics, and embodiment, outperforming model-free methods especially with suboptimal data.
Hierarchical planning in latent models reduces complexity for long-horizon tasks [15][24][40]
DexWM, PEVA, OSVI-WM for dexterous manipulation and imitation [57][87][90]
Gradient-based MPC and JEPA planning superior in sample efficiency [60][100][138]
Hierarchical world models for humanoid control [122]
AI architectures should draw from human/animal cognition: multidimensional intelligence, observation/active learning with meta-control, predictive coding, and emergence of physics understanding without innate priors.
Open-source accelerates innovation and prevents monopoly; superintelligence will be decentralized; AI safety is an engineering problem of objectives and guardrails, not existential doom; high ROI on public research.
From CNN pioneer overcoming neural net winters to sharpening critiques of scaling and founding AMI Labs to realize JEPA/world model vision at scale.
Yann LeCun
Amjad Masad
Other thinkers in the absorb network who most often quote, reply to, or cite Yann in their compiled entries (last 90 days weighted 2x). Honest signal — no follower-graph required.
Every entry that fed the multi-agent compile above. Inline citation markers in the wiki text (like [1], [2]) are not yet individually linked to specific sources — this is the full set of sources the compile considered.