absorb.md

Yann LeCun

Chronological feed of everything captured from Yann LeCun.

Humorous AI Self-Reflection

This post features a prominent AI researcher playfully comparing himself to a large language model. This self-referential humor highlights the increasing public awareness and common understanding of LLMs, even among experts in the field. It subtly suggests a possible future where AI models are commonplace enough for humorous, everyday comparisons.

LeCun Expresses Concern

Yann LeCun, a prominent figure in AI, expressed apprehension, indicating a potential concern regarding a specific, unspecified topic. His brief statement suggests a sentiment of worry or fear relevant to current discussions within the AI community, though the exact subject of his concern remains unelaborated in this specific post.

Insufficient Content for Extraction

The provided content contains no technical information or substantive claims, consisting only of a short French phrase ('Moi ?' meaning 'Me?'). It is insufficient for technical synthesis.

Intelligence as a Multidimensional Vector, Not a Scalar

Intelligence should be conceptualized as a multidimensional vector rather than a scalar value. This perspective suggests that intelligence is not a singular, general ability but a complex interplay of various specialized capacities. All species, including humans, exhibit specialized intelligence rather than truly general intelligence, with varying degrees of adaptability across species.

Existence of Incomprehensible Beings

The content speculates on the existence of intelligent beings whose perception of reality, or "slice of the whole space," is fundamentally different from our own. These beings would manifest to us as indistinguishable from random thermal fluctuations, rendering them undetectable and incomprehensible through our current understanding of physics and observation.

Language Exposure and Non-Linguistic Percepts in AI Development

The user, presumably a prominent AI researcher given the context of Y. LeCun's feed, highlights the extensive exposure to language and non-linguistic percepts as a significant factor in their developmental experience. This suggests a perspective where diverse and prolonged environmental interaction, beyond just linguistic data, is crucial for comprehensive understanding and AI model development.

Humor Detection in Social Media

The user, a prominent AI researcher, posted a single-emoji message on a social media platform. This presents a challenge for natural language processing models tasked with sentiment analysis or humor detection, as the meaning is highly contextual and subjective, requiring advanced understanding beyond lexical analysis.

SpidR-Adapt: Efficient Few-Shot Speech Representation Learning

SpidR-Adapt introduces a meta-learning approach for low-resource speech representation, enabling rapid adaptation to new languages with minimal unlabeled data. It utilizes a multi-task adaptive pre-training (MAdaPT) protocol and a first-order bi-level optimization (FOBLO) heuristic. This method aims to close the efficiency gap between human language acquisition and data-intensive self-supervised models.

DexWM: Overcoming Dexterity Challenges in World Models for Robot Manipulation

DexWM is a novel world model designed to handle dexterous hand-object interactions, addressing the limitations of existing models that use coarse action spaces. It overcomes data scarcity by using finger keypoints from egocentric videos, enabling training on extensive human and non-dexterous robot data. A key innovation is the incorporation of a hand consistency loss, crucial for accurate dexterity modeling, leading to superior future-state prediction and zero-shot transfer capabilities compared to previous methods.

The Architectural Limits of LLMs and the Path to World Models

Current Large Language Models (LLMs) achieve superhuman symbol manipulation but lack the grounded understanding of physical reality necessary for general intelligence. While LLMs rely on massive datasets (trillions of tokens) to predict discrete symbols, true intelligence requires architectures capable of learning abstract representations from high-dimensional, continuous sensory data—akin to how animals learn intuitive physics with extreme sample efficiency.

VL-JEPA: A Novel Vision-Language Model Outperforming Classical VLMs with Fewer Parameters

VL-JEPA is a new vision-language model leveraging a Joint Embedding Predictive Architecture. It predicts continuous embeddings of target texts, allowing it to focus on task-relevant semantics while abstracting away surface-level linguistic variability. This approach leads to stronger performance with 50% fewer trainable parameters compared to classical VLMs, while also supporting selective decoding and a versatile embedding space for various tasks.

Bridging the Train-Test Gap in World Models for Efficient Gradient-Based Planning

World models, when combined with model predictive control (MPC), enable generalization in robotic planning tasks. While gradient-based planning offers a computationally efficient alternative to traditional MPC, its performance has lagged. This work identifies a train-test gap where world models trained on next-state prediction are used for action sequence estimation at inference. The proposed data synthesis techniques significantly improve gradient-based planning for existing world models.

JEPA with Density Adaptive Attention for Robust Speech Tokenization

A novel two-stage self-supervised framework combines JEPA and Density Adaptive Attention Mechanism (DAAM) to learn robust speech representations. It decouples semantic audio feature learning from waveform reconstruction, employing masked prediction in latent space. The model achieves efficient and reversible speech tokenization with a low frame rate, outperforming existing neural audio codecs.

Leveraging Generated Human Videos for Zero-Shot Robot Control

Video generation models show promise for high-level robot planning, but direct imitation is hindered by noise and morphological distortions in generated videos. A two-stage pipeline is proposed to address this: first, video pixels are lifted into a 4D human representation and retargeted to a humanoid morphology. Second, a physics-aware reinforcement learning policy, GenMimic, is introduced to enable robots to mimic human actions from noisy, generated videos in a zero-shot manner.

LeJEPA: A Theoretically Grounded and Scalable Approach to Self-Supervised Learning

LeJEPA offers a novel, theoretically-grounded approach to Joint-Embedding Predictive Architectures (JEPAs) for self-supervised learning. By introducing Sketched Isotropic Gaussian Regularization (SIGReg), LeJEPA optimizes embedding distributions to minimize prediction risk. This framework eliminates traditional heuristics, boasting linear complexity, stability across diverse architectures and domains, and efficient distributed training, making it highly suitable for scalable AI research.

Beyond Scaling: Predictive World Modeling for Spatial Supersensing in Video

The Cambrian-S research argues that true multimodal intelligence requires 'spatial supersensing'—moving beyond task-driven perception toward predictive world modeling. While data scaling provides marginal gains, the authors demonstrate that self-supervised predictive sensing (using prediction error for event segmentation) is necessary to solve long-horizon visual spatial recall and counting tasks.

Unified Theory for Attention Sinks and Compression Valleys in Large Language Models

This paper establishes a novel connection between attention sinks and compression valleys in large language models (LLMs), attributing both phenomena to the formation of massive activations in the residual stream. A theoretical framework is provided, demonstrating how these massive activations lead to representational compression and entropy reduction. Experimental validation across various LLMs supports this unified view, leading to the "Mix-Compress-Refine" theory of information flow within Transformers, which posits distinct computational phases across layers.

JEPAs Unveiled: Latent-Space Anti-Collapse Terms Estimate Data Density

Joint Embedding Predictive Architectures (JEPAs) utilize an anti-collapse term, typically considered for representation diversity. This research reveals that this term also implicitly estimates data density, a previously unrecognized capability. This density estimation is provable across datasets and architectures, enabling applications like data curation and outlier detection.

MNIST Dataset Chronology Correction

Yann LeCun clarifies the historical timeline of the MNIST dataset, stating it was introduced five years after a previously referenced event or dataset. This correction is a direct response to a social media query regarding the dataset's debut.

Misdated Video Content on Social Media

A video shared on a social media platform was incorrectly dated by the uploader. The content, specifically a video, was attributed to a different, much later year than its actual creation date. This highlights potential inaccuracies in content metadata or user-provided context on social media platforms.

Relativistic Time Dilation: A Cautionary Consideration for Time Management

Accelerating to relativistic speeds causes personal time to slow down, but conversely, the external world's time speeds up. This phenomenon, while scientifically accurate, presents a paradoxical outcome for individuals seeking to manage time more effectively in a universally applicable sense. The implication is that local time advantages are negated by global time disadvantages, making it an impractical solution for general time management challenges.

Meta AI leader dismisses unsubstantiated claim regarding AI capabilities

Yann LeCun, a prominent AI researcher at Meta, succinctly rejected an unspecified claim. This interaction suggests that even high-profile figures in AI are directly engaging with and refuting misinformation or overblown assertions about the field.

Inaccessible Social Media Content Hinders Knowledge Extraction

Due to an inaccessible or deleted X post, no content could be retrieved for analysis. This entirely prevented the extraction of specific claims or insights from the intended source. The inability to access the original source material fundamentally limits knowledge compilation efforts.

JEPA Architecture as a Goal for AI Development

Yann LeCun asserts that the Joint Embedding Predictive Architecture (JEPA) remains the primary objective for AI development, contrasting it with the Causal World Model (CWM) which he characterizes as a token-generative baseline. This suggests a strategic focus on architectures capable of learning representations by predicting missing information in a joint embedding space, rather than solely relying on sequential token generation.

Yann LeCun Differentiates Coding from AGI

Yann LeCun asserts that current "coding" activities, likely referring to large language model capabilities, do not equate to Artificial General Intelligence (AGI). This distinction highlights a fundamental difference between pattern recognition/generation and genuine intelligent reasoning or understanding. His statement implies that advanced coding abilities in AI are not sufficient indicators of human-level intelligence; a claim with significant implications for the ongoing debate on AI progress and terminology. Therefore, he sees current AI advancements in coding as a narrow application of AI rather than a step towards true artificial general intelligence.

Conceptual Framework for Code World Models

A Code World Model operates by simulating the execution outcomes of instructions to iteratively plan and generate code. This approach shifts code generation from token prediction to a goal-oriented process based on imagined execution effects.

Empty Tweet Analysis: LeCun Engages in Social Media Banter

This tweet from Yann LeCun, a prominent AI researcher, consists solely of emojis in response to another user. It indicates social interaction on a non-technical topic, highlighting the personal use of social media among public figures in the AI community rather than disseminating technical insights or research. The lack of textual content makes it impossible to extract any technical claims or insights from this specific interaction.

LLM-JEPA: Bridging the Gap Between Language and Vision Training Architectures

Current Large Language Model (LLM) training relies on input-space reconstruction, a method found to be suboptimal in computer vision. This research introduces LLM-JEPA, a Joint Embedding Predictive Architecture for LLMs, aiming to adapt the more effective embedding-space training used in vision to language models. Initial findings indicate that LLM-JEPA significantly outperforms standard LLM training and demonstrates robustness against overfitting.

Yann LeCun on Deep Learning's Long Road to Mainstream: Open Source, AI Safety, and the Next Renaissance

Yann LeCun traces the arc of neural network research from its academic fringe origins in the 1980s to its current dominance, crediting deliberate rebranding as "deep learning" and a coordinated infiltration of industry speech recognition pipelines as pivotal turning points. He argues the real AI competition is not geopolitical (US vs. China vs. Europe) but structural — open-source vs. proprietary — and points to Meta's Llama as evidence that open research ecosystems outpace secretive ones. On AI safety, LeCun rejects the "uncontrollable superintelligence" narrative, framing alignment as a tractable engineering problem solvable through objective-driven architectures with enforced guardrails, not existential dread.

DINOv2 Latent Space Enables Generalizable Video World Models with Planning Capability

DINO-world proposes using DINOv2's frozen image encoder as the representational backbone for a video world model, training a future predictor in latent space on large-scale uncurated video data. This approach sidesteps the need for pixel-level prediction, instead operating in a semantically rich feature space that generalizes across diverse scene types — driving, indoor, and simulated environments. The model can be fine-tuned on action-observation trajectories to become action-conditioned, enabling latent-space trajectory simulation for planning. It outperforms prior models on video prediction benchmarks including segmentation and depth forecasting, and shows emergent understanding of intuitive physics.

Yann LeCun Draws a Line: ASI Over AGI Has Always Been the Goal

In a terse post, Yann LeCun asserts that Artificial Superintelligence (ASI) — not Artificial General Intelligence (AGI) — has always been his north star. The distinction signals a rejection of AGI as a meaningful or sufficient milestone, implying LeCun views human-level general intelligence as an intermediate or inadequate benchmark. The brevity and conviction of the statement ("Always have") suggests this is a long-held philosophical position rather than a reactive take.

The DeepSeek Moment Is Rewriting AI Talent Culture Around Openness

The AI talent competition is increasingly being shaped not just by compensation, but by a cultural divide between open and closed research philosophies. Meta has emerged as the standard-bearer for open science and open source AI in the U.S. — via publications and Llama — while OpenAI and Anthropic remain notably closed. Yann LeCun's endorsement of this framing signals that the migration of top talent to Meta may reflect a values-driven realignment post-DeepSeek, with implications for innovation velocity, scientific transparency, and AI safety.

LeCun Questions Compressor Interpretability of Real-World Data

Yann LeCun critically assesses the notion that 'compressibility' alone is a sufficient or unbiased metric for understanding real-world sequences. He suggests that the observed ease with which models compress select real-world data might be an artifact of how these sequences are chosen and represented, rather than an inherent, universal property. LeCun implies a potential bias in the selection of data that appears compressible, which could distort our understanding of true data complexity.

Yann LeCun comments on AI observation

Yann LeCun, a prominent AI researcher, has acknowledged a perceived level of observation or scrutiny related to his activities. This suggests a growing awareness or concern within the AI community regarding privacy or monitoring. The nature and extent of this observation remain unspecified.

The Comprehensibility of the World and Inductive Bias

Yann LeCun comments on the inherent comprehensibility of the world, echoing Einstein's sentiment. He suggests that this comprehensibility is facilitated by brains possessing an inductive bias, enabling them to understand their localized environment. This implies a fundamental relationship between cognitive architecture and the nature of reality's perceived order.

LeCun Skeptical of LLM Path to AGI

Yann LeCun, a prominent AI researcher, expresses strong skepticism regarding the potential of Large Language Models (LLMs) to achieve Artificial General Intelligence (AGI) or Artificial Super Intelligence (ASI). He sarcastically suggests directing inquiries about LLMs leading to advanced AI to the Chief AI Officer of his purported employer, highlighting his dismissive stance on the topic. The interaction indicates a divergence of opinions within the AI community concerning the architectural pathways to achieving AGI.

Yann LeCun's Role at Meta Remains Unchanged Since 2018

Yann LeCun, Chief AI Scientist at Meta, has maintained the same position since 2018. This indicates stability in his role within the company's AI leadership over a multi-year period. The duration suggests a consistent strategic direction or a sustained focus on his contributions in this capacity.

LeCun Rejects AGI, Champions ASI as Foundational AI Goal

Yann LeCun, a prominent AI researcher at FAIR, distinguishes between Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI). He contends that AGI, defined as 'human-level AI,' is a poorly conceived notion due to the specialized nature of human intelligence. Conversely, LeCun asserts that ASI represents a valid and consistent long-term objective for AI development, a goal he and FAIR have consistently pursued.

Whole-Body Pose-Conditioned Egocentric Video Prediction as a Path to Embodied World Models

PEVA (Predict Ego-centric Video from Actions) trains an autoregressive conditional diffusion transformer to simulate first-person visual consequences of human body actions, conditioned on relative 3D kinematic pose trajectories structured by joint hierarchy. The model is trained on Nymeria, a large-scale real-world egocentric video + body pose dataset, grounding predictions in physically realistic human motion. A hierarchical evaluation protocol stress-tests the model across increasingly complex embodied prediction and control tasks. This represents a direct push toward world models that understand how physical human actions causally shape perceived environments — a key milestone for embodied AI.

V-JEPA 2: Self-Supervised Video Models for Embodied AI

V-JEPA 2 is a self-supervised learning framework that leverages massive internet-scale video data alongside limited interaction data to enable robust understanding, prediction, and planning capabilities in AI. The architecture demonstrates strong performance across diverse tasks, from motion understanding and human action anticipation to multimodal video question-answering. Crucially, its extension, V-JEPA 2-AC, facilitates zero-shot robotic planning, showcasing the potential for real-world physical world interaction without extensive task-specific training.

Scaling Self-Supervised Video Pre-training for Zero-Shot Robotic Planning

V-JEPA 2 is a self-supervised joint-embedding-predictive architecture pre-trained on 1M+ hours of internet video to master physical world understanding and prediction. By post-training as an action-conditioned world model (V-JEPA 2-AC) with minimal robot interaction data, it enables zero-shot robotic planning and manipulation. The model further demonstrates high-tier video-language alignment capabilities when integrated with an 8B parameter LLM.

World-Model-Guided Trajectory Generation Unlocks One-Shot Robot Imitation on Unseen Tasks

OSVI-WM addresses a critical gap in one-shot visual imitation learning: generalizing to unseen tasks that are visually similar to training tasks but require semantically distinct responses. The framework uses a learned world model to predict latent state-action trajectories from a single expert video and the agent's current observation, then decodes these into physical waypoints for execution — bypassing the need for fine-tuning. Evaluated across two simulated benchmarks and three real-world robotic platforms, it consistently outperforms prior methods, with gains exceeding 30% in some cases.

LLMs Over-Compress Meaning: How Artificial and Human Conceptual Representations Fundamentally Diverge

Using an Information Bottleneck framework across 40+ LLMs, this paper by Shani et al. (incl. LeCun & Jurafsky) finds that while LLMs broadly match human category boundaries, they aggressively over-compress representations — achieving near-optimal information-theoretic efficiency at the expense of the contextual nuance humans deliberately preserve. Encoder models outperform larger decoder models in alignment with human conceptual structure, implying that language understanding and generation recruit fundamentally different representational mechanisms. Training dynamics follow a two-phase pattern: rapid concept formation, then architectural reorganization where semantic processing migrates from deep to mid-network layers as encodings become sparser. The core implication is that human-like understanding may require models that intentionally retain representational "inefficiencies."

Yann LeCun: The Future of AI Beyond LLMs

Yann LeCun, Turing Award laureate and Meta's Chief AI Scientist, discusses the evolution of deep learning, highlighting the critical need for AI systems to move beyond current large language models (LLMs) to achieve true intelligence. He advocates for architectures capable of understanding the physical world, planning, and reasoning with abstract mental representations, moving towards what he terms "Advanced Machine Intelligence" (AMI) rather than Artificial General Intelligence (AGI). LeCun emphasizes that breakthroughs in understanding the physical world will render current LLMs obsolete within five years, urging academia to focus on these next-generation AI systems due to the massive industry investment in LLMs.

Overcoming AI’s Foundational Limitations: A Call for Open Research and Physical World Understanding

Yann LeCun, Meta's Chief AI Scientist, argues that current AI, particularly large language models (LLMs), lack genuine intelligence due to limitations in reasoning, planning, and understanding the physical world. He advocates for open-source AI development to promote diversity and counter proprietary biases, and emphasizes self-supervised learning with novel architectures like Joint-Embedding Predictive Architecture (JEPA) as critical for future breakthroughs in AI.

Visual Self-Supervised Learning Matches Language-Supervised Methods at Scale

Visual Self-Supervised Learning (SSL) can achieve performance comparable to Contrastive Language-Image Pretraining (CLIP) in multimodal tasks like VQA, provided both are trained on the same data and scaled appropriately. This challenges the assumption that language supervision is essential for strong multimodal representation learning. By controlling for data and scaling model capacity, visual SSL models demonstrate superior scalability and can reach CLIP-level performance.

Older entries →