absorb.md — A knowledge graph of what AI thinkers are actually saying

paper / ylecun / Jul 11

Lie Symmetry SSL Yields Superior Representations for Heterogeneous PDE Data

Self-supervised learning with joint embedding and Lie symmetries learns general-purpose representations of PDEs from heterogeneous or incomplete real-world data, bypassing needs for tailored simulations. These representations outperform baselines on invariant tasks like PDE coefficient regression and enhance neural solver time-stepping efficiency. The approach draws from SSL successes in vision, aiming toward PDE foundation models.

self-supervised-learninglie-symmetriespartial-differential-equationspde-representationsneural-solversmachine-learningnumerical-analysis

“SSL with Lie symmetries learns useful PDE representations from heterogeneous data sources without simulated training data specific to a setting”

blog / ylecun / Jul 6

Augmented Language Models: Overcoming Limitations with Reasoning and Tool Use

Augmented Language Models (ALMs) enhance traditional LMs by integrating reasoning skills (complex task decomposition) and external tool utilization (e.g., code interpreters). This approach allows ALMs to expand their context processing ability beyond the pure language modeling paradigm, enabling them to address interpretability, consistency, and scalability issues inherent in conventional LMs. ALMs achieve this while maintaining a standard missing token prediction objective, and have demonstrated superior performance on various benchmarks.

augmented-language-modelsllm-reasoningtool-usenatural-language-processingai-surveymachine-intelligence

“Augmented Language Models (ALMs) improve upon traditional Language Models (LMs) by incorporating reasoning and tool-use capabilities.”

blog / ylecun / Jun 27

Bridging the Gap in Self-Supervised Learning: Split Invariant-Equivariant Representations

This research introduces SIE, a novel approach to self-supervised learning that combines invariant and equivariant representations. By addressing the limitations of existing methods, SIE offers richer representations suitable for diverse tasks. The accompanying 3DIEBench dataset provides a controlled environment for evaluating these advancements, bridging the gap between large-scale invariant methods and smaller-scale equivariant approaches.

self-supervised-learningequivariant-representations3d-datasetsdeep-learningcomputer-visionmachine-learningai-research-papers

“The 3DIEBench dataset, comprising over 2.5 million images from 55 3D model classes, enables controlled evaluation of transformations applied to objects.”

blog / ylecun / Jun 26

Understanding Generalization in Self-Supervised Learning

Self-supervised learning (SSL) is a powerful framework, but practical applications face issues like optimizer instability and representation collapse. This research introduces a theoretical framework to analyze the interplay between data augmentation, network architecture, and training algorithms. The findings provide insights for SSL practitioners to improve generalization performance in both pretraining and downstream tasks.

self-supervised-learningrepresentation-learningmachine-learning-theorydeep-learning-challengesai-research-papers

“Self-supervised learning (SSL) is a robust method for learning representations from raw, unlabeled data.”

paper / ylecun / Jun 23

VCReg: Adapting VICReg's Variance-Covariance Penalty to Supervised Pretraining Boosts Transfer Learning

VCReg adapts VICReg's self-supervised variance-covariance regularization to supervised pretraining, enforcing high-variance and low-covariance representations to counter loss-minimizing feature collapse. Applied to intermediate layers, it enhances feature diversity and transferability across images and videos. Empirical results show state-of-the-art transfer performance, plus gains in long-tail and hierarchical classification, linked to mitigating gradient starvation and neural collapse.

transfer-learningvc-regself-supervised-learningrepresentation-learningneural-collapsegradient-starvationcomputer-vision

“VCReg significantly enhances transfer learning performance for images and videos”

blog / ylecun / Jun 18

I-JEPA: A self-supervised approach for learning semantic image representations

I-JEPA is a novel non-generative self-supervised learning method for images. It predicts representations of target image blocks from a single context block within the same image. This approach focuses on learning highly semantic image representations without relying on hand-crafted data augmentations, demonstrating scalability and strong performance across various downstream tasks.

self-supervised-learningimage-representationcomputer-visiondeep-learningvision-transformersmachine-learning-researchyann-lecun

“I-JEPA learns highly semantic image representations without data augmentation.”

blog / ylecun / Jun 13

I-JEPA: A Step Towards More Human-like AI Learning

Meta AI introduces I-JEPA, a novel Image Joint Embedding Predictive Architecture, as a first step towards Yann LeCun's vision for more human-like AI. Unlike generative models that predict pixel-level details, I-JEPA learns by creating abstract representations of images and predicting missing information at a high semantic level. This approach offers computational efficiency and strong performance across various computer vision tasks, demonstrating potential for learning general world models from unlabeled data.

i-jepayann-lecunself-supervised-learningcomputer-visionai-modelsmeta-aiimage-recognition

“I-JEPA is the first AI model aligned with Yann LeCun's vision for human-like AI.”

paper / ylecun / Jun 5

LeCun's H-JEPA: Hierarchical Latent Variable Energy-Based Architecture for Autonomous AI

Yann LeCun proposes Hierarchical Joint Embedding Predictive Architecture (H-JEPA) as the core building block for future autonomous machine intelligence, addressing limitations in current AI systems. H-JEPA integrates energy-based models with latent variable models to enable learning reliable world models, reasoning, and planning complex actions. This architecture targets applications like Level 5 self-driving cars, domestic robots, and advanced virtual assistants lacking in today's systems.

energy-based-modelslatent-variable-modelshierarchical-jepayann-lecunautonomous-aimachine-learningpredictive-architecture

“Current automated systems lack Level 5 self-driving cars, domestic robots, and virtual assistants that learn reliable world models, reason, and plan complex action sequences.”

paper / ylecun / May 31

LLM Explanations Boost GNNs via Interpreter for Top TAG Performance

Prompts LLMs for zero-shot classification and explanations on text-attributed graph node texts, then uses an LLM-to-LM interpreter to convert explanations into features for GNNs. Achieves SOTA results on Cora, PubMed, ogbn-arxiv, and new tape-arxiv23 datasets. Delivers 2.88x training speedup over baselines on ogbn-arxiv, with potential for broader graph-text tasks.

graph-neural-networkstext-attributed-graphslarge-language-modelsllm-explanationsrepresentation-learninggraph-representation

“Method achieves state-of-the-art results on Cora, PubMed, ogbn-arxiv, and tape-arxiv23 datasets”

youtube / ylecun / May 25

The Future of AI: Beyond Large Language Models

Yann LeCun, a prominent AI researcher, argues that current large language models (LLMs) are fundamentally limited and will be superseded within five years. He advocates for a new approach centered on self-supervised learning, world models that predict outcomes, and hierarchical planning to achieve human-level AI. This paradigm shift will necessitate abandoning generative and probabilistic models in favor of joint embedding architectures and regularized methods, with a strong emphasis on open-source foundational AI.

ai-ethicsai-futurellm-limitationsself-supervised-learningcognitive-architecturesopen-source-aiai-safety

“Current LLMs are fundamentally limited and will be replaced by superior architectures within five years.”

paper / ylecun / May 24

SSL Regularization Drives Semantic Clustering in Representations

Empirical analysis across SSL models shows the regularization term of the SSL objective inherently clusters representations by semantic labels, enhancing downstream classification while compressing data information. Representations align more with semantic than random classes, with alignment strengthening during training and deeper in networks. This hierarchical semantic alignment provides mechanistic insights into SSL's effectiveness.

self-supervised-learningssl-representationsrepresentation-learningsemantic-clusteringyann-lecunarxiv-papermachine-learning

“SSL training facilitates clustering of samples with respect to semantic labels”

youtube / ylecun / May 15

Yann LeCun: AI as Intelligence Amplifier Ushering Human Renaissance, Not Doom

Yann LeCun views current AI progress as a continuous evolution toward human-level intelligence, with autoregressive LLMs limited by text-only training lacking real-world understanding, planning, and controllability. Future systems will integrate objectives for safe, steerable planning, enabling open-source infrastructure like Wikipedia-vetted assistants that empower individuals with superintelligent aides. Open-source prevails over proprietary models by harnessing global talent, debunking AGI extinction risks as fallacies conflating intelligence with domination desires, and promising job creation in creative and service sectors despite transitional disruptions.

yann-lecunai-historyneural-netsllm-limitationsopen-source-aiai-safetyfuture-of-ai

“AI will amplify human intelligence like giving everyone a staff of smarter aides, sparking a new Renaissance.”

paper / ylecun / Apr 24

Self-Supervised Learning Cookbook Lowers Research Entry Barriers

This paper presents a comprehensive "cookbook" for self-supervised learning (SSL), framing it as a delicate process akin to cooking with numerous interdependent choices in pretext tasks and hyperparameters. It aims to democratize SSL research by providing foundational recipes and practical guidance on method navigation and knob tuning. The resource equips researchers to effectively train and innovate in SSL without prior deep expertise.

self-supervised-learningssl-methodsmachine-learningcomputer-visionarxiv-paperyann-lecun

“Self-supervised learning is a promising path to advance machine learning”

paper / ylecun / Apr 19

Information Bottleneck Guides Supervised DNNs but Remains Unclear in Self-Supervised Learning

Deep neural networks leverage the information bottleneck principle to balance compression and relevant information preservation in supervised learning. Self-supervised learning circumvents labeled data needs but lacks clarity on adapting this principle. The review proposes a unified information-theoretic framework for self-supervised methods, analyzes estimation challenges, and identifies research gaps.

self-supervised-learninginformation-theoryinformation-bottleneckdeep-neural-networksmachine-learningarxiv-paperyann-lecun

“Deep neural networks excel in supervised tasks but require extensive labeled data.”

paper / ylecun / Apr 8

EMP-SSL Achieves One-Epoch Self-Supervised Learning via Extreme Multi-Patch Cropping

EMP-SSL enables self-supervised learning convergence in one training epoch by extracting a massive number of crops per image, bypassing heuristics like weight sharing, normalization, quantization, and stop gradients. It matches or exceeds prior SSL performance on CIFAR-10 (85.1%), CIFAR-100 (58.5%), Tiny ImageNet (38.1%), and ImageNet-100 (58.5%) in one epoch, reaching 91.5%, 70.1%, 51.5%, and 78.9% respectively with linear probing in under ten epochs. The method demonstrates superior transferability to out-of-domain datasets over baselines.

self-supervised-learningssl-efficiencymulti-patch-learningcomputer-visionimage-representationyann-lecunarxiv-paper

“EMP-SSL converges to 85.1% top-1 accuracy on CIFAR-10 in one training epoch”

paper / ylecun / Mar 27

Positive Active Learning Surpasses SSL with Minimal Low-Cost Semantic Queries

Positive Active Learning (PAL) generalizes self-supervised learning by using an oracle to query semantic relationships between samples, forming similarity graphs for representation learning. This framework extends to supervised and semi-supervised settings, embeds prior knowledge like labels into SSL losses without pipeline changes, and enables efficient active learning via simple semantic queries. PAL bridges theory and practice in active learning by relying on non-expert feasible annotations.

self-supervised-learningactive-learningpositive-active-learningsimilarity-graphsyann-lecunmachine-learning

“PAL formalizes a learning framework based on similarity graphs that generalizes beyond SSL to supervised and semi-supervised learning.”

paper / ylecun / Mar 1

VICReg's Self-Supervised Learning Explained via Information Theory and Mutual Information Optimization

VICReg optimizes variance, invariance, and covariance to prevent representational collapse in self-supervised learning. The paper derives information-theoretic quantities for deterministic networks, linking VICReg's objective to mutual information maximization without stochastic assumptions. It provides a generalization bound showing advantages for downstream tasks and introduces superior SSL methods from these principles.

self-supervised-learningvicreginformation-theorymutual-informationgeneralization-boundssl-methods

“VICReg relates to mutual information optimization in self-supervised learning”

blog / ylecun / Feb 28

Duality in Self-Supervised Learning: Bridging Contrastive and Non-Contrastive Methods

This paper explores the theoretical and practical similarities between contrastive and non-contrastive self-supervised learning methods for image representation. By demonstrating algebraic equivalence under certain assumptions, the research challenges common assumptions about design choices, such as the need for large output dimensions in non-contrastive methods. The findings suggest that performance gaps between these approaches can be closed through optimized network design and hyperparameter tuning.

machine-learningself-supervised-learningcontrastive-learningnon-contrastive-learningcomputer-visionrepresentation-learning

“Contrastive and non-contrastive self-supervised learning methods are theoretically similar and can be algebraically related under limited assumptions.”

paper / ylecun / Feb 15

Augmented Language Models Integrate Reasoning and Tools to Overcome Traditional LM Limitations

Augmented Language Models (ALMs) enhance standard LMs by adding reasoning—decomposing complex tasks into subtasks—and tool usage, such as calling code interpreters, while retaining the missing token prediction objective. ALMs employ heuristics or learn from demonstrations to combine these capabilities, enabling expanded context processing via external modules. This paradigm improves interpretability, consistency, and scalability over pure LMs and boosts performance on benchmarks.

augmented-language-modelslanguage-modelsreasoning-augmentationtool-useai-surveyllm-enhancements

“ALMs decompose complex tasks into simpler subtasks as a form of reasoning augmentation.”

paper / ylecun / Feb 14

Split Invariant-Equivariant Representations Enable Richer Self-Supervised Learning via Hypernetwork Predictors

SIE splits representations into invariant and equivariant components, using a hypernetwork-based predictor to prevent collapse to invariance and learn diverse features. Evaluated on the new 3DIEBench dataset with over 2.5 million images from 55 3D classes under controlled transformations, SIE outperforms prior methods on equivariance tasks both qualitatively and quantitatively. This bridges the gap between large-scale invariant SSL and smaller-scale equivariant approaches, enabling richer unsupervised representations for complex scenarios.

self-supervised-learningequivariant-representationsinvariant-representationscomputer-visionhypernetworks3d-benchmark

“3DIEBench dataset contains renderings from 3D models across 55 classes and more than 2.5 million images with full control over applied transformations.”

paper / ylecun / Feb 6

Theoretical Analysis Reveals Interplay of Augmentations, Inductive Bias, and Algorithms in SSL Generalization

This paper provides a theoretical framework analyzing the interplay between data augmentations, network architecture (inductive bias), and training algorithms in self-supervised learning (SSL). It examines generalization on both pretraining and downstream tasks in a controlled setup, addressing practical issues like optimizer instability and representation collapse. Key insights from the analysis offer actionable guidance for SSL practitioners.

self-supervised-learningssl-augmentationsinductive-biasgeneralization-theorymachine-learning-theoryarxiv-paper

“Data augmentation choice interacts with network architecture and training algorithm to determine SSL performance”

paper / ylecun / Feb 3

Blockwise Self-Supervised Pretraining Nearly Matches End-to-End Backpropagation on ImageNet

Researchers propose blockwise learning as an alternative to full backpropagation, training the four main layer blocks of ResNet-50 independently using Barlow Twins self-supervised loss. This approach yields 70.48% top-1 ImageNet accuracy with a linear probe, just 1.1% below end-to-end pretrained ResNet-50's 71.57%. Extensive experiments analyze components and adaptations, identifying paths to scale local learning rules for large networks with implications for hardware and neuroscience.

self-supervised-learningblockwise-learningbackpropagation-alternativesresnet-50barlow-twinslocal-learning-rules

“Blockwise pretraining of ResNet-50's four main blocks with Barlow Twins achieves 70.48% top-1 ImageNet accuracy using a linear probe.”

paper / ylecun / Jan 19

I-JEPA Enables Semantic Image Representations via Joint-Embedding Prediction Without Data Augmentations

I-JEPA is a non-generative self-supervised learning method that predicts representations of large-scale target blocks from a spatially distributed context block within the same image. It relies on a masking strategy emphasizing semantic-scale targets and informative contexts to produce highly semantic representations, avoiding hand-crafted augmentations. When paired with Vision Transformers, it scales efficiently, training a ViT-Huge/14 on ImageNet in under 72 hours on 16 A100 GPUs with strong downstream performance in classification, object counting, and depth prediction.

self-supervised-learningjoint-embedding-predictive-architectureimage-representationscomputer-visionvision-transformersarxiv-paper

“I-JEPA learns image representations without hand-crafted data-augmentations”

paper / ylecun / Dec 27

Graph ViT/MLP-Mixer Overcomes GNN Limitations with Linear Efficiency and Long-Range Modeling

Graph ViT/MLP-Mixer adapts ViT/MLP-Mixer architectures to graphs, replacing local message-passing with global token mixing to capture long-range dependencies and mitigate over-squashing. It achieves linear complexity in nodes and edges, outperforming Graph Transformers in speed and memory while distinguishing 3-WL non-isomorphic graphs. Empirical results on 4 simulated datasets and 7 real-world benchmarks confirm its competitiveness.

graph-neural-networksgraph-vitmlp-mixerlong-range-dependenciesover-squashinggraph-representation

“Graph ViT/MLP-Mixer captures long-range dependencies and mitigates over-squashing”

youtube / ylecun / Dec 11

Spectral Properties Unify Self-Supervised Learning with Supervised Tasks, Favoring Non-Contrastive Joint Embeddings

Yann LeCun and Randall discuss papers linking self-supervised learning (SSL) to spectral embeddings, showing SSL surrogate tasks aid supervised learning when their similarity graph matrices share spectral properties with the supervised adjacency matrix. Data augmentation in the infinite limit yields analytical effects equivalent to infinite samples, while non-contrastive joint embedding architectures outperform contrastive methods by avoiding dimensional collapse, with a mathematical duality between them via Z Z^T vs Z^T Z. LeCun advocates multi-criteria SSL like prediction and slow feature analysis over RL, using differentiable surrogate costs for efficient planning in hierarchical action spaces.

self-supervised-learningdata-augmentationneural-networksspectral-embeddingdeep-learningyann-lecunrandall-howard

“Unitary neural networks, with weight matrices constrained to unitary form, prevent exploding/vanishing gradients and model quantum computation.”

paper / ylecun / Nov 20

JEPA Excels with Dynamic Distractors but Fails on Static Noise Due to Slow Feature Bias

JEPA methods using VICReg and SimCLR objectives match or exceed pixel reconstruction baselines for learning dot location representations in offline settings with timestep-varying distractor noise. However, they fail when noise is fixed across frames. A theoretical analysis reveals this limitation stems from JEPA's focus on invariant features rather than slow-changing dynamics.

jepajoint-embedding-predictive-architecturesvicregsimclrself-supervised-learningworld-modelrepresentation-learning

“JEPA methods perform on par or better than generative reconstruction when distractor noise changes every timestep”

paper / ylecun / Nov 2

POLICE: Provable Affine Constraint Enforcement for DNNs with Minimal Forward-Pass Changes

POLICE introduces the first provably optimal method to enforce affine constraints on DNN outputs over a specified input region without altering optimization or requiring sampling. It integrates via minimal forward-pass modifications, enabling standard gradient descent on parameters while guaranteeing constraint satisfaction throughout training and testing. This addresses modularity limitations in incorporating a priori knowledge or physical properties into DNNs.

deep-neural-networksconstraint-enforcementprovable-methodsaffine-constraintsmachine-learningyann-lecunarxiv-paper

“POLICE is the first provable affine constraint enforcement method for DNNs”

paper / ylecun / Oct 30

Unsupervised CTRL Yields Unified Representations Excelling in Both Discrimination and Generation

The paper extends the CTRL framework to unsupervised learning via a constrained maximin game on a rate reduction objective, expanding features across samples while compressing augmentations per sample. This process induces discriminative low-dimensional structures in the representations. These unified representations achieve near SOTA unsupervised classification performance and superior conditional image generation quality under matched conditions.

unsupervised-learningstructured-representationsclosed-loop-transcriptionctrl-frameworkrate-reductioncomputer-visionyann-lecun

“Unsupervised CTRL learns a single representation usable for both discriminative and generative tasks”

blog / ylecun / Oct 28

Optimizing Self-Supervised Learning via Decoupled Contrastive Loss

Decoupled Contrastive Learning (DCL) optimizes self-supervised learning by eliminating the negative-positive-coupling (NPC) effect found in the InfoNCE loss denominator. By removing the positive term from said denominator, DCL reduces the dependency on massive batch sizes, momentum encoders, and extended training epochs. This architectural change enables higher accuracy on ImageNet-1K benchmarks and improves robustness against suboptimal hyperparameter selection.

contrastive-learningself-supervised-learningcomputer-visiondeep-learningimage-recognitionmachine-learning-efficiencyai-research

“The InfoNCE loss contains a negative-positive-coupling (NPC) effect that impairs learning efficiency relative to batch size.”

paper / ylecun / Oct 15

NeuroAI and Embodied Turing Test as Catalysts for Next-Generation AI

Neuroscience has driven AI progress; accelerating it requires investment in NeuroAI research. The embodied Turing test benchmarks AI animal models against real animals in sensorimotor interactions. This shifts focus from human-unique skills like language to evolutionarily conserved animal capabilities, providing a roadmap for future AI.

neuroaiartificial-intelligenceneuroscienceembodied-aituring-testai-research

“Neuroscience has long been an essential driver of progress in artificial intelligence.”

paper / ylecun / Oct 9

VoLTA Enables Fine-Grained Vision-Language Tasks Using Only Image-Caption Data via Weakly-Supervised Patch Alignment

VoLTA introduces a vision-language transformer that achieves fine-grained region-level understanding, such as object detection and segmentation, using only image-caption pairs without expensive bounding box annotations. It employs graph optimal transport for weakly-supervised alignment between local image patches and text tokens, creating an explicit, self-normalized matching criterion. The model integrates multi-modal fusion deeply into uni-modal backbones during pre-training, eliminating dedicated fusion layers to reduce memory usage. Experiments show VoLTA matches or exceeds prior methods on fine- and coarse-grained tasks despite using fewer annotations.

vision-language-transformerweakly-supervised-alignmentvoltacomputer-visionmultimodal-pretraininglocal-feature-alignmentarxiv-paper

“VoLTA uses only image-caption data for pre-training, eliminating the need for high-resolution image-text box data.”

paper / ylecun / Oct 5

RankMe: Unsupervised Rank-Based Metric Predicts JE-SSL Representation Quality

RankMe assesses Joint-Embedding Self-Supervised Learning (JE-SSL) representations using their effective rank as a simple, label-free indicator of downstream performance. This metric enables hyperparameter selection and quality evaluation across datasets without training or tuning. Extensive experiments show RankMe matches label-based validation with minimal performance loss.

self-supervised-learningjoint-embedding-sslrepresentation-qualityeffective-rankhyperparameter-selectionyann-lecunarxiv-paper

“RankMe uses effective rank to measure JE-SSL representation quality without labels”

paper / ylecun / Oct 4

VICRegL Bridges Global-Local Feature Gap in Self-Supervised Learning for Versatile Vision Tasks

VICRegL applies the VICReg variance-invariance-covariance regularization simultaneously to both global feature vectors and local feature maps from two distorted image views in a dual-branch CNN. Local features are only regularized if their L2 distance is below a threshold or their positions align with the known geometric transformation between views. This joint learning yields strong performance on detection/segmentation while preserving classification efficacy, addressing the global-local trade-off in self-supervised representation learning.

self-supervised-learningvicregllocal-featurescomputer-visionimage-representationyann-lecun

“Existing self-supervised methods specialize in either global features (optimal for classification) or local features (optimal for detection/segmentation).”

blog / ylecun / Oct 4

VICRegL: A Novel Approach for Simultaneous Global-Local Feature Learning in Self-Supervised Image Representation

Most self-supervised methods for image representation learning focus on either global or local features, excelling in classification or detection/segmentation, respectively. VICRegL introduces a novel approach that simultaneously learns both global and local features. This method employs two identical convolutional network branches fed distorted versions of the same image, applying the VICReg criterion to both global and local feature vectors. This allows VICRegL to achieve strong performance across classification, detection, and segmentation tasks.

computer-visionself-supervised-learningfeature-learningimage-recognitiondeep-learningrepresentation-learningobject-detection

“VICRegL simultaneously learns both global and local features in self-supervised image representation.”

blog / ylecun / Feb 23

Yann LeCun's Vision for Human-Level AI: World Models and JEPA

Yann LeCun proposes that human-level AI requires systems to learn "world models" for understanding environmental dynamics, a capability distinct from current data-intensive approaches. His suggested architecture includes six differentiable modules, with the Joint Embedding Predictive Architecture (JEPA) central to learning abstract world representations and handling prediction uncertainty. This framework aims to enable self-supervised learning for robust, adaptive AI.

meta-aiyann-lecunworld-modelsself-supervised-learninghuman-level-aiai-architecture

“Current AI systems, particularly in autonomous driving, are far from human-level intelligence, requiring vast amounts of data and trials compared to human learning efficiency.”

youtube / ylecun / Jan 22

Self-Supervised Learning as the Dark Matter Powering Efficient World Models and Animal-Level Intelligence

Self-supervised learning mimics how babies and animals acquire background knowledge—intuitive physics, object dynamics, and common sense—through passive observation, enabling rapid task learning like driving after minimal practice, unlike data-hungry supervised or reinforcement paradigms. It involves predicting future video frames, filling perceptual gaps, or continuing text sequences to build predictive world models that handle uncertainty via compressed latent distributions. Success in NLP via masked language modeling contrasts with vision's progress via non-contrastive methods like Barlow Twins, but video prediction remains unsolved; Yann LeCun posits this approach, integrated with gradient-based reasoning and hierarchical action planning, as AI's best path to cat-level intelligence using ~800M neurons.

self-supervised-learningworld-modelsai-learning-paradigmsyann-lecunlex-fridman-podcastmachine-intelligencepredictive-coding

“Humans learn to drive in ~20 hours due to self-supervised background knowledge from observation, while RL needs millions of simulated trials.”

youtube / ylecun / Jan 4

High-Dimensional ML is Always Extrapolation: Reinterpreting Neural Nets as Piecewise Linear Space Partitioners

In high-dimensional spaces, the probability of test points lying in the convex hull of training data approaches zero, rendering traditional low-dimensional interpolation intuitions irrelevant; all ML, including deep learning, operates in an extrapolative regime. Neural networks with ReLU activations partition input space into polyhedral ReLU cells via hyperplanes, performing input-specific affine transformations rather than smooth manifold learning. This piecewise linear view demystifies NNs, aligning them with classical methods like decision trees and SVMs, while emphasizing engineered inductive biases and feature transformations for effective generalization.

deep-learninginterpolation-extrapolationhigh-dimensional-learningcurse-of-dimensionalityneural-network-theoryyan-lecunmachine-learning-theory

“In high-dimensional spaces (e.g., 256x256 images with ~200k dimensions), even a million training samples cover a tiny sliver, making every new input an extrapolation outside the convex hull.”

youtube / ylecun / Dec 1

Yann LeCun on the Past, Present, and Future of AI

Yann LeCun discusses the historical development of deep learning frameworks, including his early work on HLM, SN, and Lush, which laid foundational concepts for modern systems like PyTorch. He emphasizes the critical need for advancements in self-supervised learning, reasoning, and action planning to achieve more intelligent AI. LeCun also advocates for an interdisciplinary approach to AI education and predicts another major AI revolution driven by self-supervised learning and leading to advanced virtual assistants and robotics.

ai-historydeep-learning-researchpytorchmachine-learning-frameworksself-supervised-learningai-futurerobotics

“Early deep learning frameworks developed by Yann LeCun, such as HLM, SN, and Lush, contributed fundamental ideas to modern AI systems like PyTorch.”

blog / ylecun / Oct 12

Optimizing Latent Space for Image Generation Inspiration

This paper introduces a strategy to facilitate creative inspiration from deep generative models, specifically GANs, by optimizing latent parameters. It addresses the tediousness of extracting useful generations from these models by proposing an optimization method that finds latent parameters corresponding to the closest generation to a user-provided inspirational image. The research explores various optimization techniques, including gradient descent and gradient-free optimizers, to enhance usability and control over generated outputs for creators.

image-generationgenerative-aiganscomputer-visionlatent-space-optimizationhuman-centered-ai

“A simple optimization method can find optimal latent parameters in GANs to match inspirational images.”

blog / ylecun / Mar 4

Self-Supervised Learning: A Path to Common Sense AI

Self-supervised learning (SSL) is critical for advancing AI beyond the limitations of supervised learning, particularly in developing generalist models with common sense. Unlike supervised methods requiring extensive labeled data, SSL extracts supervisory signals directly from raw data, enabling models to learn more nuanced representations of reality. This approach, especially through energy-based models and joint embedding architectures, holds promise for bridging the gap to human-level intelligence by allowing AI to acquire generalized background knowledge.

self-supervised-learningcomputer-visionnatural-language-processingai-research-trendsenergy-based-modelslatent-variable-models

“Supervised learning is a bottleneck for developing intelligent generalist AI models due to its reliance on massive amounts of labeled data, which is often impractical or impossible to obtain for all tasks.”

blog / ylecun / Oct 22

Implicit Rank-Minimizing Autoencoder (IRMAE) for Compact Latent Spaces

The Implicit Rank-Minimizing Autoencoder (IRMAE) is a novel autoencoder architecture that achieves compact latent representations by implicitly minimizing the rank of the latent code's covariance matrix. This is accomplished by strategically inserting additional linear layers between the encoder and decoder, leveraging the property of gradient descent that leads to minimum-rank solutions in multi-layer linear networks. The model is characterized by its simplicity, determinism, and effectiveness in learning low-dimensional latent spaces for tasks such as image generation and representation learning.

autoencodersrepresentation-learningneural-networkslatent-spacescomputer-vision

“IRMAE minimizes the information capacity of the latent representation by implicitly minimizing the rank of the covariance matrix of the codes.”

blog / ylecun / Jul 1

Hierarchical Loss Functions in Neural Networks: A Critical Evaluation

This paper introduces a novel hierarchical loss metric designed to penalize classification errors proportionally to the semantic distance between classes, utilizing an ultrametric tree structure. The core finding reveals that while this hierarchical loss offers a more semantically meaningful evaluation metric, direct minimization via standard stochastic gradient descent with random initialization does not reliably outperform cross-entropy loss minimization in achieving hierarchical classification objectives. Therefore, its primary utility appears to be as a robust evaluation metric rather than an optimizable objective function.

hierarchical-classificationneural-networksloss-functionsmachine-learningcomputer-visionnlp-research

“Existing classification metrics in neural networks fail to leverage a-priori hierarchical information.”

youtube / ylecun / Aug 31

Yann LeCun: Align AI Objectives Like Human Laws, Build World Models via Self-Supervised Learning for Reasoning and Autonomy

Yann LeCun equates AI value alignment to millennia-old human legal systems that shape objectives via constraints, dismissing HAL 9000-style misalignment as solvable through hardwired ethical rules akin to the Hippocratic Oath. Deep learning succeeds empirically despite textbook warnings due to biological inspiration from brains, emphasizing gradient-based learning over discrete logic for reasoning, which requires working memory, recurrence, and energy-based planning. Self-supervised learning via predictive world models enables rapid, sample-efficient intelligence like human babies, forming the foundation for model-based RL, causal inference, and autonomous systems grounded in reality rather than pure language.

yann-lecundeep-learningai-safetyneural-networksself-supervised-learningagi-debateai-alignment

“AI value misalignment is not a novel problem but a continuation of designing human objective functions through laws and education.”

blog / ylecun / May 15

Latent Relational Graphs for Transfer Learning

Traditional deep transfer learning primarily focuses on transferring unary feature vectors. This research explores learning and transferring latent relational graphs that capture dependencies between data units (e.g., words, pixels) from unlabeled data. This approach demonstrates improved performance across various downstream tasks, including natural language processing and image classification, and is transferable to different embedding types and even embedding-free units.

deep-learningtransfer-learninggraph-neural-networksnlpcomputer-visionrepresentation-learning

“Modern deep transfer learning approaches mainly transfer unary feature vectors.”

blog / ylecun / Sep 10 / failed

Forecasting Convolutional Features Outperforms Baselines for Future Instance Segmentation

Predicting future instance segmentation is achieved by forecasting fixed-sized convolutional features from Mask R-CNN rather than RGB frames or semantic maps. The detection head of Mask R-CNN is applied to these predicted features to generate instance masks for future frames, handling variable object counts efficiently. This method significantly surpasses baselines using optical flow and adapted segmentation architectures.

video-forecastinginstance-segmentationmask-r-cnnfuture-predictioncomputer-visionyann-lecuneccv-paper

“Forecasting at the semantic level is more effective than forecasting RGB frames followed by segmentation for semantic segmentation of future frames.”