absorb.md — A knowledge graph of what AI thinkers are actually saying

youtube / ylecun / Dec 12

The Architectural Limits of LLMs and the Path to World Models

Current Large Language Models (LLMs) achieve superhuman symbol manipulation but lack the grounded understanding of physical reality necessary for general intelligence. While LLMs rely on massive datasets (trillions of tokens) to predict discrete symbols, true intelligence requires architectures capable of learning abstract representations from high-dimensional, continuous sensory data—akin to how animals learn intuitive physics with extreme sample efficiency.

ai-systemslarge-language-modelsdeep-learningai-ethicsai-safetyfuture-of-aiml-research

“LLMs are fundamentally incapable of achieving human-level general intelligence (AGI) through current token-prediction paradigms.”

paper / ylecun / Dec 11

VL-JEPA: A Novel Vision-Language Model Outperforming Classical VLMs with Fewer Parameters

VL-JEPA is a new vision-language model leveraging a Joint Embedding Predictive Architecture. It predicts continuous embeddings of target texts, allowing it to focus on task-relevant semantics while abstracting away surface-level linguistic variability. This approach leads to stronger performance with 50% fewer trainable parameters compared to classical VLMs, while also supporting selective decoding and a versatile embedding space for various tasks.

jepavision-language-modelsjoint-embedding-predictive-architecturecomputer-visiondeep-learningmultimodal-ai

“VL-JEPA predicts continuous embeddings of target texts instead of autoregressively generating tokens.”

paper / ylecun / Dec 10

Bridging the Train-Test Gap in World Models for Efficient Gradient-Based Planning

World models, when combined with model predictive control (MPC), enable generalization in robotic planning tasks. While gradient-based planning offers a computationally efficient alternative to traditional MPC, its performance has lagged. This work identifies a train-test gap where world models trained on next-state prediction are used for action sequence estimation at inference. The proposed data synthesis techniques significantly improve gradient-based planning for existing world models.

machine-learningrobotics-world-modelsgradient-based-planningmodel-predictive-controltrain-test-gapobject-manipulationnavigation-tasks

“World models with model predictive control (MPC) facilitate generalization across diverse planning tasks by training on expert trajectories.”

paper / ylecun / Dec 8

JEPA with Density Adaptive Attention for Robust Speech Tokenization

A novel two-stage self-supervised framework combines JEPA and Density Adaptive Attention Mechanism (DAAM) to learn robust speech representations. It decouples semantic audio feature learning from waveform reconstruction, employing masked prediction in latent space. The model achieves efficient and reversible speech tokenization with a low frame rate, outperforming existing neural audio codecs.

jepaspeech-representationneural-tokenizeraudio-processingself-supervised-learningdensity-adaptive-attention

“The proposed framework utilizes a two-stage self-supervised learning approach.”

paper / ylecun / Dec 4

Leveraging Generated Human Videos for Zero-Shot Robot Control

Video generation models show promise for high-level robot planning, but direct imitation is hindered by noise and morphological distortions in generated videos. A two-stage pipeline is proposed to address this: first, video pixels are lifted into a 4D human representation and retargeted to a humanoid morphology. Second, a physics-aware reinforcement learning policy, GenMimic, is introduced to enable robots to mimic human actions from noisy, generated videos in a zero-shot manner.

roboticsvideo-generationhuman-robot-interactionreinforcement-learningcomputer-visionmotion-tracking

“Generated videos can serve as high-level planners for contextual robot control.”

paper / ylecun / Nov 11

LeJEPA: A Theoretically Grounded and Scalable Approach to Self-Supervised Learning

LeJEPA offers a novel, theoretically-grounded approach to Joint-Embedding Predictive Architectures (JEPAs) for self-supervised learning. By introducing Sketched Isotropic Gaussian Regularization (SIGReg), LeJEPA optimizes embedding distributions to minimize prediction risk. This framework eliminates traditional heuristics, boasting linear complexity, stability across diverse architectures and domains, and efficient distributed training, making it highly suitable for scalable AI research.

self-supervised-learningjoint-embedding-predictive-architecturesrepresentation-learningdeep-learning-theorycomputer-visionmachine-learning

“LeJEPA optimizes for an isotropic Gaussian distribution of embeddings to minimize downstream prediction risk.”

paper / ylecun / Nov 6

Beyond Scaling: Predictive World Modeling for Spatial Supersensing in Video

The Cambrian-S research argues that true multimodal intelligence requires 'spatial supersensing'—moving beyond task-driven perception toward predictive world modeling. While data scaling provides marginal gains, the authors demonstrate that self-supervised predictive sensing (using prediction error for event segmentation) is necessary to solve long-horizon visual spatial recall and counting tasks.

computer-visionmultimodal-aispatial-cognitionpredictive-world-modelingai-benchmarksvideo-analysis

“Data scaling alone is insufficient to achieve spatial supersensing capabilities.”

paper / ylecun / Oct 7

Unified Theory for Attention Sinks and Compression Valleys in Large Language Models

This paper establishes a novel connection between attention sinks and compression valleys in large language models (LLMs), attributing both phenomena to the formation of massive activations in the residual stream. A theoretical framework is provided, demonstrating how these massive activations lead to representational compression and entropy reduction. Experimental validation across various LLMs supports this unified view, leading to the "Mix-Compress-Refine" theory of information flow within Transformers, which posits distinct computational phases across layers.

attention-sinkscompression-valleysllm-internalstransformer-architecturesrepresentational-compressionactivation-analysis

“Attention sinks and compression valleys in LLMs are directly linked to the formation of massive activations in the residual stream.”

paper / ylecun / Oct 7

JEPAs Unveiled: Latent-Space Anti-Collapse Terms Estimate Data Density

Joint Embedding Predictive Architectures (JEPAs) utilize an anti-collapse term, typically considered for representation diversity. This research reveals that this term also implicitly estimates data density, a previously unrecognized capability. This density estimation is provable across datasets and architectures, enabling applications like data curation and outlier detection.

jepasgaussian-embeddingsdata-densityself-supervised-learningrepresentation-learningoutlier-detectionai-research

“JEPAs' anti-collapse term provably estimates data density.”

tweet / @ylecun / Oct 4

MNIST Dataset Chronology Correction

Yann LeCun clarifies the historical timeline of the MNIST dataset, stating it was introduced five years after a previously referenced event or dataset. This correction is a direct response to a social media query regarding the dataset's debut.

mnistyann-lecundeep-learning-historyneural-networks

“The MNIST dataset was released five years after a point of reference mentioned in the conversation.”

tweet / @ylecun / Oct 3

Misdated Video Content on Social Media

A video shared on a social media platform was incorrectly dated by the uploader. The content, specifically a video, was attributed to a different, much later year than its actual creation date. This highlights potential inaccuracies in content metadata or user-provided context on social media platforms.

social-mediahistorical-referenceonline-discussionvideo-contentyann-lecun

“The video in question was created in 1989.”

tweet / @ylecun / Sep 24

Relativistic Time Dilation: A Cautionary Consideration for Time Management

Accelerating to relativistic speeds causes personal time to slow down, but conversely, the external world's time speeds up. This phenomenon, while scientifically accurate, presents a paradoxical outcome for individuals seeking to manage time more effectively in a universally applicable sense. The implication is that local time advantages are negated by global time disadvantages, making it an impractical solution for general time management challenges.

humorrelativitytime-dilationphysicssarcasmsocial-media

“Accelerating to relativistic speeds causes an individual's perception of time to slow down.”

tweet / @ylecun / Sep 24

Meta AI leader dismisses unsubstantiated claim regarding AI capabilities

Yann LeCun, a prominent AI researcher at Meta, succinctly rejected an unspecified claim. This interaction suggests that even high-profile figures in AI are directly engaging with and refuting misinformation or overblown assertions about the field.

social-mediaopinionyann-lecun

“Yann LeCun rejected a claim made by user @DreamStarter_1.”

tweet / @ylecun / Sep 24

Inaccessible Social Media Content Hinders Knowledge Extraction

Due to an inaccessible or deleted X post, no content could be retrieved for analysis. This entirely prevented the extraction of specific claims or insights from the intended source. The inability to access the original source material fundamentally limits knowledge compilation efforts.

x-feedcontent-unavailableerror-handlingsocial-media-analysisdata-ingestion

“The specific X post at the provided URL is inaccessible.”

tweet / @ylecun / Sep 24

JEPA Architecture as a Goal for AI Development

Yann LeCun asserts that the Joint Embedding Predictive Architecture (JEPA) remains the primary objective for AI development, contrasting it with the Causal World Model (CWM) which he characterizes as a token-generative baseline. This suggests a strategic focus on architectures capable of learning representations by predicting missing information in a joint embedding space, rather than solely relying on sequential token generation.

jepacwmyann-lecunai-modelsdeep-learning

“JEPA is the architectural goal for AI development.”

tweet / @ylecun / Sep 24

Yann LeCun Differentiates Coding from AGI

Yann LeCun asserts that current "coding" activities, likely referring to large language model capabilities, do not equate to Artificial General Intelligence (AGI). This distinction highlights a fundamental difference between pattern recognition/generation and genuine intelligent reasoning or understanding. His statement implies that advanced coding abilities in AI are not sufficient indicators of human-level intelligence; a claim with significant implications for the ongoing debate on AI progress and terminology. Therefore, he sees current AI advancements in coding as a narrow application of AI rather than a step towards true artificial general intelligence.

artificial-intelligenceagillm-mechanismscognitive-science

“Current AI coding capabilities are not indicative of Artificial General Intelligence.”

tweet / @ylecun / Sep 24 / failed

Yes

tweet / @ylecun / Sep 24

Conceptual Framework for Code World Models

A Code World Model operates by simulating the execution outcomes of instructions to iteratively plan and generate code. This approach shifts code generation from token prediction to a goal-oriented process based on imagined execution effects.

code-world-modelai-researchdeep-learningyann-lecunai-programming

“Code World Models generate code by simulating the effects of executing instructions.”

tweet / @ylecun / Sep 19

Empty Tweet Analysis: LeCun Engages in Social Media Banter

This tweet from Yann LeCun, a prominent AI researcher, consists solely of emojis in response to another user. It indicates social interaction on a non-technical topic, highlighting the personal use of social media among public figures in the AI community rather than disseminating technical insights or research. The lack of textual content makes it impossible to extract any technical claims or insights from this specific interaction.

social-medialeverage-social-xengagement

“Yann LeCun's tweet primarily consists of emojis, indicating a non-substantive social interaction.”

paper / ylecun / Sep 11

LLM-JEPA: Bridging the Gap Between Language and Vision Training Architectures

Current Large Language Model (LLM) training relies on input-space reconstruction, a method found to be suboptimal in computer vision. This research introduces LLM-JEPA, a Joint Embedding Predictive Architecture for LLMs, aiming to adapt the more effective embedding-space training used in vision to language models. Initial findings indicate that LLM-JEPA significantly outperforms standard LLM training and demonstrates robustness against overfitting.

llm-pretrainingjepaembedding-spacelanguage-modelsai-frameworksvision-ai

“LLM pretraining, finetuning, and evaluation are primarily based on input-space reconstruction and generative capabilities.”

youtube / ylecun / Aug 14

Yann LeCun on Deep Learning's Long Road to Mainstream: Open Source, AI Safety, and the Next Renaissance

Yann LeCun traces the arc of neural network research from its academic fringe origins in the 1980s to its current dominance, crediting deliberate rebranding as "deep learning" and a coordinated infiltration of industry speech recognition pipelines as pivotal turning points. He argues the real AI competition is not geopolitical (US vs. China vs. Europe) but structural — open-source vs. proprietary — and points to Meta's Llama as evidence that open research ecosystems outpace secretive ones. On AI safety, LeCun rejects the "uncontrollable superintelligence" narrative, framing alignment as a tractable engineering problem solvable through objective-driven architectures with enforced guardrails, not existential dread.

deep-learningai-historyopen-source-aineural-networksai-safetyai-researchllm-infrastructure

“Deep learning achieved mainstream adoption in speech recognition within 18 months after Jeff Hinton placed three students as interns at Microsoft, Google, and IBM to replace acoustic modeling components with deep learning systems — all three got better results.”

paper / ylecun / Jul 25

DINOv2 Latent Space Enables Generalizable Video World Models with Planning Capability

DINO-world proposes using DINOv2's frozen image encoder as the representational backbone for a video world model, training a future predictor in latent space on large-scale uncurated video data. This approach sidesteps the need for pixel-level prediction, instead operating in a semantically rich feature space that generalizes across diverse scene types — driving, indoor, and simulated environments. The model can be fine-tuned on action-observation trajectories to become action-conditioned, enabling latent-space trajectory simulation for planning. It outperforms prior models on video prediction benchmarks including segmentation and depth forecasting, and shows emergent understanding of intuitive physics.

world-modelsvideo-predictionself-supervised-learningcomputer-visiondinolatent-spaceembodied-ai

“DINO-world outperforms previous models on video prediction benchmarks including segmentation and depth forecasting tasks.”

tweet / @ylecun / Jul 3

Yann LeCun Draws a Line: ASI Over AGI Has Always Been the Goal

In a terse post, Yann LeCun asserts that Artificial Superintelligence (ASI) — not Artificial General Intelligence (AGI) — has always been his north star. The distinction signals a rejection of AGI as a meaningful or sufficient milestone, implying LeCun views human-level general intelligence as an intermediate or inadequate benchmark. The brevity and conviction of the statement ("Always have") suggests this is a long-held philosophical position rather than a reactive take.

agiasiai-safetyyann-lecunai-discourse

“Yann LeCun's stated target for AI research has always been ASI (Artificial Superintelligence), not AGI (Artificial General Intelligence).”

tweet / @ylecun / Jul 2

The DeepSeek Moment Is Rewriting AI Talent Culture Around Openness

The AI talent competition is increasingly being shaped not just by compensation, but by a cultural divide between open and closed research philosophies. Meta has emerged as the standard-bearer for open science and open source AI in the U.S. — via publications and Llama — while OpenAI and Anthropic remain notably closed. Yann LeCun's endorsement of this framing signals that the migration of top talent to Meta may reflect a values-driven realignment post-DeepSeek, with implications for innovation velocity, scientific transparency, and AI safety.

open-source-aiai-talentmeta-aillamaopenaiai-industryopen-science

“Meta is the leading American AI lab in open science (publications) and open source (via Llama), while OpenAI and Anthropic are not open.”

tweet / @ylecun / Jul 2 / failed

Exactly.

tweet / @ylecun / Jul 1

LeCun Questions Compressor Interpretability of Real-World Data

Yann LeCun critically assesses the notion that 'compressibility' alone is a sufficient or unbiased metric for understanding real-world sequences. He suggests that the observed ease with which models compress select real-world data might be an artifact of how these sequences are chosen and represented, rather than an inherent, universal property. LeCun implies a potential bias in the selection of data that appears compressible, which could distort our understanding of true data complexity.

machine-learning-theorycompression-theoryai-philosophymodel-complexitydeep-learning

“The concept of 'compressibility' does not provide insight into the small fraction of sequences that are genuinely compressible.”

tweet / @ylecun / Jul 1

Yann LeCun comments on AI observation

Yann LeCun, a prominent AI researcher, has acknowledged a perceived level of observation or scrutiny related to his activities. This suggests a growing awareness or concern within the AI community regarding privacy or monitoring. The nature and extent of this observation remain unspecified.

social-media-trendsonline-safetyprivacy-concernsinfluencer-culturecommunity-moderation

“Yann LeCun is experiencing observation.”

tweet / @ylecun / Jul 1

The Comprehensibility of the World and Inductive Bias

Yann LeCun comments on the inherent comprehensibility of the world, echoing Einstein's sentiment. He suggests that this comprehensibility is facilitated by brains possessing an inductive bias, enabling them to understand their localized environment. This implies a fundamental relationship between cognitive architecture and the nature of reality's perceived order.

philosophycognitionai-reasoninginterpretabilityhuman-intelligence

“The comprehensibility of the world is a notable characteristic.”

tweet / @ylecun / Jul 1

LeCun Skeptical of LLM Path to AGI

Yann LeCun, a prominent AI researcher, expresses strong skepticism regarding the potential of Large Language Models (LLMs) to achieve Artificial General Intelligence (AGI) or Artificial Super Intelligence (ASI). He sarcastically suggests directing inquiries about LLMs leading to advanced AI to the Chief AI Officer of his purported employer, highlighting his dismissive stance on the topic. The interaction indicates a divergence of opinions within the AI community concerning the architectural pathways to achieving AGI.

llm-limitationsllm-safetyai-policyagi-capabilities

“Yann LeCun is skeptical that Large Language Models (LLMs) can lead to Artificial General Intelligence (AGI) or Artificial Super Intelligence (ASI).”

tweet / @ylecun / Jul 1

Yann LeCun's Role at Meta Remains Unchanged Since 2018

Yann LeCun, Chief AI Scientist at Meta, has maintained the same position since 2018. This indicates stability in his role within the company's AI leadership over a multi-year period. The duration suggests a consistent strategic direction or a sustained focus on his contributions in this capacity.

ai-leadershipmeta-aiindustry-newsyann-lecun

“Yann LeCun is the Chief AI Scientist at Meta.”

tweet / @ylecun / Jul 1

LeCun Rejects AGI, Champions ASI as Foundational AI Goal

Yann LeCun, a prominent AI researcher at FAIR, distinguishes between Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI). He contends that AGI, defined as 'human-level AI,' is a poorly conceived notion due to the specialized nature of human intelligence. Conversely, LeCun asserts that ASI represents a valid and consistent long-term objective for AI development, a goal he and FAIR have consistently pursued.

artificial-intelligenceai-researchfair-aiyann-lecunai-aspirationsagi-vs-asi

“Artificial Superintelligence (ASI) is a meaningful and consistent long-term objective for AI development.”

paper / ylecun / Jun 26

Whole-Body Pose-Conditioned Egocentric Video Prediction as a Path to Embodied World Models

PEVA (Predict Ego-centric Video from Actions) trains an autoregressive conditional diffusion transformer to simulate first-person visual consequences of human body actions, conditioned on relative 3D kinematic pose trajectories structured by joint hierarchy. The model is trained on Nymeria, a large-scale real-world egocentric video + body pose dataset, grounding predictions in physically realistic human motion. A hierarchical evaluation protocol stress-tests the model across increasingly complex embodied prediction and control tasks. This represents a direct push toward world models that understand how physical human actions causally shape perceived environments — a key milestone for embodied AI.

video-predictionegocentric-visionembodied-aidiffusion-modelscomputer-visionbody-pose-estimationworld-models

“Conditioning video prediction on full-body 3D kinematic pose trajectories — structured by joint hierarchy — enables a model to learn how physical actions shape egocentric visual experience.”

paper / ylecun / Jun 11

V-JEPA 2: Self-Supervised Video Models for Embodied AI

V-JEPA 2 is a self-supervised learning framework that leverages massive internet-scale video data alongside limited interaction data to enable robust understanding, prediction, and planning capabilities in AI. The architecture demonstrates strong performance across diverse tasks, from motion understanding and human action anticipation to multimodal video question-answering. Crucially, its extension, V-JEPA 2-AC, facilitates zero-shot robotic planning, showcasing the potential for real-world physical world interaction without extensive task-specific training.

self-supervised-learningvideo-modelsroboticsworld-modelsaction-predictionlarge-language-modelscomputer-vision

“V-JEPA 2, a self-supervised video model, achieves high performance in motion understanding.”

blog / ylecun / Jun 11

Scaling Self-Supervised Video Pre-training for Zero-Shot Robotic Planning

V-JEPA 2 is a self-supervised joint-embedding-predictive architecture pre-trained on 1M+ hours of internet video to master physical world understanding and prediction. By post-training as an action-conditioned world model (V-JEPA 2-AC) with minimal robot interaction data, it enables zero-shot robotic planning and manipulation. The model further demonstrates high-tier video-language alignment capabilities when integrated with an 8B parameter LLM.

roboticsself-supervised-learningvideo-modelsworld-modelsrobot-planningcomputer-visionai-research

“V-JEPA 2 achieves 77.3 top-1 accuracy on Something-Something v2 and 39.7 recall-at-5 on Epic-Kitchens-100.”

youtube / ylecun / May 30 / failed

Yann LeCun: We Won't Reach AGI By Scaling Up LLMS (Big Technology Podcast)

paper / ylecun / May 26

World-Model-Guided Trajectory Generation Unlocks One-Shot Robot Imitation on Unseen Tasks

OSVI-WM addresses a critical gap in one-shot visual imitation learning: generalizing to unseen tasks that are visually similar to training tasks but require semantically distinct responses. The framework uses a learned world model to predict latent state-action trajectories from a single expert video and the agent's current observation, then decodes these into physical waypoints for execution — bypassing the need for fine-tuning. Evaluated across two simulated benchmarks and three real-world robotic platforms, it consistently outperforms prior methods, with gains exceeding 30% in some cases.

imitation-learningworld-modelsroboticsone-shot-learningvisual-learningtrajectory-generationgeneralization

“OSVI-WM achieves over 30% improvement in success rate compared to prior one-shot visual imitation methods on certain benchmark tasks.”

paper / ylecun / May 21

LLMs Over-Compress Meaning: How Artificial and Human Conceptual Representations Fundamentally Diverge

Using an Information Bottleneck framework across 40+ LLMs, this paper by Shani et al. (incl. LeCun & Jurafsky) finds that while LLMs broadly match human category boundaries, they aggressively over-compress representations — achieving near-optimal information-theoretic efficiency at the expense of the contextual nuance humans deliberately preserve. Encoder models outperform larger decoder models in alignment with human conceptual structure, implying that language understanding and generation recruit fundamentally different representational mechanisms. Training dynamics follow a two-phase pattern: rapid concept formation, then architectural reorganization where semantic processing migrates from deep to mid-network layers as encodings become sparser. The core implication is that human-like understanding may require models that intentionally retain representational "inefficiencies."

llm-representationsinformation-bottleneckcognitive-sciencesemantic-compressionhuman-ai-comparisonlanguage-model-embeddingsconceptual-categories

“LLMs achieve more optimal information-theoretic compression than humans, but at the cost of semantic richness and fine-grained distinctions.”

youtube / ylecun / Apr 29

Yann LeCun: The Future of AI Beyond LLMs

Yann LeCun, Turing Award laureate and Meta's Chief AI Scientist, discusses the evolution of deep learning, highlighting the critical need for AI systems to move beyond current large language models (LLMs) to achieve true intelligence. He advocates for architectures capable of understanding the physical world, planning, and reasoning with abstract mental representations, moving towards what he terms "Advanced Machine Intelligence" (AMI) rather than Artificial General Intelligence (AGI). LeCun emphasizes that breakthroughs in understanding the physical world will render current LLMs obsolete within five years, urging academia to focus on these next-generation AI systems due to the massive industry investment in LLMs.

ai-pioneersdeep-learning-historyneurological-networksllm-critiqueagi-discussionai-futureacademic-industry-gap

“Current AI systems, particularly LLMs, lack common sense and understanding of the physical world, making them less intelligent than animals.”

youtube / ylecun / Apr 23

Overcoming AI’s Foundational Limitations: A Call for Open Research and Physical World Understanding

Yann LeCun, Meta's Chief AI Scientist, argues that current AI, particularly large language models (LLMs), lack genuine intelligence due to limitations in reasoning, planning, and understanding the physical world. He advocates for open-source AI development to promote diversity and counter proprietary biases, and emphasizes self-supervised learning with novel architectures like Joint-Embedding Predictive Architecture (JEPA) as critical for future breakthroughs in AI.

ai-policydeep-learningllmsneural-networkscomputer-visionreinforcement-learningai-ethics

“Current LLMs are fundamentally limited in their reasoning and planning capabilities, and do not possess human-like intelligence.”

youtube / ylecun / Aug 31

Yann LeCun: Align AI Objectives Like Human Laws, Build World Models via Self-Supervised Learning for Reasoning and Autonomy

Yann LeCun equates AI value alignment to millennia-old human legal systems that shape objectives via constraints, dismissing HAL 9000-style misalignment as solvable through hardwired ethical rules akin to the Hippocratic Oath. Deep learning succeeds empirically despite textbook warnings due to biological inspiration from brains, emphasizing gradient-based learning over discrete logic for reasoning, which requires working memory, recurrence, and energy-based planning. Self-supervised learning via predictive world models enables rapid, sample-efficient intelligence like human babies, forming the foundation for model-based RL, causal inference, and autonomous systems grounded in reality rather than pure language.

yann-lecundeep-learningai-safetyneural-networksself-supervised-learningagi-debateai-alignment

“AI value misalignment is not a novel problem but a continuation of designing human objective functions through laws and education.”

blog / ylecun / May 15

Latent Relational Graphs for Transfer Learning

Traditional deep transfer learning primarily focuses on transferring unary feature vectors. This research explores learning and transferring latent relational graphs that capture dependencies between data units (e.g., words, pixels) from unlabeled data. This approach demonstrates improved performance across various downstream tasks, including natural language processing and image classification, and is transferable to different embedding types and even embedding-free units.

deep-learningtransfer-learninggraph-neural-networksnlpcomputer-visionrepresentation-learning

“Modern deep transfer learning approaches mainly transfer unary feature vectors.”

blog / ylecun / Sep 10 / failed

Forecasting Convolutional Features Outperforms Baselines for Future Instance Segmentation

Predicting future instance segmentation is achieved by forecasting fixed-sized convolutional features from Mask R-CNN rather than RGB frames or semantic maps. The detection head of Mask R-CNN is applied to these predicted features to generate instance masks for future frames, handling variable object counts efficiently. This method significantly surpasses baselines using optical flow and adapted segmentation architectures.

video-forecastinginstance-segmentationmask-r-cnnfuture-predictioncomputer-visionyann-lecuneccv-paper

“Forecasting at the semantic level is more effective than forecasting RGB frames followed by segmentation for semantic segmentation of future frames.”