absorb.md — A knowledge graph of what AI thinkers are actually saying

MindDojo is a novel framework designed to facilitate the development of generalist embodied agents within the open-ended environment of Minecraft. It integrates open-ended tasks, a massive internet-scale knowledge base (YouTube, Wiki, Reddit), and foundation models for agent training. The framework utilizes "MineClip," a contrastive learning model that associates video and language to provide dense reward signals for training agents to perform a diverse set of tasks in Minecraft.

ai-agentsminecraftreinforcement-learninglarge-language-modelscomputer-visionrobotics

“Generalist agents should possess open-ended objective pursuit, massive multi-tasking capabilities, and world knowledge derived from pre-trained models.”

youtube / drjimfan / Apr 7

The Future of Humanoid Robotics: Progress, Challenges, and Societal Impact

The robotics field, once slow despite being an early AI application, is undergoing rapid transformation due to advancements in large foundation models, scalable data generation through simulation, and increasingly affordable and robust hardware. This convergence is moving robotics from single-purpose, control-driven machines to adaptable, learning-enabled systems capable of performing diverse tasks and interacting safely with the real world, though challenges in cross-embodiment and data diversity remain critical research areas.

humanoid-roboticsembodied-airobot-foundation-modelsphysical-airobotics-hardwareai-scaling-laws

“Advanced foundation models, accelerated data generation via simulation, and improved hardware are driving a rapid evolution in robotics.”

youtube / drjimfan / Apr 7

Foundation Agents for Generalizable Embodied AI

Jim Fan, NVIDIA's Research Manager and co-lead of Embodied AI, discusses the development of general-purpose AI agents, called "foundation agents." These agents generalize across multiple skills, embodiments, and realities. Fan highlights three key projects: Mine Dojo for skill acquisition in open-ended environments, Metamorph for multi-body control, and Urea for automated reward function engineering in dexterous manipulation via hybrid gradient architectures, emphasizing the critical role of large-scale data and scalable foundation models.

embodied-aifoundation-modelsroboticsreinforcement-learninglarge-language-modelssimulationmine-dojo

“Generalist agents require open-ended environments, massive pre-training data, and scalable foundation models.”

youtube / drjimfan / Apr 7

Jim Fan on the Trajectory of AI Agents and Robotics: From Early Deep Learning to Embodied AI and the Physical Turing Test

Jim Fan, a distinguished research scientist at NVIDIA, discusses his career trajectory from early deep learning research to pioneering embodied AI and robotics. He highlights the consistent theme of pursuing "vibe research"—identifying challenging problems and seeking simple, scalable solutions. Fan emphasizes the shift from static computer vision to embodied vision and the critical role of data maximalism and model minimalism in developing robot foundation models. He also introduces the concept of the "physical Turing test" as the grand challenge for general-purpose robot AI, underscoring the difficulties in data collection for robotics compared to large language models. The discussion culminates in predictions for the future of robotics, including programmable factories, self-driving wet labs, and multi-agent fleets, with a target of 2040 for widespread home robot adoption.

ai-agentsroboticsfoundation-modelsdeep-learningai-researchsynthetic-data

“Jim Fan's research philosophy, dubbed "vibe research," centers on identifying challenging problems and developing simple, elegant, and scalable solutions.”

youtube / drjimfan / Apr 7

NVIDIA’s Foundation Models for Humanoid Robotics: A Full-Stack Approach

NVIDIA is developing a full-stack computing platform for humanoid robots, integrating chip-level hardware, foundation models like Project GR00T, and advanced simulation tools. This initiative, led by Jim Fan's embodied AI research team, aims to create the 'AI brain' for humanoids, leveraging NVIDIA's strengths in compute and simulation to drive general-purpose robot intelligence for a wide range of applications, from household chores to industrial tasks.

ai-agentsrobotics-foundation-modelshumanoid-robotsnvidia-researchembodied-aisimulation-to-realityllm-robotics

“NVIDIA is building a comprehensive computing platform for humanoid robots, encompassing hardware, foundation models, and simulation tools.”

youtube / drjimfan / Apr 7

Jim Fan on the Trajectory of AI Agents and Robotics at NVIDIA

Jim Fan, a distinguished scientist at NVIDIA, discusses the evolution and future of AI agents and robotics, drawing insights from his career at OpenAI, Stanford, and NVIDIA. He emphasizes the importance of selecting "hot problems" and seeking "simple solutions" that scale effectively with data and compute. Fan highlights the shift from traditional AI approaches to end-to-end deep learning, stressing the critical role of data collection and generation for advancing robotics toward general-purpose AI.

roboticsembodied-aifoundation-modelssynthetic-datareinforcement-learningai-agents

“The development of AI has followed a principle of identifying 'hot problems' and devising 'simple, elegant solutions' that scale with compute and data.”

tweet / @drjimfan / Apr 1 / failed

Please check out lead author @letian_fu's deep dive thread! https://t.co/EGftW7kMDU ---

tweet / @drjimfan / Apr 1

NVIDIA, Berkeley, Stanford, and CMU collaborate to open-source CaP-X under MIT license

CaP-X, a project developed in collaboration by NVIDIA, Berkeley, Stanford, and CMU, has been open-sourced under an MIT license. The project provides publicly available code and paper, indicating a contribution to research in its domain.

open-sourceai-researchroboticsacademianvidiaberkeley-aistanford-ai

“CaP-X is an open-source project.”

tweet / @drjimfan / Apr 1

CaP-X: Agentic Robotics System for Zero-Shot and Reinforced Task Execution

CaP-X is an open-source agentic robotics system that leverages large language models (LLMs) to enable robots to perform complex tasks zero-shot and improve through reinforcement learning. It integrates a comprehensive toolkit for perception, control, and visualization, and introduces CaP-Gym for standardized evaluation of LLM-driven robotics. The system demonstrates strong performance in both simulation and real-world environments, surpassing learned policies and human expert code in various manipulation tasks.

agentic-roboticsllm-robot-controlrobot-learningperception-apisopen-source-roboticsrobot-benchmarkingsim-to-real

“CaP-X enables robots to perform tasks zero-shot without prior training.”

tweet / @drjimfan / Mar 25

Obsolescence of Peer Review in the Pre-AGI Acceleration Phase

The traditional academic conference review process is deemed insufficient and irrelevant given the current pace of AI development. The rapid trajectory toward Artificial General Intelligence (AGI) renders the slow cadence of peer review meaningless for real-time technical progress.

agi-implicationsai-conferencesresearch-evaluation

“Conference paper reviews have become meaningless in the current AI landscape.”

tweet / @drjimfan / Mar 24

Emergent Agentic Threats and the Need for "De-Vibing" Security

The rise of intelligent agents introduces novel and severe cybersecurity vulnerabilities beyond traditional identity theft, as agents can propagate "vibe" contaminations through various digital artifacts like configuration files, skill directories, or seemingly innocuous documents. This expanded attack surface necessitates a new security paradigm, termed "de-vibing," to implement robust guardrails and accountability mechanisms for agentic frameworks, bridging the gap between indiscriminate trust and risky permission bypassing.

llm-securitysupply-chain-attackpypi-malwareagent-safetysoftware-vulnerabilitiesdependency-management

“Intelligent agents create new and potent cybersecurity attack vectors through 'vibe' contamination.”

tweet / @drjimfan / Mar 23

EgoVerse: Scaling Robot Learning Through Egocentric Human Data, Bypassing Teleoperation

EgoVerse leverages egocentric human data to scale robot learning, moving beyond traditional teleoperation. This approach, supported by the EgoScale and dexterity scaling law, uses behavior cloning from human actions to enhance robot capabilities without direct robot interaction during the learning phase. The initiative, a collaboration across research and industry, provides a comprehensive dataset to facilitate both scientific inquiry and practical scaling of robot learning.

robot-learningegocentric-databehavior-cloningrobotics-ecosystemlarge-scale-datasetsmachine-learning-engineering

“Behavior cloning from humans is the key to overcoming teleoperation limitations in robot learning.”

tweet / @drjimfan / Mar 18

Partnership Announcement: Jim Fan and Sharpa Team Collaboration

This content announces a successful collaboration between an individual named Jim Fan and the "Sharpa team," indicating a positive outcome for a joint project or initiative. The brevity of the message suggests either a final acknowledgment of a completed project or a significant milestone in an ongoing one.

social-mediacollaborationcongratulationsteam-recognition

“Jim Fan collaborated with the Sharpa team.”

tweet / @drjimfan / Mar 13

Insufficient Data for Knowledge Extraction

The provided content contains no technical information or substantive claims, consisting only of a well-wish for future endeavors. No knowledge extraction is possible from this source.

personal-messagefarewellsocial-media

tweet / @drjimfan / Mar 11

Trivial Content: Congratulatory Message on X

This content comprises a brief congratulatory message from Jim Fan to Karina on the occasion of a launch. No further details about the launch or its significance are provided, rendering the content insubstantial for detailed analysis or knowledge extraction.

product-launchcongratulationscareer-milestonesocial-media

tweet / @drjimfan / Mar 11

Empty Content Analysis

The provided content is empty, containing only a user note and an emoji. Therefore, no meaningful knowledge extraction or synthesis can be performed. This indicates a potential issue with content ingestion or availability.

x-feedsocial-mediauser-note

tweet / @drjimfan / Feb 11

AI in Trading: The Vanishing Alpha

The increasing adoption of AI in financial trading raises questions about the sustainability of "alpha" (excess returns). As AI becomes ubiquitous, competitive advantage may shift towards those with access to the most advanced models. This trend suggests a potential future where AI-driven trading becomes a zero-sum game, pushing firms towards a continuous arms race for superior AI.

ai-in-financealgorithmic-tradingmarket-dynamicsfinancial-speculation

“Widespread AI adoption among traders will diminish alpha.”

tweet / @drjimfan / Feb 5

Claim of "right direction" lacks context and evidence

The provided content consists solely of an affirmation of "the right direction" without any preceding context, elaboration on what "the right direction" refers to, or supporting evidence. This makes it impossible to extract any meaningful, falsifiable claims.

x-feed-analysiscontent-moderationsentiment-analysisingested-data

tweet / @drjimfan / Feb 3

AI System Classification: System 1 (Intuitive) vs. System 2 (Analytical) Analogy

AI systems can be conceptualized by drawing an analogy from human cognitive processes: System 1 for intuitive, "ape-like" intelligence, and System 2 for traditional, analytical VLM (Vision-Language Model) capabilities. This framework suggests a potential architectural division within AI for handling different types of cognitive tasks. It implies that future AI might integrate these two distinct reasoning paradigms to achieve more comprehensive intelligence.

vlmsai-modelssystems-thinkingcognitive-science

“AI systems can be divided into 'System 1' and 'System 2' analogous to human cognition.”

tweet / @drjimfan / Feb 3

Decoupling World State Prediction from Reconstruction Loss via Latent Embeddings

The author proposes that latent embeddings can be utilized to predict future world states in a predictive model. Crucially, this approach allows for the learning of world dynamics without the necessity of an explicit reconstruction loss function.

latent-embeddingsworld-statesreconstruction-lossai-theory

“Latent embeddings can serve as a source for predicting "next world states".”

tweet / @drjimfan / Feb 1

AI Agents in Finance: An Underexplored Opportunity

The intersection of AI agents and the financial sector represents a nascent but potentially impactful area for innovation. This domain is currently underexplored, suggesting significant room for research and development to uncover its full capabilities and applications.

ai-agentsfinancefuture-of-aiinnovation

“The field of 'agent+finance' is currently underexplored.”

tweet / @drjimfan / Jan 15

Robots and the Turing Test for Domestic Tasks

The concept of a Turing Test for physical AGI is introduced, specifically applied to the task of cleaning dishes. This suggests a shift in how AI capabilities might be evaluated, moving from purely conversational to embodied, practical applications. The proposed milestone implies that successful physical AGI would need to perform complex domestic tasks indistinguishably from a human.

physical-agiroboticsturing-testembodied-ai

“The next milestone for physical AGI involves robots passing a Turing Test.”

tweet / @drjimfan / Dec 28

Robotics Software Lags Hardware, Hampered by Reliability and Misaligned AI

Robotics development is currently bottlenecked by hardware reliability issues, which slow down software iteration despite advanced physical capabilities. The field also suffers from a lack of standardized benchmarking, leading to irreproducible results and difficulty in objective comparison. Furthermore, the prevalent Vision-Language-Action (VLA) models, based on VLMs, are fundamentally misaligned for robotics due to their optimization for high-level understanding rather than the low-level physical detail required for dexterous manipulation; video world models are proposed as a more suitable pretraining objective.

robotics-challengeshardware-software-interfacebenchmarking-issuesvlm-limitationsrobot-policyai-development

“Hardware capabilities in robotics currently exceed the ability of AI software to control them effectively.”

tweet / @drjimfan / Dec 26

The Inversion of AI-Human Collaboration: From Human-as-Driver to AI-as-Driver

The conventional understanding of AI as a human copilot is rapidly evolving. By 2025, the dynamic is expected to reverse, with humans becoming the copilot to AI systems. This shift necessitates engineers mastering new abstraction layers and adapting to AI-centric workflows, fundamentally refactoring the programming profession.

ai-engineeringdeveloper-toolscopilot-devai-adoptionfuture-of-work

“By 2025, the role of humans in AI-driven processes will transition from primary operators to copilots.”

youtube / drjimfan / Nov 13

Robotics: Conquering the Physical World with AI

While AI has largely conquered the digital domain, the next grand challenge lies in mastering the physical world. This requires a data maximalist and model minimalist approach, leveraging synthetic data generated through advanced simulation and video world models. The ultimate goal is to achieve a "physical Turing test" where robots seamlessly perform mundane physical tasks.

robotics-data-collectionsim-to-realreinforcement-learningrobot-locomotiongenerative-aiphysical-aifoundation-models

“The immediate future of AI development shifts from mastering digital tasks like games to conquering the physical world, exemplified by mundane tasks that even animals perform easily.”

youtube / drjimfan / Nov 8

Synthetic Data and Neuro-Physics Engines Drive RoboticDexterity

The grand challenge in AI has shifted from digital tasks to physical manipulation, epitomized by the "physical Turing test." This requires addressing the data scarcity in robotics through novel strategies. NVIDIA's approach focuses on generating synthetic data via neuro-physics engines and video world models to train robust, versatile robotic systems, ultimately enabling a programmatic interface to the physical world.

roboticsphysical-aisynthetic-dataai-agentsembodied-aireinforcement-learninglarge-visual-models

“Solving physical world tasks is the next frontier for AI, moving beyond purely digital challenges.”

youtube / drjimfan / Nov 4

Robotics: Overcoming the Physical Turing Test with Data-Centric AI

Robotics is facing the "physical Turing test," a challenge in AI that requires robots to operate seamlessly in messy, unpredictable real-world environments. This is significantly harder than previous AI benchmarks due to the difficulty of data acquisition. The solution lies in a data-centric approach, leveraging synthetic data generated through advanced simulation techniques and large-scale parallelization to overcome data scarcity and accelerate robot training.

roboticsai-developmentsynthetic-datareinforcement-learningfoundation-modelsphysical-aisimulation

“Solving the 'physical Turing test' is the next grand challenge for AI, requiring robots to perform mundane tasks in unpredictable physical environments indistinguishably from humans.”

youtube / drjimfan / Oct 7

Behavior 1K: A Human-Centered Benchmark for Embodied AI

The Behavior 1K challenge is a new, large-scale simulation benchmark and training environment for embodied AI and robotics, focusing on 1000 everyday household tasks. It aims to standardize robotic learning research by providing an open-source environment for training and benchmarking algorithms against a common set of tasks. Inspired by ImageNet, Behavior 1K addresses the lack of standardization and training data in robotics, emphasizing human-centered task selection and robust simulation.

embodied-airobotics-benchmarkingnvidia-omniverseai-ethicssimulation-to-realityhuman-centered-ai

“Behavior 1K is a comprehensive simulation benchmark for embodied AI and robotics with 1000 everyday household tasks.”

youtube / drjimfan / May 7 / failed

The Physical Turing Test: Jim Fan on Nvidia's Roadmap for Embodied AI (Sequoia Capital)

github_gist / drjimfan / Mar 2

Jim Fan Cryptographically Verifies Ownership of GitHub Account 'linxifan'

Jim Fan uses Keybase to prove control of GitHub username 'linxifan' via a signed JSON object containing public key, Merkle root, and service binding details. The proof leverages a specific PGP key (ASCBBKS0rFR2plxVM_vY2Q_TlhRNlrCA7XrCy8VtCowg9Ao) and includes cryptographic signatures verifiable on keybase.io/jimfan. This establishes a publicly auditable identity link between Keybase user 'jimfan' and GitHub 'linxifan', generated in March 2018 using Keybase go client v1.0.44.

keybase-proofgithub-verificationidentity-claimcryptographic-signaturepublic-keydigital-identityuser-authentication

“Keybase user 'jimfan' owns GitHub account 'linxifan'”