Chronological feed of everything captured from AI at Meta.
Solo is a physical AI inference platform that deploys ensembles of locally fine-tuned Llama models on edge devices — including Raspberry Pis and smartphones — to serve populations without reliable internet connectivity. Rather than relying on a single large model, Solo orchestrates multiple specialized models on-device, targeting agriculture, healthcare, and education for maximum social impact. In healthcare, the platform enables differential diagnosis support and automated patient reporting in rural areas, effectively functioning as a "small hospital in a box." The platform's viability is entirely dependent on Llama's open-source licensing, which allows on-device ownership, experimentation, and offline deployment.
Meta's Fundamental AI Research (FAIR) team has announced three concurrent releases targeting distinct scientific frontiers: atomic-scale molecular modeling, scalable generative model training via scalar rewards, and neuroscientific mapping of language development. The Open Molecules 2025 dataset paired with a Universal Model for Atoms aims to accelerate materials and drug discovery, while the "Agent Sampling" algorithm enables generative model training without reference data. A large-scale brain study conducted with Rochild Foundation Hospital draws structural parallels between language emergence in developing brains and large language models, potentially informing both AI architecture and neuroscience.
Children's brains (ages 2-5) exhibit decodable language representations from natural speech, detected via intracranial electrodes in epilepsy patients, which grow more complex with age. Llama 3 training induces representational geometries that align progressively with adult brain patterns and early childhood stages. This convergence demonstrates LLMs capture developmental trajectories of human speech comprehension beyond surface mimicry.
Meta's Open Molecules 2025 provides over 100 million DFT calculations, forming the largest and most diverse dataset covering biomolecules, metal complexes, electrolytes, and small molecules. The accompanying universal model for atoms, trained on over 30 billion atoms, sets a new standard for ML-based modeling of atomic interactions in molecules and materials. These tools enable breakthroughs in energy storage, disease treatment, and climate mitigation through enhanced molecular property prediction.
DINOv3 advances self-supervised learning on images at unprecedented scale, yielding universal vision backbones with rich, dense features that exhibit high self-similarity and consistency across time, objects, and style changes. These features enable zero-shot tasks like segmentation and tracking with minimal annotations, powering top performance across diverse vision applications. The release includes open-source training code, model weights, efficient variants, alternative architectures, and tutorials, plus a specialized backbone for satellite imagery.
KunLunBaizeRAG is a novel reinforcement learning-driven framework improving Large Language Model (LLM) reasoning in complex multi-hop question-answering. It tackles limitations of traditional RAG like retrieval drift and information redundancy by integrating mechanisms such as RAG-driven Reasoning Alignment (RDRA) and Search-Think Iterative Enhancement (STIE). Experimental validations confirm significant performance gains in exact match and LLM-judged scores across multiple benchmarks, demonstrating its robustness and effectiveness.
This study successfully models the complex locomotion of isolated A. gracilipes ants using a hybrid stochastic approach combining active Brownian motion and run-and-tumble dynamics. The model accurately reproduces observed trajectory statistics by identifying reproducible probability distributions for turn angles, run times, and waiting times. This provides a robust framework for predicting ant movement ecology and gaining insights into underlying generative mechanisms and sensory systems.
Meta's VP of Infrastructure Dan Rabinovich outlines a fundamental shift in data center design driven by AI workloads — rack thermal density is scaling from ~30 kW to 500–700 kW, forcing a transition from air to full-facility liquid cooling. Meta's in-house AI accelerator program (MTIA) is not primarily cost-driven but aimed at co-designing hardware/software for high-value internal workloads like ads ranking and recommendation, where workload-specific optimization yields superior performance-per-TCO. At the semiconductor level, Dennard scaling is effectively dead, shifting the competitive frontier to advanced packaging (chiplets, CoWoS, silicon-on-wafer), which introduces new yield, toolchain, and manufacturing cycle-time challenges at scale.
Meta has developed MSVP (Meta Scalable Video Processor), a custom hardware accelerator purpose-built to handle the full video transcoding pipeline — decode, resize, and multi-format encode — at the scale demanded by Facebook, Instagram, and Messenger. MSVP outperforms traditional software encoders in throughput and quality, and is the first in the industry to embed objective quality metric computation directly in hardware, scoring every encode at scale. As generative AI, AR, and VR content creation accelerates, MSVP is positioned as a foundational infrastructure block for delivering that content to end users.
Meta's Research SuperCluster (RSC) combines latest-generation compute, high-speed interconnects, and fast storage to dramatically compress AI training timelines. The system enables researchers to elastically scale workloads from 8 to 8,000 GPUs, turning multi-month training runs into days. RSC's practical impact is demonstrated by the No Language Left Behind (NLLB-200) project, where a 200-language translation model was trained in ~10 days rather than months. The infrastructure is positioned as a strategic lever for Meta to iterate faster and compete at the frontier of large-scale model development.
Meta is executing a full-stack AI infrastructure overhaul — from custom silicon to data center architecture — driven by AI workloads growing at 1000x every two years. The company has developed two in-house chips (MTIA for ML inference/recommendation and MSVP for video encoding) to maximize performance-per-watt, bypassing GPU generality for domain-specific efficiency. Their Research Supercluster (RSC), with 16,000 GPUs and ~5 exaflops of compute, represents one of the largest AI supercomputers operational today. The core thesis: at Meta's scale (serving ~half of humanity), off-the-shelf hardware is structurally insufficient, and vertical integration of silicon, software, and data center design is the only viable path.
Meta has released SAM 3 (Segment Anything Model 3), a unified model that extends the original SAM's click-based prompting with text and visual prompting capabilities, enabling detection, segmentation, and tracking across both images and videos. The addition of text prompts allows batch segmentation of object categories simultaneously, reducing manual effort. Visual prompting lets users select an object to surface similar ones in the same image, with iterative follow-up prompts for refinement. SAM 3 is already integrated into production Meta products, specifically powering new effects in Instagram's Edits app.
Meta has introduced SAM 3D, a pair of models extending the Segment Anything Model into the 3D domain, enabling geometry and texture reconstruction for any object in a single image — including occluded or non-visible surfaces. A specialized variant focuses on human body reconstruction, generating accurate meshes of body shape and pose even for partially hidden individuals or those in uncommon poses. The system targets practical deployment across robotics, scientific research, and consumer platforms like Facebook Marketplace, and is accessible via the Segment Anything Playground.
SAM Audio is a state-of-the-art model designed for the isolation of specific sounds within complex audio mixes. It leverages text, visual, and span-based prompts to extract distinct elements of speech, music, and general environmental noise.