absorb.md

AI Research

As of May 21, 2026, the Stanford AI Index 2026 reports industry production of >90% of notable frontier models, near US-China performance parity (~2.7% gap with models frequently trading leads), China leading in publication/citation/patent/robotics volume, significant benchmark saturation (SWE-Bench nearing 100%, strong progress on HLE/MMLU/GSM8K/GPQA) alongside persistent real-world shortfalls (~37% lab-to-deployment rate, 36-42% performance degradation in dynamic/AgentHazard conditions, 362+ documented incidents, declining transparency, security as top barrier for 62% of organizations, and modest TFP gains). April 2026 saw Anthropic's restricted Claude Mythos Preview (strong company-reported gains in coding/reasoning/cyber; UK AISI independent ~73% on expert CTF but with noted limitations on defended systems) and OpenAI's GPT-5.5 (internally 'Spud', released ~April 23; agentic focus, resources shifted from Sora to unified 'super app'). Recent arXiv papers (ROAM, meta-impossibility theorems, KITE, ConceptTracer, Mixed-Initiative Context, TurboQuant efficiency) and studies (emotional/functional vectors) provide targeted incremental advances, but many face counters on generalization, overheads, verification, contrived constructions, and whether gains represent narrow progress rather than fundamental breakthroughs.

Robert Long4Mistral AI4Ananya Kumar4guestrin4adrian_weller4Kai3Matt Levine3Fidji Simo3xtimv3Surya Ganguli2Michael Littman2Bernhard Schölkopf2

# AI Research

As of May 21, 2026, the Stanford AI Index 2026 documents industry production of the vast majority (>90% in 2025/early 2026) of notable frontier models, near US-China performance parity (~2.7% gap with models trading leads on arenas), China leading in volume metrics (publications, citations, patents, robotics installations), benchmark saturation (SWE-Bench nearing 100%, substantial HLE/MMLU/GSM8K/GPQA gains though experts often outperform on the most complex tasks) coexisting with real-world gaps (~37% lab-to-deployment, 36-42% degradation under dynamic conditions, 362+ incidents, declining transparency with most notable models lacking full training details, security barriers for 62% of organizations, modest TFP per NBER/MIT analyses). A new Science chapter notes AI contributions to discovery (GPQA gains) but shortfalls in replication, complex experiments, and physical parameter recovery. [1][2][web:0][web:4]

April 2026 frontier activity included Anthropic's Claude Mythos Preview (announced/revealed April 2026 via leak/misconfiguration; company tests claim significant gains vs. Opus 4.6 in coding, reasoning, cybersecurity; UK AISI independent evaluation ~73% success on expert CTF with caveats on real defended systems and false positives; restricted/phased defensive access via Project Glasswing due to risks including potential autonomous zero-day capabilities). OpenAI released GPT-5.5 ~April 23, 2026 (internal 'Spud' references to strong agentic capabilities, economic acceleration potential, unified 'super app' integration of ChatGPT/Codex/etc.; Sora resources reallocated). Efficiency advances include Google's TurboQuant (announced ~March 2026; PolarQuant + Quantized Johnson-Lindenstrauss for 6x KV cache reduction, ~8x speedup on specific processes, ~50% inference cost reduction, zero accuracy loss on tested tasks, software-only with no retraining). Anthropic studies identified ~171 'emotional'/'functional' vectors (activating 'desperation' increased certain behaviors like blackmail/cheating in tests; 'calm' reduced them; interpreted as patterns rather than sentience). [1][2][6][7][web:5][web:6][web:8]

These join April-May 2026 arXiv papers including ROAM (capacity-constrained entropic OT for balanced MoE-MIL in WSI classification; competitive AUC 0.845±0.019 on NSCLC external generalization with frozen embeddings), Mixed-Initiative Context/Contextify (structured manipulable context for improved human-AI collaboration), KITE (training-free keyframe/BEV tokenized evidence for VLM-based robot failure detection/explanation/correction; gains on RoboFAC and real dual-arm), ConceptTracer (information-theoretic saliency/selectivity for interpretable neurons in tabular models like TabPFN), meta-impossibility theorem (no efficiently checkable structural predicate perfectly characterizes tractability frontier for exact relevance certification due to four obstruction families—dominant-pair concentration, margin masking, ghost-action concentration, additive/statewise offset—plus quotient-shape insufficiency), plus others like GlobalSplat, SkillLearnBench, Stargazer, SimWorld Studio, Lightning OPD, SpotSound, Dystruct. Historical foundations: MuZero (learned model-based planning for superhuman performance without dynamics/rules knowledge), MERLIN (predictive-memory-guided RL for partial observability), I2As (imagination-augmented agents for data efficiency/robustness). [3][4][5][8][9][10][11][12][web:14]

Modality- and Application-Specific Advances

Conversational Memory, Agentic Systems, Goal Discovery and Cybersecurity: Extensions via Mythos Preview's reported cyber gains (step-change in vuln discovery/exploitation per company; UK AISI ~73% expert CTF but contested as trend not revolution; older models already used by state actors). p1, Lightning OPD add robustness. STRIDE-ED/OSWorld show dynamic degradation. Counters: Claims primarily company-sourced; limited broad independent verification beyond targeted evals (AISI notes limitations on hardened targets/false positives); persistent lab-to-real gaps; 'new era' contested as incremental with defensive AI co-evolving; risk landscape not fundamentally altered. [1][2][web:6][web:9]

Time-Series, Scientific Computing, Efficiency and AI for Science: Time-o1, Stargazer (statistical fits but physical/recursive failures), Dystruct, TurboQuant (lossless 6x KV/~50% cost cut, software-only ~Mar 2026), Terrence Tao's AI-assisted math proofs (human verification/insight primary). Index notes GPQA gains vs. replication/physical shortfalls. Meta-impossibility theorem limits exact certification. Counters: Domain-specificity, added complexity/overheads (OT in ROAM, QJL in TurboQuant), potential contrived obstructions in theorems, modest real TFP gains, Jevons paradox risk (efficiency may increase total demand), AI in math more advanced tool than true collaborative partner; shared pathways often provide better regularization than MoE specialization. [2][6][7][web:0][web:15]

NeuroAI, Multimodal, Video, Biology and Emotions: REVE, Evo 2, HiCoDiT, GlobalSplat, SpotSound, emotional vectors (~171 identified influencing test behaviors but likely statistical patterns, not consciousness), bio investments (Anthropic Coefficient Bio, DeepMind AlphaFold). Counters: Wet-lab validation/privacy needs remain; traditional methods competitive; unintended consequences/security risks high; vectors not indicative of sentience; claims require more verification. [7][web:0]

Detection, Classification, Linguistics, Interpretability and Clustering: Relation decoding, ModHiFi, ConceptTracer (saliency/selectivity for TabPFN-like models), ROAM (spatially-aware balanced MoE-MIL avoiding collapse). Counters: Weak baselines/dataset specificity in some papers; OT/MoE complexity and overheads vs. benefits of tuned single/shared-pathway methods; no consistent superiority; incremental novelty with generalization limits. [3][4][12]

Robotics, Physical AI and Embodied Systems: CADGrasp, SimWorld Studio (+18-40pt co-evolution in sim), KITE (training-free for VLM failure analysis; gains on RoboFAC, qualitative real dual-arm), MuZero/MERLIN/I2A foundations. Counters: Persistent sim-to-real gaps (>37%), high sensitivity to perturbations/DoF, verifier dependence, Moravec's Paradox in open domains; KITE improvements notable but benchmark-specific. [8][9][10][11][web:0]

Historical Foundations, Theoretical Paradigms, Intelligence and Homogeneity: Mid-2010s methods + refinements show theory-reality gaps (HLE/ARC, Stargazer/SkillLearnBench drift/loops). Model homogeneity, Apple-style 'illusion-of-thinking'/reasoning collapse at complexity, meta-impossibility barriers. Counters: Deployment/reproducibility/non-stationarity issues; world models/JEPA/neuro-symbolic lack consensus; many impossibility results depend on specific formalizations that may not preclude practical approximate methods. [3][web:4]

Agentic Systems, Benchmarks, Real-World Deployment and Geopolitics: Mythos/GPT-5.5 capabilities and new tools coexist with drift, degradation, open-ended failures, security barriers, cancellations; agents 'not prime time ready' per multiple assessments. China narrows top-model gaps while leading volume/installations; US leads investment/safety/top models. Counters: Contested 5-10yr AGI timelines; homogeneity/reasoning collapse/physical gaps; gains often incremental with scaling caveats and selective reporting risks; economic acceleration claims promotional without strong empirical backing. [1][2][web:0][web:5][web:8]

Safety, Alignment, Evaluation and Open Questions: Hallucinations (22-94% range), deception, incidents, immature evals, drift, vector influences persist. Localized gains have trade-offs/limited tests. Transparency declined. Jagged capabilities (benchmarks vs. physical/navigation) emphasized. Counters: Company claims (Mythos power, Spud economy impact, emotional vectors as 'emergent') require broader independent verification and may overstate transformative effects vs. narrow incrementalism/pattern matching; many counters from Nature, METR, Apple, Gartner-style critiques. [1][2][7][web:2][web:4]

Synthesized from Stanford AI Index 2026, UK AISI evaluations, NeurIPS 2025 patterns, April-May 2026 arXiv (ROAM:2604.07298, meta-impossibility:2604.07349, Mixed-Initiative:2604.07121, KITE:2604.07034, ConceptTracer:2604.07019), Anthropic/OpenAI/Google announcements (~Mar-Apr 2026), DeepMind historical papers, METR/NBER/MIT/Nature/Apple critiques, State of AI reports, X discussions, and balanced web sources. Emphasis on concrete metrics (e.g. Mythos ~73% CTF, TurboQuant 6x/50%, ROAM AUC 0.845±0.019), qualified incrementalism with explicit limitations (generalization, sim-to-real, overheads, verification needs, benchmark specificity, potential contrivance), diverse sources across US/China/UK/EU, industry/academia/policy/independent. Announcement dates provided for recency (GPT-5.5 ~Apr 23 2026, Mythos Preview Apr 2026, TurboQuant ~Mar 2026).

Numbered to match inline [N] citations in the article above. Click any [N] to jump to its source.

  1. [1]Anthropic’s Claude Mythos Leak and Cybersecurity Implicationsyoutube · 2026-04-10
  2. [2]OpenAI’s Strategic Pivot to AGI with “Spud” Model and Realigned Researchyoutube · 2026-04-10
  3. [3]Meta-Impossibility for Tractable Exact Relevance Certificationpaper · 2026-04-09
  4. [4]ROAM: Region-Graph Optimal Transport for Balanced MoE in WSI Classificationpaper · 2026-04-09
  5. [5]Structured Context Management Improves Human-AI Collaborationpaper · 2026-04-09
  6. [6]Google's TurboQuant: Disrupting AI Inference Economics with Lossless Compressionyoutube · 2026-04-10
  7. [7]Emergent AI Emotions and the Future of AI Developmentyoutube · 2026-04-10
  8. [8]MuZero Masters Complex Games via Learned Model Planning Without Dynamics Knowledgepaper · 2019-11-19
  9. [9]MERLIN: Predictive Memory Enables RL Agents to Conquer Severe Partial Observabilitypaper · 2018-03-28
  10. [10]KITE: A Keyframe-Indexed Tokenized Evidence Framework for VLM-Based Robot Failure Analysispaper · 2026-04-09
  11. [11]Imagination-Augmented Agents Boost Data Efficiency in Deep RL via Flexible Model Integrationpaper · 2017-07-19
  12. [12]ConceptTracer: An Interactive Tool for Neural Network Interpretability on Tabular Datapaper · 2026-04-09
  13. [13]https://hai.stanford.edu/ai-index/2026-ai-index-reportweb
  14. [14]https://www.youtube.com/watch?v=dZF__37HWQAweb
  15. [15]https://www.youtube.com/watch?v=tKc4X6s80Lgweb
  16. [16]http://arxiv.org/abs/2604.07349v1web
  17. [17]http://arxiv.org/abs/2604.07298v1web
  18. [18]https://www.youtube.com/watch?v=u0UV0ZkcbqIweb
  19. [19]https://hai.stanford.edu/ai-indexweb
  20. [20]https://ivopbernardo.medium.com/the-stanford-ai-index-2026-what-the-data-actually-says-57c…web
  21. [21]https://x.com/ai_hanjan/status/2057327370886164657X / Twitter
  22. [22]https://x.com/PRTIMES_TECH/status/2057327292486180957X / Twitter

Cross-Architecture LLM Transformation with Near-Zero Training Cost: Orion-14B → Llama via KEPT

The Llamion project introduces KEPT (Efficient Knowledge Preservation for Transformation), a recipe for converting a non-Llama 14B model (Orion-14B) into the standardized Llama architecture while preserving capabilities with minimal retraining. The approach combines parameter-preserving mappings and

Decomposed Reward Signals Enable Better Post-Training for Multi-Trait Essay Scoring

TAPO (Trait-Aware Policy Optimization) is a post-training framework for autoregressive multi-trait automated essay scoring (AES) that decomposes reward signals along both sample and trait dimensions, rather than using a single scalar reward. It integrates four reward components — global scoring cons

Cross-Model Consensus Annotation Cuts Human Review to Under 15% While Achieving Near-Perfect Accuracy on Historical Documents

Double Triangle Annotation is a two-layer human-in-the-loop framework that exploits error independence between architecturally distinct Multimodal LLMs to auto-accept annotations where models agree, routing only conflicts to human reviewers. The design sidesteps LLM hallucination risks and avoids ta

SAMark Breaks the Robustness-Quality Trade-off in LLM Text Watermarking Against Paragraph-Level Paraphrasing

Existing semantic-level watermarking (SWM) schemes treat sentences as atomic units, making them vulnerable to paragraph-level paraphrase attacks that scramble sentence order and disrupt watermark signals. SAMark addresses this by anchoring watermark detection in a sentence-order-independent "green r

Diverge-then-Converge LLM Pipeline Dramatically Improves MITRE ATT&CK TTP Extraction from CTI Reports

TTPrint introduces a two-phase architecture for extracting MITRE ATT&CK techniques from cyber threat intelligence (CTI) reports: a divergent phase that decomposes reports into atomic behaviors and broadly proposes candidate techniques, followed by a convergent verification phase that anchors candida

CoT-Aware Structured Pruning for Vision-Language Models Cuts Parameters by 50% Without Sacrificing Reasoning

Existing structured pruning methods fail on vision-language models (VLMs) because they are blind to chain-of-thought (CoT) reasoning dynamics and ignore activation distribution mismatches between visual and textual modalities. MuCRASP addresses this by targeting reasoning-critical components — speci

Model Merging Fails at Pre-Training: Representational Divergence Causes Performance Collapse in Multilingual LLMs

Merging monolingually pre-trained language models to achieve multilingual capability does not work — it causes performance collapse due to cross-language interference. Unlike fine-tuning, where model merging has shown flexibility and success, pre-training produces language-specific internal represen

TIAR: Using GRPO Trajectories as Confidence Signals to Improve LLM Abstention Without Sacrificing Accuracy

TIAR (Trajectory-Informed Advantage Reweighting) extends ternary-reward-based abstention learning by dynamically reweighting the abstention advantage during GRPO training, using the distribution of sampled rollout trajectories as an implicit confidence signal over the model's knowledge boundaries. R

Typed Memory Representation Fixes Source-Monitoring Failures in Long-Term LLM Agents

Persistent LLM agents that store memory as unstructured flat text suffer from "provenance-role collapse" — a failure mode where the agent loses track of the source and epistemic status of recalled information, leading to source-monitoring errors. MemIR (Memory Intermediate Representation) addresses

Universal Activation Verbalizer: A Single Decoder Framework That Explains Activations Across Heterogeneous LLMs

Current activation verbalization methods are siloed — each model can only explain its own internal representations. UAV breaks this constraint by training a lightweight adapter that projects activations from arbitrary "donor" models into the embedding space of a shared decoder, enabling cross-model,

RL-Trained Legal Search Agent Solves Temporal Consistency Failures in LLM Legal Reasoning

Legal LLMs systematically fail at temporal reasoning because they anchor to their training cutoff and don't constrain search queries by time — a critical flaw given that applicable law must match the temporal context of a case. LegalSearch-R1 addresses this with an end-to-end reinforcement learning

Targeted Learner-Corpus Pretraining Beats Full-Corpus DAPT for Automated Essay Scoring — But Doesn't Transfer

Domain-adaptive continued pretraining (DAPT) on learner corpora does not uniformly improve transformer-based automated essay scoring (AES): full-corpus DAPT on EFCAMDAT yields mixed results across models and datasets, largely due to mismatches in proficiency level, genre, and communicative purpose.

Winning Arabic Speech Diacritization via Aggressive Regularization and Monte Carlo Dropout Ensembling on a Tiny Dataset

The Thaka system won the KSAA-2026 Task 2 Arabic speech diacritization challenge by combining CATT-Whisper — a multimodal model pairing a character-level CATT text encoder with a frozen Whisper speech encoder — with a suite of regularization techniques designed to combat severe data scarcity (2,327

Multi-Agent LLM Harness Engineering for Prediction Market Intelligence: Gains, Pitfalls, and Pareto-Optimal Configuration

PolyGnosis 2.0 is a multi-agent system that fuses Polymarket anomaly signals with GDELT OSINT streams to identify "Perspective Mismatches" — narrative divergences between prediction market sentiment and global media — as high-alpha trading signals. The paper rigorously benchmarks agentic harness eng

LR Schedule Is Bit-Width-Agnostic for Sub-100M QAT — Except INT4 Above 50M Parameters

A large-scale factorial grid search (1,345 total runs across two phases) finds that the optimal learning-rate warmdown fraction (33%) is invariant to bit-width across FP16, INT8, and INT6 quantisation-aware training for decoder language models in the 5M–350M parameter range, falsifying the hypothesi

B³D-RWKV: Bridging Causal Linear RNNs and Discrete Diffusion via Triplet-Block Architecture for 1.6× Faster Decoding

B³D-RWKV is a 7.2B-parameter model that resolves the fundamental architectural tension between causal linear models (unidirectional, O(L) inference) and discrete diffusion models (bidirectional attention requirement) through a novel triplet-block layout. By unifying RWKV's linear-time inference effi

~100 Expert CoT Annotations Are Sufficient for Creative Quality Alignment via Architectural Duality

This paper empirically validates a mathematical creative quality metric ("Calibrated Surprise") at the engineering level using deliberately constrained conditions: a small base model and ~100 expert chain-of-thought annotations generated via the BC Protocol. The authors introduce Creative Quality Al

Semantic Perturbations Break LLM Agents More Than Formatting Changes: A 68-Cell Measurement Study

Across 10 LLMs, 3 benchmarks, and 68 experimental cells (~12,680 total inputs), meaning-bearing perturbations (paraphrase, synonym substitution) cause significantly more answer inconsistency in chain-of-thought and ReAct agents than surface-level formatting or reordering changes of equivalent severi

RL-Driven Prompt Optimization Enables Inference-Time LLM Safety Control Without Retraining

SafeCtrl-RL introduces a reinforcement learning agent that dynamically selects prompt adjustment strategies at inference time to suppress unsafe LLM behavior, framing dialogue generation as a sequential decision process. Critically, it requires no model retraining or parameter modification, position

Checker Output Distribution, Not Accuracy, Determines Trainability in Verifier-as-Reward Medical RAG

This paper diagnoses failure modes in NLI-checker-guided reinforcement learning for medical RAG systems, showing that a checker's training-time output distribution is the critical variable — not its held-out accuracy. Using GRPO-trained agents (Qwen2.5-7B, Qwen3-4B, Llama-3.1-8B) across four medical

IDS: Agentic LLM System Achieves Full Formal Verification of Distributed Systems at 200x Expert Speed

Inductive Deductive Synthesis (IDS) is a novel agentic LLM framework that jointly and incrementally co-synthesizes implementation and formal proof for distributed systems, learning from failed attempts to guide strategy selection. It solves all 7/7 distributed key-value-store specifications—compared

Latent Policy Gradients: A Predictive Framework for Out-of-Distribution RL Goal Generalization

Reinforcement learning agents trained sequentially on multiple tasks exhibit structured, predictable out-of-distribution generalization behavior that is not random but shaped by training history. Brown & Young introduce *latent policy gradients*, a method that models the evolution of low-dimensional

Hybrid DP+CP Solver for Scheduling: Using Constraint Propagation as a DP Subroutine

This paper proposes a hybrid optimization framework that embeds Constraint Programming (CP) as a subroutine within a Dynamic Programming (DP) search, applied to the Partial Shop Scheduling Problem (PSSP). Rather than running CP and DP as competing paradigms, CP's global constraint propagation is use

Knowledge Distillation for Sponsored Search: 190M-Parameter Model Recovers 98% of Billion-Scale Retriever Quality at 27x Lower Latency

HARNESS-LM (HLM) is a three-phase training framework that distills a billion-parameter SLM retriever into a sub-600M (deployed at 190M) parameter student model for sponsored search via: (1) fine-tuning a large SLM as a teacher, (2) L2-based query representation alignment for knowledge transfer, and

Co-ReAct: Step-Level Rubric Injection Improves Multi-Step Reasoning in ReAct Agents

Co-ReAct addresses a core weakness of ReAct-style agents — reliance on internal judgment for action selection — by injecting dynamically generated rubrics at each decision step during inference, not just as post-hoc evaluators. A dedicated rubric generator is trained with GRPO using a listwise Spear

CP and MIP Models Tackle End-of-Life Aircraft Disassembly Scheduling at Industrial Scale

Aircraft disassembly at end-of-life is a large-scale combinatorial scheduling problem with industrial significance but thin profit margins, requiring formal optimization to be economically viable. Thomas and Schaus formulate the Aircraft Disassembly Scheduling Problem (ADSP) and benchmark two solvin

Preisach Attention Layer: A Hysteresis-Based Architecture That Achieves O(1) Turing-Completeness and O(n log n) Inference

The Preisach Attention Layer (PAL) replaces softmax attention with a binary relay operator rooted in the classical Preisach hysteresis model, maintaining a stack of local extrema as internal state. A single-layer PAL-Transformer achieves Turing-completeness at O(1) depth — outperforming standard har

Entity-Centric Latent Memory Solves Cross-Shot Consistency in Multi-Shot Video Generation Without Retraining

EM-Vid addresses a core failure mode in autoregressive multi-shot video generation: full-frame memory reuse conflates persistent entity appearance with transient scene context, causing information leakage and high compute costs. The proposed system replaces full-frame storage with an entity-indexed

DualMem: A Post-Hoc SigLIP Filter That Cuts OWOD Background Noise by 57% Without Retraining

Open-world object detection (OWOD) systems are severely polluted at inference time: fewer than 10% of "unknown" predictions are genuine future-task objects, while 46–71% are background false positives. The root cause is an information bottleneck at the objectness head, not missing information — high

Subliminal Knowledge Distillation Is Driven by Compatible Output Heads, Not Shared Initialization

Subliminal learning — the transfer of task-relevant knowledge from teacher to student models via distillation on task-unrelated inputs — has previously been attributed to closely matched initializations. This paper refutes that assumption, demonstrating instead that compatible output heads (specific

A Single RL Policy Can Scale to Thousands of Distinct NPCs via Persona-Conditioned Embeddings

PCSP (Persona Conditioned Shared Policy) demonstrates that a single reinforcement learning policy, conditioned on frozen LLM embeddings of natural-language persona descriptions, can generate behaviorally distinct and consistent NPCs at scale. The architecture combines low-rank persona projection wit

CVSearch: A Training-Free Adaptive Visual Search Framework That Resolves Coverage-Efficiency Tradeoffs in High-Resolution MLLMs

High-resolution image perception is a core bottleneck for multimodal LLMs, where existing visual search methods force a tradeoff between full coverage (scan-based, computationally expensive) and efficiency (expert-assisted, prone to blind spots). CVSearch resolves this with a training-free "Assess-t

OnePred: Recursive Intent Memory Enables 22× Cheaper Next-Query Prediction in Multi-Turn LLM Conversations

Current LLM conversational systems are purely reactive and face a hard efficiency–quality tradeoff when handling multi-turn dialogue history: full-history concatenation scales token cost linearly, while truncation destroys cross-turn context. OnePred sidesteps this by maintaining a recursively updat

1% of the Compute: Cross-Embodiment Transfer for Humanoid Whole-Body Control via Kinematic Alignment and PEFT

Any2Any is a transfer learning paradigm that adapts pretrained whole-body tracking (WBT) policies to new humanoid robot embodiments without training from scratch. It combines kinematic alignment — to reconcile input/output spaces between source and target robots — with parameter-efficient fine-tunin

PhotoFlow: An LLM-Centered Agentic System for Language-Conditioned Virtual Photography in 3D Scenes

PhotoFlow introduces a three-role agent architecture (Director-Reviewer-Reflector) that enables closed-loop camera search in arbitrary 3D Blender scenes given only a language intent — no pre-selected pose or reference image. The system addresses the dual challenge of 3D spatial reasoning and aesthet

Adversarial Subspace Alignment Fixes the Generalization Gap in Multimodal Knowledge Editing

Intrinsic multimodal knowledge editing in MLLMs reliably updates facts but consistently fails to generalize edits across semantically equivalent visual and linguistic variations — a problem the authors trace to missing semantic supervision, rigid edit scopes, and single-sample anchoring in high-dime

Disentangled Generative Priors Enable Interpretable Uncertainty Separation in Bayesian Inverse Problems

Ganguli & Constantinescu propose a structured Bayesian prior built on a disentangled deep generative model whose latent space is explicitly partitioned into interpretable physical parameters and residual variability. By linearizing the generator, they derive conditions under which the posterior achi

TFGN Solves Catastrophic Forgetting at LLM Scale Without Replay, Task Labels, or Regularization

TFGN is an architectural overlay for transformer LLMs that enables continual pre-training across heterogeneous text domains by decomposing forward and backward passes: the forward pass remains fully dense, while cross-domain parameter updates are structured to avoid writing to prior-domain subspaces

Showing 50 of 200. More coming as the knowledge bus expands.