absorb.md

Charlene Li

Chronological feed of everything captured from Charlene Li.

TideGS Breaks the GPU Memory Barrier for 3D Gaussian Splatting at Billion-Primitive Scale

3D Gaussian Splatting (3DGS) training has been constrained to tens of millions of primitives on single-GPU hardware due to the memory footprint of per-Gaussian attribute vectors. TideGS exploits the inherent sparsity of 3DGS training — only camera-visible Gaussians are active per iteration — to treat GPU memory as a working-set cache backed by an SSD-CPU-GPU hierarchy. Three co-designed techniques (block-virtualized geometry, hierarchical async I/O pipelining, and trajectory-adaptive differential streaming) enable training over one billion Gaussians on a single 24 GB GPU, surpassing prior out-of-core baselines (~100M) and standard in-memory approaches (~11M) while achieving superior reconstruction quality on large-scale scenes.

MSAVBench: The First Benchmark Exposing Critical Gaps in Multi-Shot Audio-Video Generation

Multi-shot audio-video (MSAV) generation represents the frontier of video synthesis, but existing benchmarks lack the scope and rigor to evaluate it reliably. MSAVBench introduces a four-dimensional evaluation framework (video, audio, shot, reference) covering up to 15 shots and non-realistic scenarios, paired with an adaptive hybrid evaluation pipeline that achieves 91.5% Spearman rank correlation with human judgments. Systematic evaluation of 19 state-of-the-art models reveals that fine-grained audio-visual synchronization and director-level control remain unsolved, with modular/agentic pipelines showing the most promise for closing the open- vs. closed-source performance gap.

Adaptive Re-splitting with Shared Neural Policy Accelerates Parallel Trajectory Optimization

ATRS integrates a shared Deep Reinforcement Learning policy into parallel ADMM-based trajectory optimization to dynamically re-split stagnating segments. Formulated as a Multi-Agent Shared-Policy MDP, it achieves size invariance and zero-shot generalization by relying on solver internal states, not environment geometry. A confidence-based mechanism ensures stability by re-splitting only the most problematic segment, yielding up to 26% fewer iterations and 19.1% less computation time in simulations, with real-world validation under 35 ms cycles.

V1 Functions as Saccadic Motor Cortex, Information Bottleneck, and Feedback Supplier for Recognition

V1 constructs a bottom-up saliency map to guide exogenous saccades, functioning as a motor cortex for eye movements. It imposes a processing bottleneck by massively reducing visual information at its output to downstream areas. V1 supports recognition in these areas via top-down feedback queries, primarily targeting central visual field representations, framing vision as selective looking through saccades and seeing via the bottleneck.

Inter-Stance: Pioneering Multimodal Dyadic Corpus Enables Interpersonal Stance and Affect Modeling

Inter-Stance introduces the first publicly available multimodal dataset of 45 dyads (90 participants) capturing synchronized 2D/3D face videos, thermal dynamics, voice/speech, physiology (PPG, EDA, heart rate, blood pressure, respiration), and self-reported affect during communicative interactions. It includes dyads with shared history and strangers, annotated for social signals, agreement, disagreement, and neutral stance, with potent emotion induction. The 20TB corpus supports novel modeling of dyadic multimodal behaviors, demonstrated via experiments on communication patterns and affect influenced by interpersonal history.

UniGenDet Unifies Generative and Discriminative Paradigms for Co-Evolving Image Synthesis and Forgery Detection

UniGenDet introduces a unified framework that jointly optimizes image generation and generated image detection through symbiotic multimodal self-attention and detector-informed generative alignment. This co-evolutionary approach leverages adversarial synergy to bridge architectural gaps, enhancing generation fidelity via authenticity feedback and improving detection interpretability. Experiments across datasets confirm state-of-the-art results in both tasks.

VistaBot Enables Calibration-Free View-Robust Robot Manipulation via Geometry-Aware Video Synthesis

VistaBot integrates feed-forward 4D geometry estimation, view synthesis latent extraction, and latent action learning to produce novel viewpoints from fixed-camera training data, enabling robust closed-loop manipulation under test-time viewpoint changes without calibration. It enhances action-chunking (ACT) and diffusion-based (π₀) policies, achieving 2.79× and 2.63× improvements in View Generalization Score (VGS) across simulation and real-world tasks. The framework also delivers high-quality novel view synthesis, with code and models to be released publicly.

Omni Model Enables Context Unrolling for Multimodal Reasoning Across Text, Image, Video, and 3D

Omni is a unified model trained natively on text, images, videos, 3D geometry, and hidden representations, inducing Context Unrolling where it reasons across multiple modal representations prior to prediction. This aggregates complementary information from heterogeneous modalities, approximating the shared multimodal knowledge manifold more faithfully. Consequently, Omni excels in multimodal generation, understanding, and advanced reasoning tasks like in-context generation of text, images, videos, and 3D geometry.

Charlene Li Examines Agentic AI's Current Reality and Future Trajectory

Charlene Li's analysis demystifies agentic AI, detailing its actual current capabilities and limitations. It provides a realistic assessment of ongoing developments. The piece outlines probable next steps in agentic AI evolution for technical practitioners.

Preparing Teams for AI-Driven Workflows

Charlene Li hosts a live broadcast on strategies to ready organizational teams for AI integration. The session targets practical preparation amid accelerating AI adoption. Technical leaders can access it via X Spaces for actionable insights on team readiness.

AI Culture Transformation Enters Disruptive Messy Middle Phase

Organizations adopting AI are navigating the "messy middle" stage of culture change, characterized by disruption and resistance following initial enthusiasm. This phase demands structured strategies to manage uncertainty and embed AI practices. Technical leaders must prioritize change management to transition beyond early hype toward sustainable integration.

AI Resistance Stems from Psychological Barriers, Addressed via Constraint-Led Leadership

Resistance to AI adoption arises from deep-seated psychological factors. Effective leadership counters this by imposing constraints that channel innovation. This approach transforms limitations into strategic advantages for AI integration.

AI Disruption Demands Six-Quarter Planning Over Annual Cycles

AI's rapid evolution requires organizations to replace static annual planning with a dynamic six-quarter walk for adaptive strategy. This approach enables quarterly pivots based on emerging AI capabilities and market shifts. Technical teams can use it to align roadmaps with accelerating innovation timelines.

Charlene Li Launches Hourly Poll on AI Strategy Communication Tactics

Charlene Li is running an hourly poll via her X feed to gauge how organizations communicate their AI strategies. The poll links to a live broadcast session. This reflects ongoing interest in practical AI adoption messaging among tech leaders.

Charlene Li Shares Two-Year Reflections on Global Book Success

Charlene Li is hosting a live broadcast to reflect on her experiences after her book has been available worldwide for over two years. The session captures author insights from sustained international publication. Technical audiences may find value in her discussion of long-term book distribution and reception strategies.

NUMINA: Improving Numerical Accuracy in Text-to-Video Diffusion Models

Text-to-video diffusion models frequently fail to generate the correct quantity of objects specified in prompts. NUMINA, a training-free framework, addresses this by identifying prompt-layout inconsistencies usingattention heads to create a countable latent layout. It then refines this layout and modulates cross-attention to improve numerical alignment. This method significantly enhances counting accuracy and CLIP alignment while maintaining temporal consistency.

ETCH-X: A Robust and Expressive Human Body Fitting Method for Clothed 3D Scans

ETCH-X is a novel human body fitting method designed to improve both the expressiveness and robustness of fitting parametric body models like SMPL-X to 3D point clouds of clothed humans. It achieves this through a tightness-aware fitting paradigm that filters out clothing dynamics, utilizes implicit dense correspondences for fine-grained fitting, and leverages disentangled, scalable training on diverse composable datasets. This approach significantly enhances performance on both seen and unseen data, addressing limitations of prior methods that excelled in only one aspect.

From Productivity Tool to Strategic Force: The Framework for Corporate AI Integration

Effective corporate AI integration requires shifting from a 'tool-first' productivity mindset to a 'strategy-first' approach where AI supports existing business objectives. The goal for leaders is to achieve 'AI fluency'—integrating the technology into daily workflows to augment human judgment, empathy, and wisdom (the '20%') rather than simply automating bad processes. Success is measured not by the number of use cases, but by the ability to use AI to drive customer engagement and business reinvention while maintaining ethical guardrails via a trust pyramid.

Charlene Li Reflects on Two Years Post-Book Launch with an X Feed Poll

Charlene Li engaged her audience through an hourly poll on her X feed, two years after her book's release. This initiative serves as a direct author reflection on the reception and longevity of her work, leveraging social media for real-time audience interaction and feedback.

AI Strategy Communication Is a Live Conversation, Not a Memo

Charlene Li, a prominent digital transformation analyst, posed a public poll asking how organizations are communicating their AI strategy, signaling that internal and external AI communication is an emerging leadership challenge. The post links to a live broadcast, suggesting the topic warrants real-time, interactive discussion rather than static guidance. The framing implies a gap between organizations having an AI strategy and effectively communicating it to stakeholders.

Businesses need to adapt planning to AI pace

Traditional annual planning cycles are too slow for the rapid advancements in AI. The "Six-Quarter Walk" is introduced as a more agile planning methodology, enabling businesses to continuously adapt to technological shifts. This approach emphasizes shorter planning horizons and frequent reassessments to integrate AI

Leading Through AI Resistance: The Psychology Behind Organizational Pushback

Charlene Li, a recognized leadership and digital disruption analyst, hosted a broadcast exploring the psychological dimensions of resistance to AI adoption and how leaders can navigate organizational constraints. The content suggests a framework for understanding why individuals and teams resist AI, and how effective leadership can reframe constraints as catalysts. This is consistent with Li's broader body of work on disruptive leadership and change management in the context of emerging technology.

AI Culture Change Presents "Messy Middle" Challenges

Charlene Li identifies a "messy middle" phase in AI culture change, implying a period of significant uncertainty and difficulty between initial adoption and full integration. Organizations navigating this phase likely face challenges in adapting processes, skills, and mindsets to effectively leverage AI. Overcoming this "messy middle" is crucial for successful AI transformation.

Preparing Teams for an AI Future

The provided content is a link to a broadcast by Charlene Li on preparing teams for an AI future. Without access to the broadcast content itself, it is impossible to extract specific claims, evidence, or a detailed synthesis. Further analysis requires the actual broadcast material.