Chronological feed of everything captured from Charlene Li.
paper / charleneli / 5d ago
3D Gaussian Splatting (3DGS) training has been constrained to tens of millions of primitives on single-GPU hardware due to the memory footprint of per-Gaussian attribute vectors. TideGS exploits the inherent sparsity of 3DGS training — only camera-visible Gaussians are active per iteration — to treat GPU memory as a working-set cache backed by an SSD-CPU-GPU hierarchy. Three co-designed techniques (block-virtualized geometry, hierarchical async I/O pipelining, and trajectory-adaptive differential streaming) enable training over one billion Gaussians on a single 24 GB GPU, surpassing prior out-of-core baselines (~100M) and standard in-memory approaches (~11M) while achieving superior reconstruction quality on large-scale scenes.
3d-gaussian-splattingout-of-core-optimizationlarge-scale-trainingcomputer-visionmemory-managementneural-renderinggpu-computing
“TideGS enables training of over one billion 3D Gaussian primitives on a single 24 GB GPU.”
paper / charleneli / 5d ago
Multi-shot audio-video (MSAV) generation represents the frontier of video synthesis, but existing benchmarks lack the scope and rigor to evaluate it reliably. MSAVBench introduces a four-dimensional evaluation framework (video, audio, shot, reference) covering up to 15 shots and non-realistic scenarios, paired with an adaptive hybrid evaluation pipeline that achieves 91.5% Spearman rank correlation with human judgments. Systematic evaluation of 19 state-of-the-art models reveals that fine-grained audio-visual synchronization and director-level control remain unsolved, with modular/agentic pipelines showing the most promise for closing the open- vs. closed-source performance gap.
video-generationaudio-video-synthesisai-evalsbenchmarkingmultimodal-aicomputer-visiongenerative-models
“MSAVBench is the first comprehensive benchmark specifically designed for multi-shot audio-video generation evaluation.”
paper / charleneli / 10d ago / failed
paper / charleneli / 10d ago / failed
paper / charleneli / 21d ago / failed
paper / charleneli / 21d ago / failed
paper / charleneli / 21d ago / failed
paper / charleneli / 22d ago / failed
youtube / charleneli / 25d ago / failed
paper / charleneli / 29d ago
ATRS integrates a shared Deep Reinforcement Learning policy into parallel ADMM-based trajectory optimization to dynamically re-split stagnating segments. Formulated as a Multi-Agent Shared-Policy MDP, it achieves size invariance and zero-shot generalization by relying on solver internal states, not environment geometry. A confidence-based mechanism ensures stability by re-splitting only the most problematic segment, yielding up to 26% fewer iterations and 19.1% less computation time in simulations, with real-world validation under 35 ms cycles.
trajectory-optimizationadmmdeep-reinforcement-learningmotion-planningroboticsparallel-optimizationmulti-agent-rl
“Existing fixed-structure parallel ADMM decompositions cause optimization stagnation in constrained regions due to lagging subproblems.”
paper / charleneli / 29d ago
V1 constructs a bottom-up saliency map to guide exogenous saccades, functioning as a motor cortex for eye movements. It imposes a processing bottleneck by massively reducing visual information at its output to downstream areas. V1 supports recognition in these areas via top-down feedback queries, primarily targeting central visual field representations, framing vision as selective looking through saccades and seeing via the bottleneck.
visual-cortexv1-functionssaliency-mapsaccadesvisual-processingneural-bottlenecktop-down-feedback
“V1 acts as a motor cortex for exogenously guiding saccades by constructing a bottom-up saliency map”
paper / charleneli / 29d ago
Inter-Stance introduces the first publicly available multimodal dataset of 45 dyads (90 participants) capturing synchronized 2D/3D face videos, thermal dynamics, voice/speech, physiology (PPG, EDA, heart rate, blood pressure, respiration), and self-reported affect during communicative interactions. It includes dyads with shared history and strangers, annotated for social signals, agreement, disagreement, and neutral stance, with potent emotion induction. The 20TB corpus supports novel modeling of dyadic multimodal behaviors, demonstrated via experiments on communication patterns and affect influenced by interpersonal history.
multimodal-corpusdyadic-interactionstance-analysissocial-signalscomputer-visionaffect-recognition
“No prior publicly-available dataset includes multimodal recordings and self-report measures of multiple persons in social interaction with dyadic recordings and annotations.”
paper / charleneli / Apr 26
UniGenDet introduces a unified framework that jointly optimizes image generation and generated image detection through symbiotic multimodal self-attention and detector-informed generative alignment. This co-evolutionary approach leverages adversarial synergy to bridge architectural gaps, enhancing generation fidelity via authenticity feedback and improving detection interpretability. Experiments across datasets confirm state-of-the-art results in both tasks.
image-generationgenerated-image-detectionunified-frameworkco-evolutionary-learningcomputer-visionadversarial-trainingself-attention
“UniGenDet is a unified generative-discriminative framework for co-evolutionary image generation and generated image detection”
paper / charleneli / Apr 26
VistaBot integrates feed-forward 4D geometry estimation, view synthesis latent extraction, and latent action learning to produce novel viewpoints from fixed-camera training data, enabling robust closed-loop manipulation under test-time viewpoint changes without calibration. It enhances action-chunking (ACT) and diffusion-based (π₀) policies, achieving 2.79× and 2.63× improvements in View Generalization Score (VGS) across simulation and real-world tasks. The framework also delivers high-quality novel view synthesis, with code and models to be released publicly.
robot-manipulationview-synthesisview-robustnessdiffusion-models4d-geometryrobotics-policies
“VistaBot achieves view-robust closed-loop manipulation without requiring camera calibration at test time”
paper / charleneli / Apr 26
Omni is a unified model trained natively on text, images, videos, 3D geometry, and hidden representations, inducing Context Unrolling where it reasons across multiple modal representations prior to prediction. This aggregates complementary information from heterogeneous modalities, approximating the shared multimodal knowledge manifold more faithfully. Consequently, Omni excels in multimodal generation, understanding, and advanced reasoning tasks like in-context generation of text, images, videos, and 3D geometry.
multimodal-modelscontext-unrollingomni-modelcomputer-visionmultimodal-reasoningai-research-paper
“Omni is natively trained on diverse modalities including text, images, videos, 3D geometry, and hidden representations”
tweet / @charleneli / Apr 20
Charlene Li's analysis demystifies agentic AI, detailing its actual current capabilities and limitations. It provides a realistic assessment of ongoing developments. The piece outlines probable next steps in agentic AI evolution for technical practitioners.
agentic-aicharlene-liai-trendsai-futurex-feed
“Agentic AI is currently experiencing specific developments beyond hype”
tweet / @charleneli / Apr 20 / failed
Data, Data Everywhere, but Not an Insight to Drink (Myths vs. Reality) https://twitter.com/i/broadcasts/1ynJOlemqEVxR
tweet / @charleneli / Apr 20 / failed
Are enterprise platforms about to face a mass exodus? https://twitter.com/i/broadcasts/1ynJOlXWVXyxR
tweet / @charleneli / Apr 20 / failed
The top questions boards should be asking about AI https://twitter.com/i/broadcasts/1OwxWXbeBDWKQ
tweet / @charleneli / Apr 18
Charlene Li hosts a live broadcast on strategies to ready organizational teams for AI integration. The session targets practical preparation amid accelerating AI adoption. Technical leaders can access it via X Spaces for actionable insights on team readiness.
ai-futureteam-preparationcharlene-liworkforce-trainingai-adoptionleadership-strategy
“Charlene Li is conducting a live session on preparing teams for an AI future”
tweet / @charleneli / Apr 18
Organizations adopting AI are navigating the "messy middle" stage of culture change, characterized by disruption and resistance following initial enthusiasm. This phase demands structured strategies to manage uncertainty and embed AI practices. Technical leaders must prioritize change management to transition beyond early hype toward sustainable integration.
ai-cultureorganizational-changecharlene-litwitter-spacesai-adoption
“AI adoption follows a 'messy middle' phase in organizational culture change”
tweet / @charleneli / Apr 18
Resistance to AI adoption arises from deep-seated psychological factors. Effective leadership counters this by imposing constraints that channel innovation. This approach transforms limitations into strategic advantages for AI integration.
ai-resistancepsychologyleadershipconstraintscharlene-litwitter-spaces
“AI resistance is primarily driven by psychological factors”
tweet / @charleneli / Apr 18
AI's rapid evolution requires organizations to replace static annual planning with a dynamic six-quarter walk for adaptive strategy. This approach enables quarterly pivots based on emerging AI capabilities and market shifts. Technical teams can use it to align roadmaps with accelerating innovation timelines.
ai-adaptationbusiness-strategyannual-planningcharlene-lisix-quarter-walkleadership
“Traditional annual planning is insufficient for adapting to AI changes”
tweet / @charleneli / Apr 18
Charlene Li is running an hourly poll via her X feed to gauge how organizations communicate their AI strategies. The poll links to a live broadcast session. This reflects ongoing interest in practical AI adoption messaging among tech leaders.
ai-strategycommunicationtwitter-spacescharlene-lihourly-poll
“Charlene Li is conducting an hourly poll on her X feed”
tweet / @charleneli / Apr 18
Charlene Li is hosting a live broadcast to reflect on her experiences after her book has been available worldwide for over two years. The session captures author insights from sustained international publication. Technical audiences may find value in her discussion of long-term book distribution and reception strategies.
charlene-litwitter-feedauthor-reflectionsbook-anniversaryhourly-poll
“Charlene Li has a book that has been in the world for more than 2 years”
youtube / charleneli / Apr 14 / failed
youtube / charleneli / Apr 12 / failed
paper / charleneli / Apr 12
Text-to-video diffusion models frequently fail to generate the correct quantity of objects specified in prompts. NUMINA, a training-free framework, addresses this by identifying prompt-layout inconsistencies usingattention heads to create a countable latent layout. It then refines this layout and modulates cross-attention to improve numerical alignment. This method significantly enhances counting accuracy and CLIP alignment while maintaining temporal consistency.
text-to-videodiffusion-modelsnumerical-alignmentcomputer-visionai-research
“Text-to-video diffusion models struggle with generating the accurate number of objects from text prompts.”
paper / charleneli / Apr 12
ETCH-X is a novel human body fitting method designed to improve both the expressiveness and robustness of fitting parametric body models like SMPL-X to 3D point clouds of clothed humans. It achieves this through a tightness-aware fitting paradigm that filters out clothing dynamics, utilizes implicit dense correspondences for fine-grained fitting, and leverages disentangled, scalable training on diverse composable datasets. This approach significantly enhances performance on both seen and unseen data, addressing limitations of prior methods that excelled in only one aspect.
3d-modelingcomputer-visionbody-fittingclothed-humanssmpl-xdeep-learninggeometric-deep-learning
“ETCH-X utilizes a 'tightness-aware fitting paradigm' to mitigate the challenges posed by clothing dynamics in human body fitting.”
youtube / charleneli / Apr 2 / failed
youtube / charleneli / Apr 2
Effective corporate AI integration requires shifting from a 'tool-first' productivity mindset to a 'strategy-first' approach where AI supports existing business objectives. The goal for leaders is to achieve 'AI fluency'—integrating the technology into daily workflows to augment human judgment, empathy, and wisdom (the '20%') rather than simply automating bad processes. Success is measured not by the number of use cases, but by the ability to use AI to drive customer engagement and business reinvention while maintaining ethical guardrails via a trust pyramid.
ai-adoptioncorporate-strategyorganizational-transformationai-leadershipstorytelling-in-businessfuture-of-workai-implementation
“AI typically handles roughly 80% of standard output, while the remaining 20%—composed of authenticity, unique voice, and deep insight—represents the primary competitive advantage for humans.”
tweet / @charleneli / Jan 27
Charlene Li engaged her audience through an hourly poll on her X feed, two years after her book's release. This initiative serves as a direct author reflection on the reception and longevity of her work, leveraging social media for real-time audience interaction and feedback.
author-reflectionssocial-media-insightscontent-strategypersonal-branding
“Charlene Li conducted an hourly poll on her X feed.”
tweet / @charleneli / Jun 3
Charlene Li, a prominent digital transformation analyst, posed a public poll asking how organizations are communicating their AI strategy, signaling that internal and external AI communication is an emerging leadership challenge. The post links to a live broadcast, suggesting the topic warrants real-time, interactive discussion rather than static guidance. The framing implies a gap between organizations having an AI strategy and effectively communicating it to stakeholders.
ai-strategyleadershipcommunicationexecutive-insightssocial-media
“Communicating an AI strategy is a distinct, non-trivial challenge separate from formulating one.”
tweet / @charleneli / May 27
Traditional annual planning cycles are too slow for the rapid advancements in AI. The "Six-Quarter Walk" is introduced as a more agile planning methodology, enabling businesses to continuously adapt to technological shifts. This approach emphasizes shorter planning horizons and frequent reassessments to integrate AI
annual-planningai-adaptationsix-quarter-walkstrategic-planningbusiness-strategy
“Traditional annual planning is insufficient for the pace of AI innovation.”
tweet / @charleneli / May 13
Charlene Li, a recognized leadership and digital disruption analyst, hosted a broadcast exploring the psychological dimensions of resistance to AI adoption and how leaders can navigate organizational constraints. The content suggests a framework for understanding why individuals and teams resist AI, and how effective leadership can reframe constraints as catalysts. This is consistent with Li's broader body of work on disruptive leadership and change management in the context of emerging technology.
ai-adoptionleadershipchange-managementai-resistanceorganizational-psychology
“AI adoption faces meaningful psychological resistance, not just technical or structural barriers.”
tweet / @charleneli / Apr 29
Charlene Li identifies a "messy middle" phase in AI culture change, implying a period of significant uncertainty and difficulty between initial adoption and full integration. Organizations navigating this phase likely face challenges in adapting processes, skills, and mindsets to effectively leverage AI. Overcoming this "messy middle" is crucial for successful AI transformation.
ai-cultureorganizational-changeai-adoptionleadershipdigital-transformation
“AI culture change involves a 'messy middle' phase.”
tweet / @charleneli / Apr 22
The provided content is a link to a broadcast by Charlene Li on preparing teams for an AI future. Without access to the broadcast content itself, it is impossible to extract specific claims, evidence, or a detailed synthesis. Further analysis requires the actual broadcast material.
ai-futureteam-developmentworkforce-planningleadership-developmenttechnological-impactorganizational-change