Arvind on AI

Chronological feed of everything captured from Arvind on AI.

paper / arvind-ai / 16h ago / failed

An Interactive Paradigm for Deep Research

paper / arvind-ai / 16h ago / failed

Ultra-Low-Bitrate Mel-Spectrogram-based Neural Speech Coding with Flow-Matching-based Refinement and Vocoding-driven Reconstruction

paper / arvind-ai / 1d ago / failed

ComHymba: Low-Complexity Domain-Informed Foundation Model for Wireless Communications

paper / arvind-ai / 1d ago / failed

HyperParallel-MoE: Multi-Core Interleaved Scheduling for Fast MoE Training on Ascend NPUs

paper / arvind-ai / 4d ago / failed

Toward AI VIS Co-Scientists: A General and End-to-End Agent Harness for Solving Complex Data Visualization Tasks

paper / arvind-ai / 4d ago / failed

Echo: Learning from Experience Data via User-Driven Refinement

paper / arvind-ai / 4d ago / failed

Effective User-defined Keyword Spotting with Dual-stage Matching, Multi-modal Enrollment, and Continual Adaptation

paper / arvind-ai / 4d ago / failed

Gyromagnetic Quantum Friction in Rayleigh Vorticity Baths

paper / arvind-ai / 4d ago / failed

Propagation-Consistent Wireless Environment Digital Twin Construction Under Sparse Measurements

paper / arvind-ai / 5d ago / failed

A putative model of the gut-muscle axis in aged livestock

paper / arvind-ai / 5d ago / failed

Carrier-doping effect and anomalous transport properties in Ni-doped CeCoIn5 investigated by Hall resistivity measurements

paper / arvind-ai / 13d ago / failed

Phoenix-VL 1.5 Medium Technical Report

paper / arvind-ai / 13d ago / failed

HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer

paper / arvind-ai / 13d ago / failed

Reviving primordial black hole formation in slow first-order phase transitions

paper / arvind-ai / 13d ago / failed

Study of $φ\to K\bar{K}$ in the amplitude analysis of $D^{+}\to K_{S}^{0}K_{L}^{0}π^{+}$

paper / arvind-ai / 13d ago / failed

BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

youtube / arvind-ai / 19d ago / failed

What's new in OneDrive photos | Sync Up

youtube / arvind-ai / 19d ago / failed

A Collaborative Effort to Push the Next Evolution in Streaming

youtube / arvind-ai / 19d ago / failed

AI in Finance | CA Ishika K. in conversation with Arvind S. (Eisner Amper) | Finance Speaks Ep. 9

youtube / arvind-ai / 19d ago / failed

AI for Investment Management | Amazon Web Services

youtube / arvind-ai / 19d ago / failed

Arvind Krishna - CEO of IBM | Podcast | In Good Company

paper / arvind-ai / 21d ago / failed

Spectral- and Energy-efficient Multi-BS Multi-RIS Pinching-antenna Systems: A GNN-based Approach

paper / arvind-ai / 21d ago / failed

Variants of Wythoff's Games with Different Terminal Sets

paper / arvind-ai / 21d ago / failed

Generalized continuum theory of phonon angular momentum in crystals

paper / arvind-ai / 21d ago / failed

Enhancing Judgment Document Generation via Agentic Legal Information Collection and Rubric-Guided Optimization

paper / arvind-ai / 22d ago / failed

Heterogeneous Scientific Foundation Model Collaboration

paper / arvind-ai / 22d ago / failed

Tail-aware N-version Machine Learning Models for Reliable API Recommendation

paper / arvind-ai / 22d ago / failed

Economic Valuation and Optimal Deployment of Static Synchronous Series Compensators for U.S. Power System Expansion

youtube / arvind-ai / 24d ago / failed

Double Your Income With AI in 3 Months (Here's the Stack)

youtube / arvind-ai / 24d ago / failed

Unexplored Territory 117 - An update on all things Memory Tiering featuring Arvind Jagannath!

youtube / arvind-ai / 24d ago / failed

IBM CEO Arvind Krishna REVEALS AI Truth 😳 Future of Jobs & Power Explained | Rational Thinker

youtube / arvind-ai / 26d ago / failed

Moravec's Paradox: AI's Strange Limitation

paper / arvind-ai / 29d ago

Infrastructure-Centric World Models Exploit Spatio-Temporal Complementarity for Superior Roadside Traffic Anticipation

Infrastructure-centric world models (I-WM) leverage fixed roadside sensors' temporal depth for accumulating long-term behavioral distributions, including rare events, complementing ego-vehicle sensors' spatial breadth. The proposed three-phase approach includes generative scene understanding with uncertainty propagation, physics-informed multi-agent predictive dynamics, and collaborative V2X models via latent alignment. A dual-layer architecture uses annotation-free multi-modal perception (LiDAR to event cameras) to drive end-to-end generative models, introducing Infrastructure VLA (I-VLA) unifying perception, language, and traffic actions.

world-modelsinfrastructure-centricautonomous-drivingroadside-perceptionspatio-temporalcomputer-visionrobotics

“Existing world models for autonomous driving adopt only ego-vehicle perspectives, leaving infrastructure viewpoints unexplored.”

paper / arvind-ai / 29d ago

TeamFusion Enables Consensus in Open-Ended Multi-Agent Teamwork via Iterative Proxy Discussions

TeamFusion is a multi-agent system for open-ended domains that instantiates proxy agents from team members' preferences, facilitates structured discussions to identify agreements and disagreements, and synthesizes consensus-oriented outputs for iterative refinement. Unlike closed-domain aggregation methods, it preserves minority perspectives by resolving disagreements rather than suppressing them. Evaluations on two teamwork tasks show superior performance over direct aggregation baselines in representing individual views and achieving consensual strength across metrics, tasks, and team configurations.

multi-agent-systemsteamwork-aiconsensus-buildingopen-ended-domainsagent-collaborationarxiv-paper

“Answer aggregation approaches from closed domains suppress minority perspectives in open-ended teamwork settings”

paper / arvind-ai / 29d ago

Wan-Image Unifies LLMs and Diffusion Transformers for Professional-Grade Controllable Image Synthesis

Wan-Image integrates large language models with diffusion transformers in a unified multi-modal architecture to enable precise control over image generation, addressing limitations in controllability, typography, and identity preservation. It leverages large-scale multi-modal data, fine-grained annotations, and reinforcement learning to support advanced features like ultra-long text rendering, multi-subject identity preservation, 4K synthesis, and interactive editing. Human evaluations show it outperforms Seedream 5.0 Lite and GPT Image 1.5 overall, matching Nano Banana Pro on challenging tasks.

image-generationdiffusion-modelsmulti-modal-aivisual-synthesiscontrollable-generationgenerative-ai

“Wan-Image features a natively unified multi-modal architecture synergizing LLMs and diffusion transformers for translating nuanced user intents into precise visuals”

paper / arvind-ai / 29d ago

LLaDA2.0-Uni Unifies Multimodal Understanding and Generation via Discrete Diffusion LLM

LLaDA2.0-Uni integrates multimodal understanding and generation using a discrete diffusion large language model with SigLIP-VQ tokenization, MoE-based dLLM backbone, and diffusion decoder. It applies block-level masked diffusion to both text and discretized visual inputs, enabling interleaved reasoning and generation. Prefix-aware backbone optimizations and few-step decoder distillation boost inference efficiency, matching specialized VLMs in understanding while excelling in high-fidelity image generation and editing.

multimodal-aidiffusion-modellarge-language-modelimage-generationdiscrete-diffusioncomputer-vision

“LLaDA2.0-Uni supports both multimodal understanding and generation in a single integrated framework”

paper / arvind-ai / 29d ago

Multimodal LLMs Enable Efficient Nationwide Building Condition Assessment from Street View Imagery

Fine-tuning Gemma 3 27B on modest human-labeled Google Street View data aligns model predictions with human mean opinion scores (MOS), surpassing individual raters on SRCC and PLCC metrics. Knowledge distillation transfers performance to Gemma 3 4B (3x speedup) and vision models like EfficientNetV2-M and SwinV2-B (30x speedup) with comparable accuracy. The framework supports scalable assessment of built environment attributes via a visualization dashboard, minimizing labeling needs.

multimodal-llmsstreet-view-imageryknowledge-distillationbuilding-assessmentcomputer-visionhousing-attributesfine-tuning

“Fine-tuned Gemma 3 27B outperforms individual human raters on SRCC and PLCC relative to human MOS for building condition evaluation”

youtube / arvind-ai / Apr 20 / failed

Economist David Autor explains the evolution of wage inequality in the US

youtube / arvind-ai / Apr 20 / failed

Professor David Autor on how AI could help rebuild the middle class

youtube / arvind-ai / Apr 20 / failed

The AI Crash That Will Make 2008 Look Small

youtube / arvind-ai / Apr 18 / failed

Balancing Innovation Speed with Responsible AI Deployment | Fireside Chat | World AI Summit 2025

youtube / arvind-ai / Apr 18 / failed

What's Happening with AI Policy and Regulation?

youtube / arvind-ai / Apr 18 / failed

Temporary fix for Section 702.

youtube / arvind-ai / Apr 18 / failed

2026 RSAC Conference | Securing autonomous AI at scale with Arvind (Nitro) Nithrakashyap from Rubrik

youtube / arvind-ai / Apr 18 / failed

Arvind Krishna, IBM CEO, on Big Bets, AI, and Quantum | The CEO Signal

youtube / arvind-ai / Apr 15 / failed

Infinity paradox - Riemann series theorem

paper / arvind-ai / Apr 10

Enhanced Precision in CKM Angle Gamma Measurement Through a Joint LHCb-BESIII Analysis

This research presents a novel, unbinned, model-independent approach to precisely measure the CKM angle gamma. By jointly analyzing data from LHCb and BESIII experiments, the study combines charge-parity violating observables from B-meson decays with strong-phase parameters from D-meson decays. This methodology significantly improves the precision of the gamma angle determination, offering critical insights into CP violation within the Standard Model.

ckm-anglegamma-measurementcp-violationbesiii-experimentlhcb-experimentparticle-physicshigh-energy-physics

“The CKM angle gamma was measured using a novel, unbinned, model-independent approach.”

paper / arvind-ai / Apr 10

Coupled-Cluster Imaginary-Time Evolution for Irreasonable Solutions

This paper introduces a coupled-cluster formalism utilizing imaginary-time evolution from an arbitrary reference. This method converges to standard coupled-cluster amplitude equations when finite solutions exist. Crucially, it provides additional information even when standard solutions are not available. The formalism also incorporates a coupled-cluster energy variance minimum to identify physically regularized coupled-cluster amplitudes.

quantum-chemistrycoupled-clustercomputational-physicsimaginary-time-evolutionchemical-physics

“A coupled-cluster formalism can perform imaginary-time evolution from an arbitrary reference.”

paper / arvind-ai / Apr 10

SHAPE: Enhancing LLM Reasoning Efficiency

SHAPE is a novel framework that improves LLM reasoning by formalizing it as a state-space trajectory. It introduces a hierarchical credit assignment mechanism. This approach aims to distinguish meaningful progress from mere verbosity in process supervision, addressing limitations of existing methods in reasoning capability and token efficiency. SHAPE achieves better accuracy while reducing token consumption.

llm-reasoningreinforcement-learningprocess-supervisionnatural-language-processingllm-efficiencyai-research

“Existing process supervision methods for LLMs fail to distinguish meaningful progress from verbosity, leading to limited reasoning and token inefficiency.”

paper / arvind-ai / Apr 10

Human Trial-and-Error Dataset Outperforms LLMs in Problem Solving

The Trial-and-Error Collection (TEC) dataset and platform capture detailed human problem-solving trajectories and reflections. This novel dataset reveals human superiority over LLMs in trial-and-error tasks, highlighting the need for more sophisticated AI techniques beyond simple heuristics. TEC provides a valuable resource for developing more capable AI systems by offering a foundation for understanding human trial-and-error behavior.

human-ai-interactiontrial-and-errorproblem-solvingai-datasetsllm-limitationshuman-cognition

“Existing AI techniques for trial-and-error are limited by reliance on simple heuristics and lack of appropriate data.”

Older entries →