absorb.md

Alán Aspuru-Guzik

Chronological feed of everything captured from Alán Aspuru-Guzik.

Quantum Workflow Enables Accurate Auger Spectra Computation with Reduced Circuit Overhead

A hybrid quantum-classical workflow computes Auger electron spectra by integrating generative quantum eigensolver (GQE) for ground-state preparation, quantum self-consistent equation-of-motion for excitations, and one-centre approximation for transition rates. GQE employs a GPT-2 model to generate optimized quantum circuits, enabling HPC parallelization and GPU acceleration for scalable performance. Demonstrated on water molecule with STO-3G basis, it matches classical full CI, computational, and experimental spectra, while using half the gate count of VQE.

MōLe: Equivariant ML Predicts Coupled-Cluster Amplitudes from Hartree-Fock Orbitals for High-Accuracy Quantum Chemistry

MōLe is an equivariant neural network that predicts coupled-cluster excitation amplitudes directly from Hartree-Fock molecular orbitals, bypassing DFT's accuracy limitations. Trained solely on small equilibrium geometries, it exhibits strong data efficiency and generalizes to larger molecules and off-equilibrium structures. It also accelerates CC convergence by reducing required iterations, enabling scalable high-accuracy wavefunction modeling.

El Agente Gráfico: Type-Safe Graphs Enable Single-Agent Scientific Automation

El Agente Gráfico integrates LLM decision-making into a type-safe execution environment using dynamic knowledge graphs and object-graph mappers for structured state management via typed Python objects. This replaces unstructured text with symbolic identifiers, enhancing context consistency, provenance tracking, and tool orchestration. A single agent outperforms multi-agent systems on quantum chemistry benchmarks and extends to conformer generation and MOF design, proving scalable automation beyond prompt-centric approaches.

El Agente Sólido Automates Solid-State Quantum Chemistry via Hierarchical LLM Agents

El Agente Sólido is a hierarchical multi-agent framework that uses LLMs to translate natural language objectives into end-to-end Quantum ESPRESSO workflows for solid-state simulations. It handles structure generation, input construction, execution, and analysis, integrating DFT, phonon calculations, and ML interatomic potentials. Benchmarking confirms reliable execution across diverse cases, boosting reproducibility in materials discovery.

El Agente Quntur: Hierarchical AI as Research Collaborator Democratizing Quantum Chemistry

El Agente Quntur is a hierarchical multi-agent AI system that acts as a research collaborator for quantum chemistry, automating ORCA 6.0 workflows via reasoning-driven decisions, composable actions, and guided deep research over documentation and literature. It eliminates hard-coded policies to enable adaptive planning, execution, and analysis of in silico experiments following best practices. The design principles generalize beyond ORCA to other quantum chemistry packages, addressing accessibility barriers for non-experts.

El Agente Estructural: AI-Powered 3D Molecular Geometry Editor Mimicking Human Expertise

El Agente Estructural is a multimodal, natural-language-driven agent that manipulates molecular geometries in 3D using domain-informed tools and vision-language models, emulating human expert editing. It enables precise control over atomic replacements, connectivity, and stereochemistry without rebuilding core frameworks. Demonstrated in case studies including site-selective functionalization, ligand exchange, and image-guided structure generation from reaction schematics. Integrates into El Agente Quntur for enhanced autonomous quantum chemistry workflows.

ELECTRAFI Achieves State-of-the-Art Periodic Charge Density Prediction with 633x Speedup via Analytic Gaussian Transforms

ELECTRAFI models periodic charge densities in crystals using anisotropic Gaussians in real space, leveraging closed-form Fourier transforms and Poisson summation for analytic plane-wave coefficients. This enables full density reconstruction via single inverse FFT, bypassing grid probing, periodic summation, and spherical harmonics. It matches or exceeds SOTA accuracy on benchmarks while being up to 633x faster, and cuts DFT initialization costs by ~20% due to low inference time.

Materealize: Multi-Agent System Bridges Computational Materials Design to Experimental Synthesis

Materealize integrates structure generation, property prediction, synthesizability assessment, and synthesis planning into a unified multi-agent framework for end-to-end inorganic materials design. It offers an instant mode for rapid task resolution, such as property-conditioned candidate design with recipes and data augmentation, completing in minutes via natural-language interface. Thinking mode employs multi-agent debate for refined outputs, including reasoning-driven routes, mechanistic hypotheses validated against literature and simulations.

MATTERIX: GPU-Accelerated Multiscale Digital Twin for Simulating Robotics-Assisted Chemistry Labs

MATTERIX is a GPU-accelerated simulation framework that creates high-fidelity digital twins of chemistry laboratories, modeling robotic manipulation, powder/liquid dynamics, device functions, heat transfer, and basic reaction kinetics. It integrates realistic physics simulation, photorealistic rendering, and a modular semantics engine for logical states and continuous behaviors across abstraction levels. The framework supports open-source assets, hierarchical planning, and learning-based skills, enabling sim-to-real transfer and in silico testing of automated workflows to minimize physical experiments.

TreeWriter Enables Hierarchical AI-Assisted Writing for Complex Long-Form Documents

TreeWriter models documents as hierarchical trees, enabling multi-level outlining, iterative editing, and context-aware AI suggestions that load relevant content dynamically. A within-subject study (N=12) comparing it to Google Docs + Gemini demonstrated superior performance in idea exploration/development, AI helpfulness, and authorial control for long-document tasks. A two-month field deployment (N=8) confirmed its efficacy for collaborative writing via structured organization.

Discrete Feynman-Kac Correctors Enable Training-Free Inference-Time Control of Discrete Diffusion Models

Discrete Feynman-Kac Correctors provide a framework for controlling the sampling distribution of trained discrete masked diffusion models at inference using Sequential Monte Carlo algorithms. These algorithms enable temperature annealing, sampling from products of marginals across multiple diffusion processes, and reward-tilted generation without retraining. Applications demonstrate improved efficiency in Ising model sampling, code generation with language models, amortized learning, and high-reward protein sequence design.

El Agente Cuántico: Multi-Agent AI Automates Quantum Simulations via Natural Language

El Agente Cuántico is a multi-agent AI system that translates natural-language descriptions of scientific intent into executed, validated quantum simulations. It reasons over library documentation and APIs to dynamically assemble workflows across heterogeneous frameworks, covering state preparation, closed/open-system dynamics, tensor networks, quantum control, error correction, and resource estimation. This unifies disparate simulation paradigms under a single interface, reducing barriers and enabling scalable, autonomous exploration of quantum models.

Foundation Models Enhance Likelihood-Free Bayesian Optimization for Scalable Molecular Discovery

The method introduces a likelihood-free Bayesian optimization (BO) approach that skips explicit surrogate modeling and directly uses priors from general LLMs and chemistry foundation models to inform acquisition functions. It employs a tree-structured partition of the molecular search space with local acquisition functions, enabling efficient candidate selection through Monte Carlo Tree Search. Coarse-grained LLM-based clustering further boosts scalability by limiting evaluations to high-value clusters, yielding superior sample efficiency, robustness, and performance in low-data regimes.

EGMOF: Modular Diffusion-Transformer Enables Data-Efficient MOF Inverse Design

EGMOF employs a hybrid diffusion-transformer architecture that decouples inverse design into property-to-descriptor mapping via a 1D diffusion model (Prop2Desc) and descriptor-to-MOF structure generation via a transformer (Desc2MOF). This modularity minimizes retraining needs and sustains high performance with limited data, such as 1,000 samples. It achieves superior validity (95%) and hit rates (84%) on hydrogen uptake tasks, outperforming baselines by up to 57% and 14%, and generalizes across 29 diverse property datasets.

Schema-Activated ICL Extracts Abstracted Reasoning Templates to Boost LLM Performance

SA-ICL introduces a framework that distills prior demonstration examples into lightweight, structured schemas—templates of key inferential steps and relationships—for explicit knowledge transfer in transformer-based LLMs. Drawing from cognitive schema theory, it augments reasoning on novel tasks, reducing dependence on demonstration volume. Experiments on GPQA chemistry/physics questions show up to 36.19% gains with single high-quality examples across LLMs, confirming they lack implicit schema formation but thrive with explicit scaffolding. This unifies ICL variants like pattern priming and CoT while enhancing interpretability.

Fast-Forwardable Lindbladians Enable Heisenberg-Limit QPE via Quadratic Speedup

Certain Lindbladian processes perform QPE-type tasks at standard quantum limit scaling, unlike Heisenberg-limit QPE. These dynamics permit quadratic fast-forwarding, simulated in O(√(t log(1/ε))) cost for time t and error ε, via a mechanism distinct from Hamiltonian fast-forwarding. This simulation doubles as a new Heisenberg-limit QPE algorithm and extends to efficient Gibbs state preparation with accelerated decoherence under Pauli noise.

RAISE: Closed-Loop Self-Driving Lab Accelerates Interfacial Formulation Discovery via High-Throughput Contact Angle Optimization

RAISE is a robotic self-driving laboratory that automates liquid formulation mixing, droplet deposition, contact angle imaging, and measurement at 1 per minute throughput. It integrates Bayesian optimization for iterative formulation exploration targeting user-defined wettability objectives. Multi-objective BO uses desirability scores to balance contact angle precision, surfactant minimization, and cost reduction, demonstrating robustness to surfactant purity variations.

Direct SE(3)-Equivariant Hessian Prediction from Graph Neural Network Irrep Features

HIP predicts molecular Hessians directly from SE(3)-equivariant irreducible representations up to degree l=2 during graph neural network message passing, bypassing automatic differentiation or finite differences. This yields 1-2 orders of magnitude speedup, higher accuracy, lower memory use, easier training, and better scaling with system size compared to traditional methods. Validation across transition state search, geometry optimization, zero-point energy corrections, and vibrational analysis shows superior performance; code and models are open-sourced.

Deep Equilibrium Models Boost ML Force Field Efficiency by Recycling Temporal Features

Recasting equivariant ML force fields as deep equilibrium models (DEQs) exploits temporal continuity in molecular dynamics, recycling intermediate features from prior timesteps. This yields 10-20% gains in accuracy and speed over baseline models on MD17, MD22, and OC20 200k datasets. DEQ training is more memory-efficient, enabling expressive models on larger systems.

Springs-and-Sticks Dynamical System Physically Approximates Continuous Functions with MLP-Comparable Performance

A dynamical system of sticks and springs approximates continuous functions via piecewise-linear stick configurations, with spring potential energy encoding MSE loss minimized through dissipation. Applied to regression, it matches multi-layer perceptron performance. Free energy changes relate to learning data distributions, but environmental fluctuations impose a thermodynamic learning barrier limiting free energy reduction and thus learning capability.

Generative AI Shifts MOF Design from Enumeration to Autonomous Laboratory Synthesis

Generative AI models including VAEs, diffusion models, and LLM-based agents leverage growing MOF datasets to propose novel porous reticular structures. These tools integrate with high-throughput screening and automated experiments in closed-loop pipelines, accelerating discovery for clean air and energy applications. Challenges persist in synthetic feasibility, dataset diversity, and domain knowledge integration.

TreeReader Enables Efficient Hierarchical Navigation of Academic Papers via LLM-Generated Interactive Summaries

TreeReader decomposes academic papers into an interactive tree structure, with LLM-generated concise summaries for each section and on-demand access to underlying details. This addresses cognitive overload from linear formats like PDF/HTML and limitations of LLM chatbots, such as poor sectional nuance and lack of navigation. A user study confirms improved reading efficiency and comprehension through focused exploration and source verification.

SynTwins Bridges AI Molecule Design and Synthetic Accessibility via Retrosynthesis-Guided Analog Search

SynTwins introduces a deterministic, retrosynthesis-guided framework that generates synthetically accessible molecular analogs by performing retrosynthesis on a target, searching for similar building blocks, and executing virtual forward synthesis. Unlike stochastic ML generators, it uses search algorithms to outperform SOTA models in producing feasible analogs with high structural similarity to targets. Integration into property-optimization pipelines yields feasible molecules with minimal property degradation, validated across diverse datasets.

Penalty Projections Enable Efficient Quantum Solving of Differential Equations with Arbitrary Boundary Conditions

The method enforces arbitrary boundary conditions in quantum algorithms for differential equations by augmenting the governing equations with penalty projections. Assuming a fast-forwardable projection representation, the gate complexity overhead scales as O(log λ) with penalty strength λ, or worst-case O((‖v(0)‖²‖A₀‖ + ‖b‖_{L¹[0;t]}^2) t² / ε) for precision ε in systems dv/dt = A₀(t) v(t) + b(t) with negative semidefinite A₀. For the heat equation, this yields Õ(d log n + log t) gate complexity. Constraint error bounds are proven, with numerical validation and circuit estimates via linear combination of Hamiltonian simulation.

Reinforcement Learning Transformer Outperforms Baselines in Quantum Circuit Compilation for Neutral Atom Arrays

QC-Daemon, a transformer-based reinforcement learning agent, compiles quantum circuits by solving the Atom Game for reconfigurable neutral atom arrays, optimizing atom layouts for parallel circuit execution. Trained on diverse circuits with physically motivated architectures, it reduces logarithmic infidelity on benchmarks up to 100 qubits. The approach demonstrates transferability, generalizing to unseen circuits without retraining.

RoboCulture Enables Cost-Effective Robotic Automation of Long-Duration Biological Experiments

RoboCulture is a flexible, low-cost platform using a general-purpose robotic manipulator to automate biological workflows, addressing limitations of current liquid handlers that require human intervention for plate loading, tip replacement, and calibration. It integrates liquid handling, lab equipment interaction, and computer vision for real-time optical density-based growth monitoring with force feedback. The system employs a modular behavior tree framework to robustly execute a fully autonomous 15-hour yeast culture experiment.

Quetzal: Scalable Autoregressive Model Outperforms Diffusion in 3D Molecule Generation

Quetzal is a scalable autoregressive model that generates 3D molecules atom-by-atom using a causal transformer for discrete atom types and a diffusion MLP for continuous positions. It surpasses existing autoregressive baselines in generation quality, matches state-of-the-art diffusion models, and enables faster generation via fewer transformer passes and exact likelihood computation. The architecture natively supports variable-size tasks like hydrogen decoration without modifications.

El Agente Q: LLM-Powered Autonomous Agent Democratizes Quantum Chemistry Workflows

El Agente Q is an LLM-based multi-agent system that interprets natural language prompts to autonomously generate, execute, and debug quantum chemistry workflows using a hierarchical memory architecture for task decomposition, tool selection, and file management. Benchmarked on six university-level exercises and two case studies, it achieves over 87% average task success with adaptive in situ error handling and support for multi-step executions. This enables accessible quantum chemistry for non-specialists while providing transparent action logs for experts.