absorb.md

Alán Aspuru-Guzik

Chronological feed of everything captured from Alán Aspuru-Guzik.

Infusing Real-World Patterns into Synthetic Data Enables Zero-Shot Material State Segmentation

Researchers bridge the gap between synthetic data's precision and real-world diversity by automatically extracting patterns from natural images and implanting them into synthetic scenes for material state segmentation. This unsupervised method generates scalable datasets capturing complex material variations like wet, dry, infected, or burned states across domains including food, soils, plants, and liquids. They introduce the first comprehensive zero-shot benchmark, where top foundation models underperform but models trained on infused data excel; resources like 300k textures and models are released.

Diffusion Model Enables Discovery of Large-Scale Surface Structures with Periodic Constraints

Researchers introduce a generative diffusion model tailored for surface structure discovery, incorporating substrate registry, periodicity via masked atoms, and z-directional confinement using a rotational equivariant neural network. The model trains a denoiser alongside a force-field for guided low-energy sampling, augmented by a data scheme enabling generation beyond training structure sizes. Demonstrated on multiple systems, it proposes a novel atomistic model for a large, previously unknown silver-oxide domain-boundary.

Quantum Transformer Enables Potential Speedup for LLM Inference via Fault-Tolerant Quantum Computing

Researchers develop quantum subroutines for transformer components including self-attention, residual connections with layer normalization, and feed-forward networks, using efficient quantum implementations of Hadamard products and element-wise matrix functions. The algorithm outputs an amplitude-encoded transformer layer result for measurement or further processing, with quantum complexity dominated by the input sequence's matrix norm. Numerical experiments on open-source LLMs, including bioinformatics applications, demonstrate feasible quantum speedups in practical regimes.

Quantum Generative Model on 16-Qubit IBM Computer Yields Novel KRAS Inhibitors with Experimental Validation

A hybrid quantum-classical generative model, trained on a 16-qubit IBM quantum computer, generated novel small molecules for KRAS inhibition in cancer therapy. Of 15 synthesized candidates, two—ISM061-018-2 and ISM061-22—demonstrated target engagement: ISM061-018-2 as a broad-spectrum inhibitor with 1.4 μM affinity to KRAS-G12D, and ISM061-22 showing selectivity for G12R and Q61H mutants. This marks the first experimental confirmation of hits from a quantum-generative model, with efficacy scaling with qubit count.

LLMs Enhance Bayesian Optimization for Molecules Only with Domain-Specific Pretraining or Finetuning

LLMs serve as fixed feature extractors for principled BO surrogate models, including Bayesian neural networks via parameter-efficient finetuning, to provide uncertainty estimates in molecular optimization. Extensive real-world chemistry experiments demonstrate LLMs accelerate BO effectively, but solely when pretrained or finetuned on domain-specific data. Prior approaches using non-Bayesian LLMs for heuristics fall short of this rigorous integration.

Quantum Computers Efficiently Simulate Chemically Relevant Dynamics via Scattering Trees

Quantum computers can simulate chemical systems with polynomially scaling circuit sizes by using heuristically guided routines for initial-state preparation and dynamics. The approach assembles good initial states in a scattering tree under stated assumptions, enabling mergo-association simulations. Post-simulation measurements yield quantities like reaction outcomes, bypassing ground-state challenges and exponential classical bottlenecks.

Generative Quantum Eigensolver Surpasses CCSD in Nitrogen Bond Dissociation via Transformer-Based Circuit Generation

The Generative Quantum Eigensolver (GQE) optimizes classical generative models, implemented as a transformer-based GPT-QE, to output quantum circuits for ground state energy estimation, bypassing variational quantum algorithms. GPT-QE is pretrained and fine-tuned on electronic structure Hamiltonians, achieving energies that exceed CCSD accuracy for N2 strong bond dissociation and approach chemical accuracy. The method is validated on real quantum hardware, demonstrating practical feasibility.

ORGANA: LLM-Driven Robotic System Automates Diverse Chemistry Experiments with Human-in-the-Loop Control

ORGANA is a robotic assistant that leverages LLMs for decision-making, perception, and human interaction to automate labor-intensive chemistry tasks like electrode polishing in electrochemistry. It supports planning, execution with visual feedback, scheduling, and parallel operations across experiments including solubility, pH measurement, recrystallization, and a 19-step electrochemistry protocol for quinone derivatives. User studies confirm it cuts frustration and physical demand by over 50% while saving 80.3% of chemists' time on average.

RePLan Enables Adaptive Robotic Replanning via VLMs for Long-Horizon Tasks

RePLan integrates Vision Language Models (VLMs) for online replanning in robotics, using physical grounding of world states to adapt actions when initial LLM-generated plans fail due to imperfect planning or environmental issues. It bridges high-level reasoning from LLMs with low-level control via generated reward functions. Evaluated on a new Reasoning and Control (RC) benchmark with eight long-horizon tasks, RePLan outperforms baselines by adapting to unforeseen obstacles and applies to real robots.

nach0: Multimodal Foundation Model Excels in Chemical and Biological Tasks

nach0 is an encoder-decoder LLM pretrained on unlabeled scientific literature, patents, and molecule strings to integrate chemical and linguistic knowledge. It undergoes instruction tuning for tasks including biomedical QA, NER, molecular generation, synthesis, and property prediction. Trained via NeMo framework, nach0 outperforms SOTA baselines on single- and cross-domain benchmarks, generating high-quality molecular and textual outputs.

GFlowNets Enable Boltzmann-Distributed Sampling of Equilibrium Molecular Conformations

GFlowNets sample diverse, low-energy conformations of small molecules directly from the Boltzmann distribution defined by molecular energy. The method integrates with energy estimation models of varying fidelity and excels at identifying thermodynamically feasible structures for highly flexible, drug-like molecules. Empirical results confirm accurate reproduction of molecular potential energy surfaces through proportional sampling.

KREED: Reflection-Equivariant Diffusion Resolves 3D Molecular Structures from Unsigned Isotopologue Spectra

KREED is a generative diffusion model that predicts complete 3D molecular structures from molecular formula, moments of inertia, and unsigned Kraitchman substitution coordinates of heavy atoms derived from natural-abundance rotational spectroscopy. It achieves >98% top-1 accuracy on QM9 and GEOM datasets using all heavy atom coordinates, retaining 91% on QM9 and 32% on GEOM with carbon subsets. Experimentally, it correctly identifies structures in 25/33 literature cases, enabling context-free structure determination.

Universal Quantum Wavelet Transform via LCU and Amplitude Amplification

The algorithm decomposes the wavelet transform kernel into a linear combination of unitaries (LCU) using modular quantum arithmetic, enabling probabilistic implementation with known success probability, then applies amplitude amplification for deterministic execution. It extends to multilevel and packet wavelet transforms with complexity logarithmic in matrix dimension N, linear in levels d, and superlinear in wavelet order M, but independent of M for practical cases. This generalizes prior QWTs limited to low-order Daubechies wavelets, positioning QWTs as versatile analogs to the quantum Fourier transform.

Chemical Language Models Enable Atom-Level Protein Generation Beyond Standard Amino Acids

Chemical language models trained on atom-level representations of small molecules extend to proteins, generating complete protein structures atom by atom from primary sequences while capturing hierarchical secondary and tertiary structures. Unlike protein language models limited to standard amino acid vocabularies, these models produce proteins with modified sidechains forming unnatural amino acids, unconstrained by the genetic code. They further generate hybrid protein-drug conjugates by simultaneously exploring protein and chemical spaces, advancing atom-level biomolecular design.

MAPs Enable Accelerated Discovery for CO2 Photo(thermal)catalysis in Solar Fuels Production

Materials acceleration platforms (MAPs) integrate automation and AI to expedite materials discovery for heterogeneous CO2 photo(thermal)catalysis, targeting solar chemicals and fuels. The abstract highlights design/performance descriptors, automation levels in experiments, and AI data analysis precedents. It proposes a MAP framework for autonomous scale-up from discovery to deployment in this emerging field.

Unified Technical Framework for AI-Driven Modeling Across Quantum, Atomistic, and Continuum Scales

This review provides a comprehensive technical treatment of AI4Science focused on quantum (wavefunctions, electron density), atomistic (molecules, proteins, materials), and continuum (fluids, climate, subsurface) systems, highlighting their shared challenges. Core techniques emphasize equivariant deep learning to encode physical symmetries and first principles. Additional challenges addressed include explainability, OOD generalization, foundation model transfer, and uncertainty quantification, with curated resources for education.

Wavelet Preconditioning Yields Kappa-Independent Quantum Solver for PDEs

Novel quantum algorithm solves discretized PDEs with polylogarithmic complexity in matrix size N, independent of condition number κ. Achieved via wavelet basis as auxiliary coordinates, enabling a simple diagonal preconditioner that renders matrix condition numbers N-independent. Generates quantum state for solution feature extraction; numerical simulations validate for various PDEs, potentially enhancing quantum simulation performance.

Language Models Generate 3D Molecules, Crystals, and Protein Sites Directly from File Formats

Unmodified language models trained via next-token prediction on sequences from XYZ, CIF, and PDB files directly output valid 3D structures of molecules, crystals, and protein binding sites. This approach handles diverse chemical distributions beyond graph-representable organic molecules, eliminating the need for simplified string or graph encodings. Performance matches state-of-the-art graph-based and domain-specific 3D generative models.

Composite Measurement Scheme Boosts Efficient Quantum Observable Estimation on 30-Qubit Systems

A composite measurement scheme distributes measurement shots across multiple schemes using a trainable ratio to optimize expectation value estimation of quantum observables. Composite-LBCS, composing locally-biased classical shadows with Pauli measurements, outperforms prior state-of-the-art methods on molecular systems up to CO2 (30 qubits). The approach supports efficient stochastic gradient descent optimization, even for observables with many terms.

Verifier-Guided Iterative Prompting with Error Feedback Generates Accurate Domain-Specific Robot Task Plans

CLAIRIFY uses iterative prompting on large language models combined with program verification to produce syntactically valid task plans in data-scarce domain-specific languages from high-level natural language instructions. Errors from prior generations serve as feedback to guide the model, while a verifier enforces syntactic correctness and environment constraints. The method achieves state-of-the-art performance in chemistry experiment planning and supports real-robot execution via integration with task and motion planners.

Clifford Circuits Enable 50% Qubit Reduction in Partitioned Quantum Chemistry Simulations

The method partitions quantum chemistry simulations using classically efficient product ansatze like separable pair forms, combined with post-treatment via Clifford or near-Clifford circuits to handle subsystem interactions without exponential Hamiltonian growth. These entangling circuits, optimized via simulated annealing and genetic algorithms, are folded into the Hamiltonian for variational quantum eigensolver use. Numerical simulations on molecules demonstrate up to 50% qubit reduction at comparable accuracy to the baseline separable-pair ansatz.

qSWIFT: High-Order Randomized Hamiltonian Simulation with Term-Independent Gates and Exponential Error Suppression

qSWIFT is a high-order randomized algorithm for Hamiltonian simulation where gate count is independent of Hamiltonian terms and systematic error decays exponentially with order. It extends qDRIFT by reducing gates linearly with precision inverse to exponentially, with rigorous diamond norm error bounds. Numerical results show third-order qSWIFT requires 1000x fewer gates than qDRIFT for 10^{-6} relative error using one ancilla qubit.

MVTrans Enables Robust Multi-View Transparent Object Perception via End-to-End Learning

MVTrans is an end-to-end multi-view RGB architecture that performs depth estimation, segmentation, and pose estimation for transparent objects, bypassing unreliable RGB-D depth maps. It extends stereo methods to handle multiple perception tasks simultaneously. The approach is supported by Syn-TODD, a large-scale synthetic dataset generated via a procedural photo-realistic pipeline compatible with RGB-D, stereo, and multi-view RGB training.

SELFIES Library Evolves to Robust, Efficient Molecular String Representation in Version 2.1.1

SELFIES provides a 100% robust string-based molecular representation immune to syntactic and semantic errors plaguing SMILES in generative ML models. The library has been generalized to broader molecule types and semantic constraints with a streamlined grammar. Version 2.1.1 of selfieslib delivers major improvements in design, efficiency, and features for cheminformatics pipelines.

Quantum Computing's High-Accuracy Promise for Transforming Drug Design

Quantum computers offer superior accuracy in quantum chemical calculations essential for industrial applications like drug design. This perspective analyzes the challenges and opportunities in deploying quantum hardware for pharmaceutical research. It identifies transformative potential in industrial workflows and outlines prerequisites for practical adoption.

Autonomous Robot Framework Automates Chemistry Experiments Using Constrained PDDLStream Planning

The framework ingests high-level experiment descriptions, perceives the lab workspace, and employs PDDLStream-based constrained task and motion planning to generate collision- and spillage-free multi-step actions. It enables robots to manipulate diverse lab equipment for executing experiments like pouring, solubility tests, and recrystallization. Demonstrated on fundamental materials synthesis tasks, it accelerates chemist workflows by automating laborious procedures.

GAUCHE Library Enables Gaussian Processes for Uncertainty Quantification and Bayesian Optimization on Chemical Structures

GAUCHE is a specialized library that implements Gaussian process kernels for chemical representations including graphs, strings, and bit vectors. It facilitates uncertainty quantification and Bayesian optimization in chemistry by extending GPs to structured molecular data. Demonstrated applications target molecular discovery and chemical reaction optimization, with open-source code available on GitHub.

DIONYSUS Benchmark Reveals Probabilistic Models' Calibration and Generalization Limits on Small Chemical Datasets

Deep learning excels on large molecular datasets but its efficacy on small ones (<2000 molecules) remains unclear. This study benchmarks probabilistic ML models across representations and tasks (binary classification, regression) for prediction quality, calibration, and uncertainty on low-data chemical datasets. It introduces simulated tests for Bayesian optimization in molecular design and out-of-distribution inference via ablated cluster splits, providing guidance on optimal model and feature choices. The open-source DIONYSUS repository enables reproducibility and extension.

Physics-Based Rendering and Contrastive Learning Enable One-Shot Material Recognition Across Arbitrary Conditions

MatSim dataset combines synthetic images from physics-based rendering of vast texture collections, objects, and environments with natural images to benchmark few-shot recognition of material similarities, transitions, and states. A siamese network trained via contrastive learning on MatSim generates material descriptors that identify states and subclasses from single images, handling mixtures, containers, and diverse environments. This approach outperforms CLIP on a new few-shot benchmark spanning food, beverages, chemistry, and terrain, with strong generalization to unsupervised tasks.

Waveflow Enables Expressive Antisymmetric Wavefunctions via Boundary-Conditioned Normalizing Flows

Waveflow constructs antisymmetric fermionic wavefunctions using boundary-conditioned normalizing flows on the fundamental domain, bypassing Slater determinants for greater expressiveness in complex many-body systems. It resolves topological mismatches between prior and target distributions with O-spline priors and I-spline bijections, preserving square-normalization. Applied to 1D many-electron systems via VQMC, it accurately learns ground-state wavefunctions.

Group SELFIES Enhances Molecular Generation via Robust Group Tokens

Group SELFIES extends SELFIES by incorporating group tokens for functional groups and substructures, preserving chemical validity guarantees while adding flexibility through molecular fragment inductive biases. It outperforms standard SELFIES in distribution learning on common molecular datasets and yields higher-quality molecules from random sampling. Open-source implementation supports further research in generative molecular design.

Quantum GANs with Variational Circuits Outperform Classical Counterparts in Small Molecule Generation

Hybrid quantum-classical GANs replace GAN components with variational quantum circuits (VQCs), demonstrating quantum advantages in de novo small molecule discovery for drug design. VQCs in the noise generator produce molecules with superior physicochemical properties and goal-directed benchmark performance compared to classical GANs. Quantum discriminators and generators with only tens of learnable parameters achieve better molecule validity, properties, and KL divergence than MLP-based models, reducing parameter counts significantly.

Machine Learning Accelerates Renewable Energy Advances Across Materials, Devices, and Systems

Machine learning leverages data trends to predict material properties, generate structures, and optimize processes, integrating into energy discovery pipelines for faster progress. The review covers ML applications in photovoltaics, batteries, electrocatalysis, and smart grids, with key performance indicators to evaluate workflow benefits. Future challenges include advancing ML techniques to maximize impact on sustainable energy transitions.

Dynamical Lie Rank Guides Efficiency in Parameterized Variational Quantum Circuits

Researchers propose using the rank of the dynamical Lie algebra from layer generators to characterize variational quantum circuits for ground-state energy calculations. Higher Lie rank correlates with improved energy accuracy and reduced circuit depth needed, even with parameter counts below generator term numbers. Exponential computation cost is mitigated by using initial iteration growth rate as a lower-bound proxy, positioning Lie rank as a key circuit design metric.

Tartarus: Realistic Benchmark Suite for Inverse Molecular Design via Physical Simulations

Tartarus introduces practical benchmark tasks for inverse molecular design using physical simulations that mimic real-world problems in materials, drugs, and chemical reactions. It addresses the lack of realistic benchmarks despite advances in AI-driven algorithms for chemical space exploration. Performance of established algorithm families varies significantly across benchmark domains, highlighting the need for domain-specific evaluation.

Quantum Iterative Power Algorithms Outperform Existing Variational Quantum Optimizers

QIPA introduces a family of hybrid variational quantum algorithms that surpass current near-term quantum optimization methods. Demonstrated on H2 molecular ground-state dissociation, transmon qubit ground-state search, and biprime factorization. Features shallow circuits compatible with error mitigation and adaptive ansatzes for scalable NISQ implementation.

Information Flow Graphs Optimize Parameterized Quantum Circuits via Mutual Information Paths

The method models parameterized quantum circuits as graphs where mutual information between gate nodes defines a distance metric for path-based optimization in variational algorithms. Applied to VQE, it computes Heisenberg model ground states; for VQC, it solves binary classification. Numerical simulations confirm improved convergence for near-term quantum algorithms, enhancing stochastic gradient methods.

Evolutionary Algorithms Enable Classically Simulable Quantum Autoencoders for Efficient State Compression

Researchers introduce evolutionary algorithms to design quantum autoencoders that compress quantum states into lower-dimensional representations, reducing resource needs on noisy quantum devices. The method successfully compresses families of quantum states using circuits with restricted gate sets for efficient classical simulation. This hybrid approach leverages classical computation to optimize quantum data representations with minimal resources.

Quantum-Inspired Cluster Expansion Accelerates Materials Discovery 10-50x Over Classical Optimizers

A quantum-inspired superposition technique combined with cluster expansion enables mapping chemical space exploration to quantum annealers, overcoming prior compatibility issues. This method searches for optimal materials 10-50 times faster than genetic algorithms and Bayesian optimization, with superior ground state prediction accuracy. Applied to acidic OER catalysts, it identifies a novel Ru-Cr-Mn-Sb-O2 family where the top performer exhibits 8x higher mass activity than RuO2 and stability over 180 hours at 10 mA/cm² in 0.5 M H2SO4.

Tunable Three-Body Interactions Demonstrated in Superconducting Flux Qubits via Coupling Module

Researchers demonstrate a superconducting circuit architecture where a coupling module mediates both 2-local and 3-local interactions between three flux qubits. The system Hamiltonian is characterized using multi-qubit Ramsey-type interferometry across excitation manifolds. The 3-local interaction is coherently tunable over several MHz via coupler flux biases and can be fully turned off, enabling applications in quantum annealing, analog simulation, and gate-based computation.

AI's Path to Scientific Understanding: From Computational Microscopes to Autonomous Agents

Advanced AI systems can contribute to scientific understanding through three dimensions: acting as computational microscopes to reveal hidden mechanisms, serving as sources of inspiration for new concepts, and potentially evolving into autonomous agents of understanding. The paper draws from philosophy of science and anecdotes from scientists to define these roles, highlighting current limitations and future research directions. Achieving true AI-driven scientific comprehension requires moving beyond prediction to mechanistic explanation, positioning AI as a pathway to artificial scientists.

SELFIES Surpasses SMILES as Robust Molecular Language for AI-Driven Chemistry

SELFIES, introduced in 2020, ensures 100% valid molecular representations, overcoming SMILES' key limitation where most symbol combinations yield invalid chemistries. This robustness has enabled new AI/ML applications in property prediction, reaction discovery, and molecule design. The paper outlines 16 future projects to extend SELFIES to new domains and enhance AI interpretability.

Extending Phoenics and Gryffin for Constrained Bayesian Optimization in Chemistry

Phoenics and Gryffin algorithms are extended to handle arbitrary known experimental and design constraints via an intuitive interface, addressing non-linear, interdependent constraints in chemical optimization domains. Benchmarks on continuous and discrete test functions demonstrate flexibility and robustness. Applications include optimizing o-xylenyl Buckminsterfullerene adduct synthesis under flow constraints and designing redox-active molecules for flow batteries under synthetic accessibility limits, enabling model-based optimization in autonomous scientific platforms.

Molecular Electronics Enables Scalable Quantum Gates via Electron Scattering

This work proposes quantum computing using molecular electronics, implementing one-qubit gates through one-electron scattering in molecules and two-qubit controlled-phase gates via electron-electron scattering along metallic leads. It introduces a class of circuit implementations and demonstrates one-qubit gates with molecular hydrogen's electronic structure as a baseline. The framework bridges molecular physics and quantum computing for potential scalable hardware.

Hierarchical RL Enables Scalable Fragment-Based 3D Molecular Generation

This work introduces a reinforcement learning framework that generates molecules as 3D structures by sequentially placing molecular fragments rather than individual atoms, leveraging chemist expertise for efficiency. Guided solely by energy-based rewards, the hierarchical agent produces complex molecules exceeding 100 atoms, including drug-like, OLED, and biomolecular distributions. This addresses limitations of prior string/graph-based generative models that neglect 3D geometry critical for applications like drug discovery.

AlphaFold Enables 30-Day Discovery of First-in-Class CDK20 Inhibitor with Submicromolar Potency

AlphaFold predicted structures powered an end-to-end AI drug discovery pipeline using PandaOmics for target selection and Chemistry42 for generative molecule design, yielding a CDK20 hit (Kd 8.9 μM) after synthesizing 7 compounds in 30 days. A second AI iteration produced a more potent analog (ISM042-2-048, Kd 210 nM) after 6 syntheses in another 30 days. This marks the first reported small-molecule CDK20 inhibitor and the inaugural use of AlphaFold in early-stage hit identification for a novel target lacking experimental structure.

Simple Language Models Outperform Graph Models in Learning Complex Molecular Distributions

Simple recurrent neural network language models, using string representations of molecules, effectively learn complex molecular distributions that challenge graph generative models. They excel on tasks like generating the highest-scoring penalized LogP molecules from ZINC15, multi-modal distributions, and the largest molecules in PubChem. Results show language models achieve superior performance compared to widely used graph models, particularly highlighting their strength in low-data regimes.

QNODE: Latent Neural ODEs Discover Quantum Dynamics and Laws from Expectation Value Data Alone

QNODE employs a latent neural ODE to model quantum dynamics from expectation values of closed and open systems, satisfying von Neumann and Lindblad master equations unsupervised. It extrapolates beyond training data, rediscovers Heisenberg's uncertainty principle data-driven without constraints, and generates physically consistent trajectories where latent space proximity implies dynamical similarity. This overcomes quantum dimensionality curse for machine-assisted physics discovery.

Robust Interval Guarantees for Quantum Measurement Outputs on NISQ Approximate States

Derives certified robustness intervals containing ideal measurement outcomes despite NISQ-era state preparation and evolution errors. Bounds use semidefinite programs with first moments and fidelity, plus higher moments via Gram matrix non-negativity generalized to mixed states. Demonstrated on VQE simulations, enabling reliable near-term quantum applications.

R12-Correction Boosts VQE Accuracy for Molecules Without Extra Quantum Resources

The variational quantum eigensolver (VQE) is enhanced by a perturbative explicitly correlated [2]_R12-correction that significantly improves accuracy using only the converged one- and two-particle RDMs from the reference wavefunction, combined with molecular integrals. This classical post-processing step scales as the sixth power of the number of electrons and requires no additional quantum measurements. MRA-PNOs as complementary basis functions enable highly accurate molecular simulations at minimal basis set quantum cost while reducing the cubic complexity of the correction computation.