Chronological feed of everything captured from Alán Aspuru-Guzik.
paper / aspuruguzik / Mar 5
Researchers bridge the gap between synthetic data's precision and real-world diversity by automatically extracting patterns from natural images and implanting them into synthetic scenes for material state segmentation. This unsupervised method generates scalable datasets capturing complex material variations like wet, dry, infected, or burned states across domains including food, soils, plants, and liquids. They introduce the first comprehensive zero-shot benchmark, where top foundation models underperform but models trained on infused data excel; resources like 300k textures and models are released.
zero-shot-segmentationmaterial-statessynthetic-datacomputer-visiontexture-generationarxivr-paper
“Synthetic data lacks real-world diversity, limiting its effectiveness for material state recognition tasks.”
paper / aspuruguzik / Feb 27
Researchers introduce a generative diffusion model tailored for surface structure discovery, incorporating substrate registry, periodicity via masked atoms, and z-directional confinement using a rotational equivariant neural network. The model trains a denoiser alongside a force-field for guided low-energy sampling, augmented by a data scheme enabling generation beyond training structure sizes. Demonstrated on multiple systems, it proposes a novel atomistic model for a large, previously unknown silver-oxide domain-boundary.
diffusion-modelgenerative-aisurface-structurescomputational-physicsmaterials-scienceneural-networkstructure-discovery
“The generative model accounts for substrate registry and periodicity using masked atoms and z-directional confinement.”
paper / aspuruguzik / Feb 26
Researchers develop quantum subroutines for transformer components including self-attention, residual connections with layer normalization, and feed-forward networks, using efficient quantum implementations of Hadamard products and element-wise matrix functions. The algorithm outputs an amplitude-encoded transformer layer result for measurement or further processing, with quantum complexity dominated by the input sequence's matrix norm. Numerical experiments on open-source LLMs, including bioinformatics applications, demonstrate feasible quantum speedups in practical regimes.
quantum-transformerquantum-computingtransformer-inferencequantum-linear-algebraquantum-machine-learningllm-accelerationquantum-ai
“Quantum subroutines can implement transformer building blocks: self-attention, residual connection with layer normalization, and feed-forward network.”
paper / aspuruguzik / Feb 13 / failed
A hybrid quantum-classical generative model, trained on a 16-qubit IBM quantum computer, generated novel small molecules for KRAS inhibition in cancer therapy. Of 15 synthesized candidates, two—ISM061-018-2 and ISM061-22—demonstrated target engagement: ISM061-018-2 as a broad-spectrum inhibitor with 1.4 μM affinity to KRAS-G12D, and ISM061-22 showing selectivity for G12R and Q61H mutants. This marks the first experimental confirmation of hits from a quantum-generative model, with efficacy scaling with qubit count.
quantum-computingdrug-discoverykras-inhibitorsgenerative-modelsquantum-algorithmshybrid-quantum-classical
“The hybrid quantum-classical generative model was trained on a 16-qubit IBM quantum computer.”
paper / aspuruguzik / Feb 7
LLMs serve as fixed feature extractors for principled BO surrogate models, including Bayesian neural networks via parameter-efficient finetuning, to provide uncertainty estimates in molecular optimization. Extensive real-world chemistry experiments demonstrate LLMs accelerate BO effectively, but solely when pretrained or finetuned on domain-specific data. Prior approaches using non-Bayesian LLMs for heuristics fall short of this rigorous integration.
llmsbayesian-optimizationmaterial-discoverymoleculesfeature-extractorsbayesian-neural-networks
“LLMs are useful for accelerating principled Bayesian optimization over molecular spaces”
paper / aspuruguzik / Jan 17
Quantum computers can simulate chemical systems with polynomially scaling circuit sizes by using heuristically guided routines for initial-state preparation and dynamics. The approach assembles good initial states in a scattering tree under stated assumptions, enabling mergo-association simulations. Post-simulation measurements yield quantities like reaction outcomes, bypassing ground-state challenges and exponential classical bottlenecks.
quantum-computingchemical-simulationquantum-algorithmsstate-preparationscattering-treequantum-chemistry
“Quantum circuits for chemically motivated simulation problems scale polynomially in relevant system parameters.”
paper / aspuruguzik / Jan 17
The Generative Quantum Eigensolver (GQE) optimizes classical generative models, implemented as a transformer-based GPT-QE, to output quantum circuits for ground state energy estimation, bypassing variational quantum algorithms. GPT-QE is pretrained and fine-tuned on electronic structure Hamiltonians, achieving energies that exceed CCSD accuracy for N2 strong bond dissociation and approach chemical accuracy. The method is validated on real quantum hardware, demonstrating practical feasibility.
quantum-computingquantum-eigensolvergenerative-quantumelectronic-structurequantum-chemistryvariational-algorithmsarxiv-paper
“GQE operates outside the variational quantum algorithm paradigm by using classical generative models to produce quantum circuits.”
paper / aspuruguzik / Jan 13
ORGANA is a robotic assistant that leverages LLMs for decision-making, perception, and human interaction to automate labor-intensive chemistry tasks like electrode polishing in electrochemistry. It supports planning, execution with visual feedback, scheduling, and parallel operations across experiments including solubility, pH measurement, recrystallization, and a 19-step electrochemistry protocol for quinone derivatives. User studies confirm it cuts frustration and physical demand by over 50% while saving 80.3% of chemists' time on average.
robotic-assistantchemistry-automationlab-roboticsllm-integrationai-roboticsautomated-experimentationelectrochemistry
“ORGANA automates polishing electrodes in electrochemistry experiments”
paper / aspuruguzik / Jan 8
RePLan integrates Vision Language Models (VLMs) for online replanning in robotics, using physical grounding of world states to adapt actions when initial LLM-generated plans fail due to imperfect planning or environmental issues. It bridges high-level reasoning from LLMs with low-level control via generated reward functions. Evaluated on a new Reasoning and Control (RC) benchmark with eight long-horizon tasks, RePLan outperforms baselines by adapting to unforeseen obstacles and applies to real robots.
roboticsreplanningvision-language-modelslarge-language-modelsrobot-controllong-horizon-tasks
“VLMs provide physical grounding to enable online replanning for long-horizon robotic tasks when initial plans fail”
paper / aspuruguzik / Nov 21
nach0 is an encoder-decoder LLM pretrained on unlabeled scientific literature, patents, and molecule strings to integrate chemical and linguistic knowledge. It undergoes instruction tuning for tasks including biomedical QA, NER, molecular generation, synthesis, and property prediction. Trained via NeMo framework, nach0 outperforms SOTA baselines on single- and cross-domain benchmarks, generating high-quality molecular and textual outputs.
foundation-modelmultimodal-llmchemical-aimolecular-generationbiomedical-nlpinstruction-tuning
“nach0 solves biomedical question answering, named entity recognition, molecular generation, molecular synthesis, and attributes prediction”
paper / aspuruguzik / Oct 20
GFlowNets sample diverse, low-energy conformations of small molecules directly from the Boltzmann distribution defined by molecular energy. The method integrates with energy estimation models of varying fidelity and excels at identifying thermodynamically feasible structures for highly flexible, drug-like molecules. Empirical results confirm accurate reproduction of molecular potential energy surfaces through proportional sampling.
gflownetsmolecular-conformationsboltzmann-distributionmolecular-dynamicsmachine-learningcomputational-chemistryarxiv-paper
“GFlowNets sample molecular conformations proportionally to the Boltzmann distribution.”
paper / aspuruguzik / Oct 17
KREED is a generative diffusion model that predicts complete 3D molecular structures from molecular formula, moments of inertia, and unsigned Kraitchman substitution coordinates of heavy atoms derived from natural-abundance rotational spectroscopy. It achieves >98% top-1 accuracy on QM9 and GEOM datasets using all heavy atom coordinates, retaining 91% on QM9 and 32% on GEOM with carbon subsets. Experimentally, it correctly identifies structures in 25/33 literature cases, enabling context-free structure determination.
diffusion-models3d-structure-predictionrotational-spectroscopykraitchman-analysismolecular-structuregenerative-aichemical-physics
“KREED achieves >98% top-1 accuracy for correct 3D structure prediction on QM9 and GEOM when using substitution coordinates of all heavy atoms”
paper / aspuruguzik / Sep 17
The algorithm decomposes the wavelet transform kernel into a linear combination of unitaries (LCU) using modular quantum arithmetic, enabling probabilistic implementation with known success probability, then applies amplitude amplification for deterministic execution. It extends to multilevel and packet wavelet transforms with complexity logarithmic in matrix dimension N, linear in levels d, and superlinear in wavelet order M, but independent of M for practical cases. This generalizes prior QWTs limited to low-order Daubechies wavelets, positioning QWTs as versatile analogs to the quantum Fourier transform.
quantum-algorithmsquantum-wavelet-transformquantum-fourier-transformwavelet-decompositionlcu-techniqueamplitude-amplificationquantum-computing
“The algorithm implements any quantum wavelet transform (QWT) using LCU decomposition of the kernel matrix into unitaries from modular quantum arithmetic.”
paper / aspuruguzik / Aug 16
Chemical language models trained on atom-level representations of small molecules extend to proteins, generating complete protein structures atom by atom from primary sequences while capturing hierarchical secondary and tertiary structures. Unlike protein language models limited to standard amino acid vocabularies, these models produce proteins with modified sidechains forming unnatural amino acids, unconstrained by the genetic code. They further generate hybrid protein-drug conjugates by simultaneously exploring protein and chemical spaces, advancing atom-level biomolecular design.
protein-generationlanguage-modelschemical-language-modelsbiomoleculesatom-level-designunnatural-amino-acidsprotein-drug-conjugates
“Chemical language models learn atom-level representations of proteins from amino acid sequences.”
paper / aspuruguzik / Aug 7
Materials acceleration platforms (MAPs) integrate automation and AI to expedite materials discovery for heterogeneous CO2 photo(thermal)catalysis, targeting solar chemicals and fuels. The abstract highlights design/performance descriptors, automation levels in experiments, and AI data analysis precedents. It proposes a MAP framework for autonomous scale-up from discovery to deployment in this emerging field.
materials-acceleration-platformsco2-photocatalysisai-materials-discoveryheterogeneous-catalysisclimate-change-mitigationautomation-in-science
“MAPs combine automation and AI to accelerate molecule and materials discovery”
paper / aspuruguzik / Jul 17
This review provides a comprehensive technical treatment of AI4Science focused on quantum (wavefunctions, electron density), atomistic (molecules, proteins, materials), and continuum (fluids, climate, subsurface) systems, highlighting their shared challenges. Core techniques emphasize equivariant deep learning to encode physical symmetries and first principles. Additional challenges addressed include explainability, OOD generalization, foundation model transfer, and uncertainty quantification, with curated resources for education.
ai4sciencequantum-systemsatomistic-modelingcontinuum-systemsequivariant-mlscientific-aiarxiv-paper
“AI4Science targets natural phenomena across subatomic, atomic, and macro scales, forming a key interdisciplinary subarea.”
paper / aspuruguzik / Jun 20
Novel quantum algorithm solves discretized PDEs with polylogarithmic complexity in matrix size N, independent of condition number κ. Achieved via wavelet basis as auxiliary coordinates, enabling a simple diagonal preconditioner that renders matrix condition numbers N-independent. Generates quantum state for solution feature extraction; numerical simulations validate for various PDEs, potentially enhancing quantum simulation performance.
quantum-algorithmspde-solvingwavelet-preconditioningquantum-computingdifferential-equationscomputational-complexity
“Prior quantum algorithms for PDEs discretized into linear systems scale at least linearly with condition number κ.”
paper / aspuruguzik / May 9
Unmodified language models trained via next-token prediction on sequences from XYZ, CIF, and PDB files directly output valid 3D structures of molecules, crystals, and protein binding sites. This approach handles diverse chemical distributions beyond graph-representable organic molecules, eliminating the need for simplified string or graph encodings. Performance matches state-of-the-art graph-based and domain-specific 3D generative models.
language-modelsmolecular-generation3d-structureschemical-language-modelsprotein-binding-sitesmaterials-designarxiv-paper
“Language models trained on XYZ, CIF, or PDB file sequences generate novel and valid 3D chemical structures.”
paper / aspuruguzik / May 3
A composite measurement scheme distributes measurement shots across multiple schemes using a trainable ratio to optimize expectation value estimation of quantum observables. Composite-LBCS, composing locally-biased classical shadows with Pauli measurements, outperforms prior state-of-the-art methods on molecular systems up to CO2 (30 qubits). The approach supports efficient stochastic gradient descent optimization, even for observables with many terms.
quantum-computingobservable-estimationmeasurement-schemespauli-measurementsclassical-shadowsquantum-algorithms
“Composite measurement scheme composes multiple schemes by distributing shots with a trainable ratio.”
paper / aspuruguzik / Mar 24
CLAIRIFY uses iterative prompting on large language models combined with program verification to produce syntactically valid task plans in data-scarce domain-specific languages from high-level natural language instructions. Errors from prior generations serve as feedback to guide the model, while a verifier enforces syntactic correctness and environment constraints. The method achieves state-of-the-art performance in chemistry experiment planning and supports real-robot execution via integration with task and motion planners.
iterative-promptingprogram-verificationrobotics-planningllm-promptingchemistry-experimentstask-planning
“CLAIRIFY combines automatic iterative prompting with program verification to generate syntactically valid programs in data-scarce domain-specific languages.”
paper / aspuruguzik / Mar 2
The method partitions quantum chemistry simulations using classically efficient product ansatze like separable pair forms, combined with post-treatment via Clifford or near-Clifford circuits to handle subsystem interactions without exponential Hamiltonian growth. These entangling circuits, optimized via simulated annealing and genetic algorithms, are folded into the Hamiltonian for variational quantum eigensolver use. Numerical simulations on molecules demonstrate up to 50% qubit reduction at comparable accuracy to the baseline separable-pair ansatz.
quantum-chemistryquantum-computingclifford-circuitsvariational-quantum-eigensolverqubit-reductionnear-term-quantum
“Clifford or near-Clifford circuits prevent exponential increase in Hamiltonian terms when accounting for partitioned subsystem interactions”
paper / aspuruguzik / Feb 28
qSWIFT is a high-order randomized algorithm for Hamiltonian simulation where gate count is independent of Hamiltonian terms and systematic error decays exponentially with order. It extends qDRIFT by reducing gates linearly with precision inverse to exponentially, with rigorous diamond norm error bounds. Numerical results show third-order qSWIFT requires 1000x fewer gates than qDRIFT for 10^{-6} relative error using one ancilla qubit.
hamiltonian-simulationqswiftqdriftquantum-compilerquantum-algorithmshigh-order-methods
“qSWIFT gate count for given precision is independent of Hamiltonian term count”
paper / aspuruguzik / Feb 22
MVTrans is an end-to-end multi-view RGB architecture that performs depth estimation, segmentation, and pose estimation for transparent objects, bypassing unreliable RGB-D depth maps. It extends stereo methods to handle multiple perception tasks simultaneously. The approach is supported by Syn-TODD, a large-scale synthetic dataset generated via a procedural photo-realistic pipeline compatible with RGB-D, stereo, and multi-view RGB training.
transparent-objectsmulti-view-perceptiondepth-estimationpose-estimationroboticscomputer-visionarxiv-paper
“Transparent object perception remains an unsolved problem despite existing RGB-D and stereo methods for depth and pose estimation”
paper / aspuruguzik / Feb 7
SELFIES provides a 100% robust string-based molecular representation immune to syntactic and semantic errors plaguing SMILES in generative ML models. The library has been generalized to broader molecule types and semantic constraints with a streamlined grammar. Version 2.1.1 of selfieslib delivers major improvements in design, efficiency, and features for cheminformatics pipelines.
selfiesmolecular-representationscheminformaticsmachine-learningchemical-physicsopen-source-library
“SELFIES is inherently 100% robust against syntactic and semantic errors”
paper / aspuruguzik / Jan 10 / failed
Quantum computers offer superior accuracy in quantum chemical calculations essential for industrial applications like drug design. This perspective analyzes the challenges and opportunities in deploying quantum hardware for pharmaceutical research. It identifies transformative potential in industrial workflows and outlines prerequisites for practical adoption.
quantum-computingdrug-designquantum-chemistryquantum-physicsarxiv-paperindustrial-applications
“Quantum computers provide high accuracy in quantum chemical calculations”
paper / aspuruguzik / Dec 19
The framework ingests high-level experiment descriptions, perceives the lab workspace, and employs PDDLStream-based constrained task and motion planning to generate collision- and spillage-free multi-step actions. It enables robots to manipulate diverse lab equipment for executing experiments like pouring, solubility tests, and recrystallization. Demonstrated on fundamental materials synthesis tasks, it accelerates chemist workflows by automating laborious procedures.
roboticslab-automationtask-motion-planningchemistry-experimentspddlstreammaterials-synthesis
“The robot framework autonomously performs chemistry experiments from high-level abstract descriptions.”
paper / aspuruguzik / Dec 6
GAUCHE is a specialized library that implements Gaussian process kernels for chemical representations including graphs, strings, and bit vectors. It facilitates uncertainty quantification and Bayesian optimization in chemistry by extending GPs to structured molecular data. Demonstrated applications target molecular discovery and chemical reaction optimization, with open-source code available on GitHub.
gaussian-processeschemistry-mlbayesian-optimizationuncertainty-quantificationmolecular-discoverychemical-reactionsopen-source-library
“GAUCHE defines Gaussian process kernels over graphs, strings, and bit vectors for chemical inputs”
paper / aspuruguzik / Dec 3
Deep learning excels on large molecular datasets but its efficacy on small ones (<2000 molecules) remains unclear. This study benchmarks probabilistic ML models across representations and tasks (binary classification, regression) for prediction quality, calibration, and uncertainty on low-data chemical datasets. It introduces simulated tests for Bayesian optimization in molecular design and out-of-distribution inference via ablated cluster splits, providing guidance on optimal model and feature choices. The open-source DIONYSUS repository enables reproducibility and extension.
probabilistic-modelschemical-datasetsmodel-calibrationbayesian-optimizationmolecular-designmachine-learningdionysus-repo
“Deep learning models are state-of-the-art for modeling molecular properties when leveraging large datasets.”
paper / aspuruguzik / Dec 1
MatSim dataset combines synthetic images from physics-based rendering of vast texture collections, objects, and environments with natural images to benchmark few-shot recognition of material similarities, transitions, and states. A siamese network trained via contrastive learning on MatSim generates material descriptors that identify states and subclasses from single images, handling mixtures, containers, and diverse environments. This approach outperforms CLIP on a new few-shot benchmark spanning food, beverages, chemistry, and terrain, with strong generalization to unsupervised tasks.
computer-visioncontrastive-learningmaterial-recognitionfew-shot-learningsynthetic-datasetphysics-based-renderingarxiv-paper
“MatSim is the first dataset and benchmark for computer vision-based recognition of similarities and transitions between materials and textures.”
paper / aspuruguzik / Nov 27
Waveflow constructs antisymmetric fermionic wavefunctions using boundary-conditioned normalizing flows on the fundamental domain, bypassing Slater determinants for greater expressiveness in complex many-body systems. It resolves topological mismatches between prior and target distributions with O-spline priors and I-spline bijections, preserving square-normalization. Applied to 1D many-electron systems via VQMC, it accurately learns ground-state wavefunctions.
normalizing-flowsfermionic-wavefunctionsquantum-chemistrymachine-learningcomputational-physicsvariational-monte-carlo
“Waveflow imposes antisymmetry by defining the fundamental domain and applying boundary conditions, avoiding Slater determinants.”
paper / aspuruguzik / Nov 23
Group SELFIES extends SELFIES by incorporating group tokens for functional groups and substructures, preserving chemical validity guarantees while adding flexibility through molecular fragment inductive biases. It outperforms standard SELFIES in distribution learning on common molecular datasets and yields higher-quality molecules from random sampling. Open-source implementation supports further research in generative molecular design.
group-selfiesmolecular-representationselfieschemical-language-modelsmolecular-generationchemical-informaticsmachine-learning
“Group SELFIES maintains the chemical robustness guarantees of SELFIES while enabling group tokens for functional groups or substructures”
paper / aspuruguzik / Oct 30
Hybrid quantum-classical GANs replace GAN components with variational quantum circuits (VQCs), demonstrating quantum advantages in de novo small molecule discovery for drug design. VQCs in the noise generator produce molecules with superior physicochemical properties and goal-directed benchmark performance compared to classical GANs. Quantum discriminators and generators with only tens of learnable parameters achieve better molecule validity, properties, and KL divergence than MLP-based models, reducing parameter counts significantly.
quantum-gangenerative-chemistrydrug-discoveryquantum-machine-learningvariational-quantum-circuitsquantum-advantagesmall-molecule-generation
“VQC in the GAN noise generator generates small molecules with better physicochemical properties and goal-directed benchmark performance than classical counterparts”
paper / aspuruguzik / Oct 19
Machine learning leverages data trends to predict material properties, generate structures, and optimize processes, integrating into energy discovery pipelines for faster progress. The review covers ML applications in photovoltaics, batteries, electrocatalysis, and smart grids, with key performance indicators to evaluate workflow benefits. Future challenges include advancing ML techniques to maximize impact on sustainable energy transitions.
machine-learningsustainable-energyrenewable-energymaterials-scienceenergy-storagephotovoltaicselectrocatalysis
“ML techniques predict material properties, generate candidate structures, and optimize processes in energy research.”
paper / aspuruguzik / Sep 28
Researchers propose using the rank of the dynamical Lie algebra from layer generators to characterize variational quantum circuits for ground-state energy calculations. Higher Lie rank correlates with improved energy accuracy and reduced circuit depth needed, even with parameter counts below generator term numbers. Exponential computation cost is mitigated by using initial iteration growth rate as a lower-bound proxy, positioning Lie rank as a key circuit design metric.
variational-quantum-algorithmsdynamical-lie-algebraquantum-circuitsquantum-controllie-rankground-state-energies
“Lie rank of dynamical Lie algebra from layer generators correlates with accuracy of ground-state energies in variational quantum algorithms”
paper / aspuruguzik / Sep 26
Tartarus introduces practical benchmark tasks for inverse molecular design using physical simulations that mimic real-world problems in materials, drugs, and chemical reactions. It addresses the lack of realistic benchmarks despite advances in AI-driven algorithms for chemical space exploration. Performance of established algorithm families varies significantly across benchmark domains, highlighting the need for domain-specific evaluation.
inverse-molecular-designbenchmarking-platformchemical-space-explorationmolecular-simulationdrug-discoverymaterials-designai-chemistry
“Many algorithms have been developed for inverse molecular design due to increased computational power and AI progress.”
paper / aspuruguzik / Aug 22
QIPA introduces a family of hybrid variational quantum algorithms that surpass current near-term quantum optimization methods. Demonstrated on H2 molecular ground-state dissociation, transmon qubit ground-state search, and biprime factorization. Features shallow circuits compatible with error mitigation and adaptive ansatzes for scalable NISQ implementation.
quantum-algorithmsvariational-quantumglobal-optimizationquantum-power-algorithmsh2-moleculetransmon-qubitbiprime-factorization
“QIPA family of variational quantum algorithms outperforms existing hybrid near-term quantum algorithms for global optimization”
paper / aspuruguzik / Jul 11
The method models parameterized quantum circuits as graphs where mutual information between gate nodes defines a distance metric for path-based optimization in variational algorithms. Applied to VQE, it computes Heisenberg model ground states; for VQC, it solves binary classification. Numerical simulations confirm improved convergence for near-term quantum algorithms, enhancing stochastic gradient methods.
quantum-circuitsinformation-flowvariational-algorithmsquantum-eigensolverquantum-classificationnear-term-quantum
“Mutual information between gate nodes in a quantum circuit graph provides a distance metric for optimization paths.”
paper / aspuruguzik / Jul 6
Researchers introduce evolutionary algorithms to design quantum autoencoders that compress quantum states into lower-dimensional representations, reducing resource needs on noisy quantum devices. The method successfully compresses families of quantum states using circuits with restricted gate sets for efficient classical simulation. This hybrid approach leverages classical computation to optimize quantum data representations with minimal resources.
quantum-compressionquantum-autoencodersevolutionary-algorithmsclassically-simulatable-circuitsquantum-circuitsmachine-learning-quantum
“Evolutionary algorithms can design quantum autoencoders for compressing quantum information into lower-dimensional representations.”
paper / aspuruguzik / May 18
A quantum-inspired superposition technique combined with cluster expansion enables mapping chemical space exploration to quantum annealers, overcoming prior compatibility issues. This method searches for optimal materials 10-50 times faster than genetic algorithms and Bayesian optimization, with superior ground state prediction accuracy. Applied to acidic OER catalysts, it identifies a novel Ru-Cr-Mn-Sb-O2 family where the top performer exhibits 8x higher mass activity than RuO2 and stability over 180 hours at 10 mA/cm² in 0.5 M H2SO4.
quantum-annealingcluster-expansionmaterials-discoveryoer-catalystschemical-space-searchquantum-inspired-optimization
“Quantum-inspired cluster expansion accelerates chemical space search 10-50 times faster than genetic algorithms and Bayesian optimization”
paper / aspuruguzik / May 9
Researchers demonstrate a superconducting circuit architecture where a coupling module mediates both 2-local and 3-local interactions between three flux qubits. The system Hamiltonian is characterized using multi-qubit Ramsey-type interferometry across excitation manifolds. The 3-local interaction is coherently tunable over several MHz via coupler flux biases and can be fully turned off, enabling applications in quantum annealing, analog simulation, and gate-based computation.
quantum-physicssuperconducting-qubitsthree-body-interactionsquantum-computingflux-qubitsarxiv-paper
“A coupling module mediates 2-local and 3-local interactions between three flux qubits by design.”
paper / aspuruguzik / Apr 4
Advanced AI systems can contribute to scientific understanding through three dimensions: acting as computational microscopes to reveal hidden mechanisms, serving as sources of inspiration for new concepts, and potentially evolving into autonomous agents of understanding. The paper draws from philosophy of science and anecdotes from scientists to define these roles, highlighting current limitations and future research directions. Achieving true AI-driven scientific comprehension requires moving beyond prediction to mechanistic explanation, positioning AI as a pathway to artificial scientists.
scientific-understandingai-in-sciencephilosophy-of-sciencemachine-learningcomputational-microscopeartificial-scientistschemical-physics
“Scientific understanding requires more than accurate predictions; it demands comprehension of how predictions are made.”
paper / aspuruguzik / Mar 31
SELFIES, introduced in 2020, ensures 100% valid molecular representations, overcoming SMILES' key limitation where most symbol combinations yield invalid chemistries. This robustness has enabled new AI/ML applications in property prediction, reaction discovery, and molecule design. The paper outlines 16 future projects to extend SELFIES to new domains and enhance AI interpretability.
selfiesmolecular-representationssmilesai-chemistrymachine-learningchemical-physicsrobust-languages
“SMILES, the dominant molecular string representation since the 1980s, produces invalid chemical structures for most symbol combinations.”
paper / aspuruguzik / Mar 29
Phoenics and Gryffin algorithms are extended to handle arbitrary known experimental and design constraints via an intuitive interface, addressing non-linear, interdependent constraints in chemical optimization domains. Benchmarks on continuous and discrete test functions demonstrate flexibility and robustness. Applications include optimizing o-xylenyl Buckminsterfullerene adduct synthesis under flow constraints and designing redox-active molecules for flow batteries under synthetic accessibility limits, enabling model-based optimization in autonomous scientific platforms.
bayesian-optimizationchemical-optimizationconstrained-optimizationmachine-learningautonomous-experimentationmaterials-design
“Phoenics and Gryffin now support arbitrary known constraints through an intuitive and flexible interface.”
paper / aspuruguzik / Feb 7
This work proposes quantum computing using molecular electronics, implementing one-qubit gates through one-electron scattering in molecules and two-qubit controlled-phase gates via electron-electron scattering along metallic leads. It introduces a class of circuit implementations and demonstrates one-qubit gates with molecular hydrogen's electronic structure as a baseline. The framework bridges molecular physics and quantum computing for potential scalable hardware.
quantum-computingmolecular-electronicsone-qubit-gatestwo-qubit-gatesquantum-physicschemical-physicsarxiv-paper
“One-qubit gates can be constructed using one-electron scattering in molecules.”
paper / aspuruguzik / Feb 1
This work introduces a reinforcement learning framework that generates molecules as 3D structures by sequentially placing molecular fragments rather than individual atoms, leveraging chemist expertise for efficiency. Guided solely by energy-based rewards, the hierarchical agent produces complex molecules exceeding 100 atoms, including drug-like, OLED, and biomolecular distributions. This addresses limitations of prior string/graph-based generative models that neglect 3D geometry critical for applications like drug discovery.
reinforcement-learning3d-molecular-designfragment-based-designdrug-discoverymolecular-generationmachine-learningarxiv-paper
“Existing ML generative models using string and graph representations ignore 3D molecular structure.”
paper / aspuruguzik / Jan 21
AlphaFold predicted structures powered an end-to-end AI drug discovery pipeline using PandaOmics for target selection and Chemistry42 for generative molecule design, yielding a CDK20 hit (Kd 8.9 μM) after synthesizing 7 compounds in 30 days. A second AI iteration produced a more potent analog (ISM042-2-048, Kd 210 nM) after 6 syntheses in another 30 days. This marks the first reported small-molecule CDK20 inhibitor and the inaugural use of AlphaFold in early-stage hit identification for a novel target lacking experimental structure.
alphafoldai-drug-discoverycdk20-inhibitorstructure-based-designgenerative-chemistryprotein-structure-prediction
“First hit for CDK20 achieved with Kd = 8.9 ± 1.6 μM after synthesizing 7 compounds in 30 days from target selection”
paper / aspuruguzik / Dec 6
Simple recurrent neural network language models, using string representations of molecules, effectively learn complex molecular distributions that challenge graph generative models. They excel on tasks like generating the highest-scoring penalized LogP molecules from ZINC15, multi-modal distributions, and the largest molecules in PubChem. Results show language models achieve superior performance compared to widely used graph models, particularly highlighting their strength in low-data regimes.
language-modelsmolecular-generationgenerative-modelsdrug-discoverymachine-learningarxiv-paperquantum-methods
“Language models can accurately generate distributions of the highest scoring penalized LogP molecules in ZINC15”
paper / aspuruguzik / Oct 20
QNODE employs a latent neural ODE to model quantum dynamics from expectation values of closed and open systems, satisfying von Neumann and Lindblad master equations unsupervised. It extrapolates beyond training data, rediscovers Heisenberg's uncertainty principle data-driven without constraints, and generates physically consistent trajectories where latent space proximity implies dynamical similarity. This overcomes quantum dimensionality curse for machine-assisted physics discovery.
quantum-dynamicsneural-odesmachine-learningquantum-physicslatent-modelsscientific-discovery
“QNODE learns to generate expectation values satisfying von Neumann equation for closed quantum systems”
paper / aspuruguzik / Oct 19
Derives certified robustness intervals containing ideal measurement outcomes despite NISQ-era state preparation and evolution errors. Bounds use semidefinite programs with first moments and fidelity, plus higher moments via Gram matrix non-negativity generalized to mixed states. Demonstrated on VQE simulations, enabling reliable near-term quantum applications.
quantum-computingnisq-eraquantum-measurementserror-boundsvqesemi-definite-programsstate-fidelity
“Robustness intervals guaranteed to contain ideal outputs are formulated as semidefinite programs using first moment and fidelity to ideal state”
paper / aspuruguzik / Oct 13
The variational quantum eigensolver (VQE) is enhanced by a perturbative explicitly correlated [2]_R12-correction that significantly improves accuracy using only the converged one- and two-particle RDMs from the reference wavefunction, combined with molecular integrals. This classical post-processing step scales as the sixth power of the number of electrons and requires no additional quantum measurements. MRA-PNOs as complementary basis functions enable highly accurate molecular simulations at minimal basis set quantum cost while reducing the cubic complexity of the correction computation.
vqequantum-eigensolverr12-correctionmolecular-systemsquantum-chemistrypair-natural-orbitalsmultiresolution-analysis
“[2]_R12-correction increases VQE accuracy significantly without additional quantum resources”