paper / wesroth / Apr 10
Observations of comet D/2021 A1 (Leonard) reveal that its volatile emissions, specifically HCN and CS, exhibited behavior inconsistent with solely solar insolation-driven sublimation as it approached perihelion. The increasing CS mixing ratio and the variable HCN abundance, particularly during outburst and fragmentation events, suggest significant contributions from intrinsic disruption processes. This highlights the necessity of multi-epoch, multi-instrument monitoring to accurately characterize the complex volatile evolution of comets.
comet-observationvolatile-emissionsastrophysicsmillimeter-astronomycomet-leonardspectral-analysissolar-system
“CS mixing ratios increased significantly as comet Leonard approached the Sun.”
paper / wesroth / Apr 10
This study systematically quantifies the impact of four classes of data leakage in machine learning across diverse datasets. It reveals that selection leakage, often overlooked, is the most significant, while estimation leakage (e.g., scaler fitting on full data) commonly emphasized in textbooks, has negligible effect. Memorization leakage scales with model capacity, and boundary leakage remains undetected by random cross-validation. The findings challenge conventional understanding of data leakage severity.
machine-learningdata-leakagemodel-evaluationtabular-datatemporal-datastatistical-analysisexperimental-design
“Class I (estimation) data leakage, such as fitting scalers on full datasets, has a negligible effect on model performance.”
paper / wesroth / Apr 10
DISCO, a multimodal generative AI model, co-designs protein sequences and 3D structures around arbitrary biomolecules. This model, conditioned solely on reactive intermediates, has successfully created diverse heme enzymes with novel active-site geometries. These enzymes catalyze previously unknown carbene-transfer reactions, surpassing the activity of engineered enzymes and offering a scalable path for evolvable enzymes.
protein-designmultimodal-aigenerative-modelsenzymesbiomoleculescarbene-transferdirected-evolution
“Deep generative models have been limited in designing enzymes without predefined catalytic residues.”
paper / wesroth / Apr 10
DiffuMask is a novel diffusion-based framework for prompt compression in large language models. It addresses the computational intensity of traditional sequential token removal methods by enabling rapid and parallel prompt pruning through iterative mask prediction. This technique significantly accelerates prompt compression while preserving essential reasoning context and maintaining or improving accuracy across various operational settings, leading to faster and more reliable in-context reasoning.
prompt-engineeringllm-optimizationdiffusion-modelsnatural-language-processingcomputational-linguisticsprompt-compression
“Existing prompt compression methods that rely on sequential token removal are computationally intensive.”
paper / wesroth / Apr 10
This paper details an extension of a novel approach for developing classical density functionals for hard-sphere (HS) fluids. By integrating test-particle sum rules for excess chemical potential and isothermal compressibility, the authors optimize the parameters in Lutsko's fundamental measure theory (FMT) formulations. This optimization specifically targets enhancing the accuracy of existing White-Bear (WB) and White-Bear mark II functionals.
classical-dfttest-particle-sum-ruleshard-sphere-fluidsfundamental-measure-theorysoft-condensed-matterstatistical-mechanics
“Test particle sum rules can improve classical density functionals for hard-sphere fluids.”