Machine Learning Research
Unique Recovery of Transport Maps and Vector Fields from Finite Data
This paper establishes conditions for the unique identification of diffeomorphisms and vector fields using finite measure-valued data. It introduces a new metric to compare diffeomorphisms based on discrepancies in pushforward densities. The analysis leverages Whitney and Takens embedding theorems t…
Geometric Framework for Prototype Clustering Accuracy
This paper introduces a geometric framework to analyze the relationship between objective accuracy and structural recovery in prototype-based clustering. It defines a clustering condition number that quantifies the difficulty of separating clusters, showing that a small suboptimality gap implies low…
Re-evaluating Data Leakage Severities in Machine Learning
This study systematically quantifies the impact of four classes of data leakage in machine learning across diverse datasets. It reveals that selection leakage, often overlooked, is the most significant, while estimation leakage (e.g., scaler fitting on full data) commonly emphasized in textbooks, ha…
Information-Theoretic Limits and QP Relaxation for Attributed Network Alignment
This research introduces the featured correlated Gaussian Wigner model to optimize attributed network alignment by integrating node features with graph topology. The authors establish the information-theoretic limits for exact and partial recovery and present QPAlign, a quadratic programming relaxat…
FLOWGEM: A Principled Solution for Non-Monotonic MAR Missingness in Data
FLOWGEM is a novel, iterative method addressing non-monotonic Missing at Random (MAR) data by minimizing Kullback-Leibler divergence through approximate Wasserstein Gradient Flows. This approach utilizes a discretized particle evolution and a local linear estimator for density ratio, enabling the ge…
Optimized Partially Deterministic Sampling Improves Compressed Sensing
This paper introduces a novel partially deterministic sampling scheme for compressed sensing, combining random and deterministic selection of sampling vectors from rows of a unitary matrix. This method offers improved sample complexity and novel denoising guarantees. Numerical experiments demonstrat…
Individual-Heterogeneous Sub-Gaussian Mixture Models Outperform Homogeneous Models in Clustering
The paper introduces individual-heterogeneous sub-Gaussian mixture models (IHSGMM) to address limitations of traditional Gaussian mixture models (GMM) which assume cluster homogeneity. IHSGMMs assign a unique heterogeneity parameter to each observation, allowing for better capture of real-world data…
Weighted Bayesian Conformal Prediction Generalizes Uncertainty Quantification Under Distribution Shift
Weighted Bayesian Conformal Prediction (WBCP) extends traditional Bayesian Conformal Prediction (BQ-CP) to handle distribution shifts by incorporating importance weights. This method replaces the uniform Dirichlet prior with a weighted Dirichlet, using Kish's effective sample size. WBCP improves con…
