absorb.md — A knowledge graph of what AI thinkers are actually saying

paper / AndrewYNg / Dec 2

EHR Models Achieve High Accuracy in Predicting 24-Hour Inpatient Discharges

Models trained on eight years of Stanford Hospital EHR data predict 24-hour inpatient discharges with AUROC 0.85 and AUPRC 0.53 on held-out test sets. These models are well-calibrated across the entire inpatient population. Decision-theoretic analysis identifies ROC regions where the model outperforms trivial classifiers in expected utility.

machine-learninghealthcare-aiehr-analysisdischarge-predictionpredictive-modelingaurochospital-management

“Models predict 24-hour inpatient discharge with AUROC of 0.85”

paper / AndrewYNg / Nov 12

Deep CNN Enables Robust Atrial Fibrillation Detection from Wrist PPG in Free-Living Conditions

Researchers developed a 50-layer CNN to detect atrial fibrillation (AF) episodes from wrist-worn PPG signals in ambulatory settings. They annotated a new dataset of over 4000 hours of PPG data, achieving 95% test AUC despite motion artifacts. This advances wearable devices toward clinical-grade AF monitoring.

atrial-fibrillationphotoplethysmographywearable-devicesdeep-learningcnnmedical-aihealth-monitoring

“Algorithm detects AF from ambulatory PPG using a 50-layer convolutional neural network”

paper / AndrewYNg / Jun 21

Survival-CRPS Enables Sharper, Calibrated Survival Predictions Over MLE

MLE-trained survival models produce high-variance probabilistic predictions. Survival-CRPS, adapting meteorology's CRPS for right- and interval-censored survival data, optimizes for sharpness under calibration. On EHR datasets STARR (RNN) and MIMIC-III (FCN), Survival-CRPS yields sharper distributions than MLE while preserving calibration.

survival-analysiscrpscalibrated-predictionshealthcare-aiehealth-recordsdeep-learningmachine-learning

“Probabilistic survival predictions from MLE-trained models exhibit high variance.”

paper / AndrewYNg / Feb 12

Ab Initio DFT Model Unifies Thermophysical and Optical Properties in Two-Temperature Warm Dense Matter

Presents an ab initio density functional theory model for thermophysical and optical properties of two-temperature warm dense matter, featuring heated electrons and cold ions in a solid lattice during ultrafast laser heating. Optical properties are computed via the Kubo-Greenwood formula. The model accurately simulates femtosecond-laser-heated gold's temperature relaxation and optical dynamics, matching experimental data from Chen et al. (Phys. Rev. Lett. 110, 135001, 2013).

warm-dense-matteroptical-propertiesab-initio-simulationsdensity-functional-theoryplasma-physicscomputational-physicstwo-temperature-model

“The model uses ab initio density functional theory simulations to describe thermophysical and optical properties of two-temperature systems with heated electrons and cold ions.”

paper / AndrewYNg / Dec 11

MURA Dataset Enables DenseNet Model Matching Top Radiologist Performance in Musculoskeletal Abnormality Detection

MURA comprises 40,561 musculoskeletal radiographs from 14,863 studies, labeled as normal or abnormal by radiologists, with a robust test set labeled by six board-certified Stanford radiologists using majority vote of three as gold standard. A 169-layer DenseNet model trained on MURA achieves AUROC 0.929 (sensitivity 0.815, specificity 0.887) and matches the best radiologist's Cohen's kappa on finger and wrist studies. Performance lags behind top radiologists on elbow, forearm, hand, humerus, and shoulder studies, positioning MURA as a key benchmark for advancing AI in radiology.

mura-datasetmedical-imagingradiology-aiabnormality-detectiondensenetmusculoskeletalarxiv-paper

“MURA dataset contains 40,561 images from 14,863 studies labeled as normal or abnormal by radiologists.”

paper / AndrewYNg / Nov 17

Deep Neural Networks Predict Mortality from EHR Data to Enable Proactive Palliative Care Referrals

A deep neural network trained on historical EHR data predicts all-cause 3-12 month mortality for hospitalized patients, identifying those likely to benefit from palliative care. This automates triage, bypassing physician overestimation of prognoses and treatment inertia that misalign patient wishes with care. The model is piloted at an academic medical center with IRB approval, featuring a novel interpretation technique for prediction explanations.

deep-learninghealthcare-aipalliative-careehr-datamortality-predictionexplainable-aiarxiv-paper

“Physicians tend to over-estimate prognoses for end-of-life patients.”

paper / AndrewYNg / Nov 14

CheXNet DenseNet Achieves Radiologist-Surpassing Pneumonia Detection on Chest X-Rays

CheXNet, a 121-layer DenseNet-121 CNN, is trained on the 100,000+ image ChestX-ray14 dataset to detect pneumonia from frontal chest X-rays. On a test set labeled by four academic radiologists, it outperforms their average F1 score for pneumonia detection. The model extends to all 14 diseases in the dataset, establishing state-of-the-art results across them.

chexnetpneumonia-detectionchest-xraymedical-imagingdeep-learningcomputer-visionradiology-ai

“CheXNet detects pneumonia from chest X-rays at a level exceeding practicing radiologists”

paper / AndrewYNg / Jul 6

Deep CNN Surpasses Cardiologists in Single-Lead ECG Arrhythmia Detection

Researchers trained a 34-layer CNN on a massive ECG dataset exceeding prior corpora by 500x in unique patients, enabling detection of diverse arrhythmias from single-lead wearable monitors. The model maps ECG sequences to rhythm classes and was evaluated against a gold standard test set annotated by committees of board-certified cardiologists. It outperforms the average of 6 individual cardiologists in both sensitivity (recall) and positive predictive value (precision).

arrhythmia-detectionconvolutional-neural-networksecg-analysismedical-aicomputer-visionwearable-healtharxiv-paper

“The algorithm exceeds performance of board-certified cardiologists in detecting heart arrhythmias from single-lead ECGs.”

paper / AndrewYNg / Mar 7

Input Noising Equates to n-gram Smoothing in Neural Language Models

The paper establishes a theoretical connection between data noising in neural network language models and smoothing techniques in n-gram models. It leverages this link to adapt smoothing-inspired noising primitives for discrete sequence tasks like language modeling. Experiments confirm perplexity and BLEU score improvements in language modeling and machine translation, with empirical validation of the noising-smoothing equivalence.

neural-language-modelsdata-noisingmodel-regularizationlanguage-modelingmachine-translationn-gram-smoothingiclr-2017

“Data noising regularizes neural network language models effectively”

paper / AndrewYNg / Feb 25

Deep Voice: End-to-End Neural TTS with Real-Time Inference

Deep Voice is a fully neural text-to-speech system comprising segmentation, G2P, duration prediction, F0 prediction, and audio synthesis models. It introduces CTC-based phoneme boundary detection and a parameter-efficient WaveNet variant for synthesis, eliminating traditional feature engineering. The system supports faster-than-real-time inference via optimized CPU/GPU kernels achieving up to 400x speedups.

text-to-speechdeep-voiceneural-speech-synthesiswavenetphoneme-segmentationctc-lossreal-time-inference

“Deep Voice is constructed entirely from deep neural networks without traditional feature engineering.”

paper / AndrewYNg / Aug 25

Speech Recognition Outperforms Touch Keyboards by Nearly 3x for Short Messages in English and Mandarin

Laboratory tests on iPhone 6 Plus show Baidu Deep Speech 2 achieving 153 WPM in English and 123 WPM in Mandarin for short message transcription, versus 52 WPM and 43 WPM for iOS Qwerty and Pinyin keyboards. Speech input rates are 2.93x faster in English and 2.87x faster in Mandarin under ideal conditions. Speech produces fewer corrected errors (5.30% vs. 11.22%) but slightly more uncorrected errors (1.30% vs. 0.79%).

speech-recognitiontext-entrymobile-hcitouchscreen-keyboardsmultilingual-inputhuman-computer-interaction

“Speech recognition input rate is 153 WPM in English, 2.93 times faster than keyboard's 52 WPM”

paper / AndrewYNg / Mar 31

Character-Level Attention RNN Achieves SOTA Grammatical Error Correction

Encoder-decoder RNN with character-based attention corrects language errors like redundancy and non-idiomatic phrasing by avoiding OOV issues inherent in word-level models. Trained on noisy learner forum data and augmented with synthesized errors, it outperforms prior methods. Combined with a language model, it sets a new state-of-the-art F0.5 score on the CoNLL 2014 Shared Task.

neural-language-correctioncharacter-based-attentionencoder-decoder-rnnlanguage-learninggrammatical-error-correctionarxiv-paperandrew-ng

“Character-level encoder-decoder RNN with attention handles orthographic errors, redundancy, and non-idiomatic phrasing flexibly”

paper / AndrewYNg / Dec 7

Driverseat Harnesses Crowdsourcing to Overcome Data Labeling and Evaluation Bottlenecks in Autonomous Driving AI

Driverseat integrates crowdsourcing with deep learning systems for autonomous driving to address key bottlenecks: lack of comprehensively labeled 3D datasets and robust evaluation strategies. It enables crowd workers to generate complex 3D labels and tag diverse failure scenarios. The system demonstrates success by crowdstrapping a CNN for lane detection, proposing "crowdstrapping" as a hybrid human-AI paradigm for perception tasks.

autonomous-drivingcrowdsourcingdeep-learninglane-detectionhuman-computer-interactioncrowdstrapping

“Deep-learning systems face two major bottlenecks in autonomous driving detection tasks: unavailability of comprehensively labeled datasets and expressive evaluation strategies.”

paper / AndrewYNg / Apr 7

Deep Learning CNNs Enable Real-Time Lane and Vehicle Detection on Highway Driving Datasets

Researchers collected a large highway driving dataset and evaluated recent deep learning techniques, particularly CNNs, for car and lane detection in autonomous driving. Existing CNN architectures achieve real-time frame rates suitable for practical systems. Results empirically validate deep learning's promise for robust, inexpensive autonomous driving perception.

deep-learningautonomous-drivingcomputer-visioncnnlane-detectionvehicle-detectionarxiv-paper

“Computer vision combined with deep learning offers a relatively inexpensive, robust solution to autonomous driving.”

paper / AndrewYNg / Dec 17

End-to-End Deep Learning Achieves State-of-the-Art Speech Recognition Without Hand-Engineered Components

Deep Speech employs a simplified end-to-end deep learning architecture using optimized RNNs trained on multiple GPUs, eliminating traditional hand-crafted pipelines for phonemes, noise modeling, or speaker variation. Novel data synthesis techniques enable efficient generation of large, varied training datasets. The system achieves 16.0% word error rate on Switchboard Hub5'00, outperforming prior benchmarks, and excels in noisy environments compared to commercial systems.

speech-recognitiondeep-speechend-to-end-learningrnndeep-learningmulti-gpu-trainingarxiv-paper

“Deep Speech achieves 16.0% error rate on the full Switchboard Hub5'00 test set”

paper / AndrewYNg / Aug 12

Bi-Directional RNNs Enable End-to-End First-Pass LVCSR Without HMMs

This work demonstrates first-pass large vocabulary continuous speech recognition (LVCSR) using only a recurrent neural network (RNN) acoustic model and language model, bypassing HMM sequence modeling. A straightforward RNN architecture with bi-directional recurrence achieves competitive accuracy. A modified prefix-search decoding algorithm integrates the language model directly, eliminating HMM infrastructure needs. Experiments on WSJ corpus validate the approach's efficacy.

speech-recognitionrecurrent-neural-networksbidirectional-rnnslarge-vocabularycontinuous-speechneural-decodingarxiv-paper

“RNNs can perform first-pass LVCSR using only a neural network and language model, without HMMs.”

paper / AndrewYNg / Jun 30

Simple DNN Architectures Excel in Large-Scale Speech Recognition Acoustic Modeling

Empirical analysis on Switchboard (300 hours) and combined Switchboard-Fisher (2,100 hours) corpora reveals that straightforward DNN architectures with maximum likelihood training outperform convolutional and locally-connected untied networks in word error rate reduction. Larger models with up to 10x more parameters scale effectively on bigger datasets, establishing best practices for DNN hybrid systems. Findings provide a case study for DNN optimization with discriminative losses applicable beyond speech tasks.

speech-recognitiondnn-acoustic-modelsdeep-neural-networksconvolutional-networksarxiv-paperandrew-ngmachine-learning

“Standard DNNs achieve strong performance on Switchboard benchmark compared to convolutional and locally-connected untied networks.”

paper / AndrewYNg / Dec 24

Deep Neural Networks Enable Class-Agnostic Object Detection from Image Recognition Architectures

Deep neural networks pretrained for image recognition can be adapted for class-generic object detection, identifying objects in images without class-specific bounding box labels. This approach detects novel objects absent from training bounding box data. Additionally, incorporating bounding box labels boosts ImageNet recognition performance by 1%.

deep-learningobject-detectioncomputer-visionneural-networksclass-generic-detectionandrew-ngarxiv-paper

“Neural networks designed for image recognition can be trained to detect objects within images regardless of their class”

paper / AndrewYNg / Jul 9

Algorithms Correcting Peer Grader Biases Boost MOOC Grading Accuracy

Algorithms estimate and adjust for individual grader biases and reliabilities in MOOC peer assessments, markedly improving accuracy against expert benchmarks. Applied to 63,199 grades from Coursera's HCI courses—the largest dataset analyzed—these tuned models link biases to student engagement, performance, and commenting style. The approach also enables smarter grader assignments for enhanced reliability.

peer-gradingmoocsgrader-biasmachine-learningonline-educationcourseraandrew-ng

“Peer grading algorithms significantly improve accuracy on real MOOC data compared to uncorrected peer grades”

paper / AndrewYNg / Feb 6

Information-Theoretic Decomposition Reveals K-Means' Cluster Balance vs. Similarity Trade-Off Over EM

The paper provides an information-theoretic analysis comparing hard assignments in K-means and soft assignments in EM for clustering. It decomposes expected distortion to show K-means trades off intra-cluster similarity against partition entropy, measuring cluster balance. This framework predicts K-means yields less overlapping densities than EM and introduces posterior assignment as an alternative.

machine-learningclusteringk-meansexpectation-maximizationinformation-theoryandrew-ngunsupervised-learning

“K-means minimizes distortion while EM maximizes likelihood”

paper / AndrewYNg / Jan 16

Deterministic POMDP Transformation Enables Efficient Policy Search for Large-Scale MDPs and POMDPs

PEGASUS transforms any MDP or POMDP into an equivalent POMDP with fully deterministic state transitions given state and action, reducing policy search to evaluating policies in this simplified structure. Policy values are estimated directly in the transformed space, with search performed to find high-value policies. The method yields polynomial sample complexity in horizon time—improving on prior exponential bounds—and extends to infinite state/action spaces, demonstrated on discrete and continuous control tasks like bicycle riding.

policy-searchpegasuspomdpsmdpsreinforcement-learningmarkov-processesandrew-ng

“Any MDP or POMDP can be transformed into an equivalent POMDP where all state transitions are deterministic given the current state and action.”

paper / AndrewYNg / Jan 16

Cross-Modal Transfer from Text Corpora Enables Zero-Shot Image Recognition

The model recognizes objects in images without training data using semantic embeddings derived from unsupervised text corpora as a shared representation space. It combines outlier detection in semantic space with dual recognition models to achieve state-of-the-art performance on seen classes with abundant training images and reasonable accuracy on unseen classes. No manual semantic features are required for words or images.

zero-shot-learningcross-modal-transfercomputer-visionmachine-learningsemantic-spaceoutlier-detectionarxiv-paper

“Model recognizes objects in images with no training data for those classes”

paper / AndrewYNg / Jan 16

Neural Tensor Networks Complete Knowledge Bases by Predicting Missing Relations with Word Vector Initialization

Introduces Neural Tensor Networks (NTN) to predict missing true relationships in incomplete knowledge bases using generalizations from existing data. Improves performance by initializing entity representations with unsupervised semantic word vectors, enabling queries for unseen entities. Outperforms prior models and achieves 75.8% accuracy on classifying unseen WordNet relationships.

neural-tensor-networksknowledge-base-completionsemantic-word-vectorsentity-relation-predictionntn-modelwordnet-classification

“NTN model predicts additional true relationships to complete knowledge bases based on existing data generalizations.”

paper / AndrewYNg / Jul 4

Polynomial-Time Learning for Bounded Factor Graphs

Factor graphs with bounded factor size and connectivity can be learned in polynomial time and sample complexity for both parameter estimation and structure learning, assuming data generation from this class. This extends to bounded-degree Bayesian and Markov networks as a corollary. The method avoids costly inference, applying even to intractably inferring networks, with graceful error degradation for out-of-class distributions.

factor-graphsgraphical-modelsstructure-learningparameter-estimationmachine-learningsample-complexitypolynomial-time

“Factor graphs with bounded factor size and bounded connectivity can be learned in polynomial time and polynomial sample complexity”

paper / AndrewYNg / Jun 20

Efficient Shift-Invariant Sparse Coding Outperforms Spectral Features for Audio Classification

Shift-invariant sparse coding (SISC) extends sparse coding to reconstruct time-series inputs using basis functions across all shifts, enabling efficient learning of high-level audio representations from unlabeled data. The method solves two convex problems exactly: L1-regularized least squares for sparse coefficients via full optimization without heuristics, and constrained least squares for bases in the Fourier domain over complex variables to decouple shifts. Learned SISC features for speech and music outperform state-of-the-art spectral and cepstral features in classification under certain conditions.

sparse-codingshift-invarianceaudio-classificationunsupervised-learningmachine-learningfeature-learning

“SISC reconstructs inputs using basis functions in all possible shifts.”

paper / AndrewYNg / Jun 13 / failed

UAI 2009 Proceedings Archival on arXiv by Bilmes and Ng

arXiv:1206.3959v2 archives the proceedings of the 25th Uncertainty in Artificial Intelligence conference, held June 18-21, 2009, in Montreal, QC, Canada. Edited by Jeff Bilmes and Andrew Ng, it represents a key collection of research on uncertainty modeling in AI. No PDF is directly available on arXiv, with versions submitted in 2012 and revised in 2014.

uncertainty-aiuai-conferenceandrew-ngjeff-bilmesarxiv-paperai-proceedingsmachine-learning

“The Twenty-Fifth Conference on Uncertainty in Artificial Intelligence occurred June 18-21, 2009, in Montreal, QC, Canada.”

paper / AndrewYNg / Dec 29

Unsupervised Training Yields Robust High-Level Feature Detectors for Faces and Beyond

A 9-layer sparse autoencoder with 1 billion connections, trained on 10 million unlabeled 200x200 images using 1,000 machines for 3 days, learns class-specific detectors for faces, cat faces, and human bodies. These detectors prove robust to translation, scaling, and out-of-plane rotation, as validated by control experiments. Fine-tuning on ImageNet achieves 15.8% accuracy across 20,000 categories, a 70% relative improvement over prior state-of-the-art.

unsupervised-learningsparse-autoencoderfeature-detectionlarge-scale-trainingcomputer-visionimagenetandrew-ng

“A 9-layered locally connected sparse autoencoder with pooling and local contrast normalization can be trained on unlabeled images to detect faces.”