Chronological feed of everything captured from Andrew Ng.
paper / AndrewYNg / Dec 2
Models trained on eight years of Stanford Hospital EHR data predict 24-hour inpatient discharges with AUROC 0.85 and AUPRC 0.53 on held-out test sets. These models are well-calibrated across the entire inpatient population. Decision-theoretic analysis identifies ROC regions where the model outperforms trivial classifiers in expected utility.
machine-learninghealthcare-aiehr-analysisdischarge-predictionpredictive-modelingaurochospital-management
“Models predict 24-hour inpatient discharge with AUROC of 0.85”
paper / AndrewYNg / Nov 12
Researchers developed a 50-layer CNN to detect atrial fibrillation (AF) episodes from wrist-worn PPG signals in ambulatory settings. They annotated a new dataset of over 4000 hours of PPG data, achieving 95% test AUC despite motion artifacts. This advances wearable devices toward clinical-grade AF monitoring.
atrial-fibrillationphotoplethysmographywearable-devicesdeep-learningcnnmedical-aihealth-monitoring
“Algorithm detects AF from ambulatory PPG using a 50-layer convolutional neural network”
paper / AndrewYNg / Jun 21
MLE-trained survival models produce high-variance probabilistic predictions. Survival-CRPS, adapting meteorology's CRPS for right- and interval-censored survival data, optimizes for sharpness under calibration. On EHR datasets STARR (RNN) and MIMIC-III (FCN), Survival-CRPS yields sharper distributions than MLE while preserving calibration.
survival-analysiscrpscalibrated-predictionshealthcare-aiehealth-recordsdeep-learningmachine-learning
“Probabilistic survival predictions from MLE-trained models exhibit high variance.”
paper / AndrewYNg / Feb 12
Presents an ab initio density functional theory model for thermophysical and optical properties of two-temperature warm dense matter, featuring heated electrons and cold ions in a solid lattice during ultrafast laser heating. Optical properties are computed via the Kubo-Greenwood formula. The model accurately simulates femtosecond-laser-heated gold's temperature relaxation and optical dynamics, matching experimental data from Chen et al. (Phys. Rev. Lett. 110, 135001, 2013).
warm-dense-matteroptical-propertiesab-initio-simulationsdensity-functional-theoryplasma-physicscomputational-physicstwo-temperature-model
“The model uses ab initio density functional theory simulations to describe thermophysical and optical properties of two-temperature systems with heated electrons and cold ions.”
paper / AndrewYNg / Dec 11
MURA comprises 40,561 musculoskeletal radiographs from 14,863 studies, labeled as normal or abnormal by radiologists, with a robust test set labeled by six board-certified Stanford radiologists using majority vote of three as gold standard. A 169-layer DenseNet model trained on MURA achieves AUROC 0.929 (sensitivity 0.815, specificity 0.887) and matches the best radiologist's Cohen's kappa on finger and wrist studies. Performance lags behind top radiologists on elbow, forearm, hand, humerus, and shoulder studies, positioning MURA as a key benchmark for advancing AI in radiology.
mura-datasetmedical-imagingradiology-aiabnormality-detectiondensenetmusculoskeletalarxiv-paper
“MURA dataset contains 40,561 images from 14,863 studies labeled as normal or abnormal by radiologists.”
paper / AndrewYNg / Nov 17
A deep neural network trained on historical EHR data predicts all-cause 3-12 month mortality for hospitalized patients, identifying those likely to benefit from palliative care. This automates triage, bypassing physician overestimation of prognoses and treatment inertia that misalign patient wishes with care. The model is piloted at an academic medical center with IRB approval, featuring a novel interpretation technique for prediction explanations.
deep-learninghealthcare-aipalliative-careehr-datamortality-predictionexplainable-aiarxiv-paper
“Physicians tend to over-estimate prognoses for end-of-life patients.”
paper / AndrewYNg / Nov 14
CheXNet, a 121-layer DenseNet-121 CNN, is trained on the 100,000+ image ChestX-ray14 dataset to detect pneumonia from frontal chest X-rays. On a test set labeled by four academic radiologists, it outperforms their average F1 score for pneumonia detection. The model extends to all 14 diseases in the dataset, establishing state-of-the-art results across them.
chexnetpneumonia-detectionchest-xraymedical-imagingdeep-learningcomputer-visionradiology-ai
“CheXNet detects pneumonia from chest X-rays at a level exceeding practicing radiologists”
paper / AndrewYNg / Jul 6
Researchers trained a 34-layer CNN on a massive ECG dataset exceeding prior corpora by 500x in unique patients, enabling detection of diverse arrhythmias from single-lead wearable monitors. The model maps ECG sequences to rhythm classes and was evaluated against a gold standard test set annotated by committees of board-certified cardiologists. It outperforms the average of 6 individual cardiologists in both sensitivity (recall) and positive predictive value (precision).
arrhythmia-detectionconvolutional-neural-networksecg-analysismedical-aicomputer-visionwearable-healtharxiv-paper
“The algorithm exceeds performance of board-certified cardiologists in detecting heart arrhythmias from single-lead ECGs.”
paper / AndrewYNg / Mar 7
The paper establishes a theoretical connection between data noising in neural network language models and smoothing techniques in n-gram models. It leverages this link to adapt smoothing-inspired noising primitives for discrete sequence tasks like language modeling. Experiments confirm perplexity and BLEU score improvements in language modeling and machine translation, with empirical validation of the noising-smoothing equivalence.
neural-language-modelsdata-noisingmodel-regularizationlanguage-modelingmachine-translationn-gram-smoothingiclr-2017
“Data noising regularizes neural network language models effectively”
paper / AndrewYNg / Feb 25
Deep Voice is a fully neural text-to-speech system comprising segmentation, G2P, duration prediction, F0 prediction, and audio synthesis models. It introduces CTC-based phoneme boundary detection and a parameter-efficient WaveNet variant for synthesis, eliminating traditional feature engineering. The system supports faster-than-real-time inference via optimized CPU/GPU kernels achieving up to 400x speedups.
text-to-speechdeep-voiceneural-speech-synthesiswavenetphoneme-segmentationctc-lossreal-time-inference
“Deep Voice is constructed entirely from deep neural networks without traditional feature engineering.”
paper / AndrewYNg / Aug 25
Laboratory tests on iPhone 6 Plus show Baidu Deep Speech 2 achieving 153 WPM in English and 123 WPM in Mandarin for short message transcription, versus 52 WPM and 43 WPM for iOS Qwerty and Pinyin keyboards. Speech input rates are 2.93x faster in English and 2.87x faster in Mandarin under ideal conditions. Speech produces fewer corrected errors (5.30% vs. 11.22%) but slightly more uncorrected errors (1.30% vs. 0.79%).
speech-recognitiontext-entrymobile-hcitouchscreen-keyboardsmultilingual-inputhuman-computer-interaction
“Speech recognition input rate is 153 WPM in English, 2.93 times faster than keyboard's 52 WPM”
paper / AndrewYNg / Mar 31
Encoder-decoder RNN with character-based attention corrects language errors like redundancy and non-idiomatic phrasing by avoiding OOV issues inherent in word-level models. Trained on noisy learner forum data and augmented with synthesized errors, it outperforms prior methods. Combined with a language model, it sets a new state-of-the-art F0.5 score on the CoNLL 2014 Shared Task.
neural-language-correctioncharacter-based-attentionencoder-decoder-rnnlanguage-learninggrammatical-error-correctionarxiv-paperandrew-ng
“Character-level encoder-decoder RNN with attention handles orthographic errors, redundancy, and non-idiomatic phrasing flexibly”
paper / AndrewYNg / Dec 7
Driverseat integrates crowdsourcing with deep learning systems for autonomous driving to address key bottlenecks: lack of comprehensively labeled 3D datasets and robust evaluation strategies. It enables crowd workers to generate complex 3D labels and tag diverse failure scenarios. The system demonstrates success by crowdstrapping a CNN for lane detection, proposing "crowdstrapping" as a hybrid human-AI paradigm for perception tasks.
autonomous-drivingcrowdsourcingdeep-learninglane-detectionhuman-computer-interactioncrowdstrapping
“Deep-learning systems face two major bottlenecks in autonomous driving detection tasks: unavailability of comprehensively labeled datasets and expressive evaluation strategies.”
paper / AndrewYNg / Apr 7
Researchers collected a large highway driving dataset and evaluated recent deep learning techniques, particularly CNNs, for car and lane detection in autonomous driving. Existing CNN architectures achieve real-time frame rates suitable for practical systems. Results empirically validate deep learning's promise for robust, inexpensive autonomous driving perception.
deep-learningautonomous-drivingcomputer-visioncnnlane-detectionvehicle-detectionarxiv-paper
“Computer vision combined with deep learning offers a relatively inexpensive, robust solution to autonomous driving.”
paper / AndrewYNg / Dec 17
Deep Speech employs a simplified end-to-end deep learning architecture using optimized RNNs trained on multiple GPUs, eliminating traditional hand-crafted pipelines for phonemes, noise modeling, or speaker variation. Novel data synthesis techniques enable efficient generation of large, varied training datasets. The system achieves 16.0% word error rate on Switchboard Hub5'00, outperforming prior benchmarks, and excels in noisy environments compared to commercial systems.
speech-recognitiondeep-speechend-to-end-learningrnndeep-learningmulti-gpu-trainingarxiv-paper
“Deep Speech achieves 16.0% error rate on the full Switchboard Hub5'00 test set”
paper / AndrewYNg / Aug 12
This work demonstrates first-pass large vocabulary continuous speech recognition (LVCSR) using only a recurrent neural network (RNN) acoustic model and language model, bypassing HMM sequence modeling. A straightforward RNN architecture with bi-directional recurrence achieves competitive accuracy. A modified prefix-search decoding algorithm integrates the language model directly, eliminating HMM infrastructure needs. Experiments on WSJ corpus validate the approach's efficacy.
speech-recognitionrecurrent-neural-networksbidirectional-rnnslarge-vocabularycontinuous-speechneural-decodingarxiv-paper
“RNNs can perform first-pass LVCSR using only a neural network and language model, without HMMs.”
paper / AndrewYNg / Jun 30
Empirical analysis on Switchboard (300 hours) and combined Switchboard-Fisher (2,100 hours) corpora reveals that straightforward DNN architectures with maximum likelihood training outperform convolutional and locally-connected untied networks in word error rate reduction. Larger models with up to 10x more parameters scale effectively on bigger datasets, establishing best practices for DNN hybrid systems. Findings provide a case study for DNN optimization with discriminative losses applicable beyond speech tasks.
speech-recognitiondnn-acoustic-modelsdeep-neural-networksconvolutional-networksarxiv-paperandrew-ngmachine-learning
“Standard DNNs achieve strong performance on Switchboard benchmark compared to convolutional and locally-connected untied networks.”
paper / AndrewYNg / Dec 24
Deep neural networks pretrained for image recognition can be adapted for class-generic object detection, identifying objects in images without class-specific bounding box labels. This approach detects novel objects absent from training bounding box data. Additionally, incorporating bounding box labels boosts ImageNet recognition performance by 1%.
deep-learningobject-detectioncomputer-visionneural-networksclass-generic-detectionandrew-ngarxiv-paper
“Neural networks designed for image recognition can be trained to detect objects within images regardless of their class”
paper / AndrewYNg / Jul 9
Algorithms estimate and adjust for individual grader biases and reliabilities in MOOC peer assessments, markedly improving accuracy against expert benchmarks. Applied to 63,199 grades from Coursera's HCI courses—the largest dataset analyzed—these tuned models link biases to student engagement, performance, and commenting style. The approach also enables smarter grader assignments for enhanced reliability.
peer-gradingmoocsgrader-biasmachine-learningonline-educationcourseraandrew-ng
“Peer grading algorithms significantly improve accuracy on real MOOC data compared to uncorrected peer grades”
paper / AndrewYNg / Feb 6
The paper provides an information-theoretic analysis comparing hard assignments in K-means and soft assignments in EM for clustering. It decomposes expected distortion to show K-means trades off intra-cluster similarity against partition entropy, measuring cluster balance. This framework predicts K-means yields less overlapping densities than EM and introduces posterior assignment as an alternative.
machine-learningclusteringk-meansexpectation-maximizationinformation-theoryandrew-ngunsupervised-learning
“K-means minimizes distortion while EM maximizes likelihood”
paper / AndrewYNg / Jan 16
PEGASUS transforms any MDP or POMDP into an equivalent POMDP with fully deterministic state transitions given state and action, reducing policy search to evaluating policies in this simplified structure. Policy values are estimated directly in the transformed space, with search performed to find high-value policies. The method yields polynomial sample complexity in horizon time—improving on prior exponential bounds—and extends to infinite state/action spaces, demonstrated on discrete and continuous control tasks like bicycle riding.
policy-searchpegasuspomdpsmdpsreinforcement-learningmarkov-processesandrew-ng
“Any MDP or POMDP can be transformed into an equivalent POMDP where all state transitions are deterministic given the current state and action.”
paper / AndrewYNg / Jan 16
The model recognizes objects in images without training data using semantic embeddings derived from unsupervised text corpora as a shared representation space. It combines outlier detection in semantic space with dual recognition models to achieve state-of-the-art performance on seen classes with abundant training images and reasonable accuracy on unseen classes. No manual semantic features are required for words or images.
zero-shot-learningcross-modal-transfercomputer-visionmachine-learningsemantic-spaceoutlier-detectionarxiv-paper
“Model recognizes objects in images with no training data for those classes”
paper / AndrewYNg / Jan 16
Introduces Neural Tensor Networks (NTN) to predict missing true relationships in incomplete knowledge bases using generalizations from existing data. Improves performance by initializing entity representations with unsupervised semantic word vectors, enabling queries for unseen entities. Outperforms prior models and achieves 75.8% accuracy on classifying unseen WordNet relationships.
neural-tensor-networksknowledge-base-completionsemantic-word-vectorsentity-relation-predictionntn-modelwordnet-classification
“NTN model predicts additional true relationships to complete knowledge bases based on existing data generalizations.”
paper / AndrewYNg / Jul 4
Factor graphs with bounded factor size and connectivity can be learned in polynomial time and sample complexity for both parameter estimation and structure learning, assuming data generation from this class. This extends to bounded-degree Bayesian and Markov networks as a corollary. The method avoids costly inference, applying even to intractably inferring networks, with graceful error degradation for out-of-class distributions.
factor-graphsgraphical-modelsstructure-learningparameter-estimationmachine-learningsample-complexitypolynomial-time
“Factor graphs with bounded factor size and bounded connectivity can be learned in polynomial time and polynomial sample complexity”
paper / AndrewYNg / Jun 20
Shift-invariant sparse coding (SISC) extends sparse coding to reconstruct time-series inputs using basis functions across all shifts, enabling efficient learning of high-level audio representations from unlabeled data. The method solves two convex problems exactly: L1-regularized least squares for sparse coefficients via full optimization without heuristics, and constrained least squares for bases in the Fourier domain over complex variables to decouple shifts. Learned SISC features for speech and music outperform state-of-the-art spectral and cepstral features in classification under certain conditions.
sparse-codingshift-invarianceaudio-classificationunsupervised-learningmachine-learningfeature-learning
“SISC reconstructs inputs using basis functions in all possible shifts.”
paper / AndrewYNg / Jun 13 / failed
arXiv:1206.3959v2 archives the proceedings of the 25th Uncertainty in Artificial Intelligence conference, held June 18-21, 2009, in Montreal, QC, Canada. Edited by Jeff Bilmes and Andrew Ng, it represents a key collection of research on uncertainty modeling in AI. No PDF is directly available on arXiv, with versions submitted in 2012 and revised in 2014.
uncertainty-aiuai-conferenceandrew-ngjeff-bilmesarxiv-paperai-proceedingsmachine-learning
“The Twenty-Fifth Conference on Uncertainty in Artificial Intelligence occurred June 18-21, 2009, in Montreal, QC, Canada.”
paper / AndrewYNg / Dec 29
A 9-layer sparse autoencoder with 1 billion connections, trained on 10 million unlabeled 200x200 images using 1,000 machines for 3 days, learns class-specific detectors for faces, cat faces, and human bodies. These detectors prove robust to translation, scaling, and out-of-plane rotation, as validated by control experiments. Fine-tuning on ImageNet achieves 15.8% accuracy across 20,000 categories, a 70% relative improvement over prior state-of-the-art.
unsupervised-learningsparse-autoencoderfeature-detectionlarge-scale-trainingcomputer-visionimagenetandrew-ng
“A 9-layered locally connected sparse autoencoder with pooling and local contrast normalization can be trained on unlabeled images to detect faces.”