absorb.md — A knowledge graph of what AI thinkers are actually saying

paper / AndrewYNg / Nov 27

Optimizing Medical Image Analysis via Latent Diffusion-Based Synthetic Augmentation

Latent diffusion models can effectively augment medical imaging datasets by conditioning generation on text prompts or geometrically transformed segmentation masks. Empirical results demonstrate significant gains in both classification (F1 score) and segmentation (Dice score) across multiple benchmark datasets. The study highlights that integrating proxy modeling and single-disease conditioning optimizes the utility of synthetic data for deep learning downstream tasks.

medical-imagingsynthetic-datadeep-learningx-ray-analysislatent-diffusion-modelsimage-segmentationimage-classification

“Synthetic chest X-rays generated via latent diffusion models can statistically improve classification and segmentation performance over real-data-only baselines.”

paper / AndrewYNg / Jun 28

Fe3O4 Nanoparticles Boost Flexural Strength and Toughness of Textured Alumina via Ultrafast Sintering

Fe3O4-coated alumina microplatelets, textured by rotating magnetic field and densified by ultrafast high-temperature sintering (UHS), incorporate Fe atoms at grain boundaries and within grains, inducing crystallographic defects. These defects promote plastic flow, enhancing energy dissipation by ~122% at 1900 mN load compared to pristine samples. Consequently, UHS-sintered Fe3O4 samples achieve superior flexural strength of ~287 MPa and fracture toughness of 7 MPa·m^{0.5} versus those without Fe3O4 or via conventional sintering.

alumina-ceramicsfe3o4-nanoparticlesultrafast-sinteringmechanical-propertiesmicrostructurefracture-toughnesstexture-orientation

“Fe3O4-coated alumina microplatelets can be oriented using a rotating magnetic field to create texture.”

paper / AndrewYNg / May 16

Many-Shot In-Context Learning Boosts Multimodal Foundation Models up to 2,000 Examples

Multimodal foundation models like GPT-4o and Gemini 1.5 Pro exhibit substantial performance gains in in-context learning (ICL) when scaling from few-shot (<100 examples) to many-shot (up to ~2,000 examples) across 14 datasets in domains including natural, medical, remote sensing, and molecular imagery. Gemini 1.5 Pro demonstrates log-linear improvements with increasing examples on many datasets, outperforming GPT-4o in learning speed despite similar zero-shot baselines. Open-weight models like Llama 3.2-Vision show no such benefits, and batching up to 50 queries in API calls enhances efficiency and performance, particularly in zero-shot settings.

many-shot-iclmultimodal-modelsin-context-learningfoundation-modelsgpt-4ogemini-1.5llama-3.2-vision

“Many-shot ICL with up to almost 2,000 demonstrating examples leads to substantial improvements compared to few-shot ICL (<100 examples) across all 14 benchmarked datasets.”

paper / AndrewYNg / Apr 25

Automating Weak Label Generation with MedSAM Boosts Label-Scarce Medical Image Segmentation

The pipeline trains a model on few gold-standard labels to automatically prompt MedSAM, generating weak labels for unlimited unlabeled real and synthetic medical images. This eliminates manual prompting, streamlining augmentation of label-scarce datasets across modalities like ultrasound, dermatology, and X-rays. Experiments validate performance gains in low-label regimes for diverse segmentation tasks.

medical-imagingimage-segmentationweak-labelssegment-anything-modellabel-scarce-learningfoundation-modelssynthetic-data

“Pipeline generates weak labels for any unlabeled medical image using MedSAM.”

paper / AndrewYNg / Apr 19

Continual Learning Enables Robust CT Organ Segmentation Across Pediatric and Adult Age Groups

Deep learning models for CT organ segmentation trained solely on adult data show substantial performance degradation on pediatric CT volumes due to age-related anatomical differences. Data augmentation and continual learning strategies mitigate this, with continual learning achieving Dice scores of 0.90 on adult data and 0.84 on pediatric data. This approach unlocks age-agnostic segmentation accuracy without retraining from scratch on mixed datasets.

medical-imagingct-segmentationcontinual-learningpediatric-imagingorgan-segmentationage-robustnessarxiv-paper

“Deep learning models trained on adult CT data underperform substantially on pediatric CT volumes for organ segmentation”

paper / AndrewYNg / Jan 25

CloudTracks Dataset Enables Superior Ship Track Localization in Satellite Cloud Imagery

CloudTracks provides 3,560 satellite images with over 12,000 ship track annotations to facilitate large-scale study of anthropogenic cloud modifications via aerosol emissions. Baseline semantic and instance segmentation models trained on the dataset achieve 61.29 IoU for localization, surpassing prior state-of-the-art by 12.64 points, and 1.64 MAE for counting, improving over previous 4.99 MAE. The dataset highlights challenges in detecting elongated, overlapping ship tracks, spurring advances in satellite image analysis for climate research.

ship-trackssatellite-imagerycloud-datasetsemantic-segmentationinstance-segmentationcomputer-visionmachine-learning

“CloudTracks dataset contains 3,560 satellite images labeled with more than 12,000 ship track instance annotations.”

paper / AndrewYNg / Dec 2

Simple Efficient Mislabel Detector Matches or Beats State-of-the-Art on Real-World Vision Datasets

Researchers benchmarked over 200 experiments on automated mislabel detection methods across synthetic and real noise in vision datasets, finding their Simple and Efficient Mislabel Detector (SEMD) performs similarly or better than prior approaches. Applying SEMD to real-world datasets, they show that strategic mislabel removal—considering dataset size, removal strategy, and amount—yields up to 8% per-class performance gains in retrained classifiers, especially in smaller data regimes. This highlights the practical value of data-centric cleaning for improving model robustness.

mislabel-detectioncomputer-visiondata-cleaningempirical-studydataset-qualitymachine-learning

“SEMD performs similarly to or outperforms prior automated mislabel detection methods”

paper / AndrewYNg / Dec 2

USat: Unified Vision Transformer Encoder for Multi-Sensor Self-Supervised Satellite Imagery Pre-Training

USat introduces a vision transformer architecture tailored for multi-spectral satellite data from multiple sensors, featuring modified patch projection layers and positional encodings to handle varying spatial scales. Integrated into a Masked Autoencoder framework, it enables self-supervised pre-training on rich remote sensing datasets. Pre-trained USat surpasses prior self-supervised MAE models by up to 8% on benchmarks and boosts performance by up to 7% in low-data scenarios.

self-supervised-learningvision-transformersatellite-imageryremote-sensingmulti-spectralmasked-autoencoderarxiv-paper

“USat encoder inputs multi-spectral data from multiple sensors simultaneously”

paper / AndrewYNg / Nov 29

Weakly-Semi-Supervised Detection Outperforms Fully Supervised Baselines with Fewer Bounding Boxes in Remote Sensing

WSSOD leverages abundant point labels alongside sparse bounding box annotations to train object detectors for remote sensing imagery. On FAIR1M and wind turbine datasets, WSSOD substantially outperforms fully supervised models using equivalent bounding box labels. Models trained with 2-10x fewer bounding boxes match or exceed fully supervised performance on complete datasets, reducing annotation costs for scalable applications.

weakly-supervised-learningsemi-supervised-detectionobject-detectionremote-sensingremotely-sensed-imagerycomputer-vision

“WSSOD models substantially outperform fully supervised models trained with the same amount of bounding box labeled images on FAIR1M and wind turbine detection datasets”

paper / AndrewYNg / Nov 16

LymphoML: Interpretable ML Matches Pathologist Accuracy on H&E for Lymphoma Subtyping

LymphoML processes H&E-stained tissue microarrays by segmenting nuclei/cells, extracting morphology/texture/architecture features, and training gradient-boosted models for lymphoma subtype classification. It achieves non-inferior accuracy to pathologists using whole-slide images and surpasses black-box deep learning on 670 cases across 8 subtypes. SHAP analysis highlights nuclear shape features as key discriminators for DLBCL (F1=78.7%) and cHL (F1=74.5%); combining H&E features with 6 immunostains yields accuracy (85.3%) comparable to 46-stain panels (86.1%).

interpretable-mllymphoma-classificationmedical-aimorphologic-featuresshap-analysiscomputer-visionpathology-diagnosis

“LymphoML achieves non-inferior diagnostic accuracy to pathologists using whole-slide images”

paper / AndrewYNg / May 13

Multimodal SSL Outperforms Unimodal for Transferring Chest X-Ray Models Across Healthcare Systems and Tasks

Multimodal self-supervised learning (SSL) using chest X-rays and radiology reports yields substantial performance gains over unimodal SSL when transferring models to new healthcare systems and tasks, matching fully supervised pretraining. Additional improvements come from domain-adaptive pretraining (DAPT), linear probing followed by finetuning (LP-FT), or their combination. These strategies enhance generalization of medical image models without requiring extensive new labels.

medical-imagingchest-xrayself-supervised-learningtransfer-learningmultimodal-learningdomain-adaptationarxiv-paper

“Multimodal SSL provides substantial performance gains over unimodal SSL across new healthcare systems and tasks”

youtube / AndrewYNg / May 11

From Physics and Laundromats to ImageNet: Fei-Fei Li's Audacious Quest for Intelligence Principles in AI

Fei-Fei Li transitioned from physics to AI inspired by great physicists' ponderings on life and intelligence, overcoming challenges like running a family dry cleaning business while studying at Princeton. She created ImageNet by pursuing the North Star of natural object recognition, addressing data scarcity and overfitting in early computer vision through massive-scale datasets like Caltech-101 and then 15 million images across 22,000 categories, catalyzing deep learning's rise. Li views AI as a pre-Newtonian science seeking fundamental principles of intelligence akin to physics laws, while advocating human-centered applications in healthcare via ambient intelligence and policy for equitable AI access. She urges diverse entrants from all fields to join, emphasizing AI's nascency and vast opportunities.

fei-fei-liandrew-ngimagenetcomputer-visionai-historyhuman-centered-aiai-education

“Fei-Fei Li ran a family dry cleaning business for seven years during her undergraduate and graduate studies at Princeton to support herself as an immigrant.”

paper / AndrewYNg / Jan 4

Street-Level Imagery Enables Scalable Gentrification Detection

Proposes a computer vision method to detect neighborhood gentrification at scale using historical street-level visual data, analyzing physical appearance changes. Outperforms prior approaches reliant on survey-based estimates, human labeling, and limited neighborhood characterization. Validates against literature measures and case studies, positioning it as a supplement for urban policy and research.

neighborhood-gentrificationstreet-level-imagescomputer-visionurban-analysisarxiv-paper

“Existing gentrification detection methods rely mainly on estimated measures from survey data”

youtube / AndrewYNg / Sep 28

Data-Centric AI Overcomes Small Datasets and Customization Barriers to Democratize AI Adoption

Andrew Ng identifies small datasets and the long-tail customization problem as primary barriers to AI adoption outside consumer tech, where enterprises like manufacturing and healthcare often have only 50 or fewer labeled images. Data-centric AI shifts focus from model engineering to systematic data improvement, enabling subject matter experts to label and refine data using tools like agreement-based labeling and rapid prototyping workflows. This approach achieved 90% accuracy in steel sheet inspection in weeks by resolving label inconsistencies, shortening time-to-value, and scaling custom AI via open platforms rather than requiring scarce ML engineers.

data-centric-aidemocratizing-aismall-datasetsai-adoptioncomputer-visionmanufacturing-aiai-tooling

“Manufacturing and healthcare sectors typically have 50 or fewer labeled images for AI training due to rarity of defects or conditions.”

paper / AndrewYNg / Aug 27

Random Forest Outperforms Taiwan's Rainfall Threshold System for Debris Flow Alerts

Taiwan's current debris flow warning system relies on a time-weighted rainfall measure exceeding a threshold, producing many false alarms and missing substantial events. Five ML models trained on historical hourly rainfall data were tested, with random forest achieving superior performance by reducing false positives and misses. Analysis of rainfall trajectories reveals key patterns linked to debris flows, enabling better trade-offs between alert frequency and coverage.

machine-learningdebris-flowevacuation-alertsrandom-forestrainfall-predictionenvironmental-mltaiwan-disasters

“Taiwan has the highest global susceptibility to and fatalities from debris flows”

paper / AndrewYNg / Jul 22

METER-ML Dataset Enables Deep Learning for Automated Methane Source Mapping from Multi-Sensor Imagery

METER-ML provides a public dataset of 86,599 georeferenced NAIP, Sentinel-1, and Sentinel-2 images across the US, labeled for six methane-emitting facility types: CAFOs, coal mines, landfills, natural gas plants, oil refineries/petroleum terminals, and wastewater plants. Deep learning models leveraging multi-sensor data achieve strong performance, with AUPRC of 0.915 for CAFOs and 0.821 for oil refineries on expert-labeled tests. The dataset fills a critical data gap for scalable, automated emission source attribution to combat global warming.

computer-visionearth-observationmethane-detectionmulti-sensor-datasetdeep-learningenvironmental-monitoringremote-sensing

“METER-ML contains 86,599 georeferenced images from NAIP, Sentinel-1, and Sentinel-2 labeled for methane sources”

paper / AndrewYNg / Jul 20

DataPerf: Benchmarking Data Quality to Advance Data-Centric AI

DataPerf introduces a community-led benchmark suite to evaluate ML datasets and data-centric algorithms, addressing the historical overemphasis on models that has caused inaccuracy, bias, and fragility in real-world ML applications. It covers five benchmarks spanning vision, speech, acquisition, debugging, and diffusion prompting, with an open online platform enabling iterative competitions and community-contributed challenges. Maintained by MLCommons, DataPerf promotes reproducibility and shifts focus from architectures to datasets for innovation in data-centric AI.

dataperfdata-centric-aiml-benchmarksdataset-evaluationarxiv-paperandrew-ngmachine-learning

“Neglecting data importance in ML has led to inaccuracy, bias, and fragility in real-world applications”

paper / AndrewYNg / Jan 5

Sparse Deep Learning Achieves Near-Perfect Gastric Intestinal Metaplasia Detection in Under 1 Minute on CPU

Proposes a sparse whole-slide image (WSI) analysis method using deep learning to rapidly localize small regions-of-interest (ROI) for WSI-level classification, addressing the challenge of gigapixel-scale images in histopathology. Evaluated on gastric intestinal metaplasia (GIM) diagnosis from H&E-stained endoscopic biopsies, achieving 100% detection in positive WSIs, AUC 0.98, and AP 0.95. Method runs in under one minute on standard CPU, enabling clinical deployment for pathologists.

deep-learningwhole-slide-imagesparse-analysismedical-imaginggastric-intestinal-metaplasiahistopathologypathology-ai

“Sparse WSI method detects GIM in all positive WSIs”

paper / AndrewYNg / Aug 3

Q-Pain Dataset Exposes Race-Gender Biases in AI Pain Management QA

Q-Pain is a new dataset and evaluation framework for detecting social biases in medical QA systems focused on pain management decisions. It reveals statistically significant disparities in treatment recommendations across intersectional race-gender subgroups when testing GPT-2 and GPT-3. The work underscores the necessity of bias-auditing datasets to mitigate risks before deploying AI in clinical settings.

social-biasquestion-answeringpain-managementmedical-ainlp-biasarxiv-paperneurips-dataset

“Q-Pain dataset assesses bias in medical QA for pain management.”

paper / AndrewYNg / Jun 28

RadGraph Dataset Enables High-Performance Extraction of Clinical Entities and Relations from Chest X-ray Reports

RadGraph introduces a benchmark dataset with radiologist-annotated entities and relations from chest X-ray reports, featuring 500 development reports (14,579 entities, 10,889 relations) from MIMIC-CXR and dual-annotated test sets from MIMIC-CXR and CheXpert. A deep learning model, RadGraph Benchmark, achieves micro F1 scores of 0.82 on MIMIC-CXR and 0.73 on CheXpert for relation extraction. The release includes an inference dataset covering 220,763 MIMIC-CXR reports (~6M entities, ~4M relations) mapped to radiographs, supporting NLP, vision, and multimodal research.

radgraphradiology-reportsinformation-extractionrelation-extractionmedical-nlpchest-xraymimic-cxr

“RadGraph development dataset contains annotations for 500 radiology reports from MIMIC-CXR with 14,579 entities and 10,889 relations.”

paper / AndrewYNg / May 6

Multi-Graph Contrastive Learning Integrates Multi-Modal Data for Superior Neighborhood Embeddings

The method constructs a multi-graph where street view images and POI features serve as node attributes for neighborhoods, while human mobility flows form directed edge attributes capturing inter-region relationships. Neighborhood representations are learned via contrastive sampling on this multi-graph, embedding both region characteristics and proximity measures into a unified space. Downstream tasks and qualitative analysis demonstrate that these embeddings outperform unimodal baselines.

neighborhood-embeddingmulti-modal-learningmulti-graphcontrastive-learninggeotagged-dataurban-analyticsmobility-data

“The proposed multi-graph approach uses street view images and POI features as node characteristics and human mobility as directed edges between neighborhoods.”

paper / AndrewYNg / Apr 21

Physiologically-Inspired 3D Augmentations Boost ECG Contrastive Learning by 9.1% AUC in Low-Label Regime

3KG applies contrastive learning to 12-lead ECGs using 3D physiologically-inspired augmentations that combine spatial and temporal transformations. On PhysioNet 2020 data, fine-tuning a linear head on 1% labeled samples yields 9.1% higher mean AUC than the top self-supervised baseline for 23-class diagnosis. Gains are largest for conduction and rhythm abnormalities, with potential for modality-specific augmentations in other biomedical signals.

contrastive-learningecg-analysisself-supervised-learningmedical-physicssignal-processingbiomedical-augmentationsmachine-learning

“3KG achieves a 9.1% increase in mean AUC over the best self-supervised baseline for 23-class ECG diagnosis”

paper / AndrewYNg / Apr 1

Superior Radiology Report Labeler Yields Better Chest X-Ray Classification Models

VisualCheXbert outperforms CheXpert and CheXbert in extracting accurate labels from radiology reports for chest X-rays. Image classification models trained on VisualCheXbert labels, using one of the largest chest X-ray datasets, exceed performance of those trained on CheXpert or CheXbert labels. Improvements in report labeling directly enhance downstream deep learning model accuracy.

chest-x-rayradiology-reportslabeler-qualitydeep-learningimage-classificationmedical-aiarxiv-paper

“VisualCheXbert outperforms CheXpert and CheXbert on extracting accurate chest X-ray image labels from radiology reports.”

paper / AndrewYNg / Mar 26

MedSelect Outperforms Baselines in Selective Labeling for Chest X-rays Using Meta-RL and Contrastive Embeddings

MedSelect combines meta-learning and deep reinforcement learning with contrastive pretraining embeddings to selectively label medical images under limited resources. It employs a trainable deep selector for labeling decisions and a non-parametric cosine similarity selector for classifying unseen images. Evaluations on chest X-ray interpretation show superior performance over baselines across seen and unseen conditions, with distinct latent embedding and clinical feature distributions.

medical-imagingselective-labelingmeta-learningreinforcement-learningchest-xraycontrastive-pretrainingactive-learning

“MedSelect uses image embeddings from contrastive pretraining to determine which images to label.”

paper / AndrewYNg / Mar 18

CheXbreak Detects Chest X-ray Model Misclassifications Using Patient Features and Model Outputs for Targeted Corrections

CheXbreak identifies patient subgroups prone to misclassification in chest X-ray AI models, with age, lung lesions, pneumothorax, and support devices as key predictors. Misclassification identifiers built from model outputs and clinical features achieve AUROC ~0.9 across diseases and ten models. A corrective algorithm selectively flips high-risk predictions, improving F1 scores for Consolidation (0.008) and Edema (0.003).

chest-xraymisclassificationdeep-learningmedical-imagingfailure-modespredictive-modelsarxiv-paper

“Patient age and radiographic findings of lung lesion, pneumothorax, or support devices statistically predict misclassification in some chest X-ray models.”

paper / AndrewYNg / Mar 8

Chest X-ray AI Models Fail to Detect Unseen Diseases but Retain Seen Detection Amid Co-occurrence

Deep learning models trained on subsets of chest X-ray diseases misclassify unseen diseases as "no disease." These models maintain accurate detection of seen diseases even when they co-occur with unseen ones. Penultimate layer features enable effective transfer learning for unseen disease detection with minimal labeled data.

chest-xrayunseen-disease-detectiondeep-learning-interpretationmedical-aidisease-classificationfeature-representationsarxiv-paper

“Models trained on seen diseases falsely classify unseen diseases as 'no disease'.”

paper / AndrewYNg / Feb 23

VisualCheXbert Aligns Radiology Report Labels with Image Labels Better Than Radiologist Report Annotations

Radiologists labeling chest X-ray images disagree significantly with those labeling corresponding radiology reports, degrading report-derived labels as image proxies. VisualCheXbert, a biomedically-pretrained BERT model supervised by a chest X-ray vision model, generates report-to-image labels outperforming existing report labelers by 0.14 average F1. These labels achieve 0.12-0.21 higher F1 agreement with radiologist image labels than report labels do.

medical-airadiology-reportschest-xraybert-modellabel-disagreementcomputer-visionarxiv-paper

“Radiologists labeling radiology reports disagree significantly with radiologists labeling corresponding chest X-ray images.”

paper / AndrewYNg / Feb 21

Patient Metadata-Guided Positive Pairs Boost Contrastive Learning for Chest X-ray Interpretation

Contrastive learning for chest X-rays is enhanced by selecting positive pairs from different images of the same patient and study across lateralities, yielding a 14.4% mean AUC gain over ImageNet pretraining when fine-tuning on 1% labels for pleural effusion. Optimal performance stems from pairing images sharing underlying pathologies via metadata and maximizing diverse image usage in queries. Hard negative selection using metadata shows no benefit.

contrastive-learningself-supervised-learningmedical-imagingchest-xraypatient-metadatarepresentation-learning

“Using images from the same patient, same study, across all lateralities as positive pairs increases mean AUC by 14.4% over ImageNet pretrained baseline for pleural effusion classification with 1% labeled data.”

paper / AndrewYNg / Feb 21

CheXseg Semi-Supervised Approach Boosts Chest X-ray Segmentation by Merging Expert Pixels with DNN Saliency

CheXseg trains multi-label chest X-ray segmentation models by combining scarce pixel-level expert annotations with abundant coarse DNN-generated saliency maps. This semi-supervised method outperforms fully-supervised baselines using only expert annotations by 9.7% mIoU and weakly-supervised methods using only saliency maps by 73.1% mIoU. The top model matches radiologist agreement on three pathologies and narrows the performance gap versus weak supervision by 57.2%.

medical-imagingchest-xraysemantic-segmentationsemi-supervised-learningsaliency-mapschexseg

“CheXseg improves mIoU over fully-supervised methods using only pixel-level expert annotations by 9.7%”

paper / AndrewYNg / Feb 17

Chest X-ray AI Models Show Robust Generalization to Smartphone Photos and External Data for Some but Not All

Eight deep learning models from the CheXpert challenge were evaluated without fine-tuning on smartphone photos of chest X-rays and external clinical datasets. On photos, all models exhibited statistically significant performance drops, with only three performing worse than radiologists on average. On external datasets, no models underperformed radiologists, and five outperformed them statistically, highlighting variable generalization under distribution shifts.

chest-xraydeep-learningmodel-generalizationmedical-aidata-distribution-shiftcomputer-visionarxiv-paper

“All 8 chest X-ray models experienced a statistically significant drop in task performance on smartphone photos of chest X-rays compared to original benchmarks.”

paper / AndrewYNg / Jan 18

ImageNet Pretraining Boosts Chest X-Ray Performance but Topologies Don't Transfer Rankings

Analysis of 16 ImageNet convolutional architectures on CheXpert reveals no correlation between ImageNet and chest X-ray performance, whether pretrained or from scratch. Pretraining provides a statistically significant boost, larger for smaller models, while model family outweighs size within families for untrained models. Truncating final blocks from pretrained models achieves 3.25x average parameter efficiency without significant performance loss.

chest-xraytransfer-learningimagenet-pretrainingmedical-imagingparameter-efficiencycomputer-visiondeep-learning

“No relationship exists between ImageNet performance and CheXpert performance for models without or with pretraining.”

paper / AndrewYNg / Nov 14

OGNet Deep Learning Model Detects Undocumented US Oil and Gas Infrastructure from Aerial Imagery

OGNet employs deep learning on high-resolution aerial imagery to automatically detect oil and gas infrastructure, key sources of methane emissions contributing to at least 25% of current global warming. The model identifies US oil refineries and petroleum terminals missing from four standard public datasets. All detections correlate with methane-emitting characteristics like infrastructure type and storage tank counts; curated data is publicly available.

deep-learningcomputer-visionremote-sensingoil-gas-infrastructuremethane-emissionsenvironmental-monitoring

“Anthropogenic methane emissions cause at least 25% of current Earth warming”

paper / AndrewYNg / Nov 12

Deep Chest X-ray Models Degrade on Smartphone Photos but Retain Radiologist-Level Performance

Eight CheXpert challenge models experienced performance drops when applied untuned to smartphone photos of chest X-rays in the CheXphoto dataset. Despite degradation, select models matched radiologist diagnostic accuracy. Future work should probe training procedures' impact on generalization to photographic inputs.

medical-aichest-xraydeep-learningmodel-generalizationsmartphone-photoschexpertcomputer-vision

“Smartphone photos of chest X-rays enable scaled DL model deployment for interpretation.”

paper / AndrewYNg / Nov 11

ForestNet: Deep Learning Model Outperforms Baselines in Classifying Deforestation Drivers from Satellite Imagery in Indonesia

ForestNet is a deep learning model trained on Landsat 8 satellite imagery to classify direct drivers of primary forest loss in Indonesia across patches of any size. It uses a curated dataset of images paired with expert annotations, enabling automated identification of deforestation causes. The model substantially outperforms standard driver classification approaches, with the dataset publicly released to advance research.

deep-learningsatellite-imagerydeforestation-detectionforestnetindonesia-deforestationcomputer-visionenvironmental-ml

“ForestNet classifies direct drivers of primary forest loss in Indonesia using satellite imagery for patches of any size.”

paper / AndrewYNg / Oct 28

GloFlow Enables Cost-Effective Whole Slide Imaging via Optical Flow and Graph Alignment of Video Frames

GloFlow is a two-stage method that generates pathology whole slide images (WSIs) from video scans, bypassing expensive motor stages in traditional scanners. Stage one trains an optical flow predictor for pairwise translations between successive frames to approximate stitching. Stage two refines this using a neighborhood graph and tractable graph-pruning for global alignment. On simulated WSI video datasets, it outperforms existing stitching methods, producing scanner-quality results.

whole-slide-imagesimage-stitchingoptical-flowpathology-imagingglobal-alignmentcomputer-vision

“Slide digitization for pathology is bottlenecked by high-cost precise motor stages needed for position information in stitching.”

paper / AndrewYNg / Oct 11

MoCo-CXR Pretraining Boosts Chest X-ray Pathology Detection via Superior Representations and Transferability

MoCo-CXR adapts Momentum Contrast (MoCo) contrastive learning for self-supervised pretraining on unlabeled chest X-rays, yielding higher-quality representations than non-pretrained baselines. Linear classifiers on these representations outperform others in pleural effusion detection, with end-to-end fine-tuning showing further gains, especially under limited labeled data. Pretraining enhances transferability to unseen tasks like tuberculosis detection across datasets.

contrastive-learningself-supervised-learningchest-xraymedical-imagingmocorepresentation-learningtransfer-learning

“Linear models trained on MoCo-CXR-pretrained representations outperform those without for pleural effusion detection”

paper / AndrewYNg / Oct 9

NGBoost with Calibration Outperforms Benchmarks in Short-Term Probabilistic Solar Forecasting

Researchers developed state-of-the-art probabilistic models for short-term solar irradiance forecasting, emphasizing post-hoc calibration for reliable predictions. Using SURFRAD network data from seven stations, NGBoost achieved superior intra-hourly performance over the best benchmark across all sites. With CRUDE calibration, NGBoost matched numerical weather prediction models at hourly resolution.

solar-irradiance-forecastingprobabilistic-modelsngboostpost-hoc-calibrationmachine-learningarxiv-paperrenewable-energy

“NGBoost achieves higher performance than the best benchmark solar irradiance forecasting model at intra-hourly resolution across all seven SURFRAD stations”

paper / AndrewYNg / Sep 17

Deep Learning Extracts Prognostic Nuclear Geometry from DLBCL Histology

Researchers analyzed H&E-stained TMAs from 209 DLBCL cases using deep learning to segment tumor nuclei and compute geometric features. A Cox proportional hazards model incorporating these features predicted survival with a C-index of 0.635 (95% CI: 0.574-0.691). The results indicate nuclear geometric features hold prognostic value, warranting prospective validation.

deep-learningmorphological-featuresdlbclmedical-imagingprognostic-modelingtumor-nucleicomputer-vision

“DLBCL is the most common non-Hodgkin lymphoma.”

paper / AndrewYNg / Jul 13

CheXphoto Dataset Enables Robustness Testing of Chest X-ray AI on Smartphone Photos

CheXphoto is a new dataset comprising over 10,000 smartphone photos and synthetic transformations of chest X-rays from the CheXpert dataset, designed to benchmark deep learning model robustness for clinical deployment via messaging apps like WhatsApp. It addresses artifacts in photo-captured X-rays absent in standard digital training data by combining automatically/manually captured photos under varied settings with targeted synthetic transformations mimicking real photos of digital X-rays and films. The dataset supports improved automated chest X-ray interpretation on mobile-captured images.

chest-xraydatasetdeep-learningrobustnesssmartphone-photosmedical-imagingarxiv-paper

“CheXphoto contains over 10,000 photos and transformations of chest X-rays”

paper / AndrewYNg / Jun 5

Manifold Topology Enables Model-Free Disentanglement Measurement in Generative Representations

Researchers propose a novel disentanglement metric for deep generative models that quantifies topological similarity between conditional submanifolds in the learned latent space, requiring only the generative model itself without external supervision or dataset-specific assumptions. Unsupervised and supervised variants are introduced and validated by ranking state-of-the-art models consistently with prior methods across multiple datasets. Code is released for reproducibility.

disentanglementgenerative-modelsmanifold-topologydeep-learningrepresentation-learningmodel-evaluationarxiv-paper

“Disentanglement measurement via topological similarity of conditional submanifolds uses only the generative model.”

paper / AndrewYNg / May 7

Generalized Vegetation Index Boosts Agricultural Land Cover Segmentation via Model-Agnostic Data Fusion

Proposes Generalized Vegetation Index (GVI), a lightweight, model-agnostic module that fuses Near-Infrared, RGB channels, and vegetation indices into deep neural networks for vegetation-related CV tasks. Pairs GVI with Additive Group Normalization (AGN) for stable training without extra parameters. Achieves 0.9-1.3% IoU gains on vegetation classes and 2% overall mIoU improvement over baselines in agriculture land cover segmentation.

computer-visionremote-sensingvegetation-indexdata-fusionland-cover-segmentationagriculture-ai

“GVI is a model-agnostic data-fusion approach for vegetation-related computer vision tasks”

paper / AndrewYNg / Apr 21

First Agriculture-Vision Challenge Advances Aerial Farmland Semantic Segmentation

The inaugural Agriculture-Vision Challenge engaged 57 teams to develop algorithms for semantic segmentation of aerial farmland imagery using a dataset of 21,061 multi-spectral images. The challenge emphasizes agricultural pattern recognition from overhead views. This paper summarizes top-performing methods and results, with ongoing leaderboard access for further research.

agriculture-visionsemantic-segmentationaerial-imagescomputer-visionchallenge-resultsarxiv-paper

“The first Agriculture-Vision Challenge had around 57 participating teams from various countries.”

paper / AndrewYNg / Apr 20

CheXbert BERT Model Outperforms Rule-Based Labelers via Hybrid Pretraining on Scale and Expert Data

CheXbert employs a biomedically pretrained BERT model, first trained on large-scale rule-based labels then finetuned on expert annotations augmented with backtranslation, for accurate chest X-ray report labeling. This hybrid approach leverages the volume of automated labels and precision of human annotations. It achieves state-of-the-art performance on one of the largest chest X-ray datasets, surpassing prior rule-based systems with statistical significance.

chexbertbert-finetuningradiology-reportsmedical-nlpreport-labelingchest-xraybiomedical-ai

“CheXbert uses a biomedically pretrained BERT model trained first on rule-based labeler annotations then finetuned on expert annotations with backtranslation.”

paper / AndrewYNg / Feb 26

Top CheXpert Models Show Strong Zero-Shot TB Detection and Robustness to External and Photographic Variations

Top 10 models from the CheXpert leaderboard achieve an average AUC of 0.851 for TB detection on two public datasets without fine-tuning or TB-specific training data. These models maintain high performance on photos of chest X-rays (AUC 0.916) comparable to original images (AUC 0.924). On an external institution's dataset, they perform comparably to or better than average radiologists across pathology detection tasks.

chest-xraydeep-learningmedical-imagingmodel-generalizationtb-detectionclinical-translationcomputer-vision

“Top 10 CheXpert models average AUC 0.851 for TB detection on two public TB datasets”

youtube / AndrewYNg / Feb 20

Andrew Ng's Journey: From Automating Education to Scaling AI for Global Impact

Andrew Ng traces his passion for AI to childhood coding and a desire to automate tedious tasks like photocopying, evolving into launching MOOCs that educated millions via Coursera. He emphasizes scaling deep learning models with larger datasets over early bets on unsupervised learning, while advocating practical debugging frameworks and small projects to build ML proficiency. For industry, he recommends starting with small AI pilots to propagate success, addressing real-world challenges like data messiness and deployment robustness outside tech sectors such as manufacturing. Ng promotes habitual learning, customer-obsessed startups via AI Fund, and focuses on immediate issues like bias and inequality over distant AGI risks.

andrew-ngmachine-learningdeep-learningonline-educationmoocsai-startupsai-applications

“Machine learning enables automation of human tasks, expanding from personal inspirations like automating photocopying to broad applications in education and industry.”

paper / AndrewYNg / Feb 7

Mobius Transformations Boost Data Augmentation for Superior Generalization in Low-Data Regimes

Mobius transformations, bijective conformal maps generalizing image translations via complex inversion in pixel space, enable label-preserving sample-level data augmentation. They outperform standard techniques like cutout and crop-and-flip, especially with scarce training data. This method enhances deep model performance and generalization across varying data amounts and architectures.

data-augmentationmobius-transformationscomputer-visionmachine-learninglow-data-regimesimage-augmentation

“Mobius transformations are bijective conformal maps that generalize image translation using complex inversion in pixel space”

paper / AndrewYNg / Nov 18

AI Pathologist Assistant Boosts Accuracy When Correct but Strongly Biases Errors When Wrong in Liver Cancer Diagnosis

A deep learning model achieved 88.5% accuracy on internal validation and 84.2% on independent test sets for distinguishing HCC from CC in liver biopsies. When assisting 11 pathologists, it did not significantly improve overall diagnostic performance (p=0.184), due to model errors strongly biasing pathologists toward incorrect diagnoses (OR=0.253). Correct model predictions significantly enhanced accuracy across all expertise levels and case difficulties (OR=4.289, p<0.001), highlighting bias risks in AI clinical integration for subspecialty pathology tasks.

deep-learningmedical-aipathologyliver-cancerai-assistantsclinical-workflowarxiv-paper

“Deep learning model accuracy was 0.885 on internal validation set of 26 slides.”

paper / AndrewYNg / Oct 8

NGBoost Extends Gradient Boosting to Probabilistic Predictions with Natural Gradient Correction

NGBoost adapts gradient boosting for probabilistic regression by optimizing parameters of conditional distributions as multiparameter targets. It employs the natural gradient to correct training dynamics inherent in multiparameter boosting. The algorithm supports arbitrary base learners, continuous-parameter distributions, and scoring rules, achieving performance that matches or exceeds prior methods with superior flexibility, scalability, and usability.

ngboostgradient-boostingprobabilistic-predictionnatural-gradientuncertainty-estimationmachine-learning

“NGBoost generalizes gradient boosting to probabilistic regression by targeting parameters of the conditional distribution.”

youtube / AndrewYNg / Oct 1

Andrew Ng's Playbook: Start Small, Scale Smartly to AI-Enable Non-Tech Industries

Andrew Ng emphasizes transforming non-software industries like manufacturing and agriculture via AI by starting with small, feasible projects to build internal momentum, forming cross-functional teams, and rigorously scoping pilots before major investments. Key challenges include limited data in industrial settings, production deployment risks, and managing essential complexity in problem definition and data organization; solutions involve daily sprints for rapid iteration, FMEA for risk anticipation, and executive AI literacy. Platforms reduce accidental complexity, but success hinges on processes like editing test sets for business alignment and planning for real-world data shifts beyond academic benchmarks.

andrew-nglanding-aiai-adoptionenterprise-aiml-deploymentai-transformationml-processes

“Incumbent companies in non-software industries can become highly valuable AI-enabled businesses by adopting AI strategically, similar to how Microsoft and Apple succeeded with the internet.”

paper / AndrewYNg / Jan 21

CheXpert: Massive Chest X-ray Dataset with Uncertainty Labels Enables CNNs to Match or Exceed Radiologist Performance

CheXpert provides 224,316 chest radiographs from 65,240 patients, labeled automatically for 14 observations using a radiology report labeler that captures interpretive uncertainty. CNNs trained with uncertainty-aware labeling strategies achieve superior ROC and PR performance to three radiologists on held-out test sets for cardiomegaly, edema, and pleural effusion. The dataset is publicly released as a benchmark for chest radiograph interpretation models.

chexpertchest-radiographmedical-imaginguncertainty-labelsdeep-learningradiology-datasetai-healthcare

“CheXpert dataset contains 224,316 chest radiographs from 65,240 patients”