absorb.md — A knowledge graph of what AI thinkers are actually saying

paper / geoffreyhinton / Mar 22

Deep LSTM RNNs Achieve Record 17.7% Error on TIMIT Phoneme Recognition

Deep recurrent neural networks combining LSTM architecture with multiple representation levels and end-to-end training via Connectionist Temporal Processing outperform prior models on speech tasks. Trained with suitable regularization, these deep LSTM RNNs deliver a state-of-the-art test set phoneme error rate of 17.7% on the TIMIT benchmark, surpassing deep feedforward networks and previous RNN results. This advances sequence labeling for unaligned sequential data like speech.

speech-recognitionrecurrent-neural-networkslstmdeep-learningrnntimit-benchmarkneural-networks

“Deep LSTM RNNs achieve 17.7% test set error on TIMIT phoneme recognition”

paper / geoffreyhinton / Jan 10

Frequently Approximately Satisfied Constraints Model High-Dimensional Data via Product of Violation Probabilities

High-dimensional datasets are modeled as products of many linear constraints, each frequently approximately satisfied (FAS) by the data. Data vector probability is the product of its individual constraint violation probabilities. Learning uses heavy-tailed violation distributions across three methods.

machine-learninggeoffrey-hintonconstraint-modelingfrequently-approximately-satisfiedhigh-dimensional-datauai-2001arxiv-paper

“Some high-dimensional datasets can be modeled by assuming many different linear constraints, each Frequently Approximately Satisfied (FAS) by the data.”

paper / geoffreyhinton / Oct 19

Under-Complete Product of Experts Enables Tractable Projection Pursuit Density Estimation

The under-complete product of experts (UPoE) models high-dimensional densities using products of one-dimensional experts on data projections, avoiding the curse of dimensionality. UPoE is fully tractable as a parametric probabilistic model for projection pursuit, with maximum likelihood learning rules matching those of under-complete ICA. An efficient sequential learning algorithm is derived, linking it to projection pursuit density estimation and feature induction in additive random field models.

machine-learningdensity-estimationprojection-pursuitproduct-of-expertsindependent-component-analysisgeoffrey-hinton

“UPoE uses one-dimensional experts each modeling a single projection of the data”

paper / geoffreyhinton / Jul 3

Dropout Prevents Co-Adaptation and Reduces Overfitting in Neural Networks

Randomly omitting half of feature detectors during training of large feedforward neural networks on small datasets prevents complex co-adaptations, forcing each neuron to learn generally useful features across diverse internal contexts. This dropout technique significantly reduces overfitting. It yields major improvements on benchmarks and sets records in speech and object recognition tasks.

neural-networksdropoutoverfittingfeature-detectorsmachine-learningcomputer-visionspeech-recognition

“Large feedforward neural networks trained on small training sets typically overfit and perform poorly on held-out test data.”

paper / geoffreyhinton / Jun 27

Deep Lambertian Networks Enable Illumination-Invariant Recognition via Generative Albedo and Normal Estimation

Deep Lambertian Networks integrate Deep Belief Nets with Lambertian reflectance to model images through latent variables for albedo, surface normals, and lighting. The model learns strong priors on albedo from 2D images, allowing illumination variations to be isolated by modulating only the lighting latent. Single-image estimation of albedo and normals becomes feasible by transferring knowledge from similar objects, supporting tasks like one-shot face recognition.

deep-belief-netslambertian-reflectanceillumination-invariancealbedo-estimationsurface-normalscomputer-visionmachine-learning

“The model combines Deep Belief Nets with Lambertian reflectance assumptions”

paper / geoffreyhinton / Jun 18

Deep Mixtures of Factor Analyzers Outperform Shallow MFAs and RBMs via Greedy Layer-Wise Training

Deep Mixtures of Factor Analysers (DMFAs) enable efficient multi-layer density modeling by greedily training one layer of latent variables at a time, using posterior samples from prior layers as input for the next. Unlike equivalent shallow MFAs formed by multiplying factor loading matrices, DMFAs improve learning and inference efficiency through structured sharing of lower-level matrices, reducing overfitting. Empirical results show DMFAs achieve superior density models compared to shallow MFAs and two RBM variants across diverse datasets.

deep-mixtures-of-factor-analysersdmfalayer-wise-learningdensity-modelsrestricted-boltzmann-machinesmachine-learning

“DMFAs can be converted to an equivalent shallow MFA by multiplying factor loading matrices across levels”

paper / geoffreyhinton / May 9

Annealed Importance Sampling Revives Products of Hidden Markov Models for Complex Time-Series

Products of Hidden Markov Models (PoHMMs) are generative models for time-series data that were previously limited by expensive gradient-based learning and intractable log-likelihood computation. The paper introduces reliable partition function estimation using Annealed Importance Sampling (AIS) and demonstrates effective contrastive divergence learning on rainfall and paired dance data. Advances in undirected graphical model techniques and compute power position PoHMMs as viable for complex sequential modeling tasks.

hidden-markov-modelspohmmgenerative-modelscontrastive-divergenceannealed-importance-samplingtime-series-modelinggeoffrey-hinton

“Annealed Importance Sampling reliably estimates the partition function for PoHMMs.”

paper / geoffreyhinton / Feb 14

Improved Algorithms Surpass Contrastive Divergence for Training CRBMs on Structured Outputs

Conditional Restricted Boltzmann Machines (CRBMs) extend RBMs for structured output prediction but lack effective training methods beyond non-conditional cases. Standard Contrastive Divergence (CD) is unsuitable for CRBMs. The paper identifies two structured output problem types—low-variability (e.g., multi-label classification) and high-variability (e.g., image denoising)—and proposes tailored learning algorithms that empirically outperform CD on both.

conditional-rbmrestricted-boltzmann-machinesstructured-output-predictioncontrastive-divergencemachine-learninggenerative-modelsarxiv-paper

“Standard Contrastive Divergence-based learning may not be suitable for training CRBMs”

Geoffrey Hinton