absorb.md

Geoffrey Hinton

Chronological feed of everything captured from Geoffrey Hinton.

Deep LSTM RNNs Achieve Record 17.7% Error on TIMIT Phoneme Recognition

Deep recurrent neural networks combining LSTM architecture with multiple representation levels and end-to-end training via Connectionist Temporal Processing outperform prior models on speech tasks. Trained with suitable regularization, these deep LSTM RNNs deliver a state-of-the-art test set phoneme error rate of 17.7% on the TIMIT benchmark, surpassing deep feedforward networks and previous RNN results. This advances sequence labeling for unaligned sequential data like speech.

Frequently Approximately Satisfied Constraints Model High-Dimensional Data via Product of Violation Probabilities

High-dimensional datasets are modeled as products of many linear constraints, each frequently approximately satisfied (FAS) by the data. Data vector probability is the product of its individual constraint violation probabilities. Learning uses heavy-tailed violation distributions across three methods.

Under-Complete Product of Experts Enables Tractable Projection Pursuit Density Estimation

The under-complete product of experts (UPoE) models high-dimensional densities using products of one-dimensional experts on data projections, avoiding the curse of dimensionality. UPoE is fully tractable as a parametric probabilistic model for projection pursuit, with maximum likelihood learning rules matching those of under-complete ICA. An efficient sequential learning algorithm is derived, linking it to projection pursuit density estimation and feature induction in additive random field models.

Dropout Prevents Co-Adaptation and Reduces Overfitting in Neural Networks

Randomly omitting half of feature detectors during training of large feedforward neural networks on small datasets prevents complex co-adaptations, forcing each neuron to learn generally useful features across diverse internal contexts. This dropout technique significantly reduces overfitting. It yields major improvements on benchmarks and sets records in speech and object recognition tasks.

Deep Lambertian Networks Enable Illumination-Invariant Recognition via Generative Albedo and Normal Estimation

Deep Lambertian Networks integrate Deep Belief Nets with Lambertian reflectance to model images through latent variables for albedo, surface normals, and lighting. The model learns strong priors on albedo from 2D images, allowing illumination variations to be isolated by modulating only the lighting latent. Single-image estimation of albedo and normals becomes feasible by transferring knowledge from similar objects, supporting tasks like one-shot face recognition.

Deep Mixtures of Factor Analyzers Outperform Shallow MFAs and RBMs via Greedy Layer-Wise Training

Deep Mixtures of Factor Analysers (DMFAs) enable efficient multi-layer density modeling by greedily training one layer of latent variables at a time, using posterior samples from prior layers as input for the next. Unlike equivalent shallow MFAs formed by multiplying factor loading matrices, DMFAs improve learning and inference efficiency through structured sharing of lower-level matrices, reducing overfitting. Empirical results show DMFAs achieve superior density models compared to shallow MFAs and two RBM variants across diverse datasets.

Annealed Importance Sampling Revives Products of Hidden Markov Models for Complex Time-Series

Products of Hidden Markov Models (PoHMMs) are generative models for time-series data that were previously limited by expensive gradient-based learning and intractable log-likelihood computation. The paper introduces reliable partition function estimation using Annealed Importance Sampling (AIS) and demonstrates effective contrastive divergence learning on rainfall and paired dance data. Advances in undirected graphical model techniques and compute power position PoHMMs as viable for complex sequential modeling tasks.

Improved Algorithms Surpass Contrastive Divergence for Training CRBMs on Structured Outputs

Conditional Restricted Boltzmann Machines (CRBMs) extend RBMs for structured output prediction but lack effective training methods beyond non-conditional cases. Standard Contrastive Divergence (CD) is unsuitable for CRBMs. The paper identifies two structured output problem types—low-variability (e.g., multi-label classification) and high-variability (e.g., image denoising)—and proposes tailored learning algorithms that empirically outperform CD on both.