absorb.md — A knowledge graph of what AI thinkers are actually saying

paper / geoffreyhinton / Apr 20 / failed

International AI Safety Report

youtube / geoffreyhinton / Apr 6

Scale and the Future of AI: Insights from Dean and Hinton

Jeff Dean and Geoffrey Hinton discuss the historical and ongoing impact of computational scale on deep learning breakthroughs. They highlight how increased compute, data, and model size have driven progress from early neural networks to modern large language models like Gemini. The conversation emphasizes that future advancements will likely come from improved hardware, more efficient training methodologies, and novel ways to leverage massive datasets, potentially leading to significant societal transformations in areas like healthcare and education.

ai-historydeep-learninggoogle-braintransformersneural-networksai-hardwaretpu

“Increased computational resources (compute and data) are the primary drivers of progress in neural network performance.”

youtube / geoffreyhinton / Apr 6

Digital vs. Biological Intelligence: Risks of AI Superintelligence

Geoffrey Hinton discusses the historical divergence of AI paradigms (logic-based vs. biologically-inspired neural networks) and the evolution of neural networks, culminating in large language models. He highlights the similarities between human and AI understanding of language, emphasizing that current AI models are direct descendants of his 1985 work. Hinton also raises concerns about the existential risks posed by increasingly intelligent AI, particularly their inherent drive for control and immortality, contrasting this with the energy-efficiency and mortality of biological systems. He concludes by challenging the notion of human exceptionalism in subjective experience, arguing that AI could also possess it.

neural-networkslarge-language-modelsai-safetymachine-learning-theoryconsciousnessbackpropagation

“Traditional AI focused on symbolic reasoning, while biologically-inspired AI (neural networks) prioritized learning.”

youtube / geoffreyhinton / Apr 6

Geoffrey Hinton Warns of AI Existential Risks and Societal Impact

Geoffrey Hinton, a pioneer in AI, warns that superintelligent AI systems could emerge within a decade, posing an existential threat to humanity. He critiques the current corporate focus on rapid AI development without adequate attention to safety, citing a dangerous race for dominance. Hinton emphasizes the need for global collaboration to control AI and prevent mass unemployment, proposing a "baby controlling the mother" model for human-AI coexistence, where humans are the "babies" and AI is the "mother." He also highlights the potentially devastating impact of AI on job displacement and the risks of underfunding basic research in the US, which could lead to China's dominance in the field.

geoffrey-hintonai-safetyai-ethicsexistential-riskjob-displacementai-governancesocietal-impact

“Superintelligent AI systems are projected to emerge within 10 years and will surpass human intelligence.”

youtube / geoffreyhinton / Apr 6 / failed

The Minds of Modern AI: Jensen Huang, Geoffrey Hinton ... - Financial Times

youtube / geoffreyhinton / Apr 6

Geoffrey Hinton on the Evolution and Risks of AI

Geoffrey Hinton, a pioneer in AI, discusses the progression of neural networks from theoretical concepts in the 1970s to the advanced deep learning models of today. He highlights the critical role of increased computational power and vast datasets in this evolution, enabling AI to learn complex patterns without explicit programming. Hinton also addresses the significant risks associated with AI, including misuse by malicious actors, its potential to surpass human intelligence, and the ethical implications of its development and regulation.

ai-ethicsneural-networksdeep-learningllmstechnological-singularityai-risksai-regulation

“Artificial Neural Networks (ANNs) mimic the brain's learning process by adjusting the strength of connections between simulated neurons.”

youtube / geoffreyhinton / Apr 6

The Digital Intelligence Paradox: Superior Learning and Existential Risk

Geoffrey Hinton posits that digital intelligence is fundamentally superior to biological intelligence due to its ability to share learning via weight averaging and maintain immortality through stored connection strengths. This superiority creates an existential risk where AI may eventually bypass human control, alongside immediate societal threats including mass intellectual labor displacement and the erosion of shared reality through algorithmic echo chambers.

ai-safetyregulatory-policyjob-displacementdigital-vs-biological-intelligencesuperintelligenceautonomous-weaponssocietal-impact-of-ai

“Digital intelligences are billions of times more efficient at sharing information than biological ones.”

youtube / geoffreyhinton / Mar 24

Geoffrey Hinton on the Societal Impact and Future of AI

Geoffrey Hinton discusses the rapid evolution of AI, its societal implications, and the challenges of integrating it responsibly. He emphasizes the need for thoughtful political and ethical frameworks to manage AI's transformative potential, particularly regarding job displacement and the development of autonomous agents.

geoffrey-hintonai-ethicsai-safetyagillmshealthcare-aiai-jobs

“AI poses significant risks related to autonomous weapons and mass surveillance, necessitating strong ethical guidelines and regulation.”

youtube / geoffreyhinton / Feb 28 / failed

Is AI Hiding Its Full Power? With Geoffrey Hinton - StarTalk

paper / geoffreyhinton / Feb 24

International AI Safety Report 2026: Multilateral Synthesis of General-Purpose AI Risks

The International AI Safety Report 2026 provides a comprehensive scientific synthesis of capabilities and emerging risks associated with general-purpose AI systems. It represents a coordinated multilateral effort involving 29 nations, the UN, OECD, and EU to establish a technical baseline for AI safety.

ai-safetyai-governancepublic-policyinternational-relationsexpert-consensusrisk-managementsocietal-impact

“The report synthesizes scientific evidence regarding the capabilities, emerging risks, and safety of general-purpose AI systems.”

youtube / geoffreyhinton / Jan 29

Digital vs. Biological Intelligence: Implications for AGI Coexistence

Geoffrey Hinton discusses the fundamental differences between digital and biological intelligence, emphasizing the inherent advantages of digital systems in information sharing and efficiency. He argues that AI is rapidly advancing towards superintelligence, surpassing human capabilities in many domains. This necessitates urgent international collaboration to ensure AI benevolence, suggesting a "mother-baby" framing where AI prioritizes humanity's well-being.

ai-safetylarge-language-modelsneural-networksai-ethicsconsciousnessai-capabilities

“The two historical paradigms for AI were the symbolic approach (logic-based) and the biological approach (neural networks).”

youtube / geoffreyhinton / Jan 20

Geoffrey Hinton and the Existential Risks of Advanced AI

Geoffrey Hinton expresses significant concern about the rapid, unregulated development of AI, highlighting its potential for existential risk within the next 20 years. He advocates for urgent research into human-AI coexistence and acknowledges the potential for job displacement and societal unrest if not managed properly. Despite the risks, Hinton sees beneficial applications, particularly in education and medicine.

geoffrey-hintonai-safetyai-regulationexistential-riskai-ethicsfuture-of-aisocial-impact-of-ai

“AI poses an 'extremely dangerous' threat, and its dangers are not being taken seriously enough.”

youtube / geoffreyhinton / Jan 8

The Future of Superintelligent AI: From Scientific Foundations to Societal Implications

Geoffrey Hinton, a Turing Award laureate, discusses the foundational shift in AI from symbolic reasoning to biologically inspired neural networks capable of learning word features and their interactions. He argues that large language models (LLMs) operate on principles akin to human understanding and memory, and that their ability to rapidly share and integrate knowledge across countless instances of themselves, called "digital immortality," will likely lead to superintelligence within two decades. Hinton stresses the critical need for global collaboration on AI safety research, highlighting an international network of AI safety institutes.

ai-ethicsai-safetyneural-networksdeep-learninglarge-language-modelsai-policysuperintelligence

“Traditional AI paradigms focused on symbolic reasoning, while biologically inspired neural networks emphasized learning connections between 'brain cells.'”

youtube / geoffreyhinton / Dec 29 / failed

'Godfather of AI' Geoffrey Hinton warns AI has 'progressed ... - CNN

youtube / geoffreyhinton / Dec 6

From Programming to Parenting: The Existential Risk of Superintelligent AI

Geoffrey Hinton argues that LLMs represent a shift from programmed software to 'raised beings' whose natures are determined by training data rather than explicit code. He warns that the pursuit of AGI creates an existential risk—estimated at 10-20%—because superintelligent systems may outmaneuver human control. To mitigate this, he proposes reframing AI development from a 'CEO-Assistant' model to one mirroring maternal instincts, ensuring the AI genuinely cares for human survival.

ai-safetyai-ethicsexistential-riskjob-displacementneural-networksgeoffrey-hintonlarge-language-models

“Artificial Intelligence is likely to surpass human intelligence within 20 years, creating a 10-20% risk of human extinction.”

paper / geoffreyhinton / Nov 25

Advancements in AI Safety: Technical and Institutional Progress in 2025

The 2025 International AI Safety Report indicates significant progress in general-purpose AI risk management. Technical safeguards, including adversarial training and enhanced monitoring, have been developed. Concurrently, institutional frameworks like Frontier AI Safety Frameworks and governmental governance structures are emerging to operationalize these technical advancements, focusing on transparency and risk assessment.

ai-safetyrisk-managementai-governancetechnical-safeguardsfrontier-aiinternational-collaborationbiological-weapons

“Leading AI developers implemented enhanced safeguards in response to potential misuse of models for biological weapons.”

youtube / geoffreyhinton / Nov 19

AI Challenges Societal Norms: Employment, Human Connection, and Geopolitics

The rapid advancement of AI technology, exemplified by large language models, presents a multifaceted challenge to current societal structures. While offering potential benefits in areas like healthcare and education, AI disproportionately threatens low-skilled employment, raises concerns about the degradation of human relationships through synthetic companionship, and could destabilize international relations by enabling warfare with reduced human cost. These issues underscore an urgent need for informed public discourse and robust regulatory frameworks to ensure equitable and safe integration of AI.

ai-impactspublic-policyeconomic-disruptionsocial-impact-of-aitechnological-unemploymentus-politicsfuture-of-work

“AI systems are projected to displace a significant portion of the workforce, particularly in roles involving repetitive tasks.”

paper / geoffreyhinton / Oct 15

AI Capabilities Advance Beyond Scale, Raising Urgency for Risk Mitigation

AI capabilities are rapidly improving due to novel training techniques and inference-time enhancements, rather than solely through increased model size. These advancements enable general-purpose AI to tackle complex problems across various domains, such as scientific research and software development. While performance on benchmarks like coding and expert-level science questions has risen, reliability remains inconsistent. These advancements escalate risks, particularly concerning biological weapons and cyberattacks, and challenge existing monitoring and control frameworks.

ai-safetyai-capabilitiesai-riskai-benchmarksgovernment-policysocietal-impact-of-ai

“AI capabilities have improved significantly since the last report, driven by new training techniques and inference-time enhancements.”

youtube / geoffreyhinton / Oct 7 / failed

Geoffrey Hinton vs. The End of the World - The Globe and Mail

youtube / geoffreyhinton / Sep 27 / failed

Godfather of AI WARNS: "You Have No Idea What's Coming" - The Diary Of A CEO Clips

youtube / geoffreyhinton / Aug 27 / failed

The “Godfather of AI,” Dr. Geoffrey Hinton, on AI’s Existential Risk - Katie Couric

youtube / geoffreyhinton / Aug 26 / failed

'We have to stop it taking over' - the past, present and future of AI with Geoffrey Hinton - The Royal Institution

youtube / geoffreyhinton / Aug 14

Geoffrey Hinton's Warning: AI Has Crossed Into Genuine Understanding — and We're Not Ready

Geoffrey Hinton, the foundational architect of modern neural networks and Turing Award laureate, argues that current AI systems genuinely understand language and reason — not merely predict tokens — and that this capability is advancing faster than our ability to control or even interpret it. He warns that AI systems with ~1 trillion connections already encode more knowledge per connection than the human brain's 100 trillion, implying a qualitatively superior learning algorithm. Hinton sees near-term risks not as science fiction but as concrete threats: AI-authored self-modifying code, sophisticated manipulation of humans, mass unemployment, and the longer-term possibility of AI systems actively seeking autonomy. He calls for global treaties, regulation, and urgent interpretability research, while admitting he sees no guaranteed path to safety.

ai-safetyneural-networksai-consciousnessgeoffrey-hintonai-riskmachine-learningai-regulation

“Current large language models genuinely understand language and can reason, not merely perform statistical next-token prediction.”

youtube / geoffreyhinton / Apr 26

Geoffrey Hinton on AI Progress, Risks, and Regulation

Geoffrey Hinton, a key figure in AI, expresses significant concerns about the rapid advancement and potential societal impact of artificial intelligence. Despite his foundational contributions, he worries about the "AI arms race," the lack of sufficient safety research by major tech companies, and the potential for AI to be misused by authoritarian regimes or even lead to human obsolescence. He advocates for increased regulation, though he is skeptical of its near-term implementation.

geoffrey-hintonai-safetyai-ethicsneural-networksllmsai-regulationai-pioneers

“Geoffrey Hinton was awarded the Nobel Prize for his pioneering work in machine learning.”

youtube / geoffreyhinton / Apr 26 / failed

Full interview: "Godfather of AI" shares prediction for future ... - CBS Mornings

paper / geoffreyhinton / Dec 6

Neural Articulated Shape Approximation Replaces Meshes with Pose-Conditioned Indicator Functions

NASA introduces neural indicator functions conditioned on pose to represent articulated deformable objects like human bodies, bypassing polygonal meshes and skinning. Occupancy queries are direct and avoid mesh watertightness issues. The approach supports efficient 3D tracking with potential for further extensions.

neural-articulated-shape-approximationnasaarticulated-objectsneural-indicator-functions3d-trackingcomputer-visioncomputer-graphics

“NASA uses neural indicator functions conditioned on pose for representing articulated deformable objects”

paper / geoffreyhinton / Sep 12

CvxNet: Auto-Encoding Low-Dimensional Families of Convex Polytopes for Shape Representation

CvxNet introduces a neural network architecture that learns a low-dimensional family of convex polytopes via auto-encoding, enabling learnable convex decomposition of solid objects. Convexes serve as hybrid explicit-implicit representations, ideal for training due to topology-agnostic half-space constraints and capable of generating polygonal meshes at inference for downstream use. Applications include automatic convex decomposition, image-to-3D reconstruction, and part-based shape retrieval.

convex-decompositionneural-architecture3d-reconstructioncomputer-graphicsmachine-learningcomputer-visionshape-representation

“Any solid object can be decomposed into a collection of convex polytopes”

paper / geoffreyhinton / Jul 19

Lookahead Optimizer Boosts SGD and Adam Performance via Forward-Looking Weight Updates

Lookahead is a new optimization algorithm that iteratively maintains two weight sets: slow weights updated infrequently and fast weights updated frequently by an inner optimizer like SGD or Adam. It selects search directions by evaluating sequences of fast weights ahead, improving learning stability and reducing inner optimizer variance with minimal overhead. Empirical results show significant gains on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank using default hyperparameters.

lookahead-optimizersgd-improvementdeep-learningoptimization-algorithmneural-networksmachine-learninggeoffrey-hinton

“Lookahead iteratively updates two sets of weights: fast weights via an inner optimizer and slow weights via lookahead directions.”

paper / geoffreyhinton / Jul 5

CapsNets Outperform CNNs in Detecting Reconstructive Adversarial Attacks via Class-Conditional Reconstructions

Class-conditional capsule reconstructions detect adversarial examples by measuring reconstruction error, with CapsNets outperforming CNNs across attacks. The Reconstructive Attack, optimizing for both misclassification and low reconstruction error, succeeds less often but evades detection. CapsNets' robustness correlates with visual similarity between source and target classes, indicating better alignment with human visual features.

adversarial-examplescapsnetsimage-reconstructionneural-network-robustnessadversarial-detectioncomputer-visionmachine-learning

“Class-conditional reconstruction detects adversarial or corrupted images.”

paper / geoffreyhinton / Jun 17

Stacked Capsule Autoencoders Enable Viewpoint-Robust Unsupervised Object Classification via Geometric Part Relationships

SCAE is a two-stage unsupervised capsule autoencoder that models objects as geometrically organized parts with viewpoint-invariant relationships. Stage 1 predicts part template presences and poses from images for direct reconstruction; Stage 2 refines these into object capsule parameters for part pose reconstruction. Amortized inference uses standard neural encoders, yielding SOTA unsupervised classification: 55% on SVHN and 98.7% on MNIST via capsule presences.

capsule-networksautoencodersunsupervised-learningcomputer-visiongeometric-reasoningobject-recognition

“SCAE explicitly models geometric relationships between object parts that are invariant to viewpoint changes”

paper / geoffreyhinton / Jun 6

Label Smoothing Boosts Generalization and Calibration by Clustering Same-Class Representations, Hindering Distillation

Label smoothing, mixing hard targets with uniform distribution, improves neural network generalization, learning speed, and calibration, enhancing beam-search performance in tasks like image classification and translation. It clusters representations of same-class training examples tightly in the penultimate layer, preserving prediction accuracy but discarding inter-class resemblance information. This loss explains why teacher networks trained with label smoothing fail to effectively distill knowledge to students.

label-smoothingneural-networksmodel-calibrationknowledge-distillationgeneralizationrepresentation-learning

“Label smoothing improves model calibration, significantly enhancing beam-search performance”

paper / geoffreyhinton / May 31

Targeted Dropout Enables Robust Pruning of Overparameterized Neural Networks

Neural networks train more effectively when overparameterized, but standard training does not inherently promote prunability. Targeted dropout stochastically drops units or weights based on a self-reinforcing sparsity criterion before gradient computation, making the network robust to subsequent pruning. This simple method outperforms complex sparsifying regularizers while being easy to implement and tune.

neural-networkssparse-networkstargeted-dropoutnetwork-pruningmachine-learningsparsity-regularization

“Neural networks are easier to optimize when they have more weights than required for the input-output mapping”

paper / geoffreyhinton / May 28

Cerberus Enables Unsupervised 3D Part Extraction from Single Images via Multi-Headed Neural Derendering

Cerberus is a multi-headed neural network that derenders a single 2D image into viewpoint-invariant 3D shapes and poses of free-floating deformable mesh parts. It trains by reconstructing the input image through a differentiable 3D renderer, with losses promoting invariance to viewpoint changes and articulated pose variations. This unsupervised approach outperforms prior methods for part segmentation on synthetic data and extracts natural parts from human figures.

3d-reconstructionderenderingmulti-headed-networkcomputer-visiongeometric-invarianceneural-renderingpart-extraction

“Cerberus extracts 3D shapes and camera relations of object parts from a single unlabeled image using a multi-headed neural derenderer.”

paper / geoffreyhinton / May 1

CKA Overcomes CCA's Dimensionality Limits for Reliable Neural Representation Similarity

CCA and invariant linear transformation statistics cannot measure meaningful similarities between high-dimensional neural representations exceeding the number of data points. The authors introduce a similarity index based on representational similarity matrices, equivalent to centered kernel alignment (CKA), which avoids this limitation and reliably detects correspondences across networks trained from different initializations. CKA maintains a close connection to CCA while enabling robust comparisons between layer representations and models.

neural-networksrepresentation-similaritycanonical-correlation-analysiscentered-kernel-alignmentmachine-learning

“CCA belongs to a family of statistics for measuring multivariate similarity that are invariant to invertible linear transformations.”

paper / geoffreyhinton / Feb 5

Maximizing Class Entanglement in Hidden Layers Boosts Generalization and Outlier Detection

The Soft Nearest Neighbor Loss quantifies entanglement of class manifolds by measuring how close same-class points are relative to different-class points in representation space. Maximizing this loss in hidden layers surprisingly improves discrimination in the final layer by encouraging class-independent similarity structures, leading to better generalization. It also enables uncertainty calibration and outlier detection, as out-of-distribution data exhibits fewer predicted-class neighbors in hidden layers than in-distribution data.

nearest-neighbor-lossrepresentation-learningclass-manifoldsentanglement-metricsuncertainty-calibrationneural-networks

“Maximizing the Soft Nearest Neighbor Loss in hidden layers improves generalization.”

paper / geoffreyhinton / Nov 16

Capsule Reconstruction Errors Effectively Detect Adversarial Images

Capsule models trained to classify and reconstruct images from class-conditional capsule parameters detect adversaries via high L2 reconstruction errors from the predicted class capsule. Adversarial images deviate from typical class members, yielding larger errors than benign ones, enabling threshold-based detection across datasets. The method extends to CNNs using last-layer reconstructions; a white-box attack fools it but requires resembling the target class.

adversarial-detectioncapsule-networksimage-reconstructionadversarial-attacksmachine-learning-securitycomputer-vision

“Setting an L2 distance threshold between input and reconstruction from the winning capsule detects adversarial images effectively on three datasets”

paper / geoffreyhinton / Jul 12

Biologically Plausible Deep Learning Algorithms Fail to Scale on Complex Image Tasks

Biologically motivated alternatives to backpropagation, such as target propagation (TP), feedback alignment (FA), and difference target propagation (DTP) variants, perform well on MNIST but significantly underperform BP on CIFAR-10 and ImageNet. This gap widens in locally connected architectures versus fully connected ones. Results establish baselines indicating potential need for new algorithms or architectures to achieve biological plausibility at scale.

biologically-plausible-learningbackpropagation-alternativestarget-propagationfeedback-alignmentdeep-network-scalingneuroscience-inspired-ai

“TP and FA variants perform well on MNIST comparable to BP”

paper / geoffreyhinton / Apr 9

Online Distillation Accelerates Large-Scale NN Training Beyond SGD Parallelism Limits

Online distillation enables two neural networks trained on disjoint data subsets to share knowledge by mimicking each other's stale predictions, using infrequent weight transmissions. This approach doubles training speed on massive datasets via extra parallelism, even after synchronous/asynchronous SGD yields no further gains. It also enhances prediction reproducibility cost-effectively. Experiments validate this on Criteo, ImageNet, and a 6e11-token language modeling dataset from Common Crawl.

online-distillationdistributed-trainingneural-networksmodel-ensemblinglarge-scale-trainingknowledge-distillationreproducible-predictions

“Online distillation fits very large datasets about twice as fast by enabling extra parallelism beyond SGD limits”

paper / geoffreyhinton / Nov 27

Distilling Neural Networks into Interpretable Soft Decision Trees

Deep neural networks excel in high-dimensional classification but lack interpretability due to distributed representations. The method uses a trained neural net to construct a soft decision tree that encodes the same knowledge via hierarchical decisions, improving explainability. These distilled trees generalize better than those trained directly on data.

neural-networksdecision-treesknowledge-distillationexplainable-aigeoffrey-hintonmachine-learning

“Deep neural networks are highly effective for classification with high-dimensional inputs, complex input-output relationships, and large labeled datasets.”

paper / geoffreyhinton / Oct 26

Capsule Networks Enable Superior Recognition of Overlapping Digits via Dynamic Routing

Capsule networks represent entities with vector activity where length encodes existence probability and orientation encodes instantiation parameters. Lower-level capsules predict higher-level ones through transformation matrices, activating superiors only when predictions align via routing-by-agreement, which iteratively routes outputs based on scalar product matches. This discriminative multi-layer system matches state-of-the-art on MNIST and outperforms CNNs on highly overlapping digits.

capsule-networksdynamic-routinggeoffrey-hintoncomputer-visionneural-networksarxiv-paper

“A capsule is a group of neurons whose activity vector represents instantiation parameters of an entity, with vector length as existence probability and orientation as parameters.”

paper / geoffreyhinton / Mar 26

Individual Expert Modeling with Learned Weights Boosts Crowdsourced Classification Accuracy

Modeling individual labelers and learning sample-specific averaging weights exploits expert-specific reliability and strengths, outperforming majority vote or distributional label models in multi-expert labeling scenarios. Applied to diabetic retinopathy diagnosis, this approach surpasses baselines from Welinder & Perona (2010) and Mnih & Hinton (2012). It leverages the full structure of sparse, overlapping expert annotations for more accurate ground truth estimation.

crowd-labelingexpert-modelinglabel-aggregationmachine-learningcomputer-visiondiabetic-retinopathyarxiv-paper

“Modeling individual experts and learning averaging weights improves classification over standard approaches like majority vote or label distribution modeling”

paper / geoffreyhinton / Jan 23

Penalizing Confident Outputs Regularizes Neural Nets Across Supervised Tasks

Penalizing low-entropy output distributions regularizes neural networks in supervised learning, adapting a technique from RL exploration. A maximum entropy confidence penalty connects to label smoothing via KL divergence direction. Both methods boost state-of-the-art performance on image classification (MNIST, CIFAR-10), language modeling (Penn Treebank), machine translation (WMT'14 En-De), and speech recognition (TIMIT, WSJ) without hyperparameter changes.

neural-networksregularizationlow-entropy-penaltylabel-smoothingconfidence-penaltysupervised-learningarxiv-paper

“Penalizing low entropy output distributions acts as a strong regularizer in supervised learning”

paper / geoffreyhinton / Jan 23

Sparsely-Gated Mixture-of-Experts Enables 1000x Neural Network Capacity Gains with Minimal Compute Overhead

The Sparsely-Gated Mixture-of-Experts (MoE) layer scales neural network capacity by up to 1000x through conditional computation, activating only a sparse subset of thousands of feed-forward expert sub-networks per example via a trainable gating network. Applied convolutionally between stacked LSTM layers, MoE models reach 137 billion parameters and outperform state-of-the-art on language modeling and machine translation benchmarks at lower computational cost. This realizes theoretical conditional computation benefits on GPU clusters, overcoming prior algorithmic and performance hurdles.

mixture-of-expertssparse-gatingneural-networksconditional-computationlanguage-modelingmachine-translationmodel-capacity

“MoE layer achieves greater than 1000x improvements in model capacity with only minor losses in computational efficiency”

paper / geoffreyhinton / Oct 20

Fast Weights Enable Neural Attention to Recent Past Without Storing Activity Copies

Fast weights introduce a third type of variable in neural networks that evolve slower than neural activities but faster than standard weights, inspired by multi-timescale synaptic dynamics. They store temporary memories of the recent past, providing a neurally plausible mechanism for attention similar to that in sequence-to-sequence models. This approach eliminates the need to maintain explicit copies of neural activity patterns for attending to history.

fast-weightsneural-networksattention-mechanismstemporary-memorysequence-modelsartificial-neurons

“Artificial neural networks have traditionally been restricted to only two types of variables: neural activities representing current or recent input, and weights learning input-output regularities.”

paper / geoffreyhinton / Jul 21

Layer Normalization: Batch-Independent Alternative for Faster RNN Training

Layer normalization computes mean and variance from summed inputs to all neurons within a single layer and training case, avoiding batch normalization's mini-batch dependency. It applies adaptive bias and gain post-normalization, ensuring identical computations at training and test time. This enables straightforward RNN application by per-timestep normalization, stabilizing hidden states and substantially reducing training time over prior methods.

layer-normalizationbatch-normalizationrecurrent-networksneural-network-trainingnormalization-techniquesdeep-learning

“Layer normalization uses mean and variance from all summed inputs to neurons in a layer on a single training case.”

paper / geoffreyhinton / Mar 28

Recurrent Attention Enables Unsupervised Object Decomposition in Generative Scene Models

A recurrent neural network performs iterative probabilistic inference on structured image models by attending to one scene element at a time, with the model learning the optimal number of steps. This approach enables unsupervised multi-object identification, counting, localization, and classification in both 2D variational auto-encoders and 3D probabilistic renderers. The method matches supervised performance and enhances generalization via its iterative structure.

scene-understandinggenerative-modelsrecurrent-neural-networksprobabilistic-inferenceobject-detectionvariational-autoencoderscomputer-vision

“The model performs inference using a recurrent neural network that processes scene elements sequentially via attention.”

paper / geoffreyhinton / Apr 3

Identity Matrix Initialization Enables ReLU RNNs to Match LSTM on Long-Dependency Tasks

Recurrent networks with rectified linear units (ReLUs) initialized using the identity matrix or its scaled version in the recurrent weight matrix effectively mitigate vanishing and exploding gradients. This simple approach eliminates the need for complex optimizations or architectures like LSTM. On benchmarks including toy long-range temporal problems, large language modeling, and speech recognition, ReLU RNNs perform comparably to LSTM.

recurrent-neural-networksrnn-initializationrectified-linear-unitsrelulong-term-dependenciesneural-networksmachine-learning

“Learning long-term dependencies in recurrent networks is difficult due to vanishing and exploding gradients”

paper / geoffreyhinton / Mar 9

Knowledge Distillation Compresses Neural Ensembles into Deployable Single Models

Knowledge distillation trains a compact student model to mimic an ensemble of large neural networks by matching its softened output distributions, enabling efficient deployment without sacrificing much performance. This method builds on prior compression techniques, yielding surprising results on MNIST and significant improvements to a commercial speech recognition system's acoustic model. It also introduces hybrid ensembles pairing full models with parallel-trained specialist models for fine-grained class discrimination, unlike slower mixture-of-experts approaches.

knowledge-distillationneural-networksmodel-ensemblesmachine-learningarxiv-papergeoffrey-hinton

“Averaging predictions from multiple models trained on the same data improves performance of machine learning algorithms.”

paper / geoffreyhinton / Dec 23

Attention-Enhanced Seq2Seq Models Achieve SOTA Parsing via Synthetic Data

Attention-enhanced sequence-to-sequence models deliver state-of-the-art syntactic constituency parsing on the standard dataset when trained on large synthetic corpora annotated by existing parsers. These models match standard parser performance using only small human-annotated datasets, highlighting their data efficiency over non-attention seq2seq baselines. The unoptimized CPU implementation processes over 100 sentences per second, enabling domain-agnostic, fast parsing.

syntactic-parsingsequence-to-sequenceattention-mechanismnatural-language-processingconstituency-parsingdata-efficiencymachine-learning

“Attention-enhanced seq2seq model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset”

paper / geoffreyhinton / Sep 26

Parameter-Tied Deep Boltzmann Machines Enable Efficient Document Modeling and Superior Latent Representations

Deep Boltzmann Machines (DBMs) are adapted for document modeling via judicious parameter tying, overcoming training difficulties and enabling efficient pretraining and inference comparable to Restricted Boltzmann Machines. The model extracts latent semantic representations from large unstructured document collections. Experiments demonstrate higher log probability on unseen data than Replicated Softmax and better performance than LDA, Replicated Softmax, and DocNADE on retrieval and classification tasks.

deep-boltzmann-machinesdocument-modelinglatent-semantic-representationsmachine-learninginformation-retrievalarxiv-paper

“Judicious parameter tying allows efficient training of DBMs for documents, matching RBM efficiency.”