absorb.md — A knowledge graph of what AI thinkers are actually saying

github_readme / fchollet / Nov 1

Keras Blog: Contribution Guidelines and Technical Details

The Keras blog, hosted on GitHub Pages and generated with Pelican, serves as a platform for content relevant to Keras users. Contributions are accepted via Pull Requests to the `content` branch, encompassing a range of topics from basic tutorials to advanced application demonstrations.

kerasdeep-learningmachine-learningopen-sourcetechnical-blog

“The Keras blog is accessible at blog.keras.io.”

blog / fchollet / Oct 9

LLMs as Vector Program Databases for Emergent NLP Capabilities

LLMs, despite architectural differences, share foundational similarities with Word2Vec, particularly in embedding tokens in a vector space where proximity denotes correlation. This mechanism, facilitated by self-attention in Transformers, leads to semantically continuous and interpolative embedding spaces. Unlike Word2Vec's linear transformations, LLMs act as "vector program databases" that store and execute complex, non-linear functions—these functions are indexed and triggered via prompts, generating emergent semantic arithmetic capabilities.

llm-prompt-engineeringword-embeddingsself-attentiontransformer-architecturehebbian-learningvector-databasesgenerative-ai

“Word2Vec models demonstrated emergent 'word arithmetic' capabilities, such as performing analogical reasoning with vector operations (e.g., V(king) - V(man) + V(woman) = V(queen)), despite not being explicitly trained for such tasks.”

youtube / fchollet / Sep 22

Deep Learning Requires Interpolative Problems and Abundant Data for Effective Generalization

Deep learning models excel at value-centric abstraction, which involves learning continuous geometric morphings between input and target vector spaces. This approach is highly effective for perception problems where data naturally lies on low-dimensional manifolds, allowing for interpolation. However, deep learning struggles with discrete, non-interpolative problems and requires extensive, densely sampled datasets to generalize effectively. The core limitation is that deep learning’s generalization capability stems primarily from the interpolative structure of the data rather than inherent model properties.

ai-generalizationdeep-learning-limitationsprogram-synthesishuman-intelligenceabstractionai-reasoninghybrid-ai-systems

“Deep learning models are primarily effective at value-centric abstraction, which relies on continuous comparisons and geometric relationships.”

youtube / fchollet / Jun 1 / failed

François Chollet : Measures of Intelligence | Lex Fridman Podcast #120 (Lex Fridman)

github_readme / fchollet / Apr 24

Pure Python Nelder-Mead for Environments with Limited Library Support

François Chollet has developed a pure Python/Numpy implementation of the Nelder-Mead optimization algorithm. This implementation addresses the limitations of environments like PyPy and Google App Engine, where SciPy—which typically provides Nelder-Mead—is unavailable. The project leverages common support for Numpy in these restricted environments to enable the use of Nelder-Mead.

nelder-mead-algorithmpython-implementationnumpyoptimizationpypy-compatibilitygoogle-app-engine

“A pure Python/Numpy implementation of the Nelder-Mead optimization algorithm exists.”

youtube / fchollet / Apr 16

Deep Learning & Generalization: Interpolation vs. Program Synthesis

François Chollet differentiates deep learning (DL) models as interpolative systems operating on continuous, learnable manifolds, suited for perception and intuitive problems. Conversely, he argues that algorithmic, discrete reasoning tasks require program synthesis methods for true generalization. Hybrid AI systems that combine both approaches are necessary for advanced intelligence, leveraging DL for pattern recognition and program synthesis for combinatorial exploration.

ai-safetyartificial-general-intelligencedeep-learning-limitationsprogram-synthesisgeneralizationfrancois-cholletkeras

“Deep learning models generalize primarily through interpolation on complex, high-dimensional manifolds.”

github_readme / fchollet / Oct 1

Keras Model Repository Deprecation and Usage

The GitHub repository fchollet/deep-learning-models is deprecated. Users should now utilize the `keras.applications` module for image classification models like VGG16, VGG19, ResNet50, Inception v3, and CRNN. The repository provides code examples for image classification, feature extraction, and intermediate layer feature extraction, with pre-trained weights for ImageNet and MSD datasets.

kerasimage-classificationdeep-learningcomputer-visionneural-networkspretrained-models

“The fchollet/deep-learning-models GitHub repository is deprecated.”

youtube / fchollet / Aug 31

Rethinking AI: Intelligence as Skill Acquisition, Not Skill Itself

François Chollet argues against the common misconception of equating intelligence with demonstrated skill, particularly in the context of AI. He proposes that true intelligence lies in the efficiency and adaptability of acquiring new skills for novel, unanticipated tasks. Chollet highlights the critical distinction between a system that can perform a task due to extensive training data or hard-coded rules, and one that can generalize and improvise in genuinely new environments, thus emphasizing the process of learning over the output of learning.

artificial-general-intelligencecognitive-sciencemachine-learningintelligence-measurementarc-challenge

“Intelligence is defined as the efficiency of acquiring new skills for novel, unanticipated tasks, not the skills themselves.”

youtube / fchollet / Sep 14

Challenging the AI "Intelligence Explosion" Narrative

François Chollet, creator of Keras and AI researcher at Google, challenges the common "intelligence explosion" narrative, arguing that intelligence is not an isolated property but emerges from interaction between a brain, body, and environment. He posits that focusing solely on brain (or algorithm) improvements ignores crucial bottlenecks and external dependencies, leading to an oversimplified view of AI progress. Chollet suggests that general AI systems, like science itself, will face exponential friction, leading to linear, not exponential, overall progress despite increasing resource consumption.

ai-ethicsdeep-learning-limitsai-philosophyfuture-of-aikeras-tensorflowai-hypeagi

“Intelligence is not an isolated property of a brain but emerges from the interaction between a brain, body, and environment.”

github_gist / fchollet / Apr 6

Seamlessly Interfacing NumPy and Keras Backend Operations

This Python Gist demonstrates a method for creating a unified interface for numerical operations that can seamlessly handle both NumPy arrays and Keras backend tensors (e.g., TensorFlow). It achieves this by dynamically dispatching calls to either the NumPy implementation or the Keras backend implementation based on the input type. This enables writing code once that can operate efficiently with different numerical computing frameworks.

kerastensorflownumpypythondeep-learningmachine-learningbackend-development

“The `NPTF` class provides a unified interface for numerical operations that can accept both NumPy arrays and Keras backend tensors.”

blog / fchollet / Dec 30 / failed

The Memories Around Us

github_gist / fchollet / Oct 5

TensorFlow Keras Fails on DeferredTensor Conversion in Hybrid Imperative-Symbolic Seq2Seq Models

Keras functional API supports blending imperative ops like tf.exp and constant tensors into symbolic layer graphs, but encounters runtime errors in eager execution for complex recurrent models. In seq2seq LSTMs, passing encoder states as initial_state to decoder LSTM triggers ValueError during RNN step computation. The failure stems from unhandled DeferredTensor objects in matmul operations within the LSTM cell, blocking tensor conversion.

kerastensorflowsymbolic-programmingdifferentiable-programmingseq2seqlstmbug-report

“Keras Model accepts non-layer ops like tf.exp directly in functional API graphs”

blog / fchollet / Sep 8 / failed

François Chollet's Personal Software Engineering Principles for Effective Development

François Chollet shares introspective notes on software engineering practices drawn from his experience. Key principles include prioritizing simplicity, modularity, and testability to enhance code reliability and maintainability. The post emphasizes disciplined habits like writing tests first and avoiding over-engineering for sustainable productivity.

software-engineeringprogramming-practicescode-qualitybest-practicesfrancois-cholletdeveloper-advice

“Simplicity is the primary goal in software design over cleverness or feature richness.”

blog / fchollet / Mar 28 / failed

Chollet's AI Concerns Limited to Provided Content Snippet

The provided content contains only the title and metadata of François Chollet's blog post "What Worries Me About AI," with no substantive body text ingested. Core insights on AI worries cannot be extracted due to absence of detailed arguments or claims. Analysis is constrained to surface-level identification of the topic as expressing concerns about AI from a prominent AI researcher.

ai-risksai-safetyfrancois-cholletai-concernsai-developmentmachine-learningfuture-of-ai

“François Chollet authored a blog post titled 'What Worries Me About AI'”

github_readme / fchollet / Feb 2

Keras RemoteMonitor Enables Real-Time Metric Visualization via Flask API

Hualos is a demo project using a Flask server with gevent to expose an API for publishing and consuming JSON training events from Keras' RemoteMonitor callback. The landing page at localhost:9000 consumes these events and renders metrics in real-time using c3.js graphs built on d3.js. Integration requires starting the server with api.py, loading the page, and adding RemoteMonitor(root='http://localhost:9000') to model.fit callbacks.

kerasvisualizationflask-apiremote-monitorreal-time-graphingmachine-learning-tools

“Hualos demo uses Flask server to expose API for Keras RemoteMonitor events”

blog / fchollet / Nov 27 / failed

Why Intelligence Explosion in AI is Fundamentally Implausible

François Chollet argues that intelligence explosion—recursive self-improvement leading to superintelligence—is implausible due to fundamental limits in generalization from finite data. Intelligence is defined by adapting to novel situations via compression of prior knowledge, not raw optimization power. Scaling compute and data cannot overcome the combinatorial explosion of possible environments, making ASI unreachable through brute-force methods.

intelligence-explosionai-limitsfrancois-cholletagi-skepticismai-safetysuperintelligence

“Intelligence explosion via recursive self-improvement is implausible”

blog / fchollet / Nov 21 / failed

Three Core Principles for Designing User-Centric APIs

API design prioritizes user experience through three rules: deliberately designing end-to-end workflows that map to domain concepts without exposing implementation details; reducing cognitive load via consistent naming, minimal new concepts, balanced parameterization, automation, and example-rich docs; and providing interactive feedback with early error catching, detailed actionable messages, and user support channels. A litmus test for quality is whether users can recall common workflows without docs after one exposure. These principles derive from empathizing with all users, countering smart engineer syndrome and masochistic attitudes toward complexity.

api-designux-designsoftware-engineeringdeveloper-toolscognitive-loaderror-handlinguser-workflows

“API workflows should map closely to domain-specific concepts like 'patty', 'cheese', or in deep learning, 'models', 'layers', 'optimizers'.”

github_gist / fchollet / Sep 21

Keras RNN API Delivers 15% Faster Stacked LSTM Training on CPU

Keras introduces a new RNN API using RNN(cells) for stacking LSTM layers, achieving 15% faster training than sequential LSTM layers on CPU. Benchmark on 10k samples of 60 timesteps and 64 dims shows classic stacked LSTMs at 35s/epoch versus 30s/epoch for the new approach. Both use RMSprop and MSE loss with batch size 128 over 4 epochs.

kerasrnnlstmstacked-rnnsperformancedeep-learning

“Classic stacked LSTM model takes 35 seconds per epoch on CPU”

blog / fchollet / Jul 18 / failed

Hybrid Programmatic Neural Models to Unlock Reasoning and Extreme Generalization

Deep learning will evolve from pure differentiable geometric transformations to program-like models blending algorithmic primitives (e.g., loops, conditionals, data structures) with neural layers, enabling reasoning, abstraction, and extreme generalization beyond current pattern recognition limits. Training will shift beyond backpropagation to non-differentiable methods like genetic algorithms and evolution strategies, paired with automated architecture search (AutoML) and lifelong learning via reusable modular subroutines from a global meta-learning library. This enables efficient model growth with minimal human engineering, achieving human-like generalization across tasks using sparse new data.

deep-learningfuture-aiprogram-synthesismeta-learningautomllifelong-learningneural-augmentation

“Future ML models will integrate programming primitives like for loops, if branches, and data structures alongside differentiable layers to enable reasoning and abstraction”

blog / fchollet / Jul 17 / failed

Deep Learning Excels at Geometric Mapping but Fails at Reasoning and Extreme Generalization

Deep learning models perform continuous geometric transformations on high-dimensional vector spaces, effectively mapping input manifolds to output manifolds given dense training data. However, they cannot represent discrete reasoning, long-term planning, or algorithmic tasks like generating code from specifications or learning sorting algorithms, regardless of data scale. They achieve local generalization near training data but lack human-like extreme generalization for novel situations, remaining brittle to adversarial perturbations without true causal understanding.

deep-learninggeometric-interpretationmodel-limitationsadversarial-examplesgeneralizationai-cognitionneural-networks

“Deep learning cannot train a model to generate appropriate source code from English product descriptions, even with millions of examples.”

github_gist / fchollet / May 3

Compact Xception Variant for 200x200 Images with 100 Classes, Sans Residuals

François Chollet provides a downsized Xception CNN architecture omitting residual connections, tailored for 200x200x3 inputs and 100-way classification. It employs an initial 3x3 Conv2D(32, stride=2) with ReLU and max pooling, followed by three depthwise-separable Conv2D blocks (128, 256, 512 filters) each with dual 3x3 SeparableConv2D layers, BatchNorm, ReLU, and stride-2 pooling. The model culminates in global average pooling and softmax output, prioritizing efficiency via separable convolutions.

keras-modelxception-architecturesmall-xceptioncomputer-visiondeep-learningcnnfrancois-chollet

“The model accepts input shape (200, 200, 3) and outputs 100 classes.”

github_gist / fchollet / Aug 13

Keras Code Snippets for Linear and Logistic Regression with Regularization

François Chollet's Gist provides minimal Keras examples for 1D MSE linear regression using a single Dense layer. It extends to binary logistic regression with sigmoid activation and binary_crossentropy loss. A third variant incorporates L1/L2 regularization via l1l2 on the weight matrix.

keras-tutoriallogistic-regressionlinear-regressionregularizationmachine-learningpython-code

“Keras implements 1D linear regression using Sequential model with Dense(1, input_dim=x.shape[1]) and MSE loss.”

blog / fchollet / Jul 6 / failed

Democratizing AI via Accessible Tools Ensures Equitable Societal Transformation

Deep learning has advanced rapidly to near-human performance in tasks like speech/image recognition and Go, yet remains underexploited in everyday products and processes. Analogous to the Internet's eventual ubiquity, AI will permeate all industries, automating intellectual tasks, disrupting jobs, and enabling a prosperity era—but only if made accessible to non-experts. Keras lowers barriers by simplifying deep learning for users with basic CS literacy, fostering widespread value creation as demonstrated by startups like Comma.ai; early adopters must prioritize open tools, tutorials, and knowledge sharing to prevent elite capture and ensure positive outcomes.

deep-learningai-futurekerasai-democratizationtechnological-impactai-accessibilityai-adoption

“Deep learning progressed from near-unusable to near-human accuracy in speech and image recognition in just 5 years.”

github_gist / fchollet / Jun 6

Keras Fine-Tuning Script for Small Image Datasets with VGG16 Backbone

François Chollet's Keras script demonstrates fine-tuning VGG16 on a small cats-vs-dogs dataset by freezing the first 25 layers, adding a custom binary classifier on top, and using heavy data augmentation with SGD at low learning rate. The approach leverages ImageNet pretraining for convolutional base while training only top layers on 2000 training and 800 validation images of 150x150 pixels over 50 epochs. Key hyperparameters include batch size 16, momentum 0.9, and augmentation via shear, zoom, and flips to combat overfitting.

kerasfine-tuningvgg16image-classificationtransfer-learningdata-augmentationmodel-freezing

“Fine-tuning uses 1000 training and 400 validation images per class from Kaggle dogs-vs-cats dataset”

github_gist / fchollet / Jun 6

Keras Transfer Learning Tutorial for Cats-vs-Dogs Using VGG16 Bottleneck Features

François Chollet's Keras script demonstrates transfer learning by extracting VGG16 bottleneck features from 2000 training and 800 validation images (1000/400 cats and dogs each), saving them as NumPy arrays, then training a simple top classifier (Flatten-Dense256-Dropout-Dense1 sigmoid) with RMSprop and binary crossentropy for 50 epochs. Common issues include using 'wb' mode for np.save/np.load to avoid UnicodeDecodeError, understanding bottleneck_features as (N, 4, 4, 512) feature maps rather than probabilities, and adapting for multi-class via softmax/categorical_crossentropy. Prediction code uses VGG16 for new image features fed to the top model or full fine-tuned model generators with argmax on class indices.

keras-tutorialvgg16transfer-learningimage-classificationdogs-vs-catscode-debuggingmodel-prediction

“Training uses exactly 1000 cat and 1000 dog images for training, 400 each for validation”

github_gist / fchollet / Jun 6

Keras Data Augmentation CNN Achieves High Accuracy on Small Image Datasets via Transfer Learning Principles

François Chollet's Keras script demonstrates building a CNN for binary image classification (cats vs. dogs) using only 2000 training images (1000 per class) and 800 validation images. Key technique is heavy data augmentation during training (shear, zoom, flips) with a simple 3-layer Conv2D architecture trained for 50 epochs on 150x150 RGB images. Model compiles with binary crossentropy and RMSprop, saving weights post-training, enabling strong generalization on limited data.

kerasimage-classificationdata-augmentationcnn-modelbinary-classificationtensorflowmodel-training

“Dataset uses 1000 training and 400 validation images per class for binary classification.”

github_gist / fchollet / Mar 12

Keras Functional API: Layers Callable on Tensors for Graph Models

François Chollet proposes a functional Keras API where layers are callable on input tensors, enabling concise graph model construction via tensor chaining and topology tracking. Key features include shared layers via reuse, Lambda for arbitrary ops, merge functions, and backward-compatible Model compilation/training with flexible input/output dicts. Discussions resolve masking via node propagation, layer querying for weight transfer, and Sequential integration by making models callable.

kerasfunctional-apigraph-apilstm-layersmodel-designdeep-learningapi-evolution

“Layers can be called directly on input tensors to produce output tensors with preserved topology”

github_gist / fchollet / May 28

Theano Functions Enable Extraction of Intermediate Keras Model Activations

François Chollet demonstrates defining a Theano function to compute and output activations from intermediate layers in a Keras Sequential model. The approach uses model.layers to access layer inputs and outputs, creating a function like theano.function([model.layers[0].input], model.layers[1].output(train=False)). This allows direct transformation of input batches through specific layers without full forward passes, useful for visualization and analysis in Theano-backed Keras.

kerastheanoneural-networksintermediate-activationsdeep-learningmodel-inspection

“A Theano function can extract activations from an intermediate Keras layer using layer input and output references.”

blog / fchollet / Aug 10

Scientific Progress is Linear, Not Exponential

Scientific and technological progress, despite exponential increases in resources like researchers and computing power, generally proceeds at a linear rate. This is because the difficulty of making impactful discoveries within a given field increases exponentially over time, effectively canceling out the benefits of increased resources. Therefore, the notion of an "intelligence explosion" or technological "Singularity" driven by exponential progress is fundamentally flawed; even a self-improving AI would face this linearity constraint without exponentially increasing resources.

artificial-intelligencetechnological-singularityscientific-progressai-ethicsinnovationfuturismphilosophy-of-science

“Scientific progress in established fields is linear, despite exponential growth in resources and accelerating returns.”

blog / fchollet / Apr 30

Designing Web Platforms for Enhanced Creativity and Intelligence

Current web platforms prioritize information flow and commercial interests, leading to "collective stupidity" and a focus on low-quality, attention-grabbing content. There is a critical need to redesign these platforms to incorporate psychological aspects of content creation and consumption, fostering higher quality content, genuine creativity, and collective intelligence. This involves shifting from content-neutral models to those that actively shape and improve the quality of user-generated content by focusing on project-driven engagement, motivational feedback, curated inspiration, and accessible learning.

web-productssocial-mediacontent-qualityuser-psychologycreative-incentivesrecommendation-systemscollective-intelligence

“Current web products prioritize information flow and commercial motives over psychological aspects of content sharing and creation, leading to a decline in content quality.”

blog / fchollet / Dec 5

The "Piano-Playing-Cat Paradigm" and the Wasted Potential of the Internet

The internet, as currently structured, primarily fosters "collective stupidity" rather than "collective intelligence." This is due to an infrastructure that prioritizes attention-grabbing, "fun" content, epitomized by "piano-playing-cat" videos, over meaningful, productive interactions. This paradigm, driven by view-count-based popularity, leads to a significant waste of human time and potential, as evidenced by the billions of hours spent on platforms like Facebook with little to no genuine return for users.

internet-philosophysocial-media-critiqueweb-analyticsuser-engagementcognitive-social-webonline-education

“Current internet infrastructure, particularly social networks, prioritizes 'useless garbage' and 'fun' content, leading to a waste of human potential.”