Chronological feed of everything captured from François Chollet.
The Keras blog, hosted on GitHub Pages and generated with Pelican, serves as a platform for content relevant to Keras users. Contributions are accepted via Pull Requests to the `content` branch, encompassing a range of topics from basic tutorials to advanced application demonstrations.
LLMs, despite architectural differences, share foundational similarities with Word2Vec, particularly in embedding tokens in a vector space where proximity denotes correlation. This mechanism, facilitated by self-attention in Transformers, leads to semantically continuous and interpolative embedding spaces. Unlike Word2Vec's linear transformations, LLMs act as "vector program databases" that store and execute complex, non-linear functions—these functions are indexed and triggered via prompts, generating emergent semantic arithmetic capabilities.
Deep learning models excel at value-centric abstraction, which involves learning continuous geometric morphings between input and target vector spaces. This approach is highly effective for perception problems where data naturally lies on low-dimensional manifolds, allowing for interpolation. However, deep learning struggles with discrete, non-interpolative problems and requires extensive, densely sampled datasets to generalize effectively. The core limitation is that deep learning’s generalization capability stems primarily from the interpolative structure of the data rather than inherent model properties.
François Chollet has developed a pure Python/Numpy implementation of the Nelder-Mead optimization algorithm. This implementation addresses the limitations of environments like PyPy and Google App Engine, where SciPy—which typically provides Nelder-Mead—is unavailable. The project leverages common support for Numpy in these restricted environments to enable the use of Nelder-Mead.
François Chollet differentiates deep learning (DL) models as interpolative systems operating on continuous, learnable manifolds, suited for perception and intuitive problems. Conversely, he argues that algorithmic, discrete reasoning tasks require program synthesis methods for true generalization. Hybrid AI systems that combine both approaches are necessary for advanced intelligence, leveraging DL for pattern recognition and program synthesis for combinatorial exploration.
The GitHub repository fchollet/deep-learning-models is deprecated. Users should now utilize the `keras.applications` module for image classification models like VGG16, VGG19, ResNet50, Inception v3, and CRNN. The repository provides code examples for image classification, feature extraction, and intermediate layer feature extraction, with pre-trained weights for ImageNet and MSD datasets.
François Chollet argues against the common misconception of equating intelligence with demonstrated skill, particularly in the context of AI. He proposes that true intelligence lies in the efficiency and adaptability of acquiring new skills for novel, unanticipated tasks. Chollet highlights the critical distinction between a system that can perform a task due to extensive training data or hard-coded rules, and one that can generalize and improvise in genuinely new environments, thus emphasizing the process of learning over the output of learning.
François Chollet, creator of Keras and AI researcher at Google, challenges the common "intelligence explosion" narrative, arguing that intelligence is not an isolated property but emerges from interaction between a brain, body, and environment. He posits that focusing solely on brain (or algorithm) improvements ignores crucial bottlenecks and external dependencies, leading to an oversimplified view of AI progress. Chollet suggests that general AI systems, like science itself, will face exponential friction, leading to linear, not exponential, overall progress despite increasing resource consumption.
This Python Gist demonstrates a method for creating a unified interface for numerical operations that can seamlessly handle both NumPy arrays and Keras backend tensors (e.g., TensorFlow). It achieves this by dynamically dispatching calls to either the NumPy implementation or the Keras backend implementation based on the input type. This enables writing code once that can operate efficiently with different numerical computing frameworks.
Keras functional API supports blending imperative ops like tf.exp and constant tensors into symbolic layer graphs, but encounters runtime errors in eager execution for complex recurrent models. In seq2seq LSTMs, passing encoder states as initial_state to decoder LSTM triggers ValueError during RNN step computation. The failure stems from unhandled DeferredTensor objects in matmul operations within the LSTM cell, blocking tensor conversion.
François Chollet shares introspective notes on software engineering practices drawn from his experience. Key principles include prioritizing simplicity, modularity, and testability to enhance code reliability and maintainability. The post emphasizes disciplined habits like writing tests first and avoiding over-engineering for sustainable productivity.
The provided content contains only the title and metadata of François Chollet's blog post "What Worries Me About AI," with no substantive body text ingested. Core insights on AI worries cannot be extracted due to absence of detailed arguments or claims. Analysis is constrained to surface-level identification of the topic as expressing concerns about AI from a prominent AI researcher.
Hualos is a demo project using a Flask server with gevent to expose an API for publishing and consuming JSON training events from Keras' RemoteMonitor callback. The landing page at localhost:9000 consumes these events and renders metrics in real-time using c3.js graphs built on d3.js. Integration requires starting the server with api.py, loading the page, and adding RemoteMonitor(root='http://localhost:9000') to model.fit callbacks.
François Chollet argues that intelligence explosion—recursive self-improvement leading to superintelligence—is implausible due to fundamental limits in generalization from finite data. Intelligence is defined by adapting to novel situations via compression of prior knowledge, not raw optimization power. Scaling compute and data cannot overcome the combinatorial explosion of possible environments, making ASI unreachable through brute-force methods.
API design prioritizes user experience through three rules: deliberately designing end-to-end workflows that map to domain concepts without exposing implementation details; reducing cognitive load via consistent naming, minimal new concepts, balanced parameterization, automation, and example-rich docs; and providing interactive feedback with early error catching, detailed actionable messages, and user support channels. A litmus test for quality is whether users can recall common workflows without docs after one exposure. These principles derive from empathizing with all users, countering smart engineer syndrome and masochistic attitudes toward complexity.
Keras introduces a new RNN API using RNN(cells) for stacking LSTM layers, achieving 15% faster training than sequential LSTM layers on CPU. Benchmark on 10k samples of 60 timesteps and 64 dims shows classic stacked LSTMs at 35s/epoch versus 30s/epoch for the new approach. Both use RMSprop and MSE loss with batch size 128 over 4 epochs.
Deep learning will evolve from pure differentiable geometric transformations to program-like models blending algorithmic primitives (e.g., loops, conditionals, data structures) with neural layers, enabling reasoning, abstraction, and extreme generalization beyond current pattern recognition limits. Training will shift beyond backpropagation to non-differentiable methods like genetic algorithms and evolution strategies, paired with automated architecture search (AutoML) and lifelong learning via reusable modular subroutines from a global meta-learning library. This enables efficient model growth with minimal human engineering, achieving human-like generalization across tasks using sparse new data.
Deep learning models perform continuous geometric transformations on high-dimensional vector spaces, effectively mapping input manifolds to output manifolds given dense training data. However, they cannot represent discrete reasoning, long-term planning, or algorithmic tasks like generating code from specifications or learning sorting algorithms, regardless of data scale. They achieve local generalization near training data but lack human-like extreme generalization for novel situations, remaining brittle to adversarial perturbations without true causal understanding.
François Chollet provides a downsized Xception CNN architecture omitting residual connections, tailored for 200x200x3 inputs and 100-way classification. It employs an initial 3x3 Conv2D(32, stride=2) with ReLU and max pooling, followed by three depthwise-separable Conv2D blocks (128, 256, 512 filters) each with dual 3x3 SeparableConv2D layers, BatchNorm, ReLU, and stride-2 pooling. The model culminates in global average pooling and softmax output, prioritizing efficiency via separable convolutions.
François Chollet's Gist provides minimal Keras examples for 1D MSE linear regression using a single Dense layer. It extends to binary logistic regression with sigmoid activation and binary_crossentropy loss. A third variant incorporates L1/L2 regularization via l1l2 on the weight matrix.
Deep learning has advanced rapidly to near-human performance in tasks like speech/image recognition and Go, yet remains underexploited in everyday products and processes. Analogous to the Internet's eventual ubiquity, AI will permeate all industries, automating intellectual tasks, disrupting jobs, and enabling a prosperity era—but only if made accessible to non-experts. Keras lowers barriers by simplifying deep learning for users with basic CS literacy, fostering widespread value creation as demonstrated by startups like Comma.ai; early adopters must prioritize open tools, tutorials, and knowledge sharing to prevent elite capture and ensure positive outcomes.
François Chollet's Keras script demonstrates fine-tuning VGG16 on a small cats-vs-dogs dataset by freezing the first 25 layers, adding a custom binary classifier on top, and using heavy data augmentation with SGD at low learning rate. The approach leverages ImageNet pretraining for convolutional base while training only top layers on 2000 training and 800 validation images of 150x150 pixels over 50 epochs. Key hyperparameters include batch size 16, momentum 0.9, and augmentation via shear, zoom, and flips to combat overfitting.
François Chollet's Keras script demonstrates transfer learning by extracting VGG16 bottleneck features from 2000 training and 800 validation images (1000/400 cats and dogs each), saving them as NumPy arrays, then training a simple top classifier (Flatten-Dense256-Dropout-Dense1 sigmoid) with RMSprop and binary crossentropy for 50 epochs. Common issues include using 'wb' mode for np.save/np.load to avoid UnicodeDecodeError, understanding bottleneck_features as (N, 4, 4, 512) feature maps rather than probabilities, and adapting for multi-class via softmax/categorical_crossentropy. Prediction code uses VGG16 for new image features fed to the top model or full fine-tuned model generators with argmax on class indices.
François Chollet's Keras script demonstrates building a CNN for binary image classification (cats vs. dogs) using only 2000 training images (1000 per class) and 800 validation images. Key technique is heavy data augmentation during training (shear, zoom, flips) with a simple 3-layer Conv2D architecture trained for 50 epochs on 150x150 RGB images. Model compiles with binary crossentropy and RMSprop, saving weights post-training, enabling strong generalization on limited data.
François Chollet proposes a functional Keras API where layers are callable on input tensors, enabling concise graph model construction via tensor chaining and topology tracking. Key features include shared layers via reuse, Lambda for arbitrary ops, merge functions, and backward-compatible Model compilation/training with flexible input/output dicts. Discussions resolve masking via node propagation, layer querying for weight transfer, and Sequential integration by making models callable.
François Chollet demonstrates defining a Theano function to compute and output activations from intermediate layers in a Keras Sequential model. The approach uses model.layers to access layer inputs and outputs, creating a function like theano.function([model.layers[0].input], model.layers[1].output(train=False)). This allows direct transformation of input batches through specific layers without full forward passes, useful for visualization and analysis in Theano-backed Keras.
Scientific and technological progress, despite exponential increases in resources like researchers and computing power, generally proceeds at a linear rate. This is because the difficulty of making impactful discoveries within a given field increases exponentially over time, effectively canceling out the benefits of increased resources. Therefore, the notion of an "intelligence explosion" or technological "Singularity" driven by exponential progress is fundamentally flawed; even a self-improving AI would face this linearity constraint without exponentially increasing resources.
Current web platforms prioritize information flow and commercial interests, leading to "collective stupidity" and a focus on low-quality, attention-grabbing content. There is a critical need to redesign these platforms to incorporate psychological aspects of content creation and consumption, fostering higher quality content, genuine creativity, and collective intelligence. This involves shifting from content-neutral models to those that actively shape and improve the quality of user-generated content by focusing on project-driven engagement, motivational feedback, curated inspiration, and accessible learning.
The internet, as currently structured, primarily fosters "collective stupidity" rather than "collective intelligence." This is due to an infrastructure that prioritizes attention-grabbing, "fun" content, epitomized by "piano-playing-cat" videos, over meaningful, productive interactions. This paradigm, driven by view-count-based popularity, leads to a significant waste of human time and potential, as evidenced by the billions of hours spent on platforms like Facebook with little to no genuine return for users.