absorb.md

François Chollet

Chronological feed of everything captured from François Chollet.

Emerging AI Inference Accelerators: A Landscape of Specialization

The AI inference market at data center scale is attracting new entrants focusing on specialized architectures to address the demanding performance, cost, and power efficiency requirements of large language models. These companies are developing purpose-built silicon, ranging from highly flexible, reconfigurable arrays to ultra-specialized, model-specific chips, each making distinct trade-offs in performance, flexibility, and cost. This market is characterized by a drive towards optimizing for specific AI workloads, often at the expense of generality, to achieve significant gains over general-purpose GPUs.

AI Collapses Coordination Costs to One Person, Enabling the Solo-Run Conglomerate

Drawing on Coase's theory of the firm, this piece argues that AI is the first technology capable of collapsing coordination costs to the level of a single individual — fundamentally redefining the minimum viable size of an organization. Where prior technological waves either scaled hierarchies up (steam, telegraph, railroad) or shrank them via markets (internet, gig economy), AI agents can now plan, execute, and manage entire business portfolios autonomously, making the "one-person conglomerate" structurally viable. The author uses this thesis to introduce HIM (Henry Intelligent Machines PBC), a platform designed to assemble and operate fleets of AI-run microbusinesses on behalf of individual owners. Notably, the author discloses a financial interest in HIM, which warrants scrutiny of the framing.

Scaling Agentic Coding Requires Organizational Infrastructure, Not Just Better Models

As software teams scale to hundreds of coding agents, the bottleneck shifts from model capability to organizational readiness — specifically, deterministic quality infrastructure (type checkers, linters, automated QA) and spec-driven development practices. Trust, not technical capability, is the primary barrier to enterprise adoption: UI/UX transparency changes have measurably increased the autonomy granted to agent systems. The product manager role is being "unbundled" into engineering, product marketing, and domain-specialist ops — with technically-inclined PMs best positioned to absorb the change. Autonomous agent deployments are already running in production at systemically important enterprises, with the frontier being how to institutionalize governance and guardrails at scale.

AI Compute Economy Matures: Token Scarcity, Meta's Strategic Dilemma, and the Rise of Financial Infrastructure for GPU Markets

The AI industry is undergoing a structural shift from chip-centric thinking to token-factory economics, where the bottleneck is no longer raw compute but memory bandwidth, interconnect speed, and capital allocation efficiency. Meta faces a strategic misalignment: its consumer-focused product surface (Facebook, Instagram, WhatsApp) doesn't benefit from coding-optimized models, the primary driver of the recursive self-improvement loop powering Anthropic and OpenAI's compounding advantage. Meanwhile, GPU market opacity—where bespoke, multi-broker deals dominate—is driving the emergence of financial infrastructure like compute futures and price indices (now on Bloomberg), signaling the commoditization of AI infrastructure. Open-weight models like Gemma 4 (31B) and Qwen are rapidly closing the performance gap with frontier hosted models, accelerating a hybrid architecture where edge handles consumer workloads and frontier models serve high-complexity enterprise tasks.

Symbolic Descent: The Case for Replacing Deep Learning's Parametric Foundation with Minimal Symbolic Models

François Chollet (Keras creator) is pursuing a fundamentally different ML paradigm at his new lab Indra: replacing parametric deep learning models with the smallest possible symbolic models, optimized via "symbolic descent" — an analog of gradient descent in symbolic space. The core theoretical motivation is the minimum description length principle: the shortest model that explains data is most likely to generalize, and parametric learning is structurally incapable of finding it. Chollet distinguishes between "AGI as automation" (the industry's current trajectory) and true general intelligence (human-level sample efficiency across arbitrary tasks), arguing the LLM stack may achieve the former but not the latter without a foundational rethink.

François Chollet's Case Against Scaling: Why AGI Requires Skill Acquisition, Not Task Automation

François Chollet argues that the AI industry's scaling paradigm — more data, compute, and parameters — is fundamentally misaligned with true AGI, which he defines as efficiency of skill acquisition rather than task performance. His ARC benchmark exposed that recent model improvements stem from brute-force problem-space mining (self-generated training loops), not genuine generalization. Chollet's alternative is program synthesis: searching for the shortest symbolic rule that explains data, mirroring the scientific method. His most provocative claim is that true AGI may ultimately be a compact program under 10,000 lines of code — achievable in principle with 1980s hardware, given the right idea.

Rethinking AGI Development: Beyond Deep Learning Limitations

Francois Chollet, founder of the ARC prize, advocates for a paradigm shift in AI research, moving beyond the current deep learning and LLM-centric approaches. He proposes "symbolic learning" or "program synthesis" as a more optimal path to Artificial General Intelligence (AGI), emphasizing efficiency, generalization, and human-level data efficiency. Chollet argues that while current LLM advancements are impressive for domains with verifiable rewards, true AGI requires a more fundamental, self-improving algorithmic approach that minimizes human intervention and aims for foundational optimality rather than architectural scaling.

ARC Prize Foundation Seeks Benchmark Lead for AGI Development

The ARC Prize Foundation is actively recruiting a senior platform engineer to lead the development of their ARC-AGI benchmark platform. This role is critical for advancing the definition and measurement of progress toward Artificial General Intelligence (AGI) by expanding existing benchmarks and establishing new ones. The position requires a strong background in backend engineering, distributed systems, cloud infrastructure, and experience in building evaluation platforms, preferably within AI/ML.

Critique of Deep Learning Research Myopia

Deep Learning (DL) researchers often lack exposure to and understanding of alternative machine learning paradigms beyond gradient descent-based parameter fitting. This narrow focus can limit innovation and the exploration of more effective or efficient learning methods. The observation suggests a potential knowledge gap within the DL community regarding the broader field of machine learning.

Symbolic Learning for Generative Program Reverse Engineering

Symbolic learning offers a method to losslessly reverse-engineer the source code of generative programs, contrasting with curve-fitting's lossy approximation of outputs. This approach is significantly more effective when the underlying generative program is simple, potentially outperforming other methods by orders of magnitude in such scenarios.

Emergent Reasoning in Advanced Language Models

The emergence of reasoning capabilities in recent Language Reasoning Models (LRMs) was unanticipated by observers who previously asserted that 2023-2024 base Large Language Models (LLMs) already possessed full reasoning. This oversight stemmed from a lack of understanding regarding the distinct characteristics to look for in advanced reasoning. Current LRMs are hypothesized to outperform earlier LLMs on complex math problems, indicating a significant advancement in fluid intelligence.

LLMs Lack Fluid Intelligence, While LRMs Show Promise in Reasoning

Base Large Language Models (LLMs) from 2023-2024 demonstrably lack fluid intelligence and mathematical reasoning capabilities, a fact now widely accepted despite initial controversy. This limitation contrasts sharply with emerging Language Reasoning Models (LRMs), which are hypothesized to perform significantly better on complex reasoning tasks. The inability of proponents of early LLMs to recognize this deficiency highlights a potential blind spot in evaluating AI capabilities when expectations are misaligned with empirical evidence.

Divergent Paths to AGI: Symbolic Learning vs. Parametric Scaling and the Shift to Scientific Utility

The discourse between Sam Altman and François Chollet reveals a fundamental divergence in AGI methodology: OpenAI continues to scale existing paradigms toward aligned AI researchers, while Chollet advocates for a foundation shift toward symbolic learning to achieve optimal generalization. While benchmarks like Arc-AGI 3 provide rigorous tests for fluid intelligence, OpenAI is increasingly prioritizing 'real-world' value—such as scientific discovery—over general-purpose generative benchmarks. This shift is accompanied by a strategic reallocation of compute toward high-impact domains like medicine and economics.

Keras with JAX Recommended for Optimal AI Development

François Chollet advocates for the use of Keras with JAX, implying this combination is crucial for success in AI development. The statement suggests that alternative approaches may lead to suboptimal outcomes, highlighting Keras/JAX as a preferred, high-performance pathway.

Spurious Correlations in Time Series Visualization

Visualizing two independent, autocorrelated random time series as a scatter plot can misleadingly suggest structure or correlation. This occurs because highly autocorrelated series, even if random and independent, produce a trajectory that appears structured in a scatter plot. This method is an inadequate way to assess relationships in such data, as it can be easily misinterpreted as genuine correlation when none exists, highlighting the need for more robust statistical analysis methods.

Limits of Curve Fitting in Complex Systems

The ability to "fit a curve" is often associated with understanding and prediction in scientific and engineering domains. However, this analogy breaks down when applied to highly complex systems, particularly those that exhibit emergent properties or non-linear behaviors that cannot be adequately captured by traditional curve-fitting methods. This suggests a fundamental limitation in applying reductionist approaches to phenomena beyond a certain threshold of complexity.

Symbolic Compression and Extreme Generalization in Scientific Discovery

Scientific advancements, exemplified by the development of the atom bomb from the discovery of radioactivity, demonstrate extreme generalization achieved through symbolic compression. A limited number of deliberately collected data points (key experiments) are translated into concise symbolic models, enabling the reverse-engineering of causal rules to reshape reality. This process highlights an efficient pathway for scientific progress, distinct from merely fitting curves to existing data.

Fine-tuning Gemma on TPU v5 with Kinetic, Keras, and JAX

This tutorial details the fine-tuning of the Gemma model on TPU v5 hardware. It highlights a toolchain consisting of Kinetic, Keras, and JAX, presented as an optimized stack for leveraging TPUs at scale. The associated script further elaborates on setups, technical specifics, and practical considerations of using Kinetic.

JAX: Exemplar of Efficient ML Framework Design

JAX represents a well-designed low-level machine learning framework. Its design principles facilitate superior performance with reduced development effort. Conversely, poorly designed frameworks hinder performance and increase effort.

Keras Kinetic: Simplified Cloud TPU/GPU Execution for Machine Learning

Keras Kinetic introduces a streamlined approach to remote execution of machine learning workloads on cloud TPUs and GPUs. It automates containerization, dependency management, and deployment to GKE clusters, simplifying the transition from local development to scalable cloud execution. This allows developers to run functions on powerful accelerators with minimal configuration overhead.

Keras Kinetic and Cloud TPUs for LLM Fine-Tuning

A recent tutorial demonstrates fine-tuning large language models (LLMs) using Keras Kinetic, an extension for Keras facilitating model training with JAX and Cloud TPUs. This approach is exemplified by fine-tuning the Gemma 2B model on the PubMedQA dataset, indicating potential for efficient medical question-answering system development.

Adobe Podcast Recognized as a Leading AI Product

François Chollet, a prominent AI researcher, identified Adobe Podcast as a top-tier AI product. This endorsement highlights the effective application of AI within the audio editing domain, suggesting that the product demonstrably leverages AI to deliver a superior user experience or functionality.

Established Companies to Benefit Most from AI Integration

AI integration presents a significant opportunity for established companies with existing profitable business models. By leveraging AI to enhance current offerings and develop new, AI-first products, these companies can solidify their market position and drive further growth. This strategy is exemplified by products like Adobe Podcast, which demonstrates the potential for AI to both improve and innovate within an established company.

PokeeClaw: Transitioning Local AI Assistants to Enterprise-Grade Production

While OpenClaw demonstrated the product-market fit for local AI assistants, its lack of security architecture limited production deployment. PokeeClaw addresses these vulnerabilities by implementing a sandboxed environment featuring RBAC, approval workflows, and audit trails to enable enterprise-safe agentic workflows.

Humanity’s Chess Mastery Accelerated by Cognitive Infrastructure

Human intelligence, amplified by externalized cognitive infrastructure like computers and the internet, can rapidly achieve expert-level performance in complex, rule-based systems. An experiment involving learning chess ("Glurg") rules from scratch demonstrates that a 3000 Elo engine could be developed within 24 hours, and a 3500 Elo engine with significantly improved efficiency within three weeks. This suggests human intelligence is near-optimal in its ability to quickly master rule-based domains.

Rethinking Intelligence as an Optimality-Bound Conversion Ratio

This content redefines intelligence not as an unbounded scalar but as a conversion ratio with an optimality bound, akin to making a ball rounder rather than a tower taller. It posits that while individual humans may not be optimally intelligent, a collective of intelligent humans augmented by external tools approaches this bound. The author argues that humanity’s ability to solve problems is near-optimal given available information, with current AI amplifying this collective intelligence.

Cognitive Agency: The Future Class Divide in an AGI World

The advent of Artificial General Intelligence (AGI) is projected to redefine societal stratification, shifting the basis of class division from material wealth to cognitive agency. This future societal structure will delineate between individuals who maintain control over their attention and actions (the "focus class") and those whose reward mechanisms are entirely managed by AI systems (the "slop class"). This division implies a fundamental change in how individuals interact with and are influenced by advanced AI.

Beyond Deep Learning: Building Optimal AI with Symbolic Program Synthesis

François Chollet, creator of Keras and the ARC AGI benchmark, discusses NDIA, a new AI research lab focused on symbolic program synthesis as an alternative to deep learning. NDIA aims to build AI that requires less data, runs more efficiently, and generalizes better by replacing parametric curves with concise symbolic models, addressing the limitations of current LLM-based approaches. This new approach, which aims for optimal AI by leveraging symbolic models, is driven by the belief that current deep learning methods, while effective for verifiable domains, are inefficient and will not lead to true AGI.

Removing Task-Specific Prompts for General AI

To generalize an AI system beyond a specific task (ARC-AGI-3), it is necessary to remove all components engineered or configured based on test runs on those specific tasks. This primarily includes prompts detailing the process to solve the games.

ARC-AGI Benchmark Roadmap and Design Philosophy

François Chollet announced that ARC-AGI-4 is slated for an early 2027 release, initiating an annual benchmark release cycle. Each new benchmark aims to be "fully unsaturated upon release" and address "the most important unanswered research questions." This development strategy necessitates anticipating future AI capabilities during the benchmark design phase, echoing the approach taken for ARC-AGI-3.

Redefining AGI: Beyond Benchmarks, Towards Human-Level Learning Efficiency

François Chollet clarifies his long-standing definition of Artificial General Intelligence (AGI), emphasizing learning efficiency over task-specific performance benchmarks. He posits that AGI should autonomously master any human-learnable task with equivalent learning efficiency, diverging from current AI development that often targets specific capabilities. This reorientation shifts the focus from achieving a pre-defined "target" to developing a "compass" for continuous, human-like learning.

ARC-AGI-3 Benchmarks Agentic AI on Novel Interactive Reasoning Tasks

The ARC-AGI-3 benchmark evaluates AI agentic intelligence through interactive reasoning environments that require human-level action efficiency on novel tasks without prior training. This benchmark highlights a significant gap between current frontier AI models, which perform under 1%, and human ability, as humans can solve all tasks upon first contact. The competition offers public environments for testing and private test sets for evaluation, aiming to drive advancements in general artificial intelligence.

ARC-3: A Benchmark for Micro-AGI

ARC-3 emphasizes interactive learning, goal discovery, and temporal planning in novel environments. It aims to measure efficient skill acquisition, a defining characteristic of general intelligence, by scaling up these capabilities within a "micro-AGI" framework, rather than focusing on perception or data-driven approaches like LLMs.

Companion Notebooks for Deep Learning with Python (Third Edition) Facilitate Practical Application

This GitHub repository offers Jupyter notebooks complementing the "Deep Learning with Python, third edition" by Chollet and Watson. It provides runnable code samples for practical application of theoretical concepts. The notebooks are designed for use with Google Colab, leveraging its free GPU runtime, and support Keras 3 with JAX, TensorFlow, or PyTorch backends. Users should refer to the companion book for comprehensive understanding, as the notebooks intentionally omit explanatory text and figures.

François Chollet: LLMs Are Memorization Engines, Not Intelligence — And AGI Is Still Decades Away

François Chollet, creator of Keras, argues that LLMs are fundamentally pattern-memorization systems — "databases of vector programs" — that can only operate within their training data distribution, making them categorically distinct from general intelligence. He defines intelligence as the efficiency with which an agent acquires new skills in novel, unprepared-for situations (operationalized via his ARC benchmark), and contends that LLMs score near zero on this metric. Chollet traces the failure mode to the architecture itself: transformers excel at passive, Hebbian-style associative learning but lack the active, causal, experimental learning that characterizes human cognition. While LLMs are practically valuable for automating tasks within known distributions, existential risk narratives are unfounded — the real bottleneck to AGI is unsolved program synthesis and few-shot generalization, not scaling.

Chollet's Case Against Scaling: Why Fluid Intelligence Requires Program Search, Not Bigger Models

François Chollet argues that the pre-training scaling paradigm fundamentally cannot produce general fluid intelligence because LLMs only acquire static, memorized skills — not the ability to synthesize novel solutions on the fly. Test-time adaptation (TTA) is a meaningful step forward, but remains compute-inefficient and lacks compositional generalization. True AGI, in Chollet's framing, requires combining two forms of abstraction: value-centric (continuous, perception/intuition via deep learning) and program-centric (discrete, reasoning via combinatorial search), and his new lab Ndea is building a deep learning-guided program search system targeting exactly this hybrid architecture.

Namex: Streamlining Python Package API Management

Namex is a Python utility designed to strictly separate a package's implementation from its public API. It enables developers to define an explicit allowlist of public symbols, offering precise control over visibility, naming, and exposure paths. This facilitates easier refactoring, prevents accidental exposure of private utilities, and simplifies API version control.

ARC-AGI-1: A Grid-Based Benchmark Designed to Test Human-Like Fluid Intelligence in AI Systems

ARC-AGI-1 (Abstraction and Reasoning Corpus) is a benchmark created by François Chollet to evaluate general fluid intelligence in both humans and AI systems, framed simultaneously as an AGI benchmark, a program synthesis benchmark, and a psychometric test. Tasks consist of input/output grid pairs (integers 0–9, up to 30×30) where a solver must infer a transformation rule from ~3 demonstrations and apply it to new inputs — with only 3 trials per test input and requiring exact cell-level correctness. The dataset is split into 400 training and 400 evaluation tasks in JSON format, with strict instructions against using evaluation data during development to preserve benchmark integrity. A v2 of the benchmark (ARC-AGI-2) has since been released in a separate repository.

ARC-AGI 2: A New Benchmark for Fluid Intelligence in AI

The ARC-AGI 2 benchmark and Arc Prize 2025 challenge are launched, emphasizing the need for AI to demonstrate fluid intelligence to achieve AGI. This new benchmark is significantly more challenging than its predecessor, with even frontier models showing single-digit performance. The core tenet is that true intelligence involves efficient acquisition and deployment of capabilities, not just raw computational power or memorization. It aims to identify systems capable of adapting to novelty by recombining existing knowledge, a crucial step towards human-level AGI that can generate new knowledge.

Challenging Deep Learning for Program Synthesis and Strong Generalization

Deep learning, while adept at pattern matching in continuous spaces, struggles with discrete symbolic program synthesis due to gradient descent's inability to effectively optimize for such tasks. This limitation necessitates alternative approaches for achieving robust generalization, particularly in scenarios requiring compositional novelty and on-the-fly adaptation. The discussion emphasizes the need for better learning mechanisms and representations, moving beyond purely data-driven methods, with benchmarks like ARC highlighting challenges in strong generalization.

Rethinking AI Benchmarking: The Shift to Generalizable, Architecturally-Agnostic Intelligence

François Chollet discusses the evolution and limitations of current AI benchmarks, particularly ARC, highlighting the need for tasks that evaluate true generalization and adaptability to novelty rather than brute-force computation. He emphasizes the integration of intuition and reasoning in AI architectures and introduces ARC 2.0 as a response to these challenges, designed to foster the co-evolution of problems and solutions.

LLMs Are Stuck at System 1: Why Program Synthesis + Deep Learning Is the Path to AGI

François Chollet argues that LLMs are fundamentally limited to "value-centric" (System 1) abstraction — pattern interpolation over continuous embedding spaces — and are categorically incapable of "program-centric" (System 2) abstraction required for true generalization. Despite five years of scaling, core failure modes (sensitivity to rephrasing, inability to generalize algorithms beyond memorized instances, compositional breakdown) remain unresolved because they are architectural, not superficial. Chollet's ARC-AGI benchmark — designed to be memorization-resistant — exposes this gap starkly: state-of-the-art LLMs score 5–21% while humans score 97–98%, and brute-force program search already achieves ~50%. His thesis is that the path forward requires merging discrete program synthesis with deep learning, using neural networks as perception/intuition layers to tame combinatorial explosion in program search space.

Large Language Models: Interpolation Not Intelligence

Large Language Models (LLMs) excel at generating human-like text by interpolating vast datasets, mimicking human behavior rather than truly understanding or reasoning. This interpolation capability, rooted in curve fitting, enables them to excel at tasks requiring extensive memorization and pattern recognition, such as passing standardized tests. However, they lack true intelligence, which is defined as the ability to adapt to novel situations and synthesize new solutions, and are therefore limited in tasks requiring genuine creativity, novel problem-solving, and abstract reasoning.

Challenging the LLM Hype: The ARC Prize for True AI Generalization

François Chollet and Mike Knoop launched the ARC Prize to incentivize research into true AI generalization, as current LLMs primarily rely on memorization and lack the ability to adapt to novel situations. Chollet argues that while LLMs excel at specific tasks through pattern matching, they fall short on the ARC benchmark, which requires on-the-fly program synthesis and core knowledge, a capability present in young children. The prize aims to push the AI community beyond current scaling approaches and toward hybrid systems that combine deep learning with discrete program search for real-world adaptability.

Active Inference is Key to LLM Performance on Novel Tasks

LLMs struggle with novel tasks, even with extensive pre-training on synthetic data. The key to unlocking performance is active inference, which involves fine-tuning the LLM on a small set of demonstration examples and then artificially expanding these examples using a Domain Specific Language (DSL) to increase data diversity. This approach enables the LLM to learn and adapt to new tasks, mimicking human learning processes.

Keras Ecosystem: Resources for Deep Learning Development

This GitHub repository compiles a comprehensive list of resources for Keras, a Python deep learning library. Categorized for easy navigation, it includes tutorials, official documentation, code examples for various applications (text, image, creative visuals, reinforcement learning), and outlines third-party libraries and projects built with Keras. The intent is to provide a centralized hub for Keras users to learn, implement, and extend deep learning solutions.

Older entries →