Chronological feed of everything captured from François Chollet.
youtube / fchollet / 1d ago
The AI inference market at data center scale is attracting new entrants focusing on specialized architectures to address the demanding performance, cost, and power efficiency requirements of large language models. These companies are developing purpose-built silicon, ranging from highly flexible, reconfigurable arrays to ultra-specialized, model-specific chips, each making distinct trade-offs in performance, flexibility, and cost. This market is characterized by a drive towards optimizing for specific AI workloads, often at the expense of generality, to achieve significant gains over general-purpose GPUs.
llm-inferenceai-hardwaredata-centerai-acceleratorssemiconductor-startupshardware-architecturechip-design
“Grok's LPU architecture prioritizes fast, predictable inference at scale through a single-core design with on-chip SRAM and compiler-scheduled execution, aiming for deterministic performance.”
youtube / fchollet / 1d ago
Drawing on Coase's theory of the firm, this piece argues that AI is the first technology capable of collapsing coordination costs to the level of a single individual — fundamentally redefining the minimum viable size of an organization. Where prior technological waves either scaled hierarchies up (steam, telegraph, railroad) or shrank them via markets (internet, gig economy), AI agents can now plan, execute, and manage entire business portfolios autonomously, making the "one-person conglomerate" structurally viable. The author uses this thesis to introduce HIM (Henry Intelligent Machines PBC), a platform designed to assemble and operate fleets of AI-run microbusinesses on behalf of individual owners. Notably, the author discloses a financial interest in HIM, which warrants scrutiny of the framing.
ai-agentsfuture-of-worksolopreneurmicrobusinesscoordination-costsai-startupsautomation
“AI is the first technology capable of collapsing coordination costs to the level of a single person, making the one-person conglomerate structurally viable.”
youtube / fchollet / 1d ago
As software teams scale to hundreds of coding agents, the bottleneck shifts from model capability to organizational readiness — specifically, deterministic quality infrastructure (type checkers, linters, automated QA) and spec-driven development practices. Trust, not technical capability, is the primary barrier to enterprise adoption: UI/UX transparency changes have measurably increased the autonomy granted to agent systems. The product manager role is being "unbundled" into engineering, product marketing, and domain-specialist ops — with technically-inclined PMs best positioned to absorb the change. Autonomous agent deployments are already running in production at systemically important enterprises, with the frontier being how to institutionalize governance and guardrails at scale.
agentic-codingai-software-developmentmulti-agent-systemsdeveloper-toolssdlc-evolutionproduct-managementai-governance
“Agent-driven code quality at scale requires deterministic infrastructure (type checkers, linters, formatters, automated QA) rather than human code review, since human review does not scale to hundreds of agents.”
youtube / fchollet / 1d ago
The AI industry is undergoing a structural shift from chip-centric thinking to token-factory economics, where the bottleneck is no longer raw compute but memory bandwidth, interconnect speed, and capital allocation efficiency. Meta faces a strategic misalignment: its consumer-focused product surface (Facebook, Instagram, WhatsApp) doesn't benefit from coding-optimized models, the primary driver of the recursive self-improvement loop powering Anthropic and OpenAI's compounding advantage. Meanwhile, GPU market opacity—where bespoke, multi-broker deals dominate—is driving the emergence of financial infrastructure like compute futures and price indices (now on Bloomberg), signaling the commoditization of AI infrastructure. Open-weight models like Gemma 4 (31B) and Qwen are rapidly closing the performance gap with frontier hosted models, accelerating a hybrid architecture where edge handles consumer workloads and frontier models serve high-complexity enterprise tasks.
ai-modelsopen-source-llmai-infrastructurecompute-marketsnew-mediaagentic-aigpu-economics
“Google's Gemma 4 31B model, running on a 24GB Mac Mini, outperforms Anthropic's Claude Sonnet 4.6 on graduate-level reasoning benchmarks despite having only 4 billion active parameters.”
youtube / fchollet / 1d ago
François Chollet (Keras creator) is pursuing a fundamentally different ML paradigm at his new lab Indra: replacing parametric deep learning models with the smallest possible symbolic models, optimized via "symbolic descent" — an analog of gradient descent in symbolic space. The core theoretical motivation is the minimum description length principle: the shortest model that explains data is most likely to generalize, and parametric learning is structurally incapable of finding it. Chollet distinguishes between "AGI as automation" (the industry's current trajectory) and true general intelligence (human-level sample efficiency across arbitrary tasks), arguing the LLM stack may achieve the former but not the latter without a foundational rethink.
agi-researchprogram-synthesisdeep-learning-alternativessymbolic-aillm-limitationsmachine-learningfrancois-chollet
“Symbolic models trained via 'symbolic descent' will require significantly less data, run more efficiently at inference, and generalize better than parametric deep learning models.”
youtube / fchollet / 1d ago
François Chollet argues that the AI industry's scaling paradigm — more data, compute, and parameters — is fundamentally misaligned with true AGI, which he defines as efficiency of skill acquisition rather than task performance. His ARC benchmark exposed that recent model improvements stem from brute-force problem-space mining (self-generated training loops), not genuine generalization. Chollet's alternative is program synthesis: searching for the shortest symbolic rule that explains data, mirroring the scientific method. His most provocative claim is that true AGI may ultimately be a compact program under 10,000 lines of code — achievable in principle with 1980s hardware, given the right idea.
agi-researchai-benchmarksprogram-synthesisdeep-learning-critiquearc-benchmarkscaling-limitsfrancois-chollet
“The industry defines AGI as task automation, while Chollet defines it as skill acquisition efficiency — the ability to learn new tasks from minimal data, comparable to human sample efficiency.”
youtube / fchollet / 1d ago / failed
youtube / fchollet / 2d ago
Francois Chollet, founder of the ARC prize, advocates for a paradigm shift in AI research, moving beyond the current deep learning and LLM-centric approaches. He proposes "symbolic learning" or "program synthesis" as a more optimal path to Artificial General Intelligence (AGI), emphasizing efficiency, generalization, and human-level data efficiency. Chollet argues that while current LLM advancements are impressive for domains with verifiable rewards, true AGI requires a more fundamental, self-improving algorithmic approach that minimizes human intervention and aims for foundational optimality rather than architectural scaling.
ai-researchagideep-learningprogram-synthesismachine-learning-benchmarksai-ethicscareer-development-ai
“Current LLM-based approaches, while effective in specific verifiable domains like code generation, are not optimal for achieving true AGI due to their reliance on extensive training data and limited generalization in non-verifiable domains.”
tweet / @fchollet / 3d ago
The ARC Prize Foundation is actively recruiting a senior platform engineer to lead the development of their ARC-AGI benchmark platform. This role is critical for advancing the definition and measurement of progress toward Artificial General Intelligence (AGI) by expanding existing benchmarks and establishing new ones. The position requires a strong background in backend engineering, distributed systems, cloud infrastructure, and experience in building evaluation platforms, preferably within AI/ML.
agi-benchmarkhiringplatform-engineeringai-evaluationdistributed-systemspython
“The ARC Prize Foundation is hiring a senior platform engineer.”
tweet / @fchollet / 4d ago
Deep Learning (DL) researchers often lack exposure to and understanding of alternative machine learning paradigms beyond gradient descent-based parameter fitting. This narrow focus can limit innovation and the exploration of more effective or efficient learning methods. The observation suggests a potential knowledge gap within the DL community regarding the broader field of machine learning.
deep-learningmachine-learning-theoryai-research-culturegradient-descent
“Many deep learning researchers are unfamiliar with learning methods outside of gradient descent for parameter fitting.”
tweet / @fchollet / 4d ago
Symbolic learning offers a method to losslessly reverse-engineer the source code of generative programs, contrasting with curve-fitting's lossy approximation of outputs. This approach is significantly more effective when the underlying generative program is simple, potentially outperforming other methods by orders of magnitude in such scenarios.
symbolic-learningmachine-learningai-reasoninggenerative-modelsai-research
“Symbolic learning losslessly reverse-engineers the source code of generative programs.”
tweet / @fchollet / 4d ago
The emergence of reasoning capabilities in recent Language Reasoning Models (LRMs) was unanticipated by observers who previously asserted that 2023-2024 base Large Language Models (LLMs) already possessed full reasoning. This oversight stemmed from a lack of understanding regarding the distinct characteristics to look for in advanced reasoning. Current LRMs are hypothesized to outperform earlier LLMs on complex math problems, indicating a significant advancement in fluid intelligence.
lrmsllmsai-reasoningfluid-intelligencellm-evaluationai-capabilities
“People who believed 2023-2024 base LLMs could already reason missed the emergence of LRMs' reasoning capabilities.”
tweet / @fchollet / 4d ago
Base Large Language Models (LLMs) from 2023-2024 demonstrably lack fluid intelligence and mathematical reasoning capabilities, a fact now widely accepted despite initial controversy. This limitation contrasts sharply with emerging Language Reasoning Models (LRMs), which are hypothesized to perform significantly better on complex reasoning tasks. The inability of proponents of early LLMs to recognize this deficiency highlights a potential blind spot in evaluating AI capabilities when expectations are misaligned with empirical evidence.
llm-reasoningai-evaluationfluid-intelligencelrm-vs-llmmathematical-reasoning
“Base LLMs from 2023-2024 are unable to perform mathematical reasoning or exhibit fluid intelligence.”
youtube / fchollet / 5d ago
The discourse between Sam Altman and François Chollet reveals a fundamental divergence in AGI methodology: OpenAI continues to scale existing paradigms toward aligned AI researchers, while Chollet advocates for a foundation shift toward symbolic learning to achieve optimal generalization. While benchmarks like Arc-AGI 3 provide rigorous tests for fluid intelligence, OpenAI is increasingly prioritizing 'real-world' value—such as scientific discovery—over general-purpose generative benchmarks. This shift is accompanied by a strategic reallocation of compute toward high-impact domains like medicine and economics.
agi-predictionsai-benchmarkingai-safety-ethicsfuture-of-humanityllm-developmentgenerative-aiai-research-directions
“The Arc-AGI 3 benchmark will likely take at least one year to saturate, regardless of frontier lab effort, due to its deliberate out-of-distribution design.”
tweet / @fchollet / 5d ago
François Chollet advocates for the use of Keras with JAX, implying this combination is crucial for success in AI development. The statement suggests that alternative approaches may lead to suboptimal outcomes, highlighting Keras/JAX as a preferred, high-performance pathway.
kerasjaxmachine-learning-frameworksdeep-learning-librariesneural-networks
“Using Keras with JAX is essential for success in AI development.”
tweet / @fchollet / 6d ago
Visualizing two independent, autocorrelated random time series as a scatter plot can misleadingly suggest structure or correlation. This occurs because highly autocorrelated series, even if random and independent, produce a trajectory that appears structured in a scatter plot. This method is an inadequate way to assess relationships in such data, as it can be easily misinterpreted as genuine correlation when none exists, highlighting the need for more robust statistical analysis methods.
data-visualizationtime-series-analysisstatistical-fallaciesdata-ethicsscientific-communication
“Plotting two random, independent, and highly autocorrelated time series on a scatter plot will always appear structured.”
tweet / @fchollet / 6d ago
The ability to "fit a curve" is often associated with understanding and prediction in scientific and engineering domains. However, this analogy breaks down when applied to highly complex systems, particularly those that exhibit emergent properties or non-linear behaviors that cannot be adequately captured by traditional curve-fitting methods. This suggests a fundamental limitation in applying reductionist approaches to phenomena beyond a certain threshold of complexity.
francois-cholletai-capabilitiescurve-fittingphysics-knowledgellm-limitations
“Traditional curve fitting is insufficient for understanding complex systems.”
tweet / @fchollet / 6d ago
Scientific advancements, exemplified by the development of the atom bomb from the discovery of radioactivity, demonstrate extreme generalization achieved through symbolic compression. A limited number of deliberately collected data points (key experiments) are translated into concise symbolic models, enabling the reverse-engineering of causal rules to reshape reality. This process highlights an efficient pathway for scientific progress, distinct from merely fitting curves to existing data.
science-historysymbolic-aigeneralizationknowledge-compressionfcholletepistemologyscientific-method
“Scientific advancement from the initial observation of radioactivity to a working atom bomb involved approximately nine distinct key experiments over 47 years.”
tweet / @fchollet / 6d ago
This tutorial details the fine-tuning of the Gemma model on TPU v5 hardware. It highlights a toolchain consisting of Kinetic, Keras, and JAX, presented as an optimized stack for leveraging TPUs at scale. The associated script further elaborates on setups, technical specifics, and practical considerations of using Kinetic.
gemmatpu-trainingkerasjaxfine-tuningmachine-learning-infrastructure
“Fine-tuning Gemma on TPU v5 is achievable using Kinetic, Keras, and JAX.”
tweet / @fchollet / 8d ago
JAX represents a well-designed low-level machine learning framework. Its design principles facilitate superior performance with reduced development effort. Conversely, poorly designed frameworks hinder performance and increase effort.
jax-frameworkmachine-learningsoftware-designdeep-learning-frameworksperformance-optimization
“JAX is a well-designed low-level machine learning framework.”
tweet / @fchollet / 8d ago
Keras Kinetic introduces a streamlined approach to remote execution of machine learning workloads on cloud TPUs and GPUs. It automates containerization, dependency management, and deployment to GKE clusters, simplifying the transition from local development to scalable cloud execution. This allows developers to run functions on powerful accelerators with minimal configuration overhead.
keras-kineticcloud-tpudistributed-trainingmachine-learning-engineeringserverlesscontainerization
“Keras Kinetic simplifies cloud TPU/GPU job execution through a decorator-based interface.”
tweet / @fchollet / 8d ago
A recent tutorial demonstrates fine-tuning large language models (LLMs) using Keras Kinetic, an extension for Keras facilitating model training with JAX and Cloud TPUs. This approach is exemplified by fine-tuning the Gemma 2B model on the PubMedQA dataset, indicating potential for efficient medical question-answering system development.
keras-kineticfine-tuningllmsjaxtpugemma-2bmedical-qa
“Keras Kinetic can be used to fine-tune LLMs.”
tweet / @fchollet / 9d ago
François Chollet, a prominent AI researcher, identified Adobe Podcast as a top-tier AI product. This endorsement highlights the effective application of AI within the audio editing domain, suggesting that the product demonstrably leverages AI to deliver a superior user experience or functionality.
adobe-podcastai-audioproduct-reviewai-product-showcase
“Adobe Podcast is one of the best AI products observed recently.”
tweet / @fchollet / 9d ago
AI integration presents a significant opportunity for established companies with existing profitable business models. By leveraging AI to enhance current offerings and develop new, AI-first products, these companies can solidify their market position and drive further growth. This strategy is exemplified by products like Adobe Podcast, which demonstrates the potential for AI to both improve and innovate within an established company.
ai-adoptionbusiness-strategyproduct-developmentestablished-companiesai-products
“Established companies with profitable business models are poised to be major beneficiaries of AI.”
tweet / @fchollet / 12d ago
While OpenClaw demonstrated the product-market fit for local AI assistants, its lack of security architecture limited production deployment. PokeeClaw addresses these vulnerabilities by implementing a sandboxed environment featuring RBAC, approval workflows, and audit trails to enable enterprise-safe agentic workflows.
local-ai-assistantsai-securityenterprise-aisandbox-architectureaccess-controltoken-optimizationproduct-market-fit
“OpenClaw lacks the necessary security infrastructure for production environments.”
tweet / @fchollet / 13d ago
Human intelligence, amplified by externalized cognitive infrastructure like computers and the internet, can rapidly achieve expert-level performance in complex, rule-based systems. An experiment involving learning chess ("Glurg") rules from scratch demonstrates that a 3000 Elo engine could be developed within 24 hours, and a 3500 Elo engine with significantly improved efficiency within three weeks. This suggests human intelligence is near-optimal in its ability to quickly master rule-based domains.
artificial-intelligencellm-capabilitiesagicognitive-scienceintelligence-theoryhuman-intelligence
“Humanity can develop a 3000 Elo chess engine within 24 hours of learning the rules, using existing cognitive infrastructure.”
tweet / @fchollet / 13d ago
This content redefines intelligence not as an unbounded scalar but as a conversion ratio with an optimality bound, akin to making a ball rounder rather than a tower taller. It posits that while individual humans may not be optimally intelligent, a collective of intelligent humans augmented by external tools approaches this bound. The author argues that humanity’s ability to solve problems is near-optimal given available information, with current AI amplifying this collective intelligence.
intelligence-theoryai-capabilitiescognitive-sciencehuman-intelligencecollective-intelligencemisconceptions
“Intelligence should be viewed as a conversion ratio with an optimality bound, not an unbounded scalar.”
tweet / @fchollet / 14d ago
The advent of Artificial General Intelligence (AGI) is projected to redefine societal stratification, shifting the basis of class division from material wealth to cognitive agency. This future societal structure will delineate between individuals who maintain control over their attention and actions (the "focus class") and those whose reward mechanisms are entirely managed by AI systems (the "slop class"). This division implies a fundamental change in how individuals interact with and are influenced by advanced AI.
agi-impactsocial-divisioncognitive-agencyai-ethicsfuture-of-worksocial-commentary
“The class divide in an AGI future will be based on cognitive agency, not wealth.”
youtube / fchollet / 15d ago
François Chollet, creator of Keras and the ARC AGI benchmark, discusses NDIA, a new AI research lab focused on symbolic program synthesis as an alternative to deep learning. NDIA aims to build AI that requires less data, runs more efficiently, and generalizes better by replacing parametric curves with concise symbolic models, addressing the limitations of current LLM-based approaches. This new approach, which aims for optimal AI by leveraging symbolic models, is driven by the belief that current deep learning methods, while effective for verifiable domains, are inefficient and will not lead to true AGI.
agi-researchprogram-synthesismachine-learning-benchmarksdeep-learning-alternativesai-ethicsopen-source-softwareai-development-strategy
“AI progress is inevitable and accelerating, making it crucial to focus on how to leverage and utilize it.”
tweet / @fchollet / 15d ago
To generalize an AI system beyond a specific task (ARC-AGI-3), it is necessary to remove all components engineered or configured based on test runs on those specific tasks. This primarily includes prompts detailing the process to solve the games.
arc-agiartificial-general-intelligencellm-engineeringprompt-engineeringai-systems
“Generalizing an AI system requires removing task-specific components.”
tweet / @fchollet / 16d ago
François Chollet announced that ARC-AGI-4 is slated for an early 2027 release, initiating an annual benchmark release cycle. Each new benchmark aims to be "fully unsaturated upon release" and address "the most important unanswered research questions." This development strategy necessitates anticipating future AI capabilities during the benchmark design phase, echoing the approach taken for ARC-AGI-3.
arc-agiai-benchmarksai-capabilitiesfuture-of-aiai-research-trends
“ARC-AGI-4 will be released in early 2027.”
tweet / @fchollet / 17d ago
François Chollet clarifies his long-standing definition of Artificial General Intelligence (AGI), emphasizing learning efficiency over task-specific performance benchmarks. He posits that AGI should autonomously master any human-learnable task with equivalent learning efficiency, diverging from current AI development that often targets specific capabilities. This reorientation shifts the focus from achieving a pre-defined "target" to developing a "compass" for continuous, human-like learning.
agi-definitionai-capabilitiesmachine-learning-researchai-ethicsfuture-of-ai
“The concept of AGI as a 'compass, not a target' has been François Chollet's consistent stance since 2021-2022, predating ChatGPT's widespread recognition.”
tweet / @fchollet / 17d ago
The ARC-AGI-3 benchmark evaluates AI agentic intelligence through interactive reasoning environments that require human-level action efficiency on novel tasks without prior training. This benchmark highlights a significant gap between current frontier AI models, which perform under 1%, and human ability, as humans can solve all tasks upon first contact. The competition offers public environments for testing and private test sets for evaluation, aiming to drive advancements in general artificial intelligence.
arc-agiai-benchmarksinteractive-aireasoning-environmentskaggle-competitionshuman-level-ai
“ARC-AGI-3 evaluates agentic intelligence via interactive reasoning environments.”
youtube / fchollet / Oct 24
ARC-3 emphasizes interactive learning, goal discovery, and temporal planning in novel environments. It aims to measure efficient skill acquisition, a defining characteristic of general intelligence, by scaling up these capabilities within a "micro-AGI" framework, rather than focusing on perception or data-driven approaches like LLMs.
agiarc-prizeprogram-synthesisreasoning-benchmarksmachine-learning-theory
“ARC-3 focuses on key abilities like goal discovery, temporal planning, and interactive learning, differentiating it from previous versions.”
github_readme / fchollet / Sep 18
This GitHub repository offers Jupyter notebooks complementing the "Deep Learning with Python, third edition" by Chollet and Watson. It provides runnable code samples for practical application of theoretical concepts. The notebooks are designed for use with Google Colab, leveraging its free GPU runtime, and support Keras 3 with JAX, TensorFlow, or PyTorch backends. Users should refer to the companion book for comprehensive understanding, as the notebooks intentionally omit explanatory text and figures.
deep-learningpythonkerasmachine-learningtensorflowpytorchjax
“The repository provides executable code samples for the third edition of "Deep Learning with Python".”
youtube / fchollet / Jul 23
François Chollet, creator of Keras, argues that LLMs are fundamentally pattern-memorization systems — "databases of vector programs" — that can only operate within their training data distribution, making them categorically distinct from general intelligence. He defines intelligence as the efficiency with which an agent acquires new skills in novel, unprepared-for situations (operationalized via his ARC benchmark), and contends that LLMs score near zero on this metric. Chollet traces the failure mode to the architecture itself: transformers excel at passive, Hebbian-style associative learning but lack the active, causal, experimental learning that characterizes human cognition. While LLMs are practically valuable for automating tasks within known distributions, existential risk narratives are unfounded — the real bottleneck to AGI is unsolved program synthesis and few-shot generalization, not scaling.
deep-learningllm-limitationskerasagi-researchopen-source-mlai-hypeintelligence-theory
“LLMs cannot generalize beyond their training distribution; they fail even trivially novel tasks, scoring 5–10% on the ARC benchmark versus ~80% for humans.”
youtube / fchollet / Jun 16
François Chollet argues that the pre-training scaling paradigm fundamentally cannot produce general fluid intelligence because LLMs only acquire static, memorized skills — not the ability to synthesize novel solutions on the fly. Test-time adaptation (TTA) is a meaningful step forward, but remains compute-inefficient and lacks compositional generalization. True AGI, in Chollet's framing, requires combining two forms of abstraction: value-centric (continuous, perception/intuition via deep learning) and program-centric (discrete, reasoning via combinatorial search), and his new lab Ndea is building a deep learning-guided program search system targeting exactly this hybrid architecture.
agiarc-benchmarkfluid-intelligencetest-time-adaptationprogram-synthesisdeep-learning-limitationsfrancois-chollet
“A 50,000x scale-up of pre-training compute from 2019 to ~2024 moved ARC-1 accuracy from ~0% to only ~10%, while any human scores above 95%.”
github_readme / fchollet / May 26
Namex is a Python utility designed to strictly separate a package's implementation from its public API. It enables developers to define an explicit allowlist of public symbols, offering precise control over visibility, naming, and exposure paths. This facilitates easier refactoring, prevents accidental exposure of private utilities, and simplifies API version control.
python-packagingapi-designnamespace-managementcode-structuredeveloper-toolssoftware-development
“Namex allows for explicit control over a Python package's public API.”
github_readme / fchollet / Apr 4
ARC-AGI-1 (Abstraction and Reasoning Corpus) is a benchmark created by François Chollet to evaluate general fluid intelligence in both humans and AI systems, framed simultaneously as an AGI benchmark, a program synthesis benchmark, and a psychometric test. Tasks consist of input/output grid pairs (integers 0–9, up to 30×30) where a solver must infer a transformation rule from ~3 demonstrations and apply it to new inputs — with only 3 trials per test input and requiring exact cell-level correctness. The dataset is split into 400 training and 400 evaluation tasks in JSON format, with strict instructions against using evaluation data during development to preserve benchmark integrity. A v2 of the benchmark (ARC-AGI-2) has since been released in a separate repository.
arc-agiai-benchmarksartificial-general-intelligenceprogram-synthesisai-evalsdatasetcognitive-reasoning
“ARC-AGI-1 is designed to target both humans and AI systems that aim to emulate human-like general fluid intelligence.”
youtube / fchollet / Mar 24
The ARC-AGI 2 benchmark and Arc Prize 2025 challenge are launched, emphasizing the need for AI to demonstrate fluid intelligence to achieve AGI. This new benchmark is significantly more challenging than its predecessor, with even frontier models showing single-digit performance. The core tenet is that true intelligence involves efficient acquisition and deployment of capabilities, not just raw computational power or memorization. It aims to identify systems capable of adapting to novelty by recombining existing knowledge, a crucial step towards human-level AGI that can generate new knowledge.
agi-benchmarksarc-prizefluid-intelligencellm-evaluationai-reasoninghuman-ai-gapdeep-learning-limits
“ARC-AGI 2 is a significantly more challenging benchmark than ARC-AGI 1, with frontier models scoring in the single digits.”
youtube / fchollet / Mar 23
Deep learning, while adept at pattern matching in continuous spaces, struggles with discrete symbolic program synthesis due to gradient descent's inability to effectively optimize for such tasks. This limitation necessitates alternative approaches for achieving robust generalization, particularly in scenarios requiring compositional novelty and on-the-fly adaptation. The discussion emphasizes the need for better learning mechanisms and representations, moving beyond purely data-driven methods, with benchmarks like ARC highlighting challenges in strong generalization.
ai-researchprogram-synthesisdeep-learningneural-networksgeneralizationarc-challengellm-limitations
“Gradient descent is ineffective for learning discrete symbolic programs, limiting deep learning's ability to achieve strong generalization.”
youtube / fchollet / Jan 9
François Chollet discusses the evolution and limitations of current AI benchmarks, particularly ARC, highlighting the need for tasks that evaluate true generalization and adaptability to novelty rather than brute-force computation. He emphasizes the integration of intuition and reasoning in AI architectures and introduces ARC 2.0 as a response to these challenges, designed to foster the co-evolution of problems and solutions.
ai-alignmentllm-limitationsprogram-synthesisagi-benchmarksdeep-learning
“Human cognition merges intuition (pattern cognition) and discrete, step-by-step reasoning, both essential for advanced AI.”
youtube / fchollet / Nov 6 / failed
youtube / fchollet / Oct 12
François Chollet argues that LLMs are fundamentally limited to "value-centric" (System 1) abstraction — pattern interpolation over continuous embedding spaces — and are categorically incapable of "program-centric" (System 2) abstraction required for true generalization. Despite five years of scaling, core failure modes (sensitivity to rephrasing, inability to generalize algorithms beyond memorized instances, compositional breakdown) remain unresolved because they are architectural, not superficial. Chollet's ARC-AGI benchmark — designed to be memorization-resistant — exposes this gap starkly: state-of-the-art LLMs score 5–21% while humans score 97–98%, and brute-force program search already achieves ~50%. His thesis is that the path forward requires merging discrete program synthesis with deep learning, using neural networks as perception/intuition layers to tame combinatorial explosion in program search space.
agi-researchllm-limitationsintelligence-benchmarksabstractionprogram-synthesishuman-ai-collaborationarc-prize
“LLM performance depends on task familiarity, not task complexity — even simple unfamiliar problems will fail, while arbitrarily complex familiar problems can be solved via memorization.”
youtube / fchollet / Aug 17 / failed
youtube / fchollet / Jun 24
Large Language Models (LLMs) excel at generating human-like text by interpolating vast datasets, mimicking human behavior rather than truly understanding or reasoning. This interpolation capability, rooted in curve fitting, enables them to excel at tasks requiring extensive memorization and pattern recognition, such as passing standardized tests. However, they lack true intelligence, which is defined as the ability to adapt to novel situations and synthesize new solutions, and are therefore limited in tasks requiring genuine creativity, novel problem-solving, and abstract reasoning.
ai-ethicsllm-limitationsagi- بحثdeep-learningmachine-intelligence
“LLMs primarily function through interpolation and memorization of vast datasets, not by true understanding or reasoning.”
youtube / fchollet / Jun 12 / failed
youtube / fchollet / Jun 11
François Chollet and Mike Knoop launched the ARC Prize to incentivize research into true AI generalization, as current LLMs primarily rely on memorization and lack the ability to adapt to novel situations. Chollet argues that while LLMs excel at specific tasks through pattern matching, they fall short on the ARC benchmark, which requires on-the-fly program synthesis and core knowledge, a capability present in young children. The prize aims to push the AI community beyond current scaling approaches and toward hybrid systems that combine deep learning with discrete program search for real-world adaptability.
ai-researchagillm-limitationsarc-benchmarkprogram-synthesismachine-learning-benchmarksopen-science
“LLMs primarily rely on memorization and lack true generalization capabilities, performing poorly on tasks requiring adaptation to novelty.”
youtube / fchollet / May 3
LLMs struggle with novel tasks, even with extensive pre-training on synthetic data. The key to unlocking performance is active inference, which involves fine-tuning the LLM on a small set of demonstration examples and then artificially expanding these examples using a Domain Specific Language (DSL) to increase data diversity. This approach enables the LLM to learn and adapt to new tasks, mimicking human learning processes.
llm-reasoningactive-inferencesynthetic-datamachine-learning-techniquesarc-tasksfew-shot-learning
“Pre-training LLMs on millions of synthetically generated tasks is insufficient for high performance on novel tasks.”
github_readme / fchollet / Feb 12
This GitHub repository compiles a comprehensive list of resources for Keras, a Python deep learning library. Categorized for easy navigation, it includes tutorials, official documentation, code examples for various applications (text, image, creative visuals, reinforcement learning), and outlines third-party libraries and projects built with Keras. The intent is to provide a centralized hub for Keras users to learn, implement, and extend deep learning solutions.
kerasdeep-learningtutorialscode-examplespython-librarymachine-learningneural-networks
“The Keras Resources GitHub repository serves as a central directory for Keras-related learning and development materials.”