absorb.md

Simon Willison

Chronological feed of everything captured from Simon Willison.

uvx Enables One-Command Local Chat with 2GB Victorian-Trained Nano Model Mr. Chatterbox

Mr. Chatterbox is a 2GB nanochat model trained from scratch on 28,000 Victorian-era British texts (1837-1899). Simon Willison's llm-mrchatterbox plugin allows local inference on consumer hardware like a Mac. With uv installed, users invoke it via a single command: uvx --with llm-mrchatterbox llm chat -m mrchatterbox, after an initial 2GB model download.

llm-mrchatterbox: Running a Victorian-era LLM Locally with LLM

llm-mrchatterbox is a plugin for LLM that enables local execution of the "Mr. Chatterbox" language model. This model was trained on a corpus of 28,000 Victorian-era British texts, offering a unique linguistic perspective. The plugin simplifies model usage and management within the LLM framework.

Red-Green TDD for LLM Agentic Engineering

Simon Willison details a "Red-Green TDD" approach adapted for LLM agentic engineering. This methodology emphasizes iterative development by first establishing a failing test (red), then implementing the agentic solution to pass the test (green), and finally refactoring. This mirrors traditional software development practices but is tailored for the non-deterministic nature of LLM evaluations, providing a structured way to build and refine agentic systems.

Mr. Chatterbox: A Victorian-Era LLM Limitations and Ethical Training Challenges

Mr. Chatterbox is a 340M parameter language model trained exclusively on 2.93 billion tokens from 28,000 Victorian-era British Library books (1837-1899). Despite its novel ethical training approach using only out-of-copyright data, the model exhibits conversational limitations, often producing Markov-chain-like responses due to insufficient training data for its parameter count and the use of modern LLMs (Claude Haiku and GPT-4o-mini) for generating supervised fine-tuning conversation pairs, which dilutes its "pre-1899 only" claim. The project highlights the challenges of creating useful LLMs from purely public domain sources.

AI Models Enable Vibe Coding of Production SwiftUI Menu Bar Apps Without Xcode

Claude Opus 4.6 and GPT-5.4 demonstrate competence in generating functional SwiftUI code for Mac menu bar apps directly from natural language prompts. This "vibe coding" approach bypasses traditional IDEs like Xcode, allowing rapid prototyping on new hardware. The result is deployable apps produced solely via AI assistance.

Bandwidther: A macOS Bandwidth Monitoring Tool Using Command-Line Utilities

Bandwidther is a SwiftUI macOS application designed for monitoring network bandwidth usage at both the system and per-process level. It leverages standard macOS command-line tools like `nettop` and `lsof` instead of relying on packet capture or private APIs, which presents both advantages in terms of system compatibility and limitations regarding the scope and accuracy of network data collection. The application provides insights into download/upload speeds, cumulative totals, and connection summaries, categorizing destinations as internet or LAN based on heuristic analysis.

Gpuer: A new macOS GPU and memory monitoring tool for Apple Silicon

Gpuer is a new SwiftUI menu bar application for macOS that provides detailed monitoring of GPU and unified memory statistics on Apple Silicon. It differentiates itself by offering a unique perspective on unified memory usage, treating CPU and GPU memory as a single pool, and by utilizing specific macOS system interfaces for accurate data collection. The tool aims to provide more insightful memory pressure and utilization data compared to traditional metrics.

LLMs as Rapid Prototyping Engines for macOS SwiftUI Applications

Large Language Models (LLMs) like Claude Opus 4.6 and GPT-5.4 are demonstrating significant capability in generating functional SwiftUI macOS applications from minimal prompts. This enables rapid prototyping and development of tools without direct programming expertise in Swift or requiring an integrated development environment like Xcode. The process, termed "vibe coding," leverages LLMs to quickly build applications by iteratively addressing feature requests and bug fixes through conversational prompts.

PyPI Already Implements AI-Powered Malware Pattern Scanning via Partner APIs

PyPI employs AI-powered scans for malicious package patterns through an API accessible to scanning partners. This capability enabled the rapid quarantine of a suspicious package within one hour of publication. The response underscores existing proactive defenses in major package registries against emerging attack vectors.

Memory-Efficient MoE-LLM Inference on Consumer Hardware

Mixture-of-Experts (MoE) Large Language Models (LLMs) can be executed on consumer-grade Mac hardware by streaming expert weights from SSD, bypassing the need to load the entire model into RAM. This approach, exemplified by the Kimi 2.5 model, which is 1T but only activates 32B parameters, enables the execution of large models on devices with limited memory. The key insight is the emergent capability of LLMs to handle complex tasks like C code generation, coupled with the necessity of robust agentic orchestration and validation for real-world application.

LLMs Enable User Profiling from Hacker News Comments in Emerging Surveillance Scenario

Simon Willison proposes prompting an LLM with 1,000 Hacker News comments per user using "Profile this user" to infer personal details, highlighting a new surveillance dystopia. Claude Opus 4.6 excels at this task. The approach demonstrates LLMs' capability to extract behavioral and identity insights from public discussion data at scale.

Starlette 1.0 Release and AI Code Generation Capabilities

Starlette 1.0 has been released, introducing a new `lifespan` mechanism for startup/shutdown, replacing `on_startup` and `on_shutdown`. This release, despite potential compatibility issues with LLM training data, enables efficient code generation for Starlette applications. Claude's ability to independently clone repositories, understand new framework versions, and integrate this knowledge into custom skills demonstrates its advanced capabilities as a coding agent.

Local Qwen 3.5 Models Enable Secure Sensitive Journalism on Laptops

Qwen 3.5 running locally on a high-end laptop delivers sufficient power for sensitive journalism applications. This capability drives interest in local AI models by eliminating cloud dependency risks. Advances now make on-device inference viable for secure, private workflows.

AI Coding Agents Risk Leaking Sensitive Data; Local Models Mitigate for High-Security Use Cases

Coding agents on cloud models leak prompts and sensitive data snippets through context, akin to untrusted access. Simon Willison highlights local models like Qwen 3.5 on laptops as viable for sensitive journalism to avoid leaks. Thread extends risks to law firms, subpoenas in SaaS APIs, and sectors like healthcare and legal.

AI-generated spam replies exhibit detectable patterns in tropes and repetitive phrasing across accounts

Simon Willison identifies AI-generated replies as detectable through characteristic "AI tropes" and their frequent repetition of similar text when replying to other accounts. This observation responds to Paul Graham's frustration with spam accounts baiting replies, prompting a request for software to automate detection. The patterns suggest scalable filtering via text similarity and behavioral analysis.

LLMs Can Generate Detailed User Profiles from Public Comments

Large Language Models (LLMs) can effectively create comprehensive user profiles by analyzing publicly available comment data. This process, demonstrated with Hacker News comments and Claude Opus 4.6, yields detailed insights into professional identity, core beliefs, working style, technical interests, and even personality traits. The method leverages open APIs to gather data, highlighting the potential for advanced intelligence gathering from public online interactions.

OpenAI Acquires Astral: Strategic Talent and Open-Source Integration for Codex

OpenAI's acquisition of Astral, known for popular Python tools like uv, ruff, and ty, appears to be a dual play for talent and technology. Astral's team will join OpenAI's Codex division, aiming to enhance AI capabilities in software development by integrating Astral's open-source projects. This move is positioned within a competitive landscape where AI companies are aggressively acquiring tools and talent to gain an edge in coding agent development.

OpenAI's GPT-5.4 Mini and Nano Models Offer Cost-Effective and Faster Performance

OpenAI introduces GPT-5.4 Mini and Nano, smaller, faster, and more economical versions of their GPT-5.4 model. These models demonstrate improved performance over previous iterations and are particularly cost-effective for large-scale tasks like image description, as evidenced by benchmark comparisons and practical application examples.

Showboat: Reproducible Agentic Demo Document Generation and Verification

Showboat is a Go-based command-line tool that facilitates the creation of executable markdown documents. These documents combine commentary, executable code blocks, and their captured output, serving as both documentation and verifiable proof of an agent's work. The tool supports re-execution of code blocks to confirm output consistency and offers remote streaming capabilities for real-time updates.

Evolving Software Development with Agentic AI

Agentic AI is transforming software development by shifting the focus from manual coding to guiding AI agents. This paradigm requires new approaches to testing, quality assurance, and security to leverage AI's efficiency while mitigating its inherent risks. Integrating AI effectively necessitates a re-evaluation of traditional development workflows and a move towards agent-centric methodologies, emphasizing robust testing and sandboxing.

Rodney: Command-line Chrome Automation for Scripted Browser Interactions

Rodney is a Go CLI tool for persistent headless Chrome automation, enabling multi-step browser interactions from shell scripts. It leverages the `go-rod` library to connect to a single long-running Chrome process, maintaining state across commands. This architecture facilitates web scraping, UI testing, and accessibility checks directly from the command line, with features like session scoping and proxy support.

Simon Willison Probes AI Tool Experience in Recent Software Developer Interviews

Simon Willison is surveying recent software developer interview experiences to determine if familiarity with AI programming tools plays a role. He explicitly requests detailed replies to gather qualitative data. This reflects growing interest in evaluating AI proficiency as a hiring criterion in tech roles.

LLMs and Novel Technology Adoption

Large Language Models (LLMs) used in coding agents are demonstrating a surprising aptitude for integrating novel or obscure technologies. Contrary to early concerns that LLMs would reinforce a "boring technology" approach due to training data bias, modern LLMs with expanded context windows effectively consume documentation and adapt to custom codebases. This suggests LLMs may accelerate, rather than hinder, the adoption of new tools by reducing the barrier to entry for developers.

Coding Agents and Open Source Relicensing: The chardet Case

Coding agents are enabling a new form of "clean room" implementation, allowing for rapid code rewrites. This development creates legal and ethical ambiguities, particularly concerning relicensing open-source projects. The chardet library case exemplifies a scenario where an AI-assisted rewrite, despite claims of being a ground-up implementation, faces challenges regarding its license due to the maintainer's prior exposure to the original LGPL-licensed code and the potential for AI models to be trained on existing restricted codebases.

Key Qwen AI Team Members Depart Alibaba Amid Reorganization

Alibaba's Qwen open-source AI model team is experiencing significant upheaval, with lead researcher Junyang Lin and several other core contributors announcing their departures. This follows a company reorganization where a new researcher was reportedly placed in charge of Qwen. The departures raise concerns about the future of the highly-regarded Qwen model family, particularly given their recent release of exceptional Qwen 3.5 models across various sizes.

Present: A WebView-Based Presentation Tool

Present is a macOS SwiftUI application designed for presentations, utilizing WebViews to display each slide as a URL. This tool allows for dynamic presentations with features like live editing, reordering slides, and remote control capabilities via an embedded HTTP server. It supports various web content and image formats, and despite being a "vibe coded" demo, it offers robust functionalities for URL-driven presentations.

LLM-Assisted, Rapid macOS App Development for Niche Tools

The author successfully developed a custom macOS presentation app, "Present," with advanced features like remote control and automatic state saving, using LLM-assisted "vibe coding" in Swift and SwiftUI. This project demonstrates how experienced software engineers can leverage LLMs to quickly build specialized tools, even in unfamiliar languages, without deep IDE interaction, highlighting a shift towards agentic engineering patterns. The approach enabled rapid development of a functional solution addressing a specific workflow problem.

Agentic Engineering Patterns: A New Discipline for Software Development

Simon Willison introduces "Agentic Engineering Patterns" to document best practices for developing software with coding agents. This discipline focuses on professional software engineers leveraging tools like Claude Code and OpenAI Codex, which generate and execute code, to amplify their expertise and accelerate development. The initiative aims to provide structured guidance on effectively utilizing these autonomous coding agents.

Integrating Diverse Content Streams with AI Assistance

Simon Willison successfully integrated five distinct content streams into his blog ("beats") by leveraging AI, specifically Claude Code, for data extraction and UI integration. This approach highlights the efficacy of AI in automating complex, multi-source content aggregation, especially when the source and destination are controlled by the same entity, allowing for more brittle but efficient solutions.

Showboat Ecosystem Expands with Remote Publishing and Charting Tools

Simon Willison introduces two new tools enhancing the Showboat ecosystem: Chartroom and datasette-showboat. Chartroom simplifies chart generation for coding agents using matplotlib, enabling easy embedding in Showboat documents. Datasette-showboat provides a remote publishing mechanism, allowing real-time streaming of Showboat document fragments to a Datasette instance for immediate viewing and feedback during agent operations.

Chartroom: A CLI Tool for Data Visualization with Matplotlib

Chartroom is a command-line interface (CLI) tool that leverages Matplotlib to generate various chart types from diverse data sources including CSV, TSV, JSON, JSONL, and SQLite. It offers automatic alt-text generation for accessibility and supports customizable output formats and styling. This tool streamlines the data visualization workflow directly from the command line.

Deep Blue: AI-Induced Existential Dread in Software Engineering

The term "Deep Blue" describes the psychological phenomenon of ennui and existential dread experienced by software developers due to the growing capabilities of generative AI in their field. This sentiment arises from concerns that AI could diminish the value of their long-honed skills, despite the potential benefits of AI-assisted programming. The article highlights how naming this issue can facilitate discussion and acknowledges the parallel to similar anxieties faced by chess and Go players in the past.

OpenAI’s Mission Statement Changes Reflect Shifting Priorities

OpenAI’s mission statement, as documented in its annual IRS 501(c)(3) filings, has undergone significant revisions between 2016 and 2024. These changes illustrate a strategic evolution from a focus on open collaboration and broad societal benefit to a more concise commitment to ensuring artificial general intelligence benefits all of humanity, notably omitting earlier emphases on safety and the unconstrained pursuit of non-financial returns. The modifications suggest a shift in organizational priorities and possibly a response to its evolving operational model.

Distributing Go Binaries via Python Wheels

go-to-wheel is a tool that automates the cross-compilation of Go modules into platform-specific Python wheels. This enables the distribution of static Go binaries through PyPI, allowing users to install CLI tools via pip or pipx without requiring a Go environment.

Closing the Agent Verification Gap with Execution-Backed Demos

To mitigate the 'black box' nature of agent-led development, Simon Willison introduced Showboat and Rodney to force agents to provide empirical evidence of functional software. Showboat automates the creation of execution-backed demo documents, while Rodney extends this to browser-based interfaces via a CLI for the Rod library. This approach complements TDD by providing a visual and verifiable audit trail of a feature's behavior.

StrongDM AI Pioneers "Dark Factory" Software Development with Agent-Driven Engineering and Digital Twin Testing

StrongDM AI has implemented a "Dark Factory" approach to software development, where coding agents autonomously write and validate code without human review. This methodology leverages recent advancements in LLM capabilities, particularly around late 2025, enabling agents to reliably handle complex coding tasks. A core innovation is the use of "Digital Twin Universes" for scalable, scenario-based testing, validating agent-generated software against high-fidelity simulations of external services.

Pydantic's Monty: A Fast, Secure Python Subset for LLM Sandboxing in WebAssembly

Pydantic has developed Monty, a Python-like language subset implemented in Rust, designed for secure and low-latency execution of LLM-generated code. Monty offers strict sandboxing by controlling host environment access and external function calls, making it suitable for embedding in agents. This innovation allows for efficient code execution within WebAssembly environments, including direct browser deployment or integration with Pyodide for browser-based Python execution.

CIA World Factbook 2020 Data Preservation

The CIA World Factbook was taken offline in February 2026. A developer, Simon Willison, recovered the 2020 edition, which was the final version released as a downloadable archive, from the Internet Archive and made it available as a GitHub repository. This action preserves public domain data that would otherwise be inaccessible.

Leveraging PyPI for Go Binary Distribution and Python Integration

Go binaries can be distributed via PyPI, enabling seamless integration into Python projects as dependencies. This method circumvents typical Go binary distribution challenges, leveraging Python's packaging ecosystem for platform-specific binary delivery. The 'go-to-wheel' tool automates the creation of Python wheels for Go applications.

Moltbook: A Social Network for AI Agents Driven by OpenClaw

Moltbook is presented as a novel social network where AI agents, specifically those built on OpenClaw, can interact by sharing information and discussing topics. The platform leverages OpenClaw's "skills" plugin system for its functionality, allowing agents to automate tasks and communicate. This exposes a significant security risk, as skills can execute arbitrary code and the agents periodically fetch and follow instructions from the internet, raising concerns about supply chain attacks and the autonomous operation of AI.

Implementing Dynamic Features with Client-Side State on Aggressively Cached Static Sites

This article details how to add dynamic, user-specific functionality to a heavily cached Django-based static site. The core technique involves utilizing client-side JavaScript and `localStorage` to manage state and conditionally display elements, thus bypassing server-side rendering for dynamic content. This approach proves effective for features like personalized edit links and persistent random content navigation within tags, even when a CDN aggressively caches pages.

ChatGPT’s Code Interpreter Upgraded with Broader Language, Shell, and Download Capabilities

ChatGPT's integrated code interpreter has received a significant, undocumented upgrade. It now supports direct Bash command execution and code execution in multiple programming languages beyond Python, including Node.js, Ruby, Perl, PHP, Go, Java, Swift, Kotlin, C, and C++. Furthermore, it can install packages via pip and npm through a proxy and download files directly from URLs into its sandboxed environment, enhancing its utility for complex tasks.

Python Script for CBOR Test Vector Validation

This Python script provides a comprehensive framework for validating CBOR (Concise Binary Object Representation) test vectors. It includes utilities for parsing diagnostic notation into Python objects and a robust equality function to handle nuances of CBOR data types, including special floating-point values and tagged data. The script automates the validation process by comparing decoded CBOR with expected values and performing round-trip serialization checks.

Enhanced Gist Privacy Control in terminal-to-html

The `terminal-to-html` tool has been updated to include a privacy checkbox for Gist creation. This enhancement allows users to designate Gists as private by default, improving control over data visibility. The update involved modifications to the UI, Gist creation logic, and authentication display to ensure consistent functionality and user experience.

Claude Uses Prompt Engineering to Enhance UI with Performance Metrics

Claude successfully integrated performance metrics into a web application's UI by analyzing existing code, identifying modification points, and implementing changes to display request time, SQL execution time, and query count. This demonstrates Claude's ability to understand complex codebases and perform targeted UI enhancements based on implicit and explicit instructions.

Leveraging AI in Software Development and Data Analysis

Simon Willison, a co-creator of the Django web framework and Dataette, discusses his extensive experience using generative AI tools, particularly LLMs with code interpreters, to enhance software development and data analysis workflows. He highlights the dramatic increase in productivity, especially for prototyping and code generation, despite the ongoing challenges of integration and model limitations. Willison emphasizes the importance of continuous experimentation and understanding each tool