absorb.md

Simon Willison

Chronological feed of everything captured from Simon Willison.

Simon Willison Denies X Access

Simon Willison has not been granted access to the X platform, contrary to potential speculation. This insight clarifies the current status of his platform access.

Streaming Experts in Mixture-of-Experts Models

The potential for "streaming experts" within a Mixture-of-Experts (MoE) model suggests a capability to dynamically allocate computational resources. This approach could enable more efficient processing by engaging specialized expert models only when relevant to the input stream. It implies an architectural evolution towards adaptive and on-demand expert utilization in large language models.

Memory Requirements for LLM Inference

Running large language models (LLMs) for inference, especially those with high parameter counts, typically necessitates significant GPU memory. While some quantized models can operate on consumer-grade hardware like a 256GB or 512GB Mac Studio, larger, unquantized models predominantly require high-end NVIDIA GPU servers to ensure sufficient memory and computational throughput.

Image Generation Models Ingest Social Media Conversations For Training Data

Image generation models are incorporating social media comments and descriptive text as training data, as evidenced by precise textual details appearing in generated images that mirror comments from posts. This suggests that the models are not only processing visual information but also integrating contextual textual descriptions from online interactions, even for unusual prompts. This expansive approach to data ingestion implies a broader definition of

Anthropic’s Project Glasswing: A Model Access Strategy for Security Research

Anthropic has made its advanced Opus-beating model exclusively available to partnered security research organizations under "Project Glasswing." This selective distribution strategy is likely a response to recent concerns from credible security experts, aiming to control access to powerful AI models for responsible research and development.

Pelican GLM-5.1 Drawing and Animation by Simon Willison

Simon Willison highlighted the capabilities of the Pelican GLM-5.1 model, specifically its ability to generate and animate drawings. This observation suggests advancements in generative AI for visual content. The integration of this specific model into his workflow or its demonstrated output indicates a practical application for AI in creative digital tasks.

GLM-5.1 Sets New AI Performance Benchmarks with Extended Autonomy

GLM-5.1, an open-source model, achieves top-tier performance on coding and long-horizon tasks, ranking #1 in open source and #3 globally across prominent benchmarks. Its key innovation lies in its ability to operate autonomously for 8 hours, executing thousands of iterations to refine strategies. This enhanced capability is indicative of advancements in AI for complex problem-solving and extended operational cycles, making it suitable for applications that require sustained, independent operation.

Early Impressions of Gemma 2 vs. Qwen 1.5 Comparison

A recent social media poll by Simon Willison solicited community feedback on Gemma 2 performance against Qwen 1.5, few days after Gemma 2 release. The poll aims to gather early impressions and comparative analysis of the two models from developers and users who have experimented with it. The results are not yet available, but the prompt suggests an active evaluation phase within the AI community.

Anthropic’s Claude Mythos: A Dual-Use AI with Unprecedented Cybersecurity Capabilities Released Under Restricted Access

Anthropic has launched Project Glasswing, providing restricted access to Claude Mythos Preview, a general-purpose AI model demonstrating unprecedented cybersecurity capabilities far exceeding previous models. This restricted release strategy is due to the model’s ability to autonomously discover and exploit high-severity vulnerabilities across major operating systems and web browsers. The initiative aims to provide the software industry with time to address critical vulnerabilities before wider deployment of such powerful AI.

SQLite WAL Mode Across Docker Containers on a Single Host

SQLite's Write-Ahead Logging (WAL) mode functions efficiently across Docker containers sharing a volume on the same host. This is due to shared kernel and filesystem semantics facilitating real-time propagation of database changes and effective memory-mapped file sharing. This setup was validated using Docker Desktop for macOS, dispelling concerns about WAL shared memory conflicts.

GLM-5.1 Demonstrates Advanced Code Generation and Debugging Capabilities

Z.ai's GLM-5.1, a large language model, exhibits an unexpected ability to generate complex HTML with integrated SVG and CSS animations. Furthermore, it can self-debug and correct issues in its generated code based on user feedback, showcasing advanced reasoning and code manipulation capabilities beyond simple SVG generation. The model contextualizes and regionalizes prompts, hinting at advanced implicit prompt understanding.

Scan-for-Secrets: Proactive Identification and Redaction of Sensitive Data in Codebases

scan-for-secrets is a Python tool designed to identify and optionally redact sensitive strings across various file types, including common escaped variants. It supports scanning directories or specific files, reading secrets from arguments, piped input, or a configurable file. Its core utility lies in preventing inadvertent exposure of credentials or other private data before sharing code or logs.

scan-for-secrets 0.3 Released for Pre-Sharing Secret Detection in Files

Simon Willison released version 0.3 of scan-for-secrets, a tool designed to detect secrets in files before they are shared publicly. It scans for common credentials and tokens to prevent accidental leaks. The update enhances usability for developers handling sensitive code.

New Redaction Features in Simon Willison Tool

Simon Willison has released an update to his internal tools, introducing new redaction capabilities. The update includes a new command-line option for interactive redaction and a Python function for programmatic redaction. These features enhance the utility for handling sensitive information within files.

Google's AI Edge Gallery: On-device Gemma Models with Tool Calling on iOS

Google has released an official iOS app, "Google AI Edge Gallery," enabling on-device execution of Gemma 4 and Gemma 3 models. The app showcases local model capabilities for tasks like image Q&A and audio transcription, and features a "skills" demo for tool calling against HTML-based widgets. This marks a significant step for vendor-supported on-device AI.

datasette-ports 0.2 Enables Discovery of Running Datasette Instances and Their Ports

datasette-ports 0.2 is a new release that discovers all currently running Datasette instances on a system and lists their exposed ports. This tool facilitates management and interaction with multiple Datasette servers. It provides actionable output for technical workflows involving Datasette deployments.

Datasette-Ports: Streamlining Local Datasette Instance Management

The Datasette-Ports tool addresses the common issue of managing multiple, locally running Datasette instances. By providing a command-line utility to list all active instances and their associated ports, databases, and plugins, it significantly improves developer workflow. This tool is especially valuable for developers working with various databases and in-development plugins across numerous terminal windows, as it centralizes instance discovery and overview.

Claude Code Terminal Output Cleaning Tool

Simon Willison developed a specialized web tool to address the common issue of extraneous whitespace and prompt characters (❯) when copying code snippets from the Claude Code terminal application. This tool streamlines the process of obtaining clean, usable code by automatically removing these artifacts and reformatting wrapped lines. It is designed for developers who frequently interact with Claude Code and require efficient code transfer.

datasette-ports 0.1 Released to Detect and List Ports of Running Datasette Instances

Simon Willison released datasette-ports 0.1, a tool that identifies all currently running Datasette instances on a system and outputs their ports. This enables quick discovery of active Datasette servers without manual port scanning or configuration checks. Targeted at Datasette users managing multiple local instances.

Datasette Ports Tool Now Independent

The `datasette-ports` tool, which identifies running Datasette instances and their active ports, has been made standalone. It no longer requires a direct Datasette installation to function, enhancing its usability for developers. The tool can be executed via `uvx datasette-ports`, though its plugin functionality within Datasette for the `datasette ports` command remains.

Claude Code Paste Tool Cleans Terminal Output by Removing Prompts and Fixing Whitespace

Simon Willison's cleanup-claude-code-paste tool processes terminal output pasted into Claude, stripping ❯ prompts, correcting wrapped-line whitespace, and joining fragmented lines into clean, readable text. It targets common formatting issues from terminal copy-pastes to improve code or output usability in AI interfaces. The tool outputs "Cleaned output:" followed by the processed text.

Simon Willison Opposes Bans on Claude's Parallel Prompt Execution Flag

Simon Willison explicitly rejects policies prohibiting the use of "claude -p", Anthropic's Claude tool flag for parallel prompt execution. This stance implies endorsement of advanced CLI features for efficient AI model interaction. Technical users should note its utility in high-throughput prompting workflows.

Anthropic's Claude Filters System Prompts for "OpenClaw" String, Blocks or Surcharges Usage

Anthropic's Claude model detects specific text like "A personal assistant running inside OpenClaw" in system prompts and either blocks access or applies extra billing charges. This filtering was empirically confirmed via testing, as demonstrated in a screenshot shared by Florian Kluge. The practice raises concerns over discriminatory billing based on prompt content, highlighted in discussions around first-party harness usage.

Uncertainty on OpenCode's Implementation: System Prompt Filter or API Key Usage?

Simon Willison inquires whether OpenCode was implemented as a system prompt filter, expressing an assumption that it relates to API key usage instead. This highlights a lack of clarity in OpenCode's technical deployment mechanism. The question seeks confirmation on the specific implementation approach for this feature or tool.

Exact String Matching in System Prompts Mitigates False Positives for LLM Safety Triggers

Simon Willison highlights a benefit of using exact string matching for detecting system prompt references in Claude Code: it prevents accidental triggers from benign mentions of strings like "OpenClaw". This approach ensures reliability by avoiding intermittent failures in legitimate usage. The observation underscores precise matching as a robust safeguard in LLM prompt engineering.

Anthropic Blocks Third-Party Apps from Claude Max via System Prompt Filtering

Anthropic restricts access to its high-tier Claude Max plan by detecting specific strings in third-party system prompts, such as 'A personal assistant running inside OpenClaw.', triggering 400 errors for apps like OpenClaw. While Simon Willison accepted their prior cost-optimization rationale for internal use, he views prompt-based filtering as excessive. This follows complaints about tiered billing tied to system prompt content.

Anthropic Claude Max Plan Blocks Exact "OpenClaw" System Prompt String with 400 Error

Anthropic's Claude Max plan enforces a precise block on the system prompt string "OpenClaw", triggering a 400 error citing third-party app usage limits. This behavior activates only for that exact string, as confirmed by targeted tests. The restriction appears designed to prevent specific third-party integrations or jailbreaks.

Anthropic Blocks Third-Party Claude Apps via Exact System Prompt Matching, Triggering Extra Billing

Anthropic now detects and blocks third-party harnesses like OpenClaw by exact string matching on specific system prompts such as 'A personal assistant running inside OpenClaw.', resulting in 400 errors and billing under extra usage tiers outside plan limits. This extends their prior reservation of the Claude Max plan for first-party use, despite accepted cost-optimization claims. Exact matching provides a workaround but raises concerns over prompt-based filtering and potential billing discrimination.

uvx's Central Role in Workflow Raises Dependency Concerns, But Traditional Installs Remain Comparable

Simon Willison expresses concern over reliance on uvx in his workflow, highlighting potential risks from its importance. He argues that recommending pip install or uv tool install for scan-for-secrets, followed by --help, does not significantly differ from using uvx. This underscores ongoing debates in Python tooling about ephemeral vs. persistent installs.

Hospital Deserts Generate 600K Unclassified Weekly Messages Potentially Overlooked as Healthcare Data

Simon Willison questions whether 600,000 weekly messages from hospital deserts are categorized as healthcare-related. These messages risk being misclassified, potentially distorting healthcare analytics in underserved areas. Accurate classification is critical for technical processing of regional health data streams.

README-Driven Development Enables Rapid Tool Prototyping with AI

Simon Willison developed a Python CLI tool, scan-for-secrets, by first crafting a detailed README specifying its exact functionality, then feeding it into Claude Code to generate the implementation. This README-driven development approach streamlined building a secret-scanning utility for log files and similar content. The resulting tool is accessible via uvx and documented on his blog and GitHub.

Simon Willison Releases Python CLI Tool for Detecting Secrets in Log Files via README-Driven Development

Simon Willison developed scan-for-secrets, a Python CLI tool that scans folders for leaked secrets like API keys in log files before sharing. The tool is invoked via uvx scan-for-secrets --help, with full details in its GitHub README and a blog post. It was built using README-driven development, where a detailed README spec was fed into Claude Code for implementation.

Simon Willison Blog Post: April 2026 Overview

This blog post from Simon Willison, dated April 5th, 2026, acts as a brief overview or summary page, referencing several other articles published around the same time. The primary insight is the aggregation of content, including a sponsored message and links to specific articles regarding AI safety, cybersecurity, and agentic engineering discussions.

AI Accelerates Decades-Long Software Dreams from Vision to Reality in Months

Simon Willison spent eight years desiring a project but built it in just three months using AI tools. This demonstrates AI's capacity to drastically compress development timelines for complex software by automating coding, debugging, and iteration. The post highlights a paradigm shift where longstanding technical ambitions become feasible rapidly with current AI capabilities.

Browser-Based SQLite SQL Parsing with Syntaqlite and Pyodide

Syntaqlite provides a full SQLite SQL parser, formatter, validator, and language server, leveraging SQLite's native grammar and tokenizer. This online playground executes syntaqlite entirely in the browser via Pyodide, enabling client-side processing of SQL inputs. Users can generate formatted SQL, AST, JSON schemas, diagnostics, and token streams without server dependencies.

AI as a Prototyping Accelerator, Not an Architectural Designer

AI excels at accelerating the initial prototyping phase of software development by handling tedious, low-level coding tasks. However, relying on AI for high-level architectural design can lead to inefficient designs, increased procrastination on critical decisions, and a potentially more convoluted development process. Human expertise remains crucial for robust, long-term architectural planning and decision-making.

Simon Willison Launches Blog Tag to Track Surging AI-Powered Security Research Trend

Simon Willison has initiated a dedicated blog tag for AI-powered security research, noting its current prominence. The tag already contains 11 posts. This reflects growing interest and activity at the intersection of AI and security research.

Expert Software Engineers Reach Cognitive Limits Managing Multiple AI Coding Agents

Lenny Rachitsky, with 25 years of software engineering experience, finds effectively using coding agents mentally exhausting, hitting cognitive limits after running four in parallel by 11am. This requires developing new personal skills to manage human cognition constraints without reviewing every agent action. The challenge highlights the need for responsible practices to prevent burnout while leveraging AI tools.

Gemma 4's Small Models Enable Local Audio Processing on Macs

Gemma 4's two smallest variants support audio understanding capabilities, including ASR and speech-to-translated text. Simon Willison seeks a recipe to run these models (E2B or E4B) against audio files locally on Macs. No established method is confirmed in the post.

The Automation of Zero-Day Discovery via Frontier LLM Agents

Frontier LLM agents are transitioning vulnerability research from a manual expert process to an automated search problem. By leveraging embedded knowledge of bug classes and massive cross-code correlations, agents can iteratively solve for reachability and exploitability with exhaustive persistence. This represents a step-function increase in zero-day discovery capabilities rather than incremental improvement.

Sophisticated Social Engineering Led to Axios Supply Chain Attack

A recent supply chain attack on Axios was the result of a highly sophisticated social engineering campaign directly targeting a maintainer. The attackers impersonated a company founder, created a convincing fake Slack workspace, and scheduled a video meeting where the maintainer was prompted to install a Remote Access Trojan (RAT). This RAT then stole credentials, enabling the publication of a malicious package.

Local LLM Execution Challenges

The user expresses surprise regarding a previously unnoticed detail, followed by an inquiry about local execution of a language model. Specifically, they question the capability of existing tools like LM Studio or Ollama to handle the task, indicating potential limitations in current local LLM deployment solutions or a lack of user awareness regarding their features. The core insight revolves around the practical challenges and uncertainties users face when attempting to run advanced language models in local environments.

Pelicans Generated for Gemma 4 Models Using Local and Cloud Inference

Simon Willison generated Pelicans for Gemma 4 variants E2B, E4B, 26B-A4B, and 31B. The first three were produced locally on a laptop with LM Studio, while the 31B model failed locally and required the Gemini API. This demonstrates feasible local inference for most Gemma 4 sizes on consumer hardware with cloud fallback for largest variants.

Gemma 4: Google DeepMind's New Efficient Multimodal LLMs

Google DeepMind has released Gemma 4, a new series of Apache 2.0 licensed LLMs, emphasizing high intelligence-per-parameter. These models, including 2B, 4B, 31B, and a 26B-A4B Mixture-of-Experts, are multimodal, supporting vision and audio inputs, with a focus on efficient on-device deployment. The release highlights a growing trend towards smaller, more capable models in AI research.

AI Agents Drive Software Engineering Shift to Ambition and Risk

AI agents have fundamentally reshaped software engineering, making code generation exceptionally cheap and enabling rapid prototyping. This shift amplifies the capabilities of experienced engineers, allowing them to tackle more ambitious projects, but leaves mid-career professionals in a precarious position. The ease of code generation introduces new security vulnerabilities, particularly "lethal trifecta" scenarios, where agents with access to private data and external communication channels are exposed to malicious instructions, raising concerns about potential large-scale failures similar to the Challenger disaster.

AI Inflection Point Redefines Software Engineering Paradigms

The rapid advancement of AI models, particularly in coding capabilities, has created a significant inflection point in software engineering. This shift has accelerated prototyping, moved bottlenecks from implementation to testing, and fundamentally altered the nature of coding work. Experienced engineers leverage AI as an amplifier, while mid-career professionals face challenges in adapting to these new paradigms.

LLM Vulnerabilities Preclude Certain Systemic Guarantees

The author posits that a particular objective is unattainable, citing research from llm-attacks.org. The referenced material likely addresses vulnerabilities or fundamental constraints in Large Language Model (LLM) security or steering that preclude the desired outcome.

Distilling Victorian Persona via Synthetic SFT: The Mr. Chatterbox Nanochat Model

Mr. Chatterbox is a specialized 2GB nanochat model trained from scratch on a corpus of 28,000 Victorian-era texts. The development pipeline leveraged synthetic data distillation from Claude Haiku and GPT-4o-mini for supervised fine-tuning (SFT) to optimize conversational capabilities without high annotation costs.

New npm Supply Chain Attack Targets Widely Used Axios Package

A critical supply chain attack has been identified, targeting the `axios` npm package, which boasts over 100 million weekly downloads. The attack leverages a newly introduced dependency, `plain-crypto-js@4.2.1`, acting as an obfuscated dropper/loader. This malware exhibits sophisticated evasion techniques and executes malicious shell commands, highlighting a significant threat to development environments.

Challenges in Local LLM Agent Performance

Local Large Language Model (LLM) agents face significant performance hurdles due to a fragmented and fragile development ecosystem. The complexity arises from diverse components like model chat templates, prompt construction, and inference mechanisms, often developed by different entities. This lack of integration leads to subtle, recurring bugs and inconsistencies, making reliable performance difficult to achieve despite ongoing improvements.

Older entries →