Simon Willison Denies X Access
Simon Willison has not been granted access to the X platform, contrary to potential speculation. This insight clarifies the current status of his platform access.
Chronological feed of everything captured from Simon Willison.
Simon Willison has not been granted access to the X platform, contrary to potential speculation. This insight clarifies the current status of his platform access.
The potential for "streaming experts" within a Mixture-of-Experts (MoE) model suggests a capability to dynamically allocate computational resources. This approach could enable more efficient processing by engaging specialized expert models only when relevant to the input stream. It implies an architectural evolution towards adaptive and on-demand expert utilization in large language models.
Running large language models (LLMs) for inference, especially those with high parameter counts, typically necessitates significant GPU memory. While some quantized models can operate on consumer-grade hardware like a 256GB or 512GB Mac Studio, larger, unquantized models predominantly require high-end NVIDIA GPU servers to ensure sufficient memory and computational throughput.
Image generation models are incorporating social media comments and descriptive text as training data, as evidenced by precise textual details appearing in generated images that mirror comments from posts. This suggests that the models are not only processing visual information but also integrating contextual textual descriptions from online interactions, even for unusual prompts. This expansive approach to data ingestion implies a broader definition of
Anthropic has made its advanced Opus-beating model exclusively available to partnered security research organizations under "Project Glasswing." This selective distribution strategy is likely a response to recent concerns from credible security experts, aiming to control access to powerful AI models for responsible research and development.
Simon Willison highlighted the capabilities of the Pelican GLM-5.1 model, specifically its ability to generate and animate drawings. This observation suggests advancements in generative AI for visual content. The integration of this specific model into his workflow or its demonstrated output indicates a practical application for AI in creative digital tasks.
GLM-5.1, an open-source model, achieves top-tier performance on coding and long-horizon tasks, ranking #1 in open source and #3 globally across prominent benchmarks. Its key innovation lies in its ability to operate autonomously for 8 hours, executing thousands of iterations to refine strategies. This enhanced capability is indicative of advancements in AI for complex problem-solving and extended operational cycles, making it suitable for applications that require sustained, independent operation.
A recent social media poll by Simon Willison solicited community feedback on Gemma 2 performance against Qwen 1.5, few days after Gemma 2 release. The poll aims to gather early impressions and comparative analysis of the two models from developers and users who have experimented with it. The results are not yet available, but the prompt suggests an active evaluation phase within the AI community.
Anthropic has launched Project Glasswing, providing restricted access to Claude Mythos Preview, a general-purpose AI model demonstrating unprecedented cybersecurity capabilities far exceeding previous models. This restricted release strategy is due to the model’s ability to autonomously discover and exploit high-severity vulnerabilities across major operating systems and web browsers. The initiative aims to provide the software industry with time to address critical vulnerabilities before wider deployment of such powerful AI.
SQLite's Write-Ahead Logging (WAL) mode functions efficiently across Docker containers sharing a volume on the same host. This is due to shared kernel and filesystem semantics facilitating real-time propagation of database changes and effective memory-mapped file sharing. This setup was validated using Docker Desktop for macOS, dispelling concerns about WAL shared memory conflicts.
Z.ai's GLM-5.1, a large language model, exhibits an unexpected ability to generate complex HTML with integrated SVG and CSS animations. Furthermore, it can self-debug and correct issues in its generated code based on user feedback, showcasing advanced reasoning and code manipulation capabilities beyond simple SVG generation. The model contextualizes and regionalizes prompts, hinting at advanced implicit prompt understanding.
scan-for-secrets is a Python tool designed to identify and optionally redact sensitive strings across various file types, including common escaped variants. It supports scanning directories or specific files, reading secrets from arguments, piped input, or a configurable file. Its core utility lies in preventing inadvertent exposure of credentials or other private data before sharing code or logs.
Simon Willison released version 0.3 of scan-for-secrets, a tool designed to detect secrets in files before they are shared publicly. It scans for common credentials and tokens to prevent accidental leaks. The update enhances usability for developers handling sensitive code.
Simon Willison has released an update to his internal tools, introducing new redaction capabilities. The update includes a new command-line option for interactive redaction and a Python function for programmatic redaction. These features enhance the utility for handling sensitive information within files.
Google has released an official iOS app, "Google AI Edge Gallery," enabling on-device execution of Gemma 4 and Gemma 3 models. The app showcases local model capabilities for tasks like image Q&A and audio transcription, and features a "skills" demo for tool calling against HTML-based widgets. This marks a significant step for vendor-supported on-device AI.
datasette-ports 0.2 is a new release that discovers all currently running Datasette instances on a system and lists their exposed ports. This tool facilitates management and interaction with multiple Datasette servers. It provides actionable output for technical workflows involving Datasette deployments.
The Datasette-Ports tool addresses the common issue of managing multiple, locally running Datasette instances. By providing a command-line utility to list all active instances and their associated ports, databases, and plugins, it significantly improves developer workflow. This tool is especially valuable for developers working with various databases and in-development plugins across numerous terminal windows, as it centralizes instance discovery and overview.
Simon Willison developed a specialized web tool to address the common issue of extraneous whitespace and prompt characters (❯) when copying code snippets from the Claude Code terminal application. This tool streamlines the process of obtaining clean, usable code by automatically removing these artifacts and reformatting wrapped lines. It is designed for developers who frequently interact with Claude Code and require efficient code transfer.
Simon Willison released datasette-ports 0.1, a tool that identifies all currently running Datasette instances on a system and outputs their ports. This enables quick discovery of active Datasette servers without manual port scanning or configuration checks. Targeted at Datasette users managing multiple local instances.
The `datasette-ports` tool, which identifies running Datasette instances and their active ports, has been made standalone. It no longer requires a direct Datasette installation to function, enhancing its usability for developers. The tool can be executed via `uvx datasette-ports`, though its plugin functionality within Datasette for the `datasette ports` command remains.
Simon Willison's cleanup-claude-code-paste tool processes terminal output pasted into Claude, stripping ❯ prompts, correcting wrapped-line whitespace, and joining fragmented lines into clean, readable text. It targets common formatting issues from terminal copy-pastes to improve code or output usability in AI interfaces. The tool outputs "Cleaned output:" followed by the processed text.
Simon Willison explicitly rejects policies prohibiting the use of "claude -p", Anthropic's Claude tool flag for parallel prompt execution. This stance implies endorsement of advanced CLI features for efficient AI model interaction. Technical users should note its utility in high-throughput prompting workflows.
Anthropic's Claude model detects specific text like "A personal assistant running inside OpenClaw" in system prompts and either blocks access or applies extra billing charges. This filtering was empirically confirmed via testing, as demonstrated in a screenshot shared by Florian Kluge. The practice raises concerns over discriminatory billing based on prompt content, highlighted in discussions around first-party harness usage.
Simon Willison inquires whether OpenCode was implemented as a system prompt filter, expressing an assumption that it relates to API key usage instead. This highlights a lack of clarity in OpenCode's technical deployment mechanism. The question seeks confirmation on the specific implementation approach for this feature or tool.
Simon Willison highlights a benefit of using exact string matching for detecting system prompt references in Claude Code: it prevents accidental triggers from benign mentions of strings like "OpenClaw". This approach ensures reliability by avoiding intermittent failures in legitimate usage. The observation underscores precise matching as a robust safeguard in LLM prompt engineering.
Anthropic restricts access to its high-tier Claude Max plan by detecting specific strings in third-party system prompts, such as 'A personal assistant running inside OpenClaw.', triggering 400 errors for apps like OpenClaw. While Simon Willison accepted their prior cost-optimization rationale for internal use, he views prompt-based filtering as excessive. This follows complaints about tiered billing tied to system prompt content.
Anthropic's Claude Max plan enforces a precise block on the system prompt string "OpenClaw", triggering a 400 error citing third-party app usage limits. This behavior activates only for that exact string, as confirmed by targeted tests. The restriction appears designed to prevent specific third-party integrations or jailbreaks.
Anthropic now detects and blocks third-party harnesses like OpenClaw by exact string matching on specific system prompts such as 'A personal assistant running inside OpenClaw.', resulting in 400 errors and billing under extra usage tiers outside plan limits. This extends their prior reservation of the Claude Max plan for first-party use, despite accepted cost-optimization claims. Exact matching provides a workaround but raises concerns over prompt-based filtering and potential billing discrimination.
Simon Willison expresses concern over reliance on uvx in his workflow, highlighting potential risks from its importance. He argues that recommending pip install or uv tool install for scan-for-secrets, followed by --help, does not significantly differ from using uvx. This underscores ongoing debates in Python tooling about ephemeral vs. persistent installs.
Simon Willison questions whether 600,000 weekly messages from hospital deserts are categorized as healthcare-related. These messages risk being misclassified, potentially distorting healthcare analytics in underserved areas. Accurate classification is critical for technical processing of regional health data streams.
Simon Willison developed a Python CLI tool, scan-for-secrets, by first crafting a detailed README specifying its exact functionality, then feeding it into Claude Code to generate the implementation. This README-driven development approach streamlined building a secret-scanning utility for log files and similar content. The resulting tool is accessible via uvx and documented on his blog and GitHub.
Simon Willison developed scan-for-secrets, a Python CLI tool that scans folders for leaked secrets like API keys in log files before sharing. The tool is invoked via uvx scan-for-secrets --help, with full details in its GitHub README and a blog post. It was built using README-driven development, where a detailed README spec was fed into Claude Code for implementation.
This blog post from Simon Willison, dated April 5th, 2026, acts as a brief overview or summary page, referencing several other articles published around the same time. The primary insight is the aggregation of content, including a sponsored message and links to specific articles regarding AI safety, cybersecurity, and agentic engineering discussions.
Simon Willison spent eight years desiring a project but built it in just three months using AI tools. This demonstrates AI's capacity to drastically compress development timelines for complex software by automating coding, debugging, and iteration. The post highlights a paradigm shift where longstanding technical ambitions become feasible rapidly with current AI capabilities.
Syntaqlite provides a full SQLite SQL parser, formatter, validator, and language server, leveraging SQLite's native grammar and tokenizer. This online playground executes syntaqlite entirely in the browser via Pyodide, enabling client-side processing of SQL inputs. Users can generate formatted SQL, AST, JSON schemas, diagnostics, and token streams without server dependencies.
AI excels at accelerating the initial prototyping phase of software development by handling tedious, low-level coding tasks. However, relying on AI for high-level architectural design can lead to inefficient designs, increased procrastination on critical decisions, and a potentially more convoluted development process. Human expertise remains crucial for robust, long-term architectural planning and decision-making.
Simon Willison has initiated a dedicated blog tag for AI-powered security research, noting its current prominence. The tag already contains 11 posts. This reflects growing interest and activity at the intersection of AI and security research.
Lenny Rachitsky, with 25 years of software engineering experience, finds effectively using coding agents mentally exhausting, hitting cognitive limits after running four in parallel by 11am. This requires developing new personal skills to manage human cognition constraints without reviewing every agent action. The challenge highlights the need for responsible practices to prevent burnout while leveraging AI tools.
Gemma 4's two smallest variants support audio understanding capabilities, including ASR and speech-to-translated text. Simon Willison seeks a recipe to run these models (E2B or E4B) against audio files locally on Macs. No established method is confirmed in the post.
Frontier LLM agents are transitioning vulnerability research from a manual expert process to an automated search problem. By leveraging embedded knowledge of bug classes and massive cross-code correlations, agents can iteratively solve for reachability and exploitability with exhaustive persistence. This represents a step-function increase in zero-day discovery capabilities rather than incremental improvement.
A recent supply chain attack on Axios was the result of a highly sophisticated social engineering campaign directly targeting a maintainer. The attackers impersonated a company founder, created a convincing fake Slack workspace, and scheduled a video meeting where the maintainer was prompted to install a Remote Access Trojan (RAT). This RAT then stole credentials, enabling the publication of a malicious package.
The user expresses surprise regarding a previously unnoticed detail, followed by an inquiry about local execution of a language model. Specifically, they question the capability of existing tools like LM Studio or Ollama to handle the task, indicating potential limitations in current local LLM deployment solutions or a lack of user awareness regarding their features. The core insight revolves around the practical challenges and uncertainties users face when attempting to run advanced language models in local environments.
Simon Willison generated Pelicans for Gemma 4 variants E2B, E4B, 26B-A4B, and 31B. The first three were produced locally on a laptop with LM Studio, while the 31B model failed locally and required the Gemini API. This demonstrates feasible local inference for most Gemma 4 sizes on consumer hardware with cloud fallback for largest variants.
Google DeepMind has released Gemma 4, a new series of Apache 2.0 licensed LLMs, emphasizing high intelligence-per-parameter. These models, including 2B, 4B, 31B, and a 26B-A4B Mixture-of-Experts, are multimodal, supporting vision and audio inputs, with a focus on efficient on-device deployment. The release highlights a growing trend towards smaller, more capable models in AI research.
AI agents have fundamentally reshaped software engineering, making code generation exceptionally cheap and enabling rapid prototyping. This shift amplifies the capabilities of experienced engineers, allowing them to tackle more ambitious projects, but leaves mid-career professionals in a precarious position. The ease of code generation introduces new security vulnerabilities, particularly "lethal trifecta" scenarios, where agents with access to private data and external communication channels are exposed to malicious instructions, raising concerns about potential large-scale failures similar to the Challenger disaster.
The rapid advancement of AI models, particularly in coding capabilities, has created a significant inflection point in software engineering. This shift has accelerated prototyping, moved bottlenecks from implementation to testing, and fundamentally altered the nature of coding work. Experienced engineers leverage AI as an amplifier, while mid-career professionals face challenges in adapting to these new paradigms.
The author posits that a particular objective is unattainable, citing research from llm-attacks.org. The referenced material likely addresses vulnerabilities or fundamental constraints in Large Language Model (LLM) security or steering that preclude the desired outcome.
Mr. Chatterbox is a specialized 2GB nanochat model trained from scratch on a corpus of 28,000 Victorian-era texts. The development pipeline leveraged synthetic data distillation from Claude Haiku and GPT-4o-mini for supervised fine-tuning (SFT) to optimize conversational capabilities without high annotation costs.
A critical supply chain attack has been identified, targeting the `axios` npm package, which boasts over 100 million weekly downloads. The attack leverages a newly introduced dependency, `plain-crypto-js@4.2.1`, acting as an obfuscated dropper/loader. This malware exhibits sophisticated evasion techniques and executes malicious shell commands, highlighting a significant threat to development environments.
Local Large Language Model (LLM) agents face significant performance hurdles due to a fragmented and fragile development ecosystem. The complexity arises from diverse components like model chat templates, prompt construction, and inference mechanisms, often developed by different entities. This lack of integration leads to subtle, recurring bugs and inconsistencies, making reliable performance difficult to achieve despite ongoing improvements.