absorb.md

AI Security

AI security encompasses the dual aspects of leveraging artificial intelligence to bolster cybersecurity defenses, such as malware scanning in software repositories, and safeguarding AI systems against novel threats like data leakage and prompt injection. Recent activity highlights a surge in AI-powered security research, with tools like local LLMs recommended for sensitive applications to mitigate risks from cloud-based coding agents. As agentic AI adoption grows in 2026, both opportunities and challenges in this evolving field are coming to the forefront.

Simon Willison5Tiago Forte2Anthropic2Ben Thompson1JPMorgan Chase1Marc Andreessen1Aaron Levie1Ravi Netravali1Vitalik Buterin1Garry Tan1gabriel1swyx1

# AI Security

AI security refers to the intersection of artificial intelligence and cybersecurity. This includes using AI for security (e.g., automated threat detection and malicious package scanning) and securing AI systems themselves against attacks such as prompt injection, data exfiltration, model poisoning, and unauthorized actions by autonomous agents. [2]

AI-Powered Security Tools and Research

Major package registries have begun implementing proactive AI defenses. PyPI employs AI-powered scans for malicious package patterns through an API accessible to scanning partners. This capability enabled the rapid quarantine of a suspicious package within one hour of publication, demonstrating the effectiveness of these defenses against emerging supply chain attacks. [1]

Expert Simon Willison launched a dedicated blog tag to track the surging trend of AI-powered security research, which already includes 11 posts as of early 2026. This reflects heightened attention to using AI tools for vulnerability discovery, with examples including AI identifying hundreds of issues in open-source software. [2]

Recent 2026 reports from Cisco, Darktrace, and others emphasize AI's role in anomaly detection, automated response, and countering AI-amplified cybercrime. [web:9][web:10]

Risks of AI Systems and Agents

Cloud-based AI coding agents present significant data security risks. Prompts and sensitive data snippets can leak through the model's context, effectively treating these agents like untrusted interns with access to sources. This applies to journalism, law firms, healthcare, and legal sectors. [3]

Local models, such as Qwen 3.5 running on well-specced laptops, offer a mitigation strategy for high-security use cases by avoiding prompt logging and subpoena risks associated with SaaS APIs. [3]

Broader concerns in 2026 include agentic AI as the 'next big insider risk,' with potential for unintended or influenced behaviors leading to security incidents. The 'lethal trifecta' of private data, untrusted content, and external actions exacerbates prompt injection vulnerabilities. [web:9]

2026 Trends

Industry analyses highlight AI amplifying both attacks and defenses. Key themes include commercialization of AI-assisted cybercrime, the need for AI governance frameworks, and securing the expanding attack surface of AI agents. While AI enhances vulnerability research, it also introduces new challenges requiring specialized approaches. [web:11][web:12]

Challenges and Mitigations

Organizations are adopting AI faster than they can secure it. Recommendations include shifting sensitive workloads to local inference for data residency, implementing strict controls on agentic systems, and continued investment in AI-native security tools.

Numbered to match inline [N] citations in the article above. Click any [N] to jump to its source.

  1. [1]PyPI Already Implements AI-Powered Malware Pattern Scanning via Partner APIstweet · 2026-03-24
  2. [2]Simon Willison Launches Blog Tag to Track Surging AI-Powered Security Research Trendtweet · 2026-04-04
  3. [3]AI Coding Agents Risk Leaking Sensitive Data; Local Models Mitigate for High-Security Use Casestweet · 2026-03-21
  4. [4]https://simonwillison.net/tags/ai-security-research/web
  5. [5]https://www.darktrace.com/blog/the-year-ahead-ai-cybersecurity-trends-to-watch-in-2026web
  6. [6]https://www.kiteworks.com/cybersecurity-risk-management/ai-cybersecurity-2026-trends-repor…web
  7. [7]https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/web
  8. [8]https://x.com/simonw/status/2036523236729692240X / Twitter
  9. [9]https://x.com/simonw/status/2040217884690145754X / Twitter
  10. [10]https://x.com/simonw/status/2035421451743072647X / Twitter

Temporal Concept Drift Systematically Degrades Adversarial Robustness in Android Malware Classifiers

A longitudinal study spanning over a decade of Android applications demonstrates that temporal distribution shift (concept drift) meaningfully erodes the adversarial robustness of malware detection models — not just their clean accuracy. Three deployment protocols were evaluated across multiple clas

Kelpie: Zero-Query Adversarial Mimicry Attacks Expose Fundamental Weaknesses in ML-Based Binary Function Classifiers

Kelpie is a black-box, query-free adversarial framework that executes targeted mimicry attacks against ML-based binary function classifiers — a significantly stronger threat model than prior evasion work, which typically required iterative classifier queries. By applying functionality-preserving cod

ArmSSL: Robust Black-Box Watermarking for Self-Supervised Learning Pre-trained Encoders

ArmSSL is a novel watermarking framework designed for self-supervised learning (SSL) encoders. It uniquely addresses black-box verifiability and adversarial robustness, two critical unmet needs in existing SSL watermarking techniques. The framework achieves this by employing paired discrepancy enlar

Data-Free Gradient Inversion MIA Breaches FL Privacy in Hardware Assurance Using SCLL Priors

Researchers present a data-free membership inference attack (MIA) on federated learning (FL) for image segmentation models in hardware assurance, using standard cell library layouts (SCLLs) as priors to guide gradient inversion and reconstruct client images from intercepted model updates. The attack

Standard Cell Library Enables Privacy Breach in Federated Learning for Hardware Assurance

DECIFR exploits domain knowledge from standard cell library layouts to perform a two-stage membership inference attack on federated learning in hardware assurance. It uses guided gradient inversion to reconstruct client training images from model updates without auxiliary data, with reconstruction f

GAAP: A Deterministic Execution Environment That Enforces AI Agent Data Privacy Without Trusting the Model

GAAP (Guaranteed Accounting for Agent Privacy) is a proposed execution environment that enforces user-defined privacy policies on AI agents deterministically — without requiring the AI model itself to be trustworthy or attack-free. It extends Information Flow Control (IFC) with persistent data store

Verifiable Gradient Inversion Attack Enhances Privacy Compromise in Federated Learning

Existing gradient inversion attacks in federated learning often produce incorrect reconstructions for tabular data without a method to verify success. This paper introduces a Verifiable Gradient Inversion Attack (VGIA) that provides an explicit certificate of correctness for reconstructed samples. V

Adversarial Patch Attacks Undermine Palmprint Recognition Systems

Deep palmprint recognition systems, despite their use in security-critical applications, are vulnerable to physically realizable adversarial patch attacks. The CAAP framework demonstrates that a universal, capture-aware patch, particularly with a cross-shaped topology, can effectively disrupt recogn

Anthropic Launches Project Glasswing for AI-Powered Software Vulnerability Detection

Anthropic's Project Glasswing leverages Claude Mythos Preview, a frontier AI model, to identify and remediate critical software vulnerabilities. This initiative, supported by major tech industry partners, aims to proactively secure essential infrastructure. The project emphasizes the defensive poten

AI Coding Agents Risk Leaking Sensitive Data; Local Models Mitigate for High-Security Use Cases

Coding agents on cloud models leak prompts and sensitive data snippets through context, akin to untrusted access. Simon Willison highlights local models like Qwen 3.5 on laptops as viable for sensitive journalism to avoid leaks. Thread extends risks to law firms, subpoenas in SaaS APIs, and sectors

AI Revolutionizes Software Security: Claude Opus 4.6 Uncovers Critical Firefox Vulnerabilities

AI models, specifically Claude Opus 4.6, are demonstrating advanced capabilities in identifying high-severity vulnerabilities in complex software like Mozilla Firefox. This collaboration between Anthropic and Mozilla showcased the AI's ability to rapidly discover critical flaws, significantly increa