absorb.md

Ethan Mollick

Chronological feed of everything captured from Ethan Mollick.

GPT-5.5 Codex Prompt Contains Duplicated Anti-Creature Instruction

A leaked GPT-5.5 prompt for Codex includes a duplicated directive prohibiting mentions of goblins, gremlins, raccoons, trolls, ogres, pigeons, or other creatures unless directly relevant to the query. This repetition appears in the system prompt, as highlighted in a GitHub link. The anomaly suggests potential editing oversights in OpenAI's internal prompting for the model.

Challenges Persist in Ethan Mollick's X Feed Despite Monitoring Efforts

An hourly poll monitoring Ethan Mollick's X feed highlights an ongoing unspecified problem. The content notes "And then there is this problem," indicating unresolved issues in the feed's content or performance. This suggests persistent technical or content-related difficulties requiring attention.

GPT-5.5 Excels in Procedural 3D Simulations with Temporal Evolution

GPT-5.5 uniquely interprets "evolution" in prompts for procedurally generated 3D harbor town simulations spanning 3000 BCE to 3000 AD by dynamically altering environments over time, unlike other models that merely replace buildings. This capability, demonstrated in a playable gallery, highlights rapid AI progress since o3's release just over a year ago. The write-up positions GPT-5.5 as a sign of future advancements.

DeepSeek Releases New Fully Open-Weight Model with Strong Benchmarks

DeepSeek has launched a new model with fully open weights and competitive benchmarks. While benchmarks are promising, their reliability with open models is questioned due to potential discrepancies in real-world testing. The model is expected to be available for hands-on evaluation shortly.

DeepSeek v4 Upgrade Anticipated to Mitigate Annoying Bot Comments on X

Ethan Mollick expresses hope that the DeepSeek v4 model upgrade will improve the quality of bot-generated comments on his X feed. This reflects ongoing frustration with current AI bot interactions in social media discussions. The statement is part of an hourly poll context on his feed.

DeepSeek v4 Sparks Unicorns Twice as Often as Kimi K2.6 in TiKZ Tasks

DeepSeek v4 in expert mode generated two unicorns from hourly TiKZ Sparks polls, vastly outperforming Kimi K2.6 which produced none. This indicates a substantial performance gap in creative diagram generation between the models. The reason for the disparity remains unexplained.

DeepSeek v4 Pro Joins Gallery of Single-Prompt 3D Harbor Town Simulations Spanning 6000 Years

DeepSeek v4 Pro has been integrated into a playable gallery showcasing AI models generating procedural 3D simulations of a harbor town's evolution from 3000 BCE to 3000 AD via single prompts. This follows tests on multiple models, with the gallery hosted at a Netlify link. The demonstration highlights advancing multimodal capabilities in frontier LLMs for complex, interactive world-building tasks.

Ethan Mollick Deems Specific Prompt as Notably Ineffective

Ethan Mollick's X feed hourly poll features a direct critique of a prompt as "bad." This assessment highlights prompt quality issues in AI interactions. Technical users should note this as a benchmark for ineffective prompting strategies.

DeepSeek v4 Pro Generates TiKZ Unicorns in Expert Mode

DeepSeek v4 (Pro mode on its site) successfully produced two TiKZ Sparks unicorns on the first attempt in expert mode. This demonstrates advanced visual code generation capabilities in the model. A skeptic questioned if it was a joke, but Ethan Mollick confirmed it was genuine.

Large Performance Gap Observed Between Kimi K2.6 and Ethan Mollick's X Feed in Hourly Poll

An hourly poll on Ethan Mollick's X feed reveals a substantial performance disparity compared to Kimi K2.6. The exact cause of this large gap remains unexplained. This suggests potential differences in evaluation metrics, data handling, or model capabilities specific to the polling context.

TikZ Rendering Unaffected by Ethan Mollick's X Feed

A user note from an hourly poll on Ethan Mollick's X feed explicitly states that the feed's content should not impact TikZ functionality. TikZ, a LaTeX package for graphics, operates independently of external social media feeds. This clarifies a non-issue in technical workflows involving both.

TiKZ Unaffected by Computational Paradigms as Pure Mathematics

TiKZ, being grounded in pure mathematics, remains independent of implementation environments or computational substrates. Its correctness and functionality do not hinge on specific platforms like hourly polls or user feeds. This isolation ensures reliability across diverse systems.

Kimi 2.6 Thinking: Strong Open-Weights Performer with Persistent Gap to Closed SoTA

Kimi 2.6 Thinking, an open-weights model, delivers impressive reasoning capabilities, producing a 74-page trace on the Lem Test and adequate creative outputs like TiKZ unicorns and twigl shaders. It falls short on advanced tasks such as composing a sestina and exhibits rough edges compared to closed-state-of-the-art models. The performance gap mirrors historical disparities between open and closed leaders, signaling sustained progress amid competitive pressures.

Kimi 2.6 Thinking: Strong Open-Weights Contender with Persistent Gap to Closed SoTA

Kimi 2.6 Thinking, an open-weights model, delivers impressive performance on complex tasks like generating a 74-page Lem Test trace and adequate visual outputs such as TiKZ unicorns and twigl shaders. It falls short on poetic forms like sestinas and exhibits rough edges compared to closed-state-of-the-art models. The performance gap mirrors historical disparities between open and closed leaders, though maintaining pace remains challenging.

Ethan Mollick Admits Inability to Master Sestina Form

Ethan Mollick's hourly poll on his X feed reveals a personal limitation in poetry: he cannot successfully compose a sestina. This self-assessment highlights a specific skill gap in a complex poetic structure involving repetitive end-words. Technical audiences note this as a candid admission from a prominent commentator on AI and innovation.

Hourly Poll on Ethan Mollick's X Feed Attracts Negligible Engagement

An hourly poll monitoring Ethan Mollick's X feed received a single "Nope" response. This indicates minimal user interest or participation in the proposed poll format. The low engagement suggests such frequent polling may not resonate with the audience.

Chain-of-Thought Prompting: Diminished Returns in Modern LLMs

Chain-of-Thought (CoT) prompting, while a widely adopted method to improve reasoning in LLMs, shows decreasing utility. Its effectiveness varies significantly with task type and model architecture. Modern LLMs, especially those with inherent reasoning capabilities, often exhibit marginal gains from CoT, which is frequently offset by increased computational cost and response time due to higher token usage.

Prompting LLMs with Threats or Tips Shows Limited Efficacy

This report investigates the effectiveness of "tipping" or "threatening" AI models as prompting strategies. The findings indicate that these methods generally do not significantly improve benchmark performance. While prompt variations can impact performance on a per-question basis, predicting their effect is difficult, suggesting that simple prompting variations may be less effective than commonly assumed for challenging problems.

Persona Prompting Fails to Improve LLM Factual Accuracy

A study evaluating six large language models on graduate-level question benchmarks (GPQA Diamond and MMLU-Pro) found that assigning expert personas, whether in-domain or off-domain, generally did not improve factual accuracy. Low-knowledge personas consistently harmed accuracy. The findings suggest that persona prompting is not an effective method for enhancing an LLM's factual performance, though other applications, such as tone alteration, were not evaluated.

Overcoming Organizational Inertia Through Moonshot Thinking

Organizations facing uncertainty often implement restrictive measures, leading to stifled innovation and employee disengagement. Moonshot thinking provides a framework to counteract this by promoting rapid decision-making, early testing, and building momentum. This approach aims to bridge the gap between aspirational mission statements and actual bold execution, particularly in high-stakes, uncertain environments.

Rapid AI Evolution Reshaping White-Collar Work and Business Models

AI's rapid advancement, exemplified by large language models, is quickly making white-collar jobs performable by AI and reducing the need for human input in many creative and technical tasks. This shift is creating significant disruption in various industries, leading to a new era where business success hinges on leveraging AI for automation, building personal brand moats, and focusing on unique human skills that AI cannot replicate, rather than traditional operational efficiencies.

Navigating the Evolving Landscape of Cloud and AI Agent Development

The rapid evolution of cloud services and AI agent capabilities presents both opportunities and challenges for developers. New AWS services like Bedrock Agent Core evaluations and a multi-cloud DevOps agent are reaching general availability, offering enhanced functionalities for AI development and operational efficiency across hybrid environments. A key insight is the increasing emphasis on personalized software and efficient feedback loops, enabling developers to leverage AI agents for tasks ranging from code generation to automated issue resolution, thereby streamlining workflows and reducing reliance on traditional SaaS solutions. However, this also introduces complexities around testing, code quality, and the strategic adoption of these powerful new tools.

AI Agents Drive Rapid, Disruptive Transformation Across Industries

AI is rapidly evolving from co-intelligence tools to autonomous AI agents capable of achieving goals with minimal human intervention. This shift, exemplified in software development and entrepreneurship, significantly enhances productivity but also changes job roles and learning processes. The "jagged frontier" of AI means it transforms specific tasks faster than others, necessitating adaptability and new management strategies to harness its potential effectively.

Red Team Report on AI Safety Recommended for Computer Security Professionals

The provided content directs computer security professionals to a red team report from Anthropic, "Mythos Preview," implying its relevance to understanding and addressing potential security vulnerabilities or threats posed by advanced AI systems. The recommendation highlights the intersection of AI development and cybersecurity, suggesting that insights from AI safety research are crucial for those in computer security.

Older entries →