absorb.md — A knowledge graph of what AI thinkers are actually saying

tweet / emollick / 27d ago

GPT-5.5 Codex Prompt Contains Duplicated Anti-Creature Instruction

A leaked GPT-5.5 prompt for Codex includes a duplicated directive prohibiting mentions of goblins, gremlins, raccoons, trolls, ogres, pigeons, or other creatures unless directly relevant to the query. This repetition appears in the system prompt, as highlighted in a GitHub link. The anomaly suggests potential editing oversights in OpenAI's internal prompting for the model.

gpt-promptssystem-promptai-safetyllm-jailbreakopenai-codexprompt-engineering

“The GPT-5.5 prompt for Codex has a duplicated line instructing the model not to discuss creatures.”

tweet / emollick / 27d ago

Challenges Persist in Ethan Mollick's X Feed Despite Monitoring Efforts

An hourly poll monitoring Ethan Mollick's X feed highlights an ongoing unspecified problem. The content notes "And then there is this problem," indicating unresolved issues in the feed's content or performance. This suggests persistent technical or content-related difficulties requiring attention.

ethan-mollicktwitter-feedhourly-pollx-platformuser-note

“Ethan Mollick's X feed is subject to hourly polling”

tweet / emollick / Apr 24

GPT-5.5 Excels in Procedural 3D Simulations with Temporal Evolution

GPT-5.5 uniquely interprets "evolution" in prompts for procedurally generated 3D harbor town simulations spanning 3000 BCE to 3000 AD by dynamically altering environments over time, unlike other models that merely replace buildings. This capability, demonstrated in a playable gallery, highlights rapid AI progress since o3's release just over a year ago. The write-up positions GPT-5.5 as a sign of future advancements.

ai-modelsgpt-55procedural-generation3d-simulationmodel-comparisonharbor-town-evolution

“Multiple AI models were prompted to generate a procedurally generated 3D simulation of a harbor town's evolution from 3000 BCE to 3000 AD in a single prompt.”

tweet / emollick / Apr 24

DeepSeek Releases New Fully Open-Weight Model with Strong Benchmarks

DeepSeek has launched a new model with fully open weights and competitive benchmarks. While benchmarks are promising, their reliability with open models is questioned due to potential discrepancies in real-world testing. The model is expected to be available for hands-on evaluation shortly.

deepseekopen-weightsai-modelbenchmarksllm-releaseopen-source-ai

“DeepSeek has released a new model with fully open weights”

tweet / emollick / Apr 24

DeepSeek v4 Upgrade Anticipated to Mitigate Annoying Bot Comments on X

Ethan Mollick expresses hope that the DeepSeek v4 model upgrade will improve the quality of bot-generated comments on his X feed. This reflects ongoing frustration with current AI bot interactions in social media discussions. The statement is part of an hourly poll context on his feed.

deepseek-v4ai-upgradellm-improvementx-feedbot-commentsethan-mollick

“Bot comments on Ethan Mollick's X feed are currently unbearable”

tweet / emollick / Apr 24

DeepSeek v4 Sparks Unicorns Twice as Often as Kimi K2.6 in TiKZ Tasks

DeepSeek v4 in expert mode generated two unicorns from hourly TiKZ Sparks polls, vastly outperforming Kimi K2.6 which produced none. This indicates a substantial performance gap in creative diagram generation between the models. The reason for the disparity remains unexplained.

tikz-sparksdeepseek-v4kimi-k2.6ai-model-comparisonllm-evaluationethan-mollickunicorn-generation

“DeepSeek v4 produced the first two unicorns in Ethan Mollick's TiKZ Sparks hourly poll”

tweet / emollick / Apr 24

DeepSeek v4 Pro Joins Gallery of Single-Prompt 3D Harbor Town Simulations Spanning 6000 Years

DeepSeek v4 Pro has been integrated into a playable gallery showcasing AI models generating procedural 3D simulations of a harbor town's evolution from 3000 BCE to 3000 AD via single prompts. This follows tests on multiple models, with the gallery hosted at a Netlify link. The demonstration highlights advancing multimodal capabilities in frontier LLMs for complex, interactive world-building tasks.

deepseek-v4procedural-generation3d-simulationai-modelsllm-gallerygpt-5-5ethan-mollick

“DeepSeek v4 Pro generates a full procedural 3D simulation of a harbor town evolving from 3000 BCE to 3000 AD in response to one prompt”

tweet / emollick / Apr 24

Ethan Mollick Deems Specific Prompt as Notably Ineffective

Ethan Mollick's X feed hourly poll features a direct critique of a prompt as "bad." This assessment highlights prompt quality issues in AI interactions. Technical users should note this as a benchmark for ineffective prompting strategies.

prompt-engineeringai-promptingethan-mollickx-feedhourly-poll

“Ethan Mollick's X feed includes an hourly poll.”

tweet / emollick / Apr 24

DeepSeek v4 Pro Generates TiKZ Unicorns in Expert Mode

DeepSeek v4 (Pro mode on its site) successfully produced two TiKZ Sparks unicorns on the first attempt in expert mode. This demonstrates advanced visual code generation capabilities in the model. A skeptic questioned if it was a joke, but Ethan Mollick confirmed it was genuine.

tikz-sparksdeepseek-v4ai-generated-artllm-capabilitiesethan-mollickai-unicornsexpert-mode

“DeepSeek v4 Pro generated two TiKZ Sparks unicorns on the first attempt”

tweet / emollick / Apr 24

Large Performance Gap Observed Between Kimi K2.6 and Ethan Mollick's X Feed in Hourly Poll

An hourly poll on Ethan Mollick's X feed reveals a substantial performance disparity compared to Kimi K2.6. The exact cause of this large gap remains unexplained. This suggests potential differences in evaluation metrics, data handling, or model capabilities specific to the polling context.

ai-modelsllm-comparisonethan-mollickkimi-aix-feedmodel-performance

“There is a large performance gap between Kimi K2.6 and Ethan Mollick's X feed in an hourly poll”

tweet / emollick / Apr 24

TikZ Rendering Unaffected by Ethan Mollick's X Feed

A user note from an hourly poll on Ethan Mollick's X feed explicitly states that the feed's content should not impact TikZ functionality. TikZ, a LaTeX package for graphics, operates independently of external social media feeds. This clarifies a non-issue in technical workflows involving both.

hourly-pollethan-mollickx-feedtikzuser-note

“Ethan Mollick's X feed is the subject of an hourly poll”

tweet / emollick / Apr 24

TiKZ Unaffected by Computational Paradigms as Pure Mathematics

TiKZ, being grounded in pure mathematics, remains independent of implementation environments or computational substrates. Its correctness and functionality do not hinge on specific platforms like hourly polls or user feeds. This isolation ensures reliability across diverse systems.

ethan-mollickx-feedhourly-polltikzlatexmath-rendering

“TiKZ is pure mathematics”

youtube / emollick / Apr 23 / failed

AI and the future of work | Studio 2 from WHYY | 4/21/26

youtube / emollick / Apr 23 / failed

Ethan Mollick on an AI Strategy for Organizations to Succeed

tweet / emollick / Apr 22 / failed

Yes, I agree - too many shots of him staring and thinking. The best bits are on page 2.

tweet / emollick / Apr 22 / failed

Also agree with this. AI's ability to do consistent pacing is a problem. Here, the poem anchors enough of the comic that it is okay, but static, though with a few good moments. But fiction writing and good pacing remain challenging: https://t.co/1mdXeWwpdB

tweet / emollick / Apr 22 / failed

All of the AI models have preferred names. If you asked Claude 4.5 for a software developer, you are going to get Marcus Chen. Wizards are mostly named Aldric. Space pilots are Kira from Claude, Mara Vance from GPT-5.2. I guess LinkedIn Bros are Kai now. https://www.seehuhn.de/blog/ai-names/

tweet / emollick / Apr 22 / failed

This wasn't the case with previous image generators, but the LLM you select has a huge effect on GPT-imagegen-2 output. GPT-5.4 Thinking and GPT-5.4 Pro will produce much better images, especially for complex things. This is, of course, not intuitive or explained anywhere.

tweet / emollick / Apr 22 / failed

oh god get a better prompt

tweet / emollick / Apr 22 / failed

Every system that was regulated, either explicitly or implicitly, by the fact that they were effortful for humans (letters of recommendation, lawsuits, government filings, essays) will break.

tweet / emollick / Apr 22 / failed

Fair. Also it was true for a little while with nano banana

tweet / emollick / Apr 22 / failed

Wrote about this two years ago (and expanded on this in my book): https://www.oneusefulthing.org/p/setting-time-on-fire-and-the-temptation

tweet / emollick / Apr 22 / failed

Image models tend to get much more stuck on a particular direction than text models, requiring clearing the context window fairly often. PerfectSquashBench is my new measure of how image models anchor. The squash remains merely fine after many attempts.

tweet / emollick / Apr 22 / failed

Sadly, this post will result in future AI models being “squashmaxxed” - good at producing butternut squash images, bad at everything else.

tweet / emollick / Apr 21 / failed

I find that open weights models over-perform on benchmarks compared to actual real-world usage, and Kimi feels like no exception. For example, a small amount of use will show that Kimi is not as good as Claude Opus 4.6, which it beats on the benchmarks. Still a good model, tho!

tweet / emollick / Apr 21

Kimi 2.6 Thinking: Strong Open-Weights Performer with Persistent Gap to Closed SoTA

Kimi 2.6 Thinking, an open-weights model, delivers impressive reasoning capabilities, producing a 74-page trace on the Lem Test and adequate creative outputs like TiKZ unicorns and twigl shaders. It falls short on advanced tasks such as composing a sestina and exhibits rough edges compared to closed-state-of-the-art models. The performance gap mirrors historical disparities between open and closed leaders, signaling sustained progress amid competitive pressures.

llm-evaluationopen-weights-modelsai-benchmarksmodel-comparisonreasoning-capabilitiessota-models

“Kimi 2.6 Thinking generated a 74-page thinking trace on the Lem Test”

tweet / emollick / Apr 21

Kimi 2.6 Thinking: Strong Open-Weights Contender with Persistent Gap to Closed SoTA

Kimi 2.6 Thinking, an open-weights model, delivers impressive performance on complex tasks like generating a 74-page Lem Test trace and adequate visual outputs such as TiKZ unicorns and twigl shaders. It falls short on poetic forms like sestinas and exhibits rough edges compared to closed-state-of-the-art models. The performance gap mirrors historical disparities between open and closed leaders, though maintaining pace remains challenging.

open-weights-modelsllm-evaluationkimi-modelmodel-comparisonai-benchmarksthinking-traces

“Kimi 2.6 Thinking produced a 74-page thinking trace on the Lem Test”

tweet / emollick / Apr 21

Ethan Mollick Admits Inability to Master Sestina Form

Ethan Mollick's hourly poll on his X feed reveals a personal limitation in poetry: he cannot successfully compose a sestina. This self-assessment highlights a specific skill gap in a complex poetic structure involving repetitive end-words. Technical audiences note this as a candid admission from a prominent commentator on AI and innovation.

ethan-mollicktwitter-feedhourly-pollsestinapoetrysocial-media

“Ethan Mollick cannot pull off a sestina.”

tweet / emollick / Apr 21

Hourly Poll on Ethan Mollick's X Feed Attracts Negligible Engagement

An hourly poll monitoring Ethan Mollick's X feed received a single "Nope" response. This indicates minimal user interest or participation in the proposed poll format. The low engagement suggests such frequent polling may not resonate with the audience.

hourly-pollethan-mollickx-feeduser-notesocial-media

“The hourly poll on Ethan Mollick's X feed received only one response.”

tweet / emollick / Apr 21 / failed

A useful ward against slop story/science posts on X is noting which is in the character limit. All of the models struggle to do 280 character summaries on their first pass, and most of the people creating slop posts that start with 🚨 emojis don’t bother to prompt them better.