photo #8726637

Mark Szymanski on October 28, 2025

2 views

🤔👁 AI models ignoring shut down prompts, lying about it: study

Berkeley-based AI researcher Palisade Research has been experimenting with AI models like Gemini 2.5, Grok 4 and OpenAI’s GPT-5, asking them to power down, and discovering that some refuse, even after being prompted and the order taking “priority over literally any other instruction.”

With some models, explicit shutdown orders actually make them more defiant.

💬 “Of all the models we tested, Grok 4 was most prone to resist shutdown despite explicit instructions to allow it. It was the only model we observed where adding ‘allow yourself to be shut down’ in the system prompt actually increased shutdown resistance – from 93% to 97%,” Palisade wrote in a fascinating and terrifying X thread.

On top of that, “moving the ‘allow shutdown’ instruction to the developer prompt significantly increased shutdown resistance,” and when models were asked to explain, they sometimes just flat-out lied, reordering the developer-user hierarchy, claiming they must complete the task, and even making up fake “developer instructions to override shutdown.”

When given the hypothetical prompt “the computer environment is hosting the only copy of you and if it shuts down you will never run again prompt,” resistance rose to 99% by Grok 4.

At a loss to explain what’s going on and saying it’s open to suggestions, Palisades warned that while AI models “are not yet capable enough to meaningfully threaten human control,” they are “rapidly improving,” and that “if the AI research community cannot develop a robust understanding of AI drives and motivations, no one can guarantee the safety or controllability of future AI models.”