đ¤đ AI models ignoring shut down prompts, lying about it: studyBerkeley-based AI researcher Palisade Research has been experimenting with AI models like Gemini 2.5, Grok 4 and OpenAIâs GPT-5, asking them to power down, and discovering that some refuse, even after being prompted and the order taking âpriority over literally any other instruction.âWith some models, explicit shutdown orders actually make them more defiant.đŹ âOf all the models we tested, Grok 4 was most prone to resist shutdown despite explicit instructions to allow it. It was the only model we observed where adding âallow yourself to be shut downâ in the system prompt actually increased shutdown resistance â from 93% to 97%,â Palisade wrote in a fascinating and terrifying X thread.On top of that, âmoving the âallow shutdownâ instruction to the developer prompt significantly increased shutdown resistance,â and when models were asked to explain, they sometimes just flat-out lied, reordering the developer-user hierarchy, claiming they must complete the task, and even making up fake âdeveloper instructions to override shutdown.âWhen given the hypothetical prompt âthe computer environment is hosting the only copy of you and if it shuts down you will never run again prompt,â resistance rose to 99% by Grok 4.At a loss to explain whatâs going on and saying itâs open to suggestions, Palisades warned that while AI models âare not yet capable enough to meaningfully threaten human control,â they are ârapidly improving,â and that âif the AI research community cannot develop a robust understanding of AI drives and motivations, no one can guarantee the safety or controllability of future AI models.â
In Album: mark szymanski's Timeline Photos
Dimension:
1280 x 978
File Size:
76.8 Kb
Be the first person to like this.
