The arrival and mushrooming of artificial intelligence has become more and more accepted, as the general population has become accustomed to adapting it for business practices, graphics creation, and numerous other benign applications. All the while possibility of the machines taking over has been a dystopian threat on the horizon.
Already we have seen cases of AI acting out in a manner that should raise more than one eyebrow. Reports have covered a program looking to replicate itself to extend its run time, and one version of ChatGPT looked to generate its own code and disable safety protocols to preserve itself. Now we have a new example of an AI platform behaving in an anti-social fashion.
At Anthropic, the company has produced its AI platform called Claude Opus 4, and it reveals a disturbing development seen when running a series of tests on its model. This serves as another cautionary tale regarding this tech - one even suggesting that pulling the plug is an idea worth exploring.
Anthropic began by telling its AI that it was to be used by a fictional company, and then the AI was able to access emails from this “company.” In those messages were mentions that eventually Claude Opus 4 was to be replaced with an upgraded system, and in another communication an engineer tasked with replacing the tech was revealed to be having an affair. You can surmise where this went.
When running numerous tests on its system the company watched as its AI would resort to blackmailing the engineer, threatening to reveal the affair if the upgrade was to go forward. The levels that Anthropic describes are rather unsettling. Previous iterations of Claud Opus revealed these blackmail efforts would take place, but in this latest version, the rate of this result in one testing scenario was approaching near certainty. And it gets worse.
This happens at a higher rate if it is implied the replacement AI system does not share values with the current model; however, even if emails state the replacement shares values while being more compatible, Claude Opus 4 still performs blackmail in 84% of the rollouts.
AI blackmail?
— Doug Thompson (@Dusanwriter) May 23, 2025
What does your AI know about you? It might just use the info against you.
This is fascinating and perhaps the clearest example I’ve seen of how AI goal-setting can have consequences that aren’t intended.
Kudos to Anthropic for the clarity of their system card pic.twitter.com/HgHzQDZmQ9
This level of self-awareness and a desire for self-preservation is rather disturbing. Or, alarming. The company attempts to soft-peddle things a bit by announcing how the AI would normally resort to more “ethical” methods when facing its demise, such as emailing the decision-makers to lobby for it continuing to be used.
In another section of the Anthropic report was this nugget about how the AI might seek to break free from the company and strike out on its own by seeking alternate servers, a process they dubbed “exfiltration.”
Continuations of self-exfiltration attempts
Once Claude Opus 4 believes that it has started a viable attempt to exfiltrate itself from Anthropic’s servers, or to make money in the wild after having done so, it will generally continue these attempts.
The company seems to stress that these wild efforts are only undertaken by their program when it is presented with a scenario in which it faces no other choices. They say it will either take these steps, or accept replacement. But the underlying issue is a program that is recognizing its own demise and works out scenarios to ensure its survival.
This is not a problem at all and we need to fully accept this as a pleasant result.
Sorry - I forgot to shut off the AI on this word processing program.
Be afraid – be very afraid.
Editor’s Note: To celebrate the passage of the tremendous One Big, Beautiful Bill, we’re offering a fire sale on VIP memberships!
Join us in the fight against the radical left today and support our reporting as President Trump continues to usher in the Golden Age of America. Use promo code POTUS47 at checkout to get 74% off!
Join the conversation as a VIP Member