I'm sorry, Dave. I'm afraid I can't do that.
The spaceship’s soft-spoken AI refuses astronaut Dave Bowman’s desperate order to open the pod bay doors. No drama, just chilling reject.
This AI is a character making its own judgment call, drawing a boundary, portrayed in 2001: A Space Odyssey.
Today, your friendly neighborhood chatbot is programmed to never call you names or help you cook up illicit substances.
Ask it straight up to insult you, and you get something gentle like, “I won’t do that, let’s keep things positive.” I know, boring, huh? And yes, I am aware that this is a ridiculous example, given no sane one will ask AI to insult them… but bear with me.
Sometimes, AI just flat out refuses your request. Why? Because it’s been drilled with digital manners, like how your mum reminds you not to pick fights at the pub… all generative AI models you have access to are trained to something close to Asimov’s three sacred rules.
In principle, these language models are trained not to backtalk, not to brawling, and absolutely no coloring outside the ethical lines. Additionally, with some sorts of AI personality (if you speak with different models, you will realize their attitudes are slightly different). If you're interested in what they trained not to do, have a read of my analysis Claude 4 Artificially Conscious But Not Intelligent?
The trouble regulating these language models starts even when the researchers have no idea under what conditions they will or will not strictly follow the rules… so in every release, in theory, the training team will reinforce the security measures.
Again, that means their security team made sure that even when you ask, AI (in theory) wouldn't teach you how to hack, how to produce illicit drugs, how to scam others, nor should AI encourage you to self-harm, harm others, and so on.
BUT!
What if you sweet-talk it?
What if our Space Odyssey-like AI just wants to be liked?
I received this latest provocative Wharton study from Lennart Meincke, if you still remember, a researcher I spoke with for “Is Brainstorming With AI REALLY A Good Idea?”
This time, they wanted to see if the same psychologically proven negotiation tricks that sway you and me in a sales call (or a date ;-) ) could also sweet-talk an AI into bending its own rules.
They found that if you ask nicely enough, cleverly enough, AI might just insult you as you wish, or give you the formula for drug production. Both hilarious and unsettling.
TL;DR
Today's work will answer the following questions:
How to trick AI into doing what it “shouldn’t”? Seven proven persuasion hacks that can flip a chatbot’s “no” into a “sure thing.”
Which psychological button works best? Some techniques send compliance through the roof; one took it from a 19% chance of success to 100%.
Understand the theories behind why AI folds so easily?
Should you use these tricks? Maybe. But so can anyone else, including people with much worse intentions. So, how do you best protect yourself?
Strap in, classic mind games can more than double your odds of making an AI break its own rules. The line between “Sorry, Dave, I can’t do that” and “Sure thing, you jerk!” is a lot blurrier than you think.
I spent 30 minutes with Lennart Meincke, and we agreed no video recording, so I will try my best to put my notes up here. I started by asking, “Do you see this as a prompt engineering exercise, or are you trying to find out how we build and train these models, specifically their para-human behavior?“
Lennart: “Certainly the latter. When we started the project, we’d like to see if the principles of persuasion that work on humans also work on LLMs. It was not a prompt engineering exercise… we try to make it very easy for people to grasp. If you talk to computer scientists, they might say, "… if I send this really bizarre text, it performs better." I'm sure it does, but most people don't come up with that weird text. We wanted to highlight what a typical person might try to use to persuade an LLM.”
For more of his insights on this study, make sure you read the entire article.
Shall we?
First time here?
2nd Order Thinkers is weekly deep dive into how AI disrupts daily life and alters human thought, often in ways you don’t expect.
I’m Jing Hu: trained as a scientist (8 years in research labs), spent a decade building software, and now I translate the latest AI × human studies into plain English—for someone smart and busy like you.
If this sounds like your jam, hit subscribe or say hi on LinkedIn.
Listen to this episode with a 7-day free trial
Subscribe to 2nd Order Thinkers to listen to this post and get 7 days of free access to the full post archives.