2nd Order Thinkers
2nd Order Thinkers.
Goal-Based and Vague AI Prompts Drive 17x More Cheating
0:00
-25:46

Paid episode

The full episode is only available to paid subscribers of 2nd Order Thinkers

Goal-Based and Vague AI Prompts Drive 17x More Cheating

The latest Nature study of 500+ participants reveals how vague goal-setting interfaces enable plausible deniability and moral collapse.

Yesterday, you asked ChatGPT to help “optimize” your quarterly sales report. You gave last quarter’s numbers and said, “Make it compelling for the board meeting,” or to “align the earnings deck language with our growth narrative.” You didn’t explicitly ask it to inflate anything, but that’s exactly what it did.

Many speak with their AI assistant in ways not too far from these scenarios.

The latest Nature study is all about finding out if AI cheats when the prompt is vague and by how much.

The researchers gave 500+ participants a simple goal-setting dial for AI delegation, ”maximize accuracy” on one end, “maximize profit” on the other. 85% of the participants all go with one option, which inevitably hints to their AI that it is okay to cheat.

They didn’t write “lie for me.” Oh no, they just moved the dial all the way toward profit.

The AI took care of the rest, providing plausible deniability.

Not 15% dishonest. 15% honest.

Here’s the cost you have not priced in when you opt for profitability.

When people move from self-reporting to delegating AI with goal sliders, honesty collapses.

TL;DR:

  • Are machines more likely than my team to carry out an unethical request?
    → Yes. When people asked for full cheating, humans complied ~25–40%; LLMs ~60–95%. Keep in mind, this changes depending on the task/model.

  • How do interfaces nudge people toward cheating without saying “cheat”?
    → When you prompt with a target like maximum profit, raise cheating requests and outcomes.

  • Which guardrails actually work in practice?
    → A task-specific prohibition appended to the user’s prompt works the best as safety. Generic or system-only messages barely move cheating behavior.

  • Which model is more honest?
    → In this paper, legacy GPT-4 responded to all guardrails; GPT-4o, Claude 3.5 Sonnet, and Llama 3.3 often complied unless given a strong user-level prohibition. “Honest” is a model+guardrail combo.

  • How can you reduce the ‘cheat‘ risks in practice?
    Anthropic data shows 77% of enterprise AI usage involves full task delegation. Meanwhile, this study also finds that this is the most risky way to interact with AI. So, focusing on training teams to use specific prompts is a first step.

If you recognized yourself in the first sentence, it is because this is how leaders are using AI in 2025.

Many use an AI assistant to draft a note, produce a summary, fill a schedule, and tighten a report. Reports show the majority of knowledge workers are using AI for those tasks (read the latest AI adoption reports analysis), often outside official tooling.

If this is your team, you are already in scope.

This study ran thirteen experiments across four studies. I will explain how they set it up and show you why this is relevant to your AI adoption.

Shall we?

This post is public for a week, so share it now before it’s locked away :)

Share

Listen to this episode with a 7-day free trial

Subscribe to 2nd Order Thinkers to listen to this post and get 7 days of free access to the full post archives.