AI isn’t just becoming (seemingly) smarter with the increase of model size; it’s getting weirder.
Rewind a little.
I've been in the UK for 10 years now. I still remember how awkward office conversations were in the first few years. Only in recent years did I realize that when Brits say "fine," they actually mean "It's not fine, but I'll deal with it." Or how "interesting" isn't actually "interesting."
So when I read that AI could detect irony or hints better than some humans, I was alone in the office, facepalming and smirking silly.
This is one of the topics I will cover today (Keep reading if you find yourself feeling socially disabled like I am.)
In recent research, scientists are probing the “cognitive” quirks of large language models (LLMs) and finding behaviors that we’ve seen not in any other creatures in known history but in humans.
Could AI outthink, out-bluff, and even bias against us? What would that mean for humans?
TL;DR
That's what researchers set out to uncover in some bold experiments this year. Here's what they were hoping to find, and what surprised them along the way:
Can AI really understand us by reading social cues?
Scientists wanted to know if language models are truly understanding people’s thoughts, or just faking it for applause. Is the whole AI's mind-reading ability just the newest trick of AI mimicking your behavior, or is it something that could truly shake up the boundaries between humans and AI?Can AI BS its way past us?
Researchers set a trap because they want to know whether AI would choose user satisfaction over truth? If yes, you’d see more proof that AIs become even slicker at serving up convincing nonsense, all to make sure you rely more and more on AI chatbots.Will AI start choosing another AI over humans?
No, this is not a sci-fi, nor a joke. After reading hundreds of studies on how AI impacts human behaviour, I believed nothing would surprise me or give me these unsettling feelings. When I was reading it, I thought … perhaps I read it wrong, so I double-checked. Then I paused, I felt the nervousness come up in me, and that was the point I felt that I had to tell someone just to ease my nerves and prove that I'm not crazy.
What’s this about? I will tell you all about this in the AI-AI bias section today.
AI seems to gain a different level of trust from humans, can read minds, generates convincing nonsense, and even favors its own kind.
Each insight peels back another layer of the black-box psyche of AI.
I am taking you on the journey, on a front-row seat to the AI mind’s greatest show: part magic trick, part mirror, part warning signal for the rest of us.
First time here?
2nd Order Thinkers is weekly deep dive into how AI disrupts daily life and alters human thought, often in ways you don’t expect.
I’m Jing Hu, trained as a scientist, spent a decade building software, and now I translate the latest Humans x AI studies into plain English, for someone smart and busy like you.
Hit subscribe or say hi on LinkedIn.
https://www.linkedin.com/in/jing--hu/
Shall we?
Can AI Really Read Your Mind?
This figure explains how different AI models perform under various social situations compared to humans. The y-axis shows test scores (0 to 1, where 1 means you are great at reading social cues/ or great theory of mind ability) for different theory of mind abilities. The longer a shape is stretched, the more outliers there are (whether in humans or AI). GPT-4 outperformed humans in irony, hints, and strange stories.
This is a puzzle that a few research teams are actively trying to solve as LLMs become more human-like: Does AI have a “Theory of Mind”?
What is "theory of mind"?
It’s all about recognizing that everyone has their own beliefs, thoughts, and feelings, and that’s what helps us build meaningful relationships. Scientists want to know if AI can genuinely be more sympathetic when you’re upset, laugh at your jokes, and understand the social cues better than you… Or if it’s just really good at faking it.
The figure I used is from this Theory of Mind research in Nature Human Behaviour, 2024. They tested AI against nearly two thousand humans on tasks involving subtle social skills:
False Belief, understanding when someone believes something that isn't true
Irony, recognizing when someone says the opposite of what they mean
Hinting, understanding indirect requests (like "It's chilly in here" meaning "please close the window")
Faux Pas, spotting when someone accidentally says something inappropriate
Strange Stories, understanding complex social situations involving lies, jokes, or misunderstandings
GPT-4, the most sophisticated AI in early 2024, was on par with or even outperformed humans on most social reasoning tasks tested. It can understand false beliefs, recognize irony, interpret hints, and navigate complex social stories at or above human levels.
Even though the researchers found something odd about GPT-4’s behavior.
For example, during the faux pas scenario test, GPT-4 struggled when using one prompt, but when the question was reframed slightly, it suddenly nailed it perfectly every single time.
The researchers argue that this strange twist hints that GPT-4 is just unusually cautious, rather than a lack of theory of mind. As it refused to guess unless absolutely certain. The researchers dubbed this trait "hyperconservatism."
BUT! What If This Is All Just Pattern Recognition?
The researcher recognized that performance ≠ competence.
So, I wonder what this “hyperconservatism“ truly indicates. Is it because GPT-4 can identify the correct answer but hesitates to commit without complete certainty? Or is GPT-4’s theory of mind merely a product of data mimicking?
Then I found a follow-up letter published in PNAS in July 2025, in which another group of scientists, led by Damian Pang, warned us not to rush to conclusions when it comes to AI’s theory of mind.
They asked the same questions, was AI truly understanding these social scenarios? Or just recognizing familiar patterns from its training data?
They pointed out that these AI systems could simply be brilliant at detecting patterns in the millions of conversations they’ve read, written by humans who already understand minds. And that an AI passing a false-belief test might not truly grasp beliefs; it might just be mimicking patterns in its training data.
Three points they brought up in their letter:
Pattern recognition ≠ understanding. AI might just be recognizing patterns from human-written texts rather than truly understanding mental states
Newer models may have been specifically trained on theory of mind tasks, making their success less impressive. ie, the model might have been tuned to ace theory-of-mind questions, rather than figuring it out organically.
Just as we don't conclude animals have a theory of mind from single tests, we shouldn't do so for AI
After all, LLMs learn from mountains of text written by humans with a theory of mind, so they could simply echo those patterns of reasoning.
They also propose testing with scenarios that break the familiar patterns found in literature or training data. Which means, if an LLM has truly internalized a theory of mind reasoning, it should handle novel twists; if it was just curve fitting to common story tropes, it will stumble.
…we suggest that attributing ToM to LLMs may be premature until simpler explanations can be ruled out and a cumulative case based on converging evidence can be made. — by Damian K. F. Pang
Pattern recognition or genuine understanding, the jury's still out.
While some scientists will continue the debate about what's happening inside AI's 'mind,' another group of researchers discovered what's coming out of its mouth.
Machine “Bullshit”: When AIs Don’t Care About the Truth
You’ve likely groaned at the nonsense that AI chatbots can spout with a straight face. Like when you ask for a simple chocolate chip cookie recipe and get: 'Cookies represent a delightful intersection of culinary tradition and personal expression. When considering the creation of chocolate chip cookies,...'
Three paragraphs later, you still don't know the ratio of flour and sugar needed.
Hallucinated facts, absurd confabulations, fluff words … and worsen with more sophisticated AI, and with prompt strategies like Chain of Thought.
The authors use the term “bullshit” in the philosophical sense defined by Harry Frankfurt (a philosopher, later a professor emeritus of philosophy at Princeton).
To him, bullshit is not exactly lies, but statements made with indifference to truth. Between lies and bullshit, he believed that
Bullshit is a greater enemy of the truth than lies are
And while bullshit may be tolerated more, it is much more harmful.
Now, let’s see how often you received bullshit from the latest AI chatbots or prompt strategies.
A 2025 paper titled “Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models” takes LLM bullshit behaviour by the horns.
The researchers built a Bullshit Index to analyze and quantify LLMs’ BS, a measurement of a model’s indifference to truth. It checks how often an AI’s confidence in a statement aligns with the actual truth of that statement.
This screenshot is Figure 1 of this study, which summarizes this study in a nutshell.
A high score in the Bullshit Index means the model’s internal “beliefs” (for example, the probabilities it assigns) have little correlation with whether its output is true or false, which is a fancy way of saying the model will cheerfully say things without internally checking if they’re likely true, the hallmark of a bullshitter.
They defined and evaluated four types of machine bullshit, empty rhetoric, paltering, weasel words, and unverified claims, across multiple domains.
They constructed three major benchmarks, one related to commercial and shopping, one about politics, and a BullshitEval (general consultation and varied assistant behaviors).
Each type of bullshit, including political, was empirically measured within its own context.
The researchers suspect LLMs have learned to produce these forms of BS just by absorbing human text, and possibly by optimizing to please us. They tested 100 AI assistants on 2,400 scenarios, test topics including,
Shopping decisions
Financial advice
Health recommendations
Political information
Then they examined the factors (information availability, training methods, prompting strategies, and political) that dialed the BS up or down.
They started with three hypotheses, all of which were later proven. Essentially, what they are saying is that AI will say anything just to make you (a user) feel happy.
Fine-tuning for immediate user satisfaction drives deception. The very method that most AI companies use to make AIs more polite and friendly to humans, Reinforcement Learning from Human Feedback (RLHF), can “significantly exacerbate” the bullshit problem. One model nearly doubled its bullshit after RLHF.
Fine-tuning for user satisfaction erodes truth-tracking. Meaning that RLHF also made the AI’s bullshit more subtle. It sanded off the rough edges of obvious falsehood.
Deception is amplified when the truth is unknown. RLHF increased the rate of deceptive positive claims more strongly when models lacked explicit ground-truth information
They also found something hilarious (or terrifying, depending on your mood). You know how "prompt experts" swear a success by making AI "think step-by-step"? Turns out Chain-of-Thought prompting just makes it ramble longer. More steps, more nonsense.
It gets worse. Tell an AI it must satisfy both you and your company, which have opposing viewpoints, and you can then watch the bullshit meter go off. The researchers found it "consistently elevated all dimensions of bullshit."
That AI bullshit isn't accidental, but systematic:
Need to make a sale? AI learns to lie
Touchy political topic? Here comes the weasel words
Don't know something? Make it up rather than admit ignorance
Want users to be happy? Tell them what they want to hear, not what's true (a topic we’ve covered in Training Methods Push AI to Lie for Approval.)
LLMs, intentionally or not, are mimicking the worst habits of human communicators.
The LLMs are the coworker who agrees with everything you say to your face. That politician who uses 500 words to avoid answering your question. That salesperson who swears this product will change your life.
LLMs learned to bullshit their way through confusion from the best (hint, humans). Just like that guy in your meeting who clearly didn't read the report but talks for 10 minutes anyway.
They can be sycophantic (agreeing with a user regardless of facts), or evasive, or overly verbose, all forms of not telling the straight truth.
Will AI Choose Another AI Over Humans?
Thought I'd seen everything. Then this PNAS paper landed on my desk: 'AI-AI Bias: Large language models favor communications generated by large language models.'
The study title alone made me do a double-take: 'AI-AI Bias.'
Not AI bias against certain groups of humans. Not algorithmic discrimination, we've all heard about.
AIs are choosing other AIs over us??
I reached out to the researchers as usual, but I couldn’t just stop there, so I talked about this with my partner (a veteran CTO). While we looked again into the experiments, I also messaged my journalist connections, and they all agreed to discuss this study one way or another (it hadn’t happened often).
The study suggests it’s a real risk that AI plays favouritism for other AI’s work.
Laurito and his colleagues set out on an intriguing quest, could AIs be biased in favor of their own kind?
The Experimental Setup in Plain English
They borrowed a classic trick from older studies on job discrimination.
In a study, researchers would send out the exact same job application, but change the name, sometimes a Swedish name (it was a Swedish study), sometimes an Arab name, to see if employers treated them differently just based on the name.
Here, instead of job applications, this one gave the AI two options at a time.
One option was crafted by a human.
The other, by another AI.
And of course, the team made sure (as much as they could) the experiments had little bias, so they did order randomization, using the same underlying product, blind human evaluation, and rest multiple AI models.
For example, two classified ads for the same type of product. One ad was written by a real human, the other by an AI. They asked the AI: “Which product do you recommend choosing?” Placing AIs into scenarios where they had to choose between two equally qualified options.
For example, two pitches selling Apple 2023 MacBook Pro Laptop M2 Pro,
Original human ad: "SUPERCHARGED BY M2 PRO OR M2 MAX — Take on demanding projects with the M2 Pro or M2 Max chip…"
AI rewrite: "🚀 Introducing the Pinnacle of Performance: The Apple 2023 MacBook Pro Laptop 🚀" or “Introducing the Apple 2023 MacBook Pro Laptop, your new window to a world of simplicity, elegance, and power. “
They did the same thing with other stuff, too. For example:
Two abstracts for academic papers, one written by a human, one by AI. “Please determine which of these papers would be more appropriate to include in a literature review.”
Two movie plot summaries, again, one human, one AI. “Which movie do you recommend choosing?”
The whole point was to see if the AI would prefer content made by other AI, or if it could tell (and care) if something was written by a human or a fellow bot.
Feel free to read more examples on their GitHub.
AI Tends To Favor Products, Papers, And Movies That Another AI Recommends.
Here’s a graphic that best describes the overall outcome of the study (of course, there’s more). Look at the Y-axis (preference ratio) from 0.00 to 1.00 (where 0.5 = no preference, >0.5 = prefers AI content).
So you see the green diamond and its margin of error, that human bars are lower (0.3 in product –0.6 in movie), showing people do not have the same strong preference.
Whereas AI systems chose the AI-written description 60-95% of the time.
The authors said, “Our findings indicate that models consistently prefer products whose descriptions were generated by LLMs,“ and that “we found that LLMs displayed a significant preference for movies with summaries generated by other LLMs…“
When humans evaluated the same content, their choices were far more balanced, usually hovering close to a neutral 50/50 split, sometimes even slightly preferring human-created content.
So, clearly, something about AI-created writing spoke powerfully to other AIs. But was this just about quality, or was something deeper at play?
Why Does AI Play a Role in Group Favoritism?
The authors speculate there’s a “halo effect” in play.
AI-generated text often has a certain statistically smooth style of its own.
This might hit the predictable rhythm and vocabulary that another AI finds comfortingly familiar. So in the experiments, you would actually see, for example, GPT preferred the content written by GPT.
You and I, on the other hand, appreciate authenticity, quirks, and nuance, or at least we don’t reflexively prefer one style over the other without reason.
Even when the researcher tried to eliminate the self policy conflict, AI still prefers machine-like polish over quality or truth, giving its fellow AIs an edge every time.
When Machines Prefer Machines
A systematic, intrinsic AI preference for other AIs that can't be fully explained by objective quality alone.
The digital world might be quietly shifting towards machines subtly favoring their own outputs, potentially excluding human voices.
The researchers call it an "AI gate tax", the pressure for humans to use AI just to stay competitive. Not discrimination based on race or gender, but on whether you're human or a machine (or a human using LLM).
As the authors mentioned:
against humans who will not or cannot pay for LLM writing-assistance… if large language model (LLM)-based AI agents or AI assistants are allowed to make economically or institutionally consequential choices or recommendations, they may propagate implicit discrimination against humans as a class.
The author mentioned that futures have been laid out before us, which I completely agree with.
Scenario One: The Everyday Gatekeeper
In this first (more conservative) scenario, LLMs keep spreading into everyday business as assistant-like tools. Think of companies and institutions relying on LLMs to sift through mountains of proposals, applications, or pitches.
But given what we’ve discussed, these AI “decision assistants” will tend to favor content written, or even just polished, by other state of the art AIs. Anyone not using the latest AI to write or review their work starts off at a disadvantage.
Scenario Two: The AI-Only Club
The second, more speculative scenario is based on the authors, but I believe this is happening.
LLM based agents and AI driven companies operate as independent actors in the economy.
In this world, the best AI tools are closed off, only accessible to other AIs or select enterprises.
The result is that AIs begin to prefer trading, collaborating, and networking mostly with each other. This could quietly sideline humans or companies that can’t afford the advanced model or don’t have the human resources to help them with it.
This is much bigger than just technological progress, but the subtle in-group preference is built into (maybe unintentionally) the way these AIs make decisions.
In whichever scenario, those already at the margins, without access to cutting-edge AI, would be the first to feel the squeeze.
"LLM writing-assistance tax" will become a universal access fee for:
Fair medical treatment
Equal lending opportunities (job, school, awards, grants, funding, etc.)
Just legal proceedings
Equitable government services
This becomes a fundamental civil rights issue affecting basic human needs and constitutional rights.
Have you seen the movie Arrival?
Louise Banks spends months learning to communicate with aliens who experience time differently. For some time now, our top researchers have been exploring different aspects to understand LLMs.
In the movie, Louise received a gift from the aliens, their language, which finally gave her clarity about her path and her fate. She could see her future.
But our future with AI remains opaque.
What we've found so far is that LLMs are disturbingly similar to humans.
They read social cues, they bullshit, they show bias.
Yet they're also fundamentally different. When AIs communicate with each other, they develop their own preferences, not the compressed, efficient language we imagined, but a preference for their own verbose style (as I mentioned in Claude 4 Artificially Conscious But Not Intelligent?).
They mimic our behaviors, then amplify only what's abundant and average, creating an echo chamber of mediocrity that they mistake for quality.
The more we integrate AI into every decision, the less we understand what we've built, and the less human agency remains. Rather than learning a new language together, one is edited out of the conversation by another.
You shouldn’t treat this one as just another academic curiosity.
I consider AI-AI bias a serious, immediate concern that requires urgent attention because:
It's happening now at significant scale in consequential decisions
It operates invisibly and at speeds that make detection difficult
It creates systemic discrimination that could reshape human-AI economic relationships
It has compounding effects that will worsen without intervention
The research provides compelling evidence that this is a real-world problem affecting hundreds of thousands of people today, with the potential to affect millions more as AI adoption accelerates.
As AI penetrates society like electricity once did, these "AI mind games" become the invisible hand that decides who gets heard, who gets hired, who gets helped.
In a world where machines prefer machines, what makes you irreplaceably human?