So…we have two major AI usage reports dropped on the same day from OpenAI and Anthropic.
I was curious, so I thought to put them side-by-side.
Both claim to be "economic indices." Both analyze millions of real conversations. And both accidentally reveal the same uncomfortable truth: the AI adoption metrics aren’t having a bright future, as they’d like to believe.
Here's what nobody's connecting yet.
OpenAI's data shows ChatGPT usage has shifted from 50/50 work-personal to 70% personal in just one year. This aligns with the MIT report released in August, which states that 95% of organizations are getting zero return on their $30-40 billion in GenAI investment. Zero. Not small. Zero.
Your dashboard might seem otherwise, why not?
Your team sends thousands of messages via chatbots. With all things considered, usage seems up and to the right. While yet another report (which I’ll also cover) found that 97% of consumer AI users won't even pay for it.
Turns out "adoption" is people asking ChatGPT for relationship advice at 2 am, not caring so much about the response quality, nor automating workflows.
Questions I plan to answer in this one:
Is AI still the ultimate productivity tool?
How many users actually paid for AI chatbots?
Why is Claude Chat better positioned than ChatGPT?
Why does Utah show a higher adoption rate than California? And what story does the 2025 gender adoption gap tell us?
Are we funding productivity tools or expensive digital therapy bots?
This article tells a story that should make every leader pause before their next budget review and ask: Are we funding productivity tools or the world's most expensive emotional support chatbot?
Shall we?
Section 1: What The Numbers Actually Show (And Hide)
Every major technology platform in history faced the same challenge: converting users into revenue. Facebook needed five years to figure out advertising. Twitter took eight years to hit $1 billion. Google AdWords needed four years to prove its model.
What about AI?
AI (especially LLM chatbots) has so far achieved massive adoption in the shortest period of time in tech history. Which shouldn't be a surprise and shouldn't be a comparison point at all… just like you wouldn't (and shouldn’t) compare the adoption of fire with the adoption of the Internet.
With such an unprecedented adoption rate, it also managed to fail at monetization in ways that no one had ever seen before.
The consumer AI market has 1.8 billion users. Only 3% pay for it. To put that in perspective, this is on the low end of the free-to-play conversion rate of any mediocre mobile app. How about Spotify converts over 40% and Netflix used to convert over 90%?
The free to pay number looks slightly worse for ChatGPT. With 800 million weekly active users and 20 million paying subscribers, this means that only 2.5% pay for premium access. It depends on which data you pulled; on a generous side, it would be 5%.
Even in 5%, my argument later still stands.
The Split Personality of AI Usage
ChatGPT‘s September data showing something their marketing team probably wishes they hadn't measured.
In June 2024, ChatGPT usage split evenly between work and personal. 15 months later, over 70% of all ChatGPT messages are personal relationship advice, creative writing, and emotional processing.
Work usage dropped to just 27% of total volume.
Here's where it gets interesting.
Most of you are veterans in business setup, so you should know that the conventional wisdom says people pay for tools that make them more productive at work. Personal entertainment apps struggle to charge more than a few dollars, while professional tools command $20, $50, or even $100 monthly.
So if ChatGPT is increasingly becoming a personal companion rather than a work assistant, the 5% (or even lower) conversion rate starts making sense.
People don't pay for digital friends. They pay for tools that make them money.
The Menlo Ventures data only confirms this isn't just a ChatGPT problem.
Across all consumer AI tools, 97% of users stick to free tiers. This includes Midjourney, Claude, Perplexity, and dozens of others. The entire consumer AI industry is essentially running on fumes, massive user numbers, and minimal revenue.
Why This Time Really Is Different (And Not in a Good Way)
Previous platforms had an excuse for slow monetization. Facebook needed to build its user network before ads made sense. Or that Google needed enough search volume to attract advertisers. They were building two-sided marketplaces that required scale.
AI tools don't have that excuse. A big part of their business is selling directly to users. Yes, OpenAI tried to build GPTs (a marketplace for AI chatbots, not sure it’s going anywhere). The product works from day one. There's no network effect needed. Yet they're converting users at rates (considering the capital raised) that would have killed any previous technology company.
The shift from work to personal usage that OpenAI's data reveals isn't a mistake.
Users like you and I are voting with our usage behaviour and our wallets.
If you only use ChatGPT to talk about your feelings or ask questions you previously used Google for. If you didn’t pay for Google as a user, why would you pay for ChatGPT?
Why Claude A Better Product Than ChatGPT?
If we solely focus on building a product that is viable, then Claude is winning.
What the OpenAI team has seen is that the 70% personal usage matches perfectly with the mass majority who won't pay.
However, this is different for Anthropic, which has been focusing on coding and writing since the beginning.
Anthropic's report data suggests a different pattern.
Their report shows rising "directive" interactions, a single-command, task-focused usage with their API that maps to actual business processes.
While OpenAI hasn't released comparable API usage breakdowns, Anthropic's report aligns with its reputation in coding and information-sector tasks, not to mention that these are the two tasks that LLM is most useful for, hinting at the enterprises' vendor choices.
The positioning gap is real.
ChatGPT's massive brand recognition comes with a consumer-heavy usage pattern that's notoriously difficult to monetize. Claude's narrative and usage telemetry align more with enterprise buyers who have clear budgets for tools that demonstrably improve business outcomes (even only within one sector).
This isn't about which AI is "better."
No, LLMs are now close to commoditisation; there's not much difference in the model itself.
It's about which large language model company has a fundamental understanding of what models can actually deliver to match user needs.
Still, if we zoom out from the comparison between OpenAI and Anthropic, the revenue at large is still disappointing.
The market isn't failing to monetize because it's early. It's failing because the technology failed to solve the problem that AI pioneers imagined to solve.
Section 2: How People Use Chatbots at Work?
The shift in ChatGPT usage from work to personal life also points to a deeper truth about the current state of AI.
We’ve always wanted AI to be a reliable work partner. AI is expected to perform repetitive tasks for us. Klaas and I wrote two articles before the launch of GPT-5 and Claude 4 about how agentic AI isn’t up for the task. Read AI Agents Problems No One Talks About.
Unfortunately, the status remains largely unchanged.
The 3:1 "Good to Bad" Ratio at Work
Every manager should read this before they push their team to use more chatbots for work.
The report examines whether the user’s apparent satisfaction with the chatbot’s response to their request. They look for an expression of satisfaction or dissatisfaction in the user’s subsequent message in the same conversation, with three categories: Good, Bad, and Unknown.
The good-to-bad ratio grew from 3 to 4.1-ish. You can attribute it to the model being more mature. However, if we break down the data and examine the good-to-bad ratio, you’d see that the work-related usage remains low in the ratio, despite racial differences in usage.
When it comes to work-related tasks, the ratio of good to bad information from the model is often between 3.11:1 and 1.95:1.
Meaning, in a professional context, as much as one-third of the information you get from the world's leading chatbot isn’t up to your satisfaction.
Is this a ratio you’d accept from a human teammate?
But again, what do you expect from the models that are designed to generate plausible text, not to be factually accurate? An AI chatbot may be brilliant for chitchat and role play (part of the self-expression category, the one with the highest good-to-bad ratio), but a terrible doer.
Also, there are different kinds of help.
Asking, Doing, and Expressing: Three Ways We Interact with AI.
The OpenAI report breaks down AI conversations into three categories:
Asking: Seeking information or answers (e.g., "What is the capital of Mongolia?").
Doing: Task completion or execution (e.g., "Write a Python script to scrape this website").
Expressing: Emotional release or other open-ended conversation (e.g., "I'm feeling overwhelmed at work, help me think through this").
So while "Doing" tasks are common, the real growth is in "Expressing.”
People are using AI as a sounding board, a creative partner, or simply a way to articulate their thoughts without fear of judgment. This again verified what OpenAI found in their report of the transition from work to personal use.
It’s a powerful use case, but it’s not the one that will save the day for the LLM chatbot application.
Before we enter the next section, here’s my question.
Are you paying for productivity tools, or are you subsidizing expensive digital comfort food?
The answer has significant implications for how you invest in, manage, and measure the impact of AI (chatbot-style AI) in your organization.
Section 3: "Automation vs. Augmentation".
Your best employees already figured out what the leaders haven't. The hands-on teams using AI as a smart assistant, not a replacement.
While you're looking at metrics on Anthropic report showing "directive automation" jumping from 27% to 39% in the Anthropic report, and thought you may finally deliver the downsizing the board is asking, your team is only using Claude to draft a report they'll rewrite, check code they'll debug, and brainstorm ideas they'll refine.
The Microbehaviour Your Team Didn’t Articulate
Here's the micro-behavior happening across every department in the last year.
We all start with AI doing the whole task, watch it fail spectacularly, then shift to using it as a first-draft machine. We’ve learned through painful experience that AI hits 50-70% accuracy on enterprise tasks, good enough to start with, deadly if you ship it unchanged.
Anthropic's report accidentally reveals this pattern.
High-adoption countries (like the US or the UK), the ones with the most AI experience, favor augmentation over automation. These aren't workers resisting change.
Why would they?
You may argue that people don't use it because they fear losing jobs to it, but that's not how the economy works, at least not in the short term. People are mostly shortsighted, so the sooner they see an easy way to adopt AI that will produce an enormous amount of economic output, the sooner they will rush to it. So there is only one sensible conclusion here.
Your most sophisticated teammates have learned that AI works best as a co-pilot, not an autopilot.
The geographic data reveal that wealthy countries with longer AI exposure use it less for full automation.
They tried delegation. It failed.
Now they use AI the way it actually works.
When 95% of custom enterprise AI implementations fail
Also, the fact that
This is the first report where automation usage exceeds augmentation usage.
Claim in the Anthropic report happens to be in the same period that Anthropic shipped new features, eg, web search, research mode, and artifacts.
These features mechanically compress five-turn conversations into one command. The interaction looks more autonomous, but your employee is still reviewing, editing, and fixing the output.
This creates an illusion. You see automation metrics rising and think your workforce can now delegate tasks to AI to work independently. Meanwhile, your actual users know AI only achieves 50% accuracy on financial analysis and 24% completion in realistic workplace simulations, read my Is AI Destroying SaaS?
Again, it has been proven that the AI replacing human workers is just hopes and dreams from those who have already bet too much on LLM automating jobs.
The MIT data confirms these causes.
95% of custom enterprise AI implementations fail. Not just because the models are weak, but also because leaders misread usage patterns. They saw how demos worked out and invested in complex autonomous workflows when their teams were either not prepared for such a workflow or a complete mismatch between user intent and tech maturity.
To further understand, what are those adaptations that show early success?
Let’s look at the API section of the Anthropic report, which has seen a steady growth in its API usage.
The vast majority of the adoption is from the Information sector, specifically for coding-related tasks. They stated
Among the top 15 use clusters—representing about half of all API traffic—the majority relate to coding and development tasks.
So LLM still has its usage, primarily being a code assistant for developers.
The User Problem First Approach
The pattern is consistent across every successful AI deployment (or should we say every software product launch?): solve an actual user problem.
Yes, it is this obvious. But most leaders do it backwards. They start with “look what it can do!" instead of "here's what pains users (or colleagues)".
You’d think the fundamental blockers are LLM’s capabilities.
No. They're memory, your goal-setting, your feedback integration, and your data consistency.
Until these basics work, that rising "directive automation" percentage is just how they slice it in the report, not proof that AI delegation is ready for your workforce.
Section 4: User Demographic. Geographic and Gender.
Why do People in Utah Use More AI?
Anthropic's AI Usage Index reveals an expected pattern, but also something that may drop your jaw.
Singapore (4.6x expected usage) leads adoption globally. No surprise there. Given Singapore ranks 5th in the Global Innovation Index, with top scores for government effectiveness and venture capital infrastructure.
Korea (ranked 4th in the Global Innovation Index) follows a similar pattern. When countries build systematic innovation or infrastructure, AI adoption follows.
But here's where it gets interesting.
This way of ranking also puts Utah second in US per capita AI usage (3.78x), ahead of California (3rd place, 2.13x) and far ahead of New York (4th place, 1.58).
Suprise!
The team further looked into it… they saw a notable fraction of its usage appeared to be possibly associated with coordinated abuse.
In footnote #7 Anthropic suspects an anomaly in Utah's data:
When further investigating Utah’s activity, we discovered a notable fraction of its usage appeared to be possibly associated with coordinated abuse. This is also reflected in a much higher “directive” automation score than average. However, we ran robustness checks and believe that this activity is not driving the results.
In plain English…
"Coordinated Abuse" doesn't necessarily mean malicious activity. This could be for tasks like generating massive amounts of content, scraping data, or running programmatic queries for a specific application.
A higher "directive" score makes perfect sense if its automated scripts are, by their very nature, "directive." They issue a command and expect a result, with minimal or no conversational back-and-forth.
However, they also claimed that this isn’t ‘Driving the Results’. So even after accounting for or filtering out this bot-driven activity, their main findings about geographic trends and the general rise in automation still hold true.
Another theory about Utah’s high Claude usage is that it established regulatory sandboxes for AI experimentation, created the first Office of AI Policy, and invested in cloud infrastructure before adoption took off.
But even this… I’m not convinced how Utah’s legitimate high adoption due to regulatory sandboxes and AI-friendly policies is so exceptional to an extent that it surpasses California’s tech hub usage.
Following Occam's Razor, if something is unexplainable, always go with the simplest answer. I suspect this claim about Utah reflects the methodological vulnerability of per-capita metrics in small populations when automated usage is present.
This is exactly the kind of "Causality Theater" you need to be extremely careful when reading reports like this.
Avoid seeing impressive statistics and assuming they reflect organic user behavior when the real explanation might be much simpler.
Section 4.1: Gender pay gap. No, not the “pay gap” you thought. Keep reading.
Now, OpenAI’s latest figures show the gender gap in AI usage is flipped.
Back in 2023, usage surveys found men were far more likely to use generative AI than women; women made up just 17.6% of users, while men made up the majority. Then 37.2% in 2024.
Now, according to OpenAI’s new report, by September 2025, 52.4% of ChatGPT users are women, slightly more than men.
But is this reversal real?
in my network, points out that OpenAI determines gender by analyzing typically female or male names rather than actual demographic data (by asking the users to self-identify gender). She mentioned this might not be a scientifically accurate metric, as the team acknowledges.However, I thought this method could have errors for both females and males, so that the result would be balanced out.
What’s more telling is if you compare OpenAI’s report data with a Harvard Business School study also published in 2025. This study's comprehensive analysis of 18 studies covering 140,000+ individuals worldwide found persistent 25% gender gaps in AI adoption as recently as 2024.
So which should you believe?
The timing discrepancy may be one key to deciphering this puzzle.
Have a read of the Y-axis of Harvard's data (see screenshot above). It spans 2023-2024 using rigorous survey methodology, while OpenAI's flip appears in their 2025 first-party usage data.
Either we witnessed the fastest demographic reversal in tech history (which isn’t unlikely, given that LLM is the technology with the fastest adoption rate by far), or there are methodological issues at play.
Let’s dig deeper and look at how each gender uses AI chatbots in different ways. There are more women using ChatGPT to draft emails, write reports, and get practical guidance. More men use it for coding help, researching information, and generating images.
This usage difference reminds me of a study by the Norwegian business school. Their data, based on students with equal access to free and paid AI tools, found that male students were more than twice as likely as female students to pay for AI subscriptions (23.3% vs 10.7%).
So this isn’t just about usage.
Females are less willing to invest in the upgraded features for AI.
There’s no clear answer to this, but from other consumer reports, it suggests that this preference might be more of a gender issue in spending, rather than a technological avoidance.
Still, this observation can be disturbing.
Especially, I just did an interview with the author of an AI-AI Bias study. They found that large language models consistently prefer content generated by other AI systems over human writing. When GPT-4 evaluated product descriptions, it chose AI-generated text 89% of the time, while humans preferred the same content only 36% of the time. Similar patterns emerged across academic papers (78% vs 61%) and movie summaries (70% vs 58%).
Since AI systems and automated screens now systematically prefer content written or enhanced by AI over content written by a human, and most men have access to state-of-the-art AI, would it create disadvantages for those who don’t engage with the technology in the same way?
In practice, this means some of us won’t get the productivity boost or the AI “signal” now favored by the tens of thousands of automated evaluators.
Section 5: The $97 Billion Question Mark?
You've probably seen the number by now. AI has created $97 billion in consumer surplus, they say.
The average American would need $98 monthly compensation to give up AI tools. It's the kind of stat that makes you look smart in leadership meetings.
Such concrete proof that AI isn't just hype!
There's just one problem: the number appears to be rather creative accounting.
Where the $97 Billion Actually Comes From
The figure traces back to researchers Avinash Collis and Erik Brynjolfsson, two credible researchers, who published in a Wall Street Journal op-ed.
They ask people: "How much would we need to pay you to stop using AI?" Take the average answer ($98 per month), multiply by 82 million users, and multiply by 12 months. Voilà: $97 billion.
Even if we exclude the several obvious sampling biases, that $98 figure is still remarkably difficult to verify.
The WSJ piece cites "our own survey" without providing methodology details, sample sizes, or even basic information about who was surveyed. The links they do provide don't contain the supporting data.
Note* Please do share the data with me if you know where the sources are.
What's particularly amusing is that this OpenAI ChatGPT usage research paper we’re reviewing cites this WSJ piece as authoritative evidence.
They're treating a newspaper opinion column as peer-reviewed research!!
A rather loose approach to source verification.
The Social Media Reality Check
This "willingness to pay" methodology has been tried before.
Using identical approaches, researchers found that people value TikTok at $59 per month and Instagram at $47 per month.
By this logic, Americans derive thousands of dollars annually from scrolling social feeds.
Yet when researchers asked a different question,
Would you prefer to live in a world with or without [TikTok/Instagram]?
High stated valuations coexist with a genuine wish that the product would disappear entirely.
This reveals the core flaw: the methodology measures addiction and dependency, not genuine value creation.
What People Actually Do With AI
Menlo Ventures' survey of over 5,000 Americans found that while 61% have used AI in the past six months, adoption for any single task remains remarkably shallow.
Here’s a breakdown from MenloVC’s report: the top 10 activities people do with AI.
Rather than one transformative application, usage is "thinly spread across a wide range of tasks".
Users may genuinely believe these tools provide substantial value (hence the high survey responses), while primarily delegating fragmented, marginal value tasks rather than generating measurable economic output.
The Credibility Problem
The disconnect becomes obvious when you check the numbers. The claimed $97 billion consumer surplus substantially exceeds the roughly $10 billion in revenue generated by the market leader ChatGPT. Meanwhile, labour productivity growth remained a modest 2.3% in 2024, hardly evidence of revolutionary economic transformation.
If AI were truly generating $97 billion or even trillions in economic value, according to McKinsey, you'd expect to see corresponding productivity gains in national statistics. Instead, economic growth patterns show little evidence of AI-driven improvements.
What You Actually Need To Know?
Instead of relying on consumer surplus estimates that may measure digital addiction rather than economic value, focus on measurable outcomes in your own organisation. Track specific productivity improvements in your AI pilot projects. Document time savings on concrete tasks, and of course, the time invested to prompt and correct errors from AI. Measure actual cost changes or revenue increases.
The productivity revolution may well be coming.
But again, the numbers on paper aren't the same as your actual progress. Your credibility depends on understanding the difference.
Three realities you need to accept if you’re a leader trying to navigate what AI can actually do for your team.
First Reality: High usage doesn’t equal high business value.
Again, as I pointed out, the data shows that a huge portion of AI usage is personal, not professional.
Your team might be spending hours on ChatGPT, but that doesn’t mean they’re getting more work done. They might just be getting more emotional support. Of course, it counts if emotional support is all you want to see.
Reality Two: Carefully read the footnotes or data source.
Don't be fooled by maps showing where AI is "hot." A high per-capita usage in a report doesn't mean the region has unlocked AI-driven productivity. The Utah anomaly may be due to poor equation design, showing how easily geographic metrics can be distorted.
Third Reality: Thin tasks vs thick workflows.
LLMs are great at “thin” steps (drafts, snippets), but real automation requires owning “thick” multi-step workflows with state, memory, and backstops—where models still break.
The Audit That Matters
Your clients don’t care how many of your employees are using ChatGPT. So stop tracking AI adoption rates.
They care about the results you deliver.
Instead of adoption, consider auditing whether your team’s AI usage patterns align with your business outcomes if you don’t want to just invest in an expensive hobby. Ask the hard questions:
Are we measuring feel-good engagement or measurable work improvements?
Are we using AI to solve real business problems, or are we just paying for a virtual therapist?
Are we building resilient data infrastructure, or are we betting on autonomous agents with a shaky foundation?
I hope this helps.
I also find it interesting how both reports address themselves as an economic index.
Such an arrogant word choice, thinking that a single tool could represent the whole AI economy.