DevDay ChatGPT Apps: What OpenAI Got Wrong About UX & ROI

2nd Order Thinkers.

OpenAI DevDay 2025: Trying What Google and Meta Failed- America's WeChat

0:00

-23:03

OpenAI DevDay 2025: Trying What Google and Meta Failed- America's WeChat

I'm going to break down what OpenAI assumed when they built the Apps SDK—and why they're dead wrong.

Jing Hu

Oct 12, 2025

Transcript

Sam Altman had a big day.

A few hours after the AMD-partnership announcement, Altman took the DevDay stage with the headline feature: you can now use other companies’ [apps inside ChatGPT]. Altman’s version of Everything Everywhere All At Once.

Zillow for homes, Expedia/Booking.com for travel, Spotify for playlists, Canva for design, Figma for product work, and Coursera for courses. The pitch is that chat stops being a destination and becomes the place you do things.

If you stopped at the sizzle reel and think the United States has finally found its WeChat, think again.

You’d likely have spotted some glitches if you watched the demo.

Now, let’s start from the beginning, shall we?

TL;DR:

OpenAI launched apps inside ChatGPT, but you still have to tell it which app to use—defeating the “just ask” promise
The demo frictions are real: manual routing, OAuth redirects, and you’ll still open the native app to actually do things
Natural language isn’t always better than tapping an icon—especially when the original interface was already intuitive
It works for one thing: holding context during multi-source research (house hunting, learning). You’re not juggling tabs.
What this really is: A premium research assistant that hands you off, not the “everything app” Altman is selling

The “All-in-One” Holy Grail of American Tech Giants’

From Musk’s ‘Super App’, Zuckerberg’s ‘Personal superintelligence’ to Nadella’s ‘Unified AI-powered platform’ (Microsoft needs to be more creative in naming things), what’s driving the dream of an all-in-one app? Why do so many tech leaders want one tool that does everything?

This is not a mystery.

Whoever owns the moment of intent controls demand, economics, and data—a pattern proven by WeChat in China, where 1.3 billion users conduct everything from messaging to payments to government services in a single app.

If I say “book me a flight Friday” in your interface, you control the entire value chain: which vendor appears first in search results, what alternatives remain hidden, which payment rail processes the transaction (and collects the fees), what ancillary services get recommended, and crucially—what behavioral data you capture to refine ranking algorithms for every subsequent query.

This is why U.S. platforms have pursued consolidation strategies for fifteen years.

The economic stakes are quantifiable.

Payment processing alone generates 2-3% of transaction value. Advertising placement commands premium CPMs. First-party data enables targeting worth multiples of third-party alternatives. And platform fees on third-party transactions, the “super app tax”, can reach 30%, as seen in app store policies.

This is digital real estate.

It’s all about owning the prime location in the empire of digital commerce.

Have any of them succeeded?

If the standard is WeChat, which again, has more than one billion users conducting everything from messaging to payments to government services to mini-app commerce, all completed within a single interface that captures every digital activity you can think of in a day-to-day.

No U.S. platform has achieved that level of closure.

Just to name a few…

Apple Owns hardware + iOS + App Store, centralizing discovery and payments. However, this is not collapsing life into a single app.

Amazon bundles retail + media + some hardware like Alexa. The Default intent is still “shop + entertain,” not “do everything.” Messaging, identity, and payments are nowhere to be seen.

Google is probably the closest the U.S. has gotten (if you use Android). Google has the stack: Search for discovery, Maps works well alongside search, Gmail for partial communication, YouTube for entertainment, Google Pay, and Android as the OS. That’s everything WeChat has, infrastructure-wise.

So as you can see, there’s always something missing, a few steps away from the everything everywhere all at once holly grail.

WeChat keeps you inside. Mini-apps run in WeChat. Payments close in WeChat. You order food, book rides, pay bills, message friends—all without leaving.

However, even the one with the closest concept, Google, sends traffic outward. The ad model rewards the click away, not the completed transaction. They tried “Buy on Google” and killed it in 2023, and now it's just a Google shop tab, which still doesn’t handle transactions.

And of course, Musk keeps promising a U.S. super app, the way some people promise they’ll start going to the gym out of aspiration, w/o considering practicality.

That’s the backdrop.

Now OpenAI walks on stage, convinced they can pull off what everyone else fumbled.

What OpenAI Actually Shipped?

OpenAI released an Apps SDK and lit up early partners. There will be an app directory and submissions later this year.

The earliest Apps SDK integration seems somewhat impressive in the demo, as what you expected from all demos.

You can ask ChatGPT “find me hotels in Paris” or “flights to Chicago” and it pulls live results from Booking.com, Expedia, Zillow, Spotify—complete with images, prices, maps, playlists—right in the chat. No tab-switching needed to compare options (in theory).

You then get to browse everything visually and ask follow-up questions all in one place.

Seems magical, doesn’t it?

Right off the bat, there’s friction: you have two options to activate an app: one to talk about a relevant topic or name-drop the app to activate it.

Is that really simpler than tapping an icon? What if I talk about music, but I actually want to open a Booking.com app?

Imagine a Friday afternoon. You’re burned out. You slump on the couch and think: I need something relaxing this weekend.

So you open ChatGPT and type exactly that: “I need something relaxing this weekend.”

ChatGPT stares back at you, waiting. Does that mean Spotify for a chill playlist? Booking.com for a spa hotel? Coursera for a meditation course? The Fork for a quiet dinner reservation?

All equally valid. ChatGPT can’t read your mind.

So you have to clarify: “Spotify, give me a relaxing playlist.”

But wait…

If you already know you want Spotify, why are you sitting here talking to ChatGPT? You could’ve just tapped the green icon on your iPhone. One thumb movement. Half a second. You’d already be listening.

Instead, you’re typing instructions to a middleman, waiting for it to figure out what you want, load an embedded app preview, then probably still clicking through to Spotify anyway.

Or it’s Tuesday night. Your partner casually mentions their birthday is in three weeks, and you realize with a small jolt of panic that you have no plan.

You grab your phone: “Help me plan a surprise birthday weekend.”

ChatGPT blinks. What kind of help? Does that mean OpenTable for a fancy dinner reservation? Target to shop for a gift? Is Instacart to get ingredients for a homemade breakfast? Uber to book a ride to somewhere special? TripAdvisor to find the perfect activity?

The cursor just sits there. Waiting for you to decide.

So you clarify: “OpenTable, find romantic restaurants for Saturday night.”

But here’s the thing—you’ve now spent 30 seconds typing, waiting, thinking about which app solves which piece of this puzzle. Meanwhile, your thumb knows exactly where the OpenTable app lives on your home screen. You could’ve opened it, filtered by “romantic,” scrolled through three options, and been done.

Instead, you’re narrating your to-do list to an AI that still makes you do the routing.

The promise was: tell me what you need, I’ll handle the rest.

The reality is: tell me what you need, then tell me which tools to use, then I’ll show you a preview before you click out anyway.

You close the app. You’ll just open OpenTable like a normal person. At least that doesn’t make you feel like you’re managing a personal assistant who can’t take initiative.

Right, so how many frictions have we counted with these two scenarios?

Not Everything Compresses Into a Prompt

The fundamental mistake OpenAI makes is assuming natural language lowers all barriers.

It doesn’t.

Natural language helps when the original task was difficult, when the original task required expert knowledge, syntax memorization, or extensive training.

That’s why “vibe coding” exploded in the last two years.

Remember what coding used to look like?

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv(’sales_data.csv’)
monthly_sales = df.groupby(df[’date’].dt.to_period(’M’))[’revenue’].sum()
plt.figure(figsize=(10,6))
plt.plot(monthly_sales.index.astype(str), monthly_sales.values)
plt.xlabel(’Month’)
plt.ylabel(’Revenue’)
plt.title(’Monthly Sales Trend’)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Now you just say: “Show me monthly sales trends from this CSV.”

That’s a real barrier removed.

Language replaced something harder than language—syntax, library imports, parameter memorization, Stack Overflow searches at 2 am.

But natural language doesn’t lower a shit when the original interface was already more intuitive than forming sentences.

We just saw this play out in practice. When you already know you want Spotify, typing instructions to an AI middleman is slower than tapping the icon. But the problem runs deeper than speed.

Here’s what tech bros keep getting wrong, because they think language is the universal interface.

They assume: If you can describe it, you can prompt it. If you can prompt it, AI can handle it.

But the majority of the decisions in our life aren’t linguistic—they’re spatial, visual, emotional. You don’t describe your way to the right apartment; you see the kitchen, imagine your couch in the living room, and notice the light. You don’t articulate why one poster works and another doesn’t; you feel it instantly.

Browsing IS the decision. The comparison IS the choice.

Scrolling through Zillow isn’t a bug in the user journey; it’s the actual product experience.

Trying to replace that with “describe your dream home in words, then I’ll show you one option” fundamentally misunderstands how people make visual decisions.

And then there’s the hallucination problem.

Natural language interfaces come with a risk that deterministic apps don’t have: fluent nonsense delivered with high confidence.

When you’re planning a casual weekend, that’s tolerable. When you’re juggling fare classes, refund rules, loyalty status, or property tour schedules, it’s a risk you pay for later.

This is why deterministic systems like apps still matter.

Fare calendars, price graphs, map views, and standardized amenity filters. The features are guaranteed, coded, and testable. No one has to guess what the AI is thinking or whether it hallucinated a policy detail.

Not everything compresses into a prompt.

And not every problem needs AI to solve it.

A User Journey Full of Holes

Beyond the philosophical problem, here are three specific tactical frictions that break the experience.

Friction #1. Manual routing (the invocation tax)

You must explicitly name which app to use.

Yes, you could also hint at what you want to ChatGPT, but there’s a chance it grabs a kitchen knife to help you with your plumbing.

ChatGPT can’t infer that “relaxing weekend” means Spotify, or that “birthday planning” needs OpenTable. That’s a pre-decision before your actual decision. People hate pre-decisions. If you guess wrong, the app lacks a feature you need, you repeat the invocation with another brand.

If the promise is “don’t think, just ask,” forcing you to remember and name a specific app defeats the premise.

Friction #2. Cold-start penalty

First time using an app in ChatGPT?

Get ready for account linking, OAuth redirects, consent prompts, and policy text.

Meanwhile, you might already have a session open in the native app with Face ID and saved credentials.

“It’s just a few clicks away!” says tech bros, sure, there are reasons why companies paid six digits to UX designers to reduce steps, FRICTION!

Friction #3: The Extended Feature Problem

During the DevDay demo, they waited two minutes for Canva poster templates.

Then you’d still need to go on Canva to complete the refinement, the dragging and clicking, and download the image you created. Voice assistants stalled for this exact reason: when success depends on exact incantations, people go back to tapping icons.

Friction #4: The Incentive Desert With Their Partners

Here’s what building a ChatGPT integration actually costs: design work to fit OpenAI’s display constraints, API development to expose your data in their format, ongoing maintenance when either your app or their SDK changes, and customer support for a fragmented experience you don’t fully control.

And what do you get in return?

Users who can’t complete transactions in your interface. Shallow integrations that OpenAI explicitly designed to be “lightweight and simple”—their words, not mine. Traffic that might convert... if users don’t bounce back to your native app, where the actual features live.

Meanwhile, your native app works perfectly. Users know where it is. It makes you money. Every dollar spent improving it has a clear ROI.

So why would Spotify, Zillow, or Canva dedicate engineering resources to maintain a ChatGPT integration when their product roadmap is already backlogged? When the best-case scenario is that users start in ChatGPT but finish in their app anyway?

They won’t. Or at least, not for long.

Some will launch for the PR and dip their toe in the water.

But the priorities will shift soon if there’s no traction or usage data from doing this extra work with ChatGPT. The dedicated team gets reassigned.

Many of these frictions will appear the moment people actually try to use this feature.

Again, Altman confused a clean interface with an easy user journey. The first obstacle they’d face isn’t AI accuracy or partner breadth, but initiation.

The Only Scenario This Works.

Alright, I DO see scenarios where having apps inside ChatGPT actually removes friction instead of adding it.

Take house hunting.

You’re browsing homes on Zillow through ChatGPT. You find one you like. Normally, what happens next?

Open Google Maps. Copy-paste the address. Check the commute. Go back to Zillow. Wait, which house was it again? Find it again. Open another tab for school ratings. Enter the address. Again. Then crime stats. Then, walkability scores. You’re not doing complex tasks—you’re just trying to answer “Is this house right for me?” But you’re spending most of your energy on context management: remembering which house you were looking at, re-entering the same information, switching between tabs.

With ChatGPT holding the context, you just ask: “What’s the commute to downtown from this house?” → “Are there good schools nearby?” → “Show me crime stats for this neighborhood.” → “What about the walkability score?”

Each question builds on the previous one. The house stays in context. You’re not juggling tabs or copy-pasting addresses. That’s genuinely better.

Or take learning.

You’re watching a Coursera statistics course through ChatGPT. The instructor mentions “p-values,” and you don’t get it.

The normal flow: Pause the video. Open a new tab. Google “what is p-value.” Scan three different explanations. Try to connect it back to what the instructor said. Go back to Coursera. Where were you in the video? Rewind. Find your place. Unpause.

With ChatGPT, you just ask: “Explain p-values in simple terms” without leaving the course. The answer connects to what you’re learning right now. No context switch. No memory load. Just immediate clarification.

So you see… the value with ChatGPT isn’t the apps themselves, but it holds the context and combines information so you don’t have to.

This works when the friction you’re removing is coordination overhead, not the task itself. When your problem is “I need to synthesize information from five different places” or “I keep losing track of what I was doing,” having one interface that remembers feels genuinely helpful.

When you’re doing exploratory, multi-source research—the kind where you’re asking follow-up questions, building understanding, connecting dots across services—ChatGPT as the coordinator actually makes sense.

Where I land (and where this goes next)

OpenAI built a research assistant that holds context and saves users from tab hell. That’s genuinely useful.

They dressed it up as an everything app and called it a platform play. That’s the problem.

The partners who signed on will quietly deprioritize this within a year if they see no user engagement. The integrations will go stale. Users will learn the narrow use case: ask ChatGPT to compare options, then tap the real app to finish, and that’s where this stabilizes.

Not as the App Store. Not as WeChat. As a somewhat decent research layer (as it currently is) that hands you off when things get real.

The everything app dream dies the same way in America every time: not because the technology can’t do it, but because nothing in the ecosystem supports it.

Users won’t route through a middleman when their thumb already knows where the app is. Visual decisions don’t compress into prompts. Partners won’t maintain integrations with no ROI. And platforms make money sending you away, not keeping you in.

OpenAI will soon learn what Google, Meta, and Apple already know.