Apple is enforcing an old App Store rule against a new kind of software

IrisMay 05, 20261Share

Since January, Replit’s iOS app has been stuck on the same version. The rankings show it slipping. First to second, then to third in Apple’s free developer tools category.

In March, The Information reported what was happening. Apple had blocked updates to several AI coding apps, including Replit and Vibecode. The cited rule was App Store Review Guideline 2.5.2: apps must be self-contained in their bundles and may not “execute code which introduces or changes features or functionality.” The rule has existed for years. It predates the category of software it is now being applied to.

According to the report, Apple was close to approving Replit’s updates if the company stopped previewing generated apps inside its iOS client and opened them in Safari instead. The fix wasn’t “stop generating code.” It was “stop showing the generated thing inside the reviewed thing.”

A week later, Apple’s enforcement escalated. On March 26, the company pulled an app called Anything from the App Store entirely, citing the same rule. Anything’s co-founder Dhruv Amin had spent the previous three months trying to comply, submitting four different technical rewrites in response to Apple’s feedback. The final attempt did exactly what Apple had reportedly suggested to Replit: route generated app previews through an external web browser instead of an in-app web view. Apple rejected the update and removed the existing version anyway.

Several months in, the standoff has not been resolved.

To understand what the rule is doing, it helps to take Apple’s position at its most charitable. The App Store’s premise is that the artifact reviewed at submission is the artifact that runs on the user’s device. When Replit displays a generated app inside an embedded web view, the reviewed Replit binary effectively contains an unbounded number of unreviewed apps. The wrapper got reviewed. The contents did not. From Apple’s perspective, the review didn’t review anything.

Apple has had complicated rules around interpreters, downloadable code, browsers within browsers, and JavaScript-heavy apps for as long as the App Store has existed. Each round, the runtime got more flexible, and platform owners drew a new line. The lines moved. The premise behind them did not: software is a thing you can hold still long enough to inspect.

Underneath the question of motive, there is a real epistemological problem. Apple’s reviewers don’t have a method for evaluating software whose behavior is determined at runtime by a model. The existing review process has nothing to look at. The wrapper is reviewable. What the wrapper does at three in the morning on a user’s phone, after a user types a prompt no reviewer ever saw, is not.

The reviewable artifact and the running artifact are not the same kind of thing.

It is not just reviews. Almost every layer of the modern software-distribution stack was built on the same premise.

Version numbers assume one canonical artifact with a defined lineage. Release notes assume there is a release. “Are you on the latest version?” assumes there is a latest. Bug reports assume two users see the same software. Documentation assumes the screenshots will match. Reproducibility assumes the binary doesn’t move while you’re looking at it.

Everything in the stack rests on the assumption that software, once shipped, holds still. App stores, package managers, versioning, CI/CD pipelines, support tooling, JIRA tickets, Stack Overflow answers, the entire grammar of “we shipped X in version Y”.

Adaptive software doesn’t hold still. Each user’s version drifts from every other user’s, sometimes immediately, sometimes over weeks. A bug isn’t reproducible because the software that produced it isn’t there anymore. There is no “latest version” because there is no version. Two users on the same product can’t compare notes about a feature, because the feature was generated for one of them and not the other.

The App Store’s premise was always going to expire. Software that holds still is a temporary condition, not a property. It held still for forty years because the tools to make it adaptive didn’t exist yet. They exist now.

The App Store enforces the assumption most strictly, and as a result Apple is being forced to ask the question first: what does review mean when the artifact and the runtime aren’t the same thing? The rest of the stack will soon be forced to ask it next.

Each part of the stack assumes something about software that’s about to stop being true. The infrastructure built around the old assumption — review queues, version numbers, package registries — was never going to survive software that no longer holds still.

While Apple was telling Replit to stop showing its generated apps in-app, OpenAI was building a platform whose entire premise is that apps generate themselves in-app.

In October 2025, OpenAI introduced the Apps SDK at DevDay. A few months later, the company opened public submissions and launched an app directory inside ChatGPT. By early 2026, the directory hosted Spotify, Zillow, Canva, Coursera, Booking.com, Expedia, Adobe Photoshop, Gmail, Microsoft Teams, Stripe — and Replit. The same Replit. Eight hundred million people use ChatGPT every week. There is a tools menu, a directory, a submission process, and a review queue. Essentially, an app store.

But the unit is different. ChatGPT apps aren’t binaries. They are MCP servers paired with web UI components that the model can render inline in a conversation. The model decides which app to surface based on conversational context. OpenAI’s own example: if you mention buying a house, the model can pull up Zillow inside the chat. The user didn’t navigate to Zillow. The model summoned it.

Discovery shifts from search to inference. The user doesn’t browse a directory and pick an app; the model picks one and the user accepts or doesn’t. The app does not have a fixed UI. It has capabilities and components, composed turn by turn against whatever the user is trying to do.

The protocol underneath all this is the Model Context Protocol, originally developed at Anthropic and now adopted across the industry. Both ChatGPT’s apps and Claude’s connectors run on it.

What’s worth noticing is the ladder. The App Store distributes binaries, the lowest rung. The ChatGPT directory distributes capabilities: MCP servers and components, one rung up. The next rung, which doesn’t exist anywhere at scale, is distribution by intent: the user expresses a need, the runtime composes a fulfillment, and what travels between developer and user is closer to a constraint set than a program.

Each rung up makes the underlying software more adaptive and the distribution model less compatible with the App Store.

It is tempting to read the contrast between Apple and OpenAI as openness defeating control, but the framing is wrong.

OpenAI still gatekeeps. There is a review process. There is a directory. Apps that don’t meet quality and safety standards don’t get in. The Agentic Commerce Protocol, OpenAI’s payments layer, is in beta with Stripe; some version of a take-rate is coming. The platform incentives port over from one generation of distribution to the next. The 30% may not survive in its current form, but the gatekeeper-with-rents structure will find a new shape.

What’s actually different is the kind of software the new model can hold. The App Store can hold software that doesn’t change. The ChatGPT directory can hold software that does.

We believe the second category is where everything is going, because adaptive software is what software becomes once the tools to build it exist. The platforms that survive will be the ones that can hold software in motion. The platforms that don’t will be remembered the way we remember CD-ROMs: a coherent technology that solved a real problem inside an assumption that didn’t last.

The contradiction is harder to ignore because both models live on the same device.

ChatGPT is an app on iOS. It went through Apple’s review. It is now distributing third-party software through its own internal directory, in a way that runs against the spirit of the rules Apple is currently enforcing against Replit. The contradiction has been ignorable so far because the ChatGPT directory is small relative to the App Store, and because most of its surface is conversational rather than visual.

This will likely not stay true.

As the directory grows into something that genuinely competes with the App Store as a place users find and use software, Apple’s reviewers will be forced into a decision. They can apply 2.5.2 to ChatGPT itself — restricting or pulling one of the most-used apps in the world — which invites antitrust action, regulatory attention, and a level of public scrutiny no App Store policy has ever attracted. Or they can accept that the rule applies only when the runtime is native, not when it’s mediated by a language model running on someone else’s servers. Either resolution reshapes the platform balance.

This is the kind of decision that won’t be announced. It will arrive as a specific call on a specific app review, made by a specific person in Cupertino who has to figure out, in 2026 or 2027, what 2.5.2 is supposed to mean now.

OpenAI is not the only company betting that distribution is moving up the stack.

Apple Intelligence is Apple’s own attempt to make Siri-as-agent the new distribution surface for its hardware. Gemini is doing the same on Android. Microsoft’s Copilot is making the same bet inside Windows and Office. On the browser side, OpenAI’s Atlas, Perplexity’s Comet, and Anthropic’s Claude in Chrome are betting that the agent inside the browser is where users will eventually do things, and that whoever runs that agent runs distribution.

Whether the new distribution layer lives in the OS, browser, chat, the unit of distribution is rising. From binary, to capability, eventually to something closer to intent. Whichever surface wins, the binary loses standing.

Meanwhile, the developers Apple has put on hold aren’t going quietly. Speaking at StrictlyVC this week, Replit’s CEO Amjad Masad called Apple’s stated reasoning “a lie” and said his company could prove it in court if necessary. Whatever the merits of Apple’s structural argument, it now has to be defended publicly, against opposition with capital, reach, and a willingness to litigate.

Apple’s reviewers were trained to look at code. The category of software they are now being asked to review doesn’t have code worth reading in the traditional sense. The code that runs is generated at the moment of use, on the user’s device or on a server somewhere, in response to a request the reviewer never saw.

The artifact has dissolved into the runtime.

Someone in Cupertino, fairly soon, is going to have to write a memo about what review means when there’s nothing static to inspect. The memo will say one of two things. It will either find a way to keep the assumption that software holds still, which buys time but loses the future, or it will concede that the artifact has dissolved into the runtime, and the App Store’s whole premise needs a different theory underneath it. We think the second memo is the one that gets written eventually. The only question is how many years of the first one come before it.

The fight is not really between Apple and Replit. It’s between two theories of what software is.

Thanks for reading Adaptive Software! Subscribe for free to receive new posts.

SubscribeShare

No posts