Don't Rely on Their AI (API)

Why Every Business Needs an Exit from the API

"Don't rely on their AI."

This sentence will irritate a lot of people who spent the last two years building on top of commercial AI APIs. It should. Because if your core business operations depend on a third-party model endpoint you do not control, you have not built a resilient business. You have built a dependency — and dressed it up as a strategy.

A Single Point of Failure by Another Name

Every serious engineer knows the rule: eliminate single points of failure. Redundant infrastructure, fallback services, circuit breakers, graceful degradation. These are not advanced concepts. They are table stakes for anything that needs to keep running when something breaks.

And yet an entire generation of AI-powered products has been built on a single, uncontrolled, externally-hosted dependency — the commercial LLM API — without applying that same basic discipline.

The risk is not just an outage. It is multi-dimensional and largely invisible until it materialises:

Pricing changes overnight. Your unit economics are valid until they aren't. The provider reprices, introduces tiered rate limits, or restructures the model that your prompts were tuned for — and your margin calculation becomes fiction.
Model behaviour shifts without warning. Silent updates to the underlying model change outputs in ways that break downstream logic. The API contract guarantees an endpoint, not a behaviour.
Deprecations arrive on someone else's schedule. The model version your system depends on gets sunset. Migration is your problem, on their timeline.
Data leaves your infrastructure. Every API call is a data transfer. Customer data, operational data, proprietary context — all of it processed on infrastructure you do not own, under terms of service that can change, in jurisdictions that may not align with your compliance obligations.
The provider's business changes. Acquisitions, pivots, regulatory pressure, financial distress — your critical dependency is subject to forces entirely outside your control.

A strong business takes control of as many risk factors as possible. Outsourcing your core intelligence layer to a third-party API is the exact opposite.

Starting With Commercial APIs Is Fine. Staying Is a Choice.

To be clear: beginning with a commercial AI API is not a mistake. It is the fastest way to validate whether AI solves your problem at all. The marginal cost of an API call is low enough that exploration is cheap. You can test hypotheses, prototype workflows, and identify where AI actually adds value — before committing to infrastructure.

The mistake is treating that starting point as an architecture decision.

The responsible maturity path looks like this:

Phase 1 — Validate. Use commercial APIs freely. Explore use cases. Identify what actually works and what the value is. At this stage, flexibility and speed matter more than cost or control.

Phase 2 — Identify. Audit what you are actually using the API for. In most organisations, 80% of token consumption comes from a handful of repeated, well-defined tasks. These are your migration candidates — and as the article The Flamethrower Problem in this series argued, many of them should probably become deterministic pipelines rather than LLM calls at all.

Phase 3 — Migrate. Move high-frequency, well-defined tasks to self-hosted open source models or traditional automation. Reserve commercial APIs for genuinely novel, low-frequency, or high-complexity tasks where frontier capability is actually required.

Phase 4 — Own. Operate your own inference infrastructure. On-premise, sovereign-cloud-hosted, or a combination. Your data stays in your environment. Your costs are predictable. Your dependency on external providers is bounded and deliberate.

Most businesses never leave Phase 1. Not because the later phases are technically unreachable, but because the urgency was never there while pricing was subsidised. That urgency is now arriving.

The Open Source Gap Is Closing — Faster Than the Industry Admits

The standard objection to self-hosted models has been capability: frontier commercial models are simply better, and for serious business applications the gap matters.

This was true in 2023. It is significantly less true now. And the trajectory suggests it will be largely irrelevant for most business use cases within a foreseeable horizon.

The pattern has been consistent: capabilities that required GPT-4 in early 2023 were matched by open source models within 12 to 18 months, running on a single consumer-grade GPU. Code generation, document summarisation, classification, structured data extraction, multilingual understanding — the open source ecosystem has closed the gap in every high-frequency business domain, even if the absolute frontier of reasoning remains ahead.

More importantly: frontier capability is not what most business applications need. The tasks that consume the majority of enterprise AI token spend — summarising documents, classifying inputs, generating structured outputs, assisting with routine writing — do not require a 200-billion-parameter frontier model. They require a capable, fast, cost-efficient model that does the job reliably. That model exists today, runs locally, and costs the price of electricity.

The Frontier Is Approaching Diminishing Returns

There is a longer arc here worth understanding.

The rate of improvement in frontier AI has been extraordinary — but the gains are showing signs of flattening on the benchmarks that matter for practical applications. Each new frontier release delivers meaningful improvements at the edges of reasoning and multimodal capability, while the core business utility benchmarks move incrementally.

Meanwhile, open source development is still in a steep catch-up phase. The gap between frontier and open source is narrowing from both ends simultaneously: frontier gains are slowing, open source gains are accelerating.

The crossover — where open source models are capable enough for the vast majority of business use cases — is not a distant theoretical. For many specific domains, it has already happened. For the remainder, it is a matter of when, not if.

And when that crossover is broadly recognised, the commercial API business model faces a structural challenge it has no clean answer to. Inference at scale is not a software margin business. It is a compute and energy business. The economics do not improve with user growth the way SaaS does — they scale with hardware provisioning and electricity consumption. As those costs rise and as open source capability closes the gap, the premium for frontier API access has to be justified by a shrinking set of genuinely frontier use cases.

The Hardware Trajectory Points the Same Direction

The other variable that rarely enters this conversation is hardware.

Compute capable of running serious inference — the kind that required a data centre in 2022 — is now available on consumer and prosumer hardware. An RTX 4080 runs a 14-billion-parameter quantised model competently. Purpose-built inference hardware from Nvidia, AMD, and a growing number of challengers continues to push the capability-per-watt ratio upward and the cost downward.

The frontier providers know this. The likely direction of travel — already visible in early moves by some providers — is toward private cloud deployments and on-premise model instances for enterprise customers. The model weights themselves, running on your infrastructure, under your control, with your data never leaving your environment.

That future is approaching from two directions at once: hardware prices falling, and open source capability rising. On-premise inference for the majority of business AI workloads is not a niche technical option — it is the default architecture of a maturing industry.

The Compliance Case Is Already Closed

For Swiss and EU businesses specifically, this is not just an architectural preference. It is a legal and regulatory reality.

FADP and GDPR create meaningful obligations around data residency, processing transparency, and cross-border transfers. Every API call that sends customer or operational data to a US-hosted endpoint is a potential compliance exposure — regardless of what the provider's data processing agreement says. Legal architecture and technical architecture have to align.

Self-hosted inference resolves this cleanly. Data does not leave the infrastructure. Processing is auditable. There is no ambiguity about jurisdiction, no dependency on a third party's compliance posture, and no legal risk that reprices every time a provider updates their terms.

For any business handling sensitive data — which is most businesses in regulated industries — the question is not whether to move toward sovereign AI infrastructure. It is how quickly.

What the Providers Are Actually Doing

If self-hosted and on-premise deployment were the obvious direction, you might expect OpenAI and Anthropic to be moving there. They are not — and that is telling.

In May 2026, both companies launched enterprise deployment arms within 72 hours of each other. OpenAI's "Deployment Company" — a majority-owned subsidiary backed by over $4 billion in initial investment — dispatches embedded engineers directly into client organisations to design and deploy AI workflows. Anthropic launched a comparable services firm in partnership with Blackstone, Goldman Sachs, and Hellman & Friedman. The message from both was identical: the next phase of frontier AI is not about models. It is about deployment.

What neither announced was on-premise model deployment or the ability to run their weights on customer infrastructure. The strategy is the opposite: deepen the integration, embed the dependency, and monetise the services layer around an API that remains under their control.

This is a rational business decision for them. OpenAI's own estimates suggest it could lose $14 billion in 2026 — triple the figure for 2025 — while Anthropic lost an estimated $5.2 billion last year despite $30 billion in annualised revenue. Shipping model weights to customer hardware eliminates the recurring inference revenue that justifies those losses. They will not do it voluntarily.

The implication for businesses is direct: do not wait for the frontier providers to offer you an exit. They are structurally incentivised to prevent one. The path to infrastructure independence runs through open source — not through a commercial vendor's roadmap.

That said, the pressure will build. As open source capability closes the gap and as the cost and compliance arguments for self-hosted inference become undeniable, it is reasonable to assume that frontier providers will eventually offer private cloud or on-premise deployment for their highest-tier enterprise clients — not out of generosity, but out of competitive necessity. When that happens, it will likely arrive as a premium, heavily contracted offering for the largest accounts. It will not be the accessible, affordable path. Open source will be.

What a Resilient AI Architecture Looks Like

The goal is not to eliminate external AI providers entirely. It is to make your dependency on them deliberate, bounded, and replaceable.

A resilient architecture:

Runs the majority of AI workloads on self-hosted, open source models — locally or on sovereign cloud infrastructure — for predictable cost and full data control
Uses commercial APIs selectively — for genuinely frontier tasks, low-frequency complex reasoning, or during the validation phase of new use cases
Treats commercial API calls as an architectural liability — logged, governed, and subject to ongoing review for migration candidates
Invests in deterministic pipelines for high-frequency, well-defined tasks — removing the AI dependency entirely where the problem allows
Monitors the open source landscape continuously — because the model capable of replacing your commercial dependency may already exist

This is not an ideological position. It is straightforward risk management, applied to a dependency that most organisations have adopted without the scrutiny they would apply to any other critical infrastructure decision.

The Uncomfortable Conclusion

The commercial AI API model — frontier capability, billed by the token, hosted on infrastructure you do not control — is a transitional architecture. It was the right way to access AI capability at a specific moment in the technology's development, when open source alternatives were genuinely insufficient and hardware was prohibitively expensive.

That moment is passing.

The businesses that recognise this now — that begin the migration toward owned infrastructure, open source models, and deterministic pipelines — will emerge from the transition with lower costs, stronger compliance posture, and a resilience that their API-dependent competitors will not have.

The businesses that don't will keep paying the token tax, on someone else's terms, until the economics force the decision anyway.

The exit from the API is not a technical challenge. The models are available. The hardware is affordable. The path is clear.

What it requires is the same thing the article The Flamethrower Problem identified: knowing what you are doing, and making deliberate decisions rather than deferring them indefinitely.

You may start with the API. But if you never plan your exit, the API owns you — and the people selling it are counting on exactly that.