The Flamethrower Problem

How "AI" Became a Synonym for Waste

"Use AI to replace AI."

This one-liner might sound provocative. But it captures something the enterprise tech and the business world is only now being forced to confront: the most expensive way to automate a task is often to throw a large language model at it — and the smartest move is to use that LLM exactly once: To figure out how to never need it again.

The Illusion of Transformation

Between 2023 and 2025, the narrative took hold that AI was transforming business. Productivity was about to explode. Knowledge work would never be the same. And the proof? Adoption metrics thorugh token consumption: Leaderboards ranking engineers by how much AI they used.

At least until the bills arrived.

Uber burned through its entire 2026 AI tooling budget in four months. Microsoft started revoking Claude Code licenses as costs spiraled. JPMorgan published a note titled "AI Token Costs Are Eating Internet Profits Alive." Engineering analytics firm Faros AI found that code churn — lines of code deleted versus added — increased by over 800% under heavy AI adoption.

The uncomfortable question nobody was asking during the subsidy-fuelled adoption phase: What problem were we actually solving, and did we use the right tool?

But there is a second question, quieter and more uncomfortable still: Did the people using these tools actually know what they wanted?

How "AI" Got Hijacked by One Product Category

Here is the distortion at the heart of the current crisis: the word "AI" has been quietly redefined to mean "LLM API call."

This is not a semantic complaint. It has real consequences.

Artificial intelligence, as a field, encompasses decades of rigorous applied mathematics:

Gradient boosting models (XGBoost, LightGBM) dominating tabular data prediction
Convolutional neural networks for image classification and quality control
Recurrent and transformer-based time series models for demand forecasting
Anomaly detection algorithms for fraud, infrastructure monitoring, and sensor data
Recommendation engines powering e-commerce and content platforms
Classical NLP pipelines for document classification and entity extraction

These are not legacy tools. They are production-grade, cost-efficient, and battle-tested. They can run in a Docker container. They have no per-token billing. Their inference cost is essentially electricity. And critically: their outputs are deterministic and auditable.

None of them made the hype cycle because you cannot wrap them in a chat interface and sell a subscription by Tuesday. And none of them reward vague intent with plausible-looking output. They demand specification upfront — which, it turns out, is exactly the discipline that separates valuable automation from expensive noise.

The Real Cost of the Flamethrower

LLMs are genuinely remarkable tools. They excel at a specific band of tasks:

Understanding and generating unstructured natural language
Flexible interface generation and summarization
Tasks where the output specification is inherently ambiguous
One-shot reasoning over novel problems

Outside that band, using an LLM is like using a flamethrower to light a candle. It works. It is spectacular. And it will burn down your kitchen.

The structural problem with agentic AI specifically is that there is no linear relationship between a task prompt, token consumption, and the usability of the outcome. An agent navigating a multi-step workflow introduces compounding nondeterminism at every step. The underlying model is updated without notice. Tool APIs change. Context window handling shifts between versions. You cannot version-pin an agent the way you version-pin a software library.

This is a fundamental incompatibility with production engineering culture — any serious system that demands reproducibility, auditability, and cost predictability is structurally at odds with how agentic LLMs actually work.

But here is what the architecture critiques tend to miss: the unpredictability of agents is not only a technical property. It is also a human one. An agent given a vague goal by someone who has not yet figured out what they want, will explore the solution space at full token burn until it produces something the user can react to. That is not automation. That is outsourced thinking, billed by the word.

The Subsidy That Distorted Everything

Much of the "AI is transforming our workflows" narrative was built on pricing that was clearly below cost. OpenAI and Anthropic were buying adoption with venture capital. The real compute economics were hidden behind investor patience and a growth-at-all-costs mandate.

Adoption curves built on subsidized pricing historically do not survive normalization. Food delivery, ride-hailing, cloud storage — the pattern is consistent. Retention holds only where genuine ROI existed independently of the discount.

Now that infrastructure bills are real and pricing is moving toward cost-reflective levels, the reckoning is arriving. Companies that adopted AI horizontally and indiscriminately — that measured success in token consumption rather than business outcomes — are the most exposed.

What the subsidy also masked was a competence gap. When the tool is cheap enough to experiment with it recklessly, nobody is forced to ask whether the person using it is qualified to direct it. Price discipline enforces specification discipline. The two were always linked — the subsidy just hid that link until now.

Use AI to Replace AI

Here is a more disciplined approach, and the basis of that provocative one-liner.

LLMs are excellent at one specific meta-task: helping you understand a problem well enough to solve it deterministically. A developer who uses Claude once to reason through a data transformation problem, then implements it as a typed, tested, versioned pipeline — has extracted genuine value. They have paid for intelligence once, not indefinitely.

Notice what this requires: the developer already understands the problem domain well enough to recognise a good solution when the LLM produces one. They can evaluate the output, reject the plausible-but-wrong answer, and translate the useful insight into production code. The LLM provided leverage. The developer provided judgment.

Remove the judgment, and the same workflow becomes an infinite loop of generation, confusion, and re-prompting — each cycle burning tokens, producing churn, and arriving nowhere.

This is the correct mental model:

Use an LLM to understand the task — its inputs, edge cases, failure modes
Use an LLM to prototype the solution — quickly, exploratorily
Replace the LLM with a deterministic pipeline — scripted, typed, deployable
Use classical ML where prediction is needed — not language generation

A coder who scripts the ten most repetitive LLM requests their team makes — turning them into parameterized functions, scheduled jobs, or lightweight microservices — eliminates the token cost entirely while making the behavior reproducible and the system maintainable.

Classical ML: The Quiet Workhorse Nobody Talks About

For most real business automation problems, the right tool is not an LLM. Consider:

Problem	Right Tool	LLM Needed?
Predict customer churn	XGBoost on CRM features	No
Detect fraudulent transactions	Isolation Forest / gradient boosting	No
Classify support tickets	Fine-tuned classifier or TF-IDF + SVM	Rarely
Forecast inventory demand	Prophet / LSTM time series model	No
Detect anomalies in sensor data	Autoencoder / statistical models	No
Extract entities from documents	spaCy NER pipeline	No
Recommend products	Collaborative filtering / matrix factorization	No
Parse and route structured forms	Rule engine / regex + schema validation	No

Each one of these runs locally. Each one has auditable, reproducible output. Each one can be deployed as a Docker container and run indefinitely at the cost of electricity. None of them require a vendor API, a usage policy, a data processing agreement, or a per-token invoice.

Crucially, none of them tolerate vague intent. You cannot prompt your way into a well-trained fraud detection model. You need domain knowledge, labelled data, feature engineering, and validation discipline. That specificity is a feature, not a limitation — it forces the practitioner to understand the problem before touching the tooling.

The business case compared to agentic LLMs is not even close. But it requires something the hype cycle never demanded: knowing what you are doing before you start.

What Good AI Governance Actually Looks Like

The companies that will extract durable value from AI — in the broad, correct sense — are the ones that treat it as an engineering discipline rather than a procurement category.

That means:

Matching tool to task — LLMs for language, classical ML for prediction, deterministic code for repetitive logic
Treating token spend like compute budget — with visibility, governance, and accountability
Preferring local inference where data sensitivity and cost predictability matter
Investing in the boring fundamentals — data quality, feature engineering, model validation — before reaching for the API
Building pipelines, not prompts — systems that can be tested, versioned, and deployed like software

And above all: requiring that the people directing automated systems can actually specify what they want before the meter starts running.

The Bottleneck Was Never the Model

The AI industry has sold a seductive idea: that large language models are the great equalizer. That anyone, regardless of background or expertise, can now do expert-level work. That the gap between a junior analyst and a seasoned domain expert has been closed by a sufficiently capable model.

The data from the past two years tells the opposite story.

The productivity gains attributed to AI are almost entirely captured by the top quartile of users — people who already knew what good output looked like, who could evaluate and reject the plausible-but-wrong answer, and who arrived at the tool with a clear goal in mind. For them, LLMs provide genuine leverage. For everyone else, they provide expensive iteration.

A junior developer tokenmaxxing through a complex architecture problem is not being augmented. They are generating code they cannot evaluate, in a system they do not understand, at a cost they cannot see. The 800% code churn is not a failure of the model. It is a failure of specification — and behind that, a failure of domain competence.

This is the gap the industry has refused to name, because naming it undermines the democratization narrative that justified the valuations.

Agentic AI and LLM orchestration will only produce a real return on investment when the person directing the system is deeply competent in the domain they are automating. Not competent in prompting. Not certified in AI tools. Competent in the work itself — experienced enough to know what the goal looks like, precise enough to describe its boundaries, and skilled enough to recognise when the output misses them.

Everything short of that produces the same result: exponential token consumption, ballooning energy costs, and outcomes that wouldn't survive scrutiny in any serious engineering review.

The bottleneck was never the model. It was always the mind behind the prompt.

And no amount of infrastructure investment, no frontier model release, no agentic framework update closes that gap. Only education and experience do — and those have always been slow, expensive, and impossible to subscription-price.

The most powerful tool in the world, used in the hands of someone who doesn't know what they want, is simply reduced to expensive discovery.