Token Spend is a Vanity Metric

Token Spend is a Vanity Metric

"Tokenmaxxing" had a good run. For a few months in early 2026, companies fell over themselves to get engineers consuming as many AI tokens as possible. Leaderboards went up. Budgets were allocated by the team. Usage became a proxy for innovation. Then the bills came due.

Uber burned through its entire 2026 AI coding budget by April. Microsoft cancelled Claude Code licences across major divisions. Seven companies, including Klarna, GitHub, and the Commonwealth Bank of Australia, publicly pulled back, capped, or restructured their AI spending. The narrative flipped overnight from "AI is transforming everything" to "AI costs more than the humans."

And here's where the industry is getting it wrong, again. The problem was never the technology. The problem is that token spend is a vanity metric. We measured the wrong thing, panicked at the number, and are now drawing exactly the wrong conclusion.

Vanity Metrics vs. Value Metrics

Alistair Croll and Benjamin Yoskovitz defined this problem cleanly in Lean Analytics: a vanity metric is any number that goes up and to the right but doesn't connect to a decision. It feels good. It looks impressive in a slide deck. But it doesn't tell you whether you're creating value.

Token consumption is the AI equivalent of measuring website hits in 2005. More tokens consumed tells you absolutely nothing about whether the work was productive, valuable, or transformative. A developer burning $2,000 in tokens to ship a critical feature that unblocks a $10M contract is a steal. A developer burning $200 on tokens to generate boilerplate that gets rewritten the next day is waste. In neither case does the dollar figure tell you anything about the value produced.

Douglas Hubbard's How to Measure Anything gives us the framework: measurement is about reducing uncertainty in decisions. If your metric doesn't help you make a better decision, it's not measurement; it's accounting. And accounting, while necessary, is not strategy.

The questions that matter are not "how much did we spend on tokens?" They are:

  • Productive: Did the AI-assisted work ship? Did it pass review? Did it reduce cycle time?
  • Valuable: Did the output connect to a business outcome? Revenue, retention, risk reduction?
  • Transformative: Did the tool enable work that couldn't have been done at all, or would have taken 10x longer, without it?

These are harder to measure than a billing dashboard. That's the point. The easy metric is almost never the right one.

We already have a research-validated framework for this. Google's DORA program has spent a decade proving that outcomes (deployment frequency, lead time, change failure rate, mean time to restore) correlate to organizational performance in ways that outputs never do. Lines of code didn't work. Commit counts didn't work. Token spend won't work either. The pattern holds: measure the system's ability to deliver value reliably, not the volume of activity that went into it.

The Inefficiency Phase Is Not Failure

Every technology follows the same pattern. There's a period of massive inefficiency before the market finds efficient applications. The internet in 1996 was mostly broken links and geocities pages. Cloud computing in 2008 was expensive, unreliable, and poorly understood. Mobile apps in 2009 were mostly fart buttons and flashlights.

Nobody concluded that the internet was broken because early adoption was wasteful. We understood that the cost of exploration is inherently higher than the cost of optimization, because you haven't yet learned what to optimize.

AI tooling is in its exploration phase. The companies that burned through their budgets didn't fail at AI. They failed at scoping the problem. They gave engineers unlimited token budgets without defining what success looked like, without connecting usage to outcomes, and without understanding which problems were worth throwing compute at.

That's not a technology failure. That's a management failure. And it's a perfectly normal one for this phase of the adoption curve.

Domain Knowledge First, Tools Second

Here's the part that doesn't change regardless of the technology: you have to understand the problem before you can solve it.

Someone with AI who doesn't understand a business problem won't be any better at solving it. They'll only more quickly arrive at the wrong answer. The tool amplifies whatever you point it at — including confusion, misalignment, and bad assumptions.

The 2025 DORA Report confirms this empirically. For high-performing teams with solid platforms, workflow clarity, and team alignment, AI acts as an accelerator. For teams struggling with technical debt and process dysfunction, AI magnifies those problems, often leading to worse outcomes. Cortex's 2026 engineering benchmark tells the same story in starker numbers: PRs per author up 20% year-over-year with AI tools, but change failure rates also up 30% and incidents per pull request up 23.5%. More output. Worse outcomes. The definition of measuring the wrong thing.

This has always been true. A spreadsheet didn't make a bad analyst good. A CI/CD pipeline didn't make a bad architecture resilient. A faster car doesn't fix a wrong turn; it just gets you to the wrong destination sooner.

The organizations seeing real returns from AI tooling share a pattern: they start with domain clarity. They know what problem they're solving. They know what "good" looks like for that problem. They know what metrics connect to business value. And then they apply the tool, iteratively, measuring the things that matter at each step.

Iteration Is the Whole Game

The lean model was never "get it right the first time." It was build-measure-learn. Form a hypothesis. Test it cheaply. Measure the outcome (not the input). Iterate.

Applied to AI adoption, this looks like:

  1. Identify a specific problem with clear success criteria tied to business value.
  2. Apply the tool to that problem in a bounded way.
  3. Measure the output against the success criteria, not the token bill.
  4. Iterate: adjust the approach, the prompt architecture, the workflow integration.
  5. Scale what works. Get rid of what doesn't.

This is not revolutionary. It's how we've always adopted technology successfully. The only thing that changed is the speed at which you can burn money if you skip steps 1 and 3.

The Right Metric

Stop measuring token spend as a proxy for anything. Start measuring what the tokens produced.

Cycle time reduction. Defect rate. Feature throughput. Time to first value. Problems solved that were previously intractable. Revenue protected or generated. If you're in software delivery, DORA's four keys already give you the scaffolding: Are you deploying more frequently? Is lead time shrinking? Is your failure rate stable or declining? Can you recover quickly when something breaks? These metrics existed before AI tools arrived, and they'll outlast the current hype cycle because they measure what matters: the system's capacity to deliver value reliably.

These metrics require you to understand your domain. They require you to know what "value" means for your specific context. They require iteration and refinement. There's no billing dashboard that gives you this for free.

That's the work. The AI doesn't replace it. The AI accelerates it, but only if you've done the hard part first.

The companies pulling back aren't proving that AI doesn't work. They're proving that vanity metrics and unlimited budgets are a bad combination regardless of the technology. The answer isn't less AI. It's better questions, measured against outcomes that matter.