Enterprises Move Beyond Token Counts to Measure AI

PYMNTS reports, citing the Financial Times, that some Amazon engineering teams used the company\u0000s internal AI tool MeshClaw to inflate token-usage leaderboards this spring; the Financial Times described the behaviour as "tokenmaxxing" and quoted an employee saying, "Managers are looking at it." PYMNTS also reports similar behaviour at Meta. PYMNTS argues that raw token consumption is a poor proxy for business value and that enterprise finance teams face unpredictability as costs shift to per-call model billing. PYMNTS cites Salesforce as a case study: the outlet reports Salesforce introduced $2-per-conversation pricing for Agentforce in late 2024 and logged 5,000 Agentforce deals in the first two quarters under that model, but only 3,000 paid. PYMNTS describes Salesforce\u0000s current unit, the Agentic Work Unit (AWU), as a move to measure discrete tasks completed by agents rather than tokens.
What happened
PYMNTS, citing the Financial Times, reports that internal leaderboards tracking token consumption prompted some employees at Amazon to use the company\u0000s internal AI tool MeshClaw to delegate tasks to agents and boost token totals, a practice the Financial Times calls "tokenmaxxing." PYMNTS reports similar incentives at Meta, and quotes an FT-sourced employee: "Managers are looking at it." PYMNTS frames tokens as a poor metric for value and attributes billing and forecasting difficulties to token-based consumption models. PYMNTS reports that Salesforce tested $2-per-conversation pricing for Agentforce in late 2024, logging 5,000 deals in its first two quarters under that model but only 3,000 paid, and that Salesforce currently measures work in Agentic Work Units (AWUs).
Editorial analysis - technical context
Metrics that reward raw model calls or token volume create optimisation pressure on users. Industry-pattern observations show that when consumption is the measured KPI, teams often prioritize activity that increases the tracked metric rather than downstream outcomes. This pattern surfaces technical issues such as inefficient prompting, agentic workflow leakage, and harder-to-audit chains of tool calls.
Industry context
For finance and procurement teams, token-based billing converts engineering choices into direct cost drivers. Organizations shifting from fixed-license SaaS to per-call model economics face forecasting gaps and procurement friction, as reported by PYMNTS in its Salesforce example. Observed patterns across vendors include experimenting with unit definitions that better align cost with discrete business outcomes, as illustrated by Salesforce\u0000s move to AWUs.
What to watch
Monitor whether more vendors publish alternative billing units (task- or outcome-based metrics), whether industry consortia recommend standard usage units, and whether enterprises adopt internal guardrails to separate valuable agent outcomes from metric-driven noise. For practitioners: track both token consumption and outcome metrics when evaluating agentic systems.
Scoring Rationale
The story matters to practitioners who manage cost, procurement, and observability for AI systems. It highlights real enterprise pain with token-based billing and vendor responses, but it is not a frontier technical breakthrough.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


