FinOps Evolves to Manage Generative AI Spend

According to SiliconANGLE, the discipline of FinOps is adapting as token economics from generative AI reshape enterprise budgets and introduce new cost categories. SiliconANGLE reports that speakers at a day two keynote broadcast on theCUBE warned that organisations must account for costs beyond model inference, including database throughput, developer hardware and increased data movement. Hays, described by SiliconANGLE as senior vice president and head of engineering excellence and technology strategy execution at Fidelity Investments, said, "You have to get transparency in your token costs," and added that token pricing "impacts probably a dozen or more costs around you." The article also cites theCUBE Research's 2025 data, which Nashawaty reported shows 24% of organisations want to release code on an hourly basis, a cadence the article links to faster model deployment.
What happened
According to SiliconANGLE, the FinOps discipline is finding its footing as organisations confront the budgetary effects of generative AI and its token-based pricing models. The article reports that token economics are forcing teams to rethink which costs count, citing inference, database throughput, developer hardware and additional data costs such as movement into services like Snowflake. The piece includes a direct quote from Hays, described by SiliconANGLE as senior vice president and head of engineering excellence and technology strategy execution at Fidelity Investments: "You have to get transparency in your token costs," Hays said, "but you have to understand actually how it impacts probably a dozen or more costs around you." The discussion took place during a day two keynote analysis broadcast on theCUBE, per SiliconANGLE. The article also cites theCUBE Research's 2025 finding that 24% of organisations want to release code on an hourly basis, reported by Nashawaty.
Editorial analysis - technical context
Token-priced inference shifts the unit of consumption from CPU/GPU-hours to model-specific token usage, which increases the importance of fine-grained telemetry. Companies and practitioners typically need to combine model-level billing lines with storage and database throughput metrics, and to map token counts back to feature pipelines and embedding stores. Observability stacks that correlate requests -> token usage -> data ingress/egress become essential in this environment.
Context and significance
Industry observers note that as model deployment cadence accelerates, cost governance moves beyond cloud bills into cross-functional workflows involving engineering, data, and finance. This creates demand for cost-aware development practices, more granular chargeback or showback mechanisms, and tooling that can report costs per model, per feature, and per customer interaction. These are broader patterns seen across organisations adopting production-scale generative AI.
What to watch
Indicators to monitor include vendor billing granularity for token and inference metrics, emergence of FinOps features tailored to model telemetry in cost-management platforms, uptake of per-request or per-feature cost dashboards, and whether organisations publish internal practices for mapping token costs to business metrics. SiliconANGLE did not include a public statement from the named financial institutions explaining changes to their vendor contracts or internal org charts.
Scoring Rationale
This story highlights a notable operational shift for teams managing production AI costs. It matters to practitioners because it identifies concrete telemetry and governance needs, though it does not report a new product or industry-wide mandate.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
