Harness Survey Finds Gaps Measuring AI Coding Productivity

DevOps.com reports on a Harness survey of 700 developers and engineering leaders that finds 89% have seen productivity improvements on tracked metrics after adopting AI tools, while 81% say time spent reviewing code increased. The survey, reported by DevOps.com, finds just under a third of the workday is consumed by AI-related tasks that existing metrics do not track. Per the report, 94% of respondents say technical debt, validation time, and developer burnout are not captured by current productivity metrics; specific untracked activities include reviewing AI-generated code (53%), fixing subtle AI-introduced bugs (52%), explaining AI code to teammates (48%), and context switching (45%). DevOps.com cites comments from Trevor Stuart, general manager and senior vice president for Harness, urging organizations to revisit how they measure engineering productivity in the age of AI.
What happened
DevOps.com reports a Harness survey of 700 developers and engineering leaders that finds 89% have seen improvements in the productivity metrics their organizations track after adopting AI tools, and 81% report increased time spent reviewing code. The report states that just under a third of the workday is now consumed by AI-related tasks that existing metrics do not track. Per DevOps.com, 94% of respondents said technical debt, validation time, and developer burnout are not being tracked by existing productivity metrics. The survey lists specific untracked activities: time spent reviewing AI-generated code (53%), fixing subtle bugs introduced by AI (52%), explaining AI-generated code to teammates (48%), and context switching between tools (45%). The report also notes only 38% of respondents said their organizations track time spent reviewing AI-generated code.
Technical details
Editorial analysis - technical context: In practice, introducing generative AI into development workflows raises two measurable instrumenting problems. First, observable outputs such as lines of code or tokens consumed do not capture downstream validation, debugging, and review work. Second, cognitive overhead from tool switching and interpreting AI outputs becomes a hidden cost that standard telemetry and CI metrics often miss. Both issues complicate end-to-end productivity measurement and cost accounting for model usage.
Context and significance
Industry context: The survey highlights a broader industry pattern where early AI adoption boosts certain throughput metrics while creating new, uninstrumented workstreams. Reporting framed the so-called "token-maxxing" measurement approach as potentially incentive-distorting, per DevOps.com. For engineering leaders and platform teams, that pattern raises questions about how to correlate model usage, review time, and production ship rates to obtain a truthful productivity signal.
What to watch
For practitioners: observers should watch whether organizations expand telemetry to include review and validation time, adopt standardized tagging for AI-generated artifacts, and track production ship rates alongside model consumption. Also monitor whether vendor and internal tooling evolve to expose review/validation latency and provenance metadata that can be mapped back to productivity metrics.
Scoring Rationale
The survey documents measurable gaps practitioners face when instrumenting AI-assisted development-important for engineering and platform teams but not a frontier research or infrastructure shock. The findings matter for tooling and metrics decisions across teams.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


