New York

October 15–17, 2025

Berlin

November 3–4, 2025

London

June 2–3, 2026

Most companies still aren't measuring AI coding tools

AI adoption is soaring, but 82% of organizations still aren't measuring the impact of AI coding tools.
August 20, 2025

You have 1 article left to read this month before you need to register a free LeadDev.com account.

Estimated reading time: 5 minutes

While AI adoption is high and tech leaders boast big productivity gains, LeadDev’s AI Impact Report reveals the real challenge is measuring what matters. 

In just a few years, AI coding assistants have gone from curiosity to commonplace. Tools like GitHub Copilot, once experimental, are now embedded in daily workflow across the industry. 

According to LeadDev’s AI Impact Report 2025, 98% of respondents are at least exploring the use of AI tools and models, yet the most common challenge for 60% of respondents was a lack of clear metrics to evaluate their impact. 

Meanwhile, industry giants trumpet impressive claims. In June 2025, Google CEO Sundar Pichai cited a “10% boost in engineering capacity” thanks to AI. In the same month, Microsoft said GitHub Copilot now writes 40% of their code, enabling them to launch more products in a year than in the previous three combined.

Yet beneath the headlines and adoption stats, the real question remains – how do companies measure AI’s ability to create meaningful impact?

The biggest challenge for engineering orgs right now

According to LeadDev’s AI Impact Report 2025, only 18% of organizations are currently measuring the impact of AI coding tools at all, and even among them, consensus is elusive. The most common metric, “development time per feature,” is used by just 47%.

Part of the challenge lies in the limitations of current AI tools themselves. Taylor McCaslin, in his blog, points out that contemporary AI tools “lack the ability to assess the broader architecture of the application,” a problem that becomes even more pronounced in microservices environments. 

Laura Tacho, CTO at developer intelligence platform DX, agrees with Mcaslin’s sentiment. 

“The tools themselves in the ecosystem are maturing so rapidly. It’s difficult to know what to measure because a lot of the times, it’s not even clear what the capabilities of the tool are,” she said.

“When we talk about acceptance rate, a lot of the metrics that were popularized early on were metrics that were meant to show whether or not the tools were fit for purpose, not to measure the impact of them across an organization,” she explained. 

Paul Sliwinski, director of customer success at Aiimi, argues that the “disconnect largely stems from the complexity of AI return on investment (ROI) and a lack of clarity around what success looks like.” 

He adds that many AI initiatives fail because they lack clear business objectives and rely on poor-quality data, making impact hard to measure.

The challenge isn’t adopting AI, it’s figuring out how to measure what really matters before those hidden gains – or costly blind spots – slip through the cracks.

The measurement vacuum

So, where are companies directing their AI investments? According to the survey, 85% of respondents are prioritizing AI applications that support internal engineering tasks such as building internal dashboards, improving testing, or assisting with code generation

Among these, automated code generation is the most common task offloaded to AI tools, with 48% of respondents using AI for everything from producing boilerplate code to refactoring existing code.

However, measuring the true impact of AI tools is far from straightforward. As Johnson explained in his blog, “most companies get tripped up by measuring lines of code generated, commit frequency, or story point velocity. These metrics were designed for a pre-AI world and they fundamentally miss what AI productivity actually looks like.” 

For example, AI-generated code may work, but it often causes repetition and code bloat by patching local areas instead of improving the overall architecture. This is especially problematic in object-oriented languages – where DRY (don’t repeat yourself) principles are critical – since duplicated code increases maintenance effort, inconsistency risks, and technical debt.

Measuring productivity increases as a result of AI adoption

AI’s widespread adoption has fueled hype – especially in boardrooms – about its potential to significantly boost engineering productivity.

But measuring the impact of AI on engineer productivity isn’t as simple as tallying lines of code generated or time saved. As Johnson points out, “When an AI tool generates a hundred lines of boilerplate in thirty seconds, did productivity increase?” 

The real question, he argues, is what developers do with the time freed up: “Did they spend more time thinking through edge cases? Did they write better tests? Did they have a deeper conversation with a product manager about user experience?” 

DX’s Tacho added that to improve productivity, AI should be deployed to solve specific, measurable issues tied to business or team objectives – not just given to individuals without a clear organizational purpose.

To assess an AI tool’s value, Tacho recommends tracking three dimensions: utilization, impact, and cost. 

She explained that utilization includes adoption metrics such as tool penetration and the number of engineers actively using AI tools. 

Impact combines direct measures like time saved per engineer per week with established performance benchmarks such as the DX Core Four, which remain essential for evaluating software quality and delivery speed. 

Cost, assessed later in maturity, ensures ROI without discouraging adoption.

Tacho cautions against over-focusing on a single metric, noting that developer productivity is multidimensional and should include factors like experience, quality, and business impact alongside speed

Golden thread of value

As AI tools become commonplace, the challenge isn’t adoption itself – it’s ensuring they drive meaningful impact.

Amid this, Sliwinski stresses that AI tools should be treated “as solutions to defined problems, rather than just shiny new accessories.” 

Sliwinski explained that teams should start with a clear guiding question: “If process X is automated or augmented, how much could outcome Y improve?” 

This “golden thread of value” links AI efforts to measurable, business-relevant results and encourages consistent tracking, Sliwinski said.

Successful adoption requires engineering, product leadership, and other teams to “align and identify relevant outcomes together.” Once agreed, this allows teams to set clear expectations, connect them to measurable indicators, and track the results.

As AI is embedded, organizations should assess “what decisions are improving, what are people doing differently with the time they have got back, and how much are core business KPI’s improving.” 

This tailored, context-specific approach helps teams create reliable metrics even when AI tools are embedded across complex, business-wide processes, Sliwinski adds. 

New York ticket prices go up soon

The companies that will get the most out of AI aren’t the ones chasing flashy adoption stats – they’re the ones willing to measure it with the same discipline they apply to any other strategic investment, and to change course when the data shows they should.