GitHub says developers are 55% faster with Copilot. Microsoft reports a 26% productivity boost. But a July 2025 study from METR found experienced developers were actually 19% slower when using AI tools.
Stack Overflow's developer satisfaction with AI dropped from 77% to 72% in just one year. Only 43% of developers say they're confident in AI accuracy.
Who's right? And what does it mean for your workflow?
The answer is more nuanced than any single headline suggests. Here's what the research actually shows.
The Case for Productivity Gains
Let's start with the studies showing AI helps.
GitHub/Microsoft Original Study (February 2023)
The foundational GitHub Copilot study that launched a thousand vendor claims:
- 95 professional developers participated
- Task: Implement an HTTP server in JavaScript
- Result: Group with Copilot completed 55.8% faster
- Developers also reported higher satisfaction and reduced frustration
This study established the "AI makes you faster" narrative. But note the conditions: a specific, well-defined task in a controlled environment with a particular tech stack.
Microsoft/Accenture Fortune 100 Study (2024)
Microsoft followed up with a larger enterprise study:
- 5,000 developers across Fortune 100 companies
- 26% productivity increase measured over sustained period
- Framed as: "An 8-hour day produces 10 hours of output"
- Covered more diverse tasks and longer timeframe
This addressed some limitations of the original study—more developers, longer duration, varied work. But still relied partly on self-reported metrics.
IBM watsonx Code Assistant Study (December 2024)
IBM surveyed 669 developers using their code assistant:
- Significant perceived productivity improvements (self-reported)
- Top use case: understanding code, not just writing it
- Interesting finding: developers maintained "shared sense of authorship" even with AI assistance
- Strongest benefits for code explanation and documentation
JetBrains Developer Survey (October 2024)
JetBrains surveyed 481 programmers about their AI tool usage patterns:
- Most frequent use: repetitive tasks and boilerplate
- Two distinct modes identified:
- "Acceleration mode" — AI handles mundane work
- "Exploration mode" — AI helps understand complex unfamiliar code
- Developers selective about when to engage AI assistance
The Case Against (Or At Least, Complications)
Now for the studies that complicate the narrative.
The METR Study (July 2025) — The Bombshell
The METR study (full paper) dropped a statistic that made headlines:
Setup:
- 16 experienced open-source developers
- 246 real issues from repositories averaging 22,000+ stars
- Frontier AI tools: Cursor Pro with Claude 3.5/3.7 Sonnet
- Developers highly experienced with both their codebases AND AI tools
Results:
| Metric | Value |
|---|---|
| Expected speedup (developer prediction) | +24% faster |
| Perceived speedup (what developers thought happened) | +20% faster |
| Actual result | 19% slower |
The gap between perception and reality is striking. Developers believed they were getting a 20% speedup while actually experiencing a 19% slowdown.
As Simon Willison noted in his analysis, this wasn't a study of novice AI users—these were experienced developers who knew their tools and their codebases intimately.
TechCrunch's coverage highlighted the key implication: AI coding tools may genuinely help some developers while slowing down others.
Stack Overflow Developer Survey (2024)
The annual Stack Overflow survey revealed erosion in AI sentiment:
- Developer satisfaction with AI: 77% (2023) → 72% (2024)
- Only 43% confident in AI accuracy
- 45% rate AI "bad or very bad" at complex tasks
- Trust declining even as adoption increases
GitClear Code Quality Analysis (2025)
GitClear analyzed code metrics across repositories using AI assistance:
- Copy-paste code: Up 4x
- Refactoring: Dropped from 25% to 10% of code changes
- "Moved" code (a signal of thoughtful refactoring): Plummeted
- Interpretation: More code generated, less code improved
Why Studies Disagree
These aren't bad studies reaching wrong conclusions. They're measuring different things under different conditions.
Variable 1: Task Type
| Study | Task Type | Finding |
|---|---|---|
| GitHub | Specific exercise (HTTP server) | +55% faster |
| METR | Real issues in production repos | -19% slower |
Simple, well-defined tasks with clear success criteria: AI helps. Complex, ambiguous tasks requiring system understanding: AI may hurt.
Variable 2: Developer Experience
The pattern across studies:
- Junior developers: Significant gains. AI provides patterns they haven't learned yet.
- Experienced developers: Smaller or negative gains. Already fast; AI adds overhead.
The GitHub study included a mix of experience levels. The METR study specifically selected experienced developers who deeply knew their codebases. Different populations, different results.
Variable 3: Codebase Familiarity
- New or unfamiliar codebases: AI provides useful context and suggestions
- Deeply familiar codebases: AI adds noise; developer already knows the answer
METR developers had contributed to their repositories for years. They didn't need AI to explain the code to them.
Variable 4: Measurement Method
The perception gap in the METR study is crucial:
- Self-reported productivity: Often inflated
- Objective measurement: Sometimes tells different story
- Developers thought they were 20% faster while being 19% slower
Many positive studies rely heavily on self-reported metrics. This isn't dishonest—perception matters—but it's not the same as measuring actual output.
Variable 5: Time Horizon
Microsoft found it takes approximately 11 weeks to realize full productivity benefits from AI tools. Studies vary dramatically in duration:
- Some measure single sessions
- Others track weeks or months
- Learning curve costs aren't always captured
When AI Coding Tools Work
Based on the research, clear patterns emerge about where AI helps.
Best fit scenarios:
| Use Case | Why It Works |
|---|---|
| Boilerplate generation | Repetitive, pattern-based, low risk |
| Tests and configs | Well-defined structure, easy to verify |
| Unfamiliar languages/frameworks | AI bridges knowledge gaps |
| Code explanation | Understanding > generation |
| API discovery | Finding methods, libraries, patterns |
| Prototyping | Speed matters more than perfection |
Worst fit scenarios:
| Use Case | Why It Struggles |
|---|---|
| Complex debugging | Often makes it worse (Stack Overflow data) |
| Architecture decisions | Lacks full system context |
| Performance optimization | Doesn't understand constraints |
| Security-critical code | 45% failure rate (Veracode) |
| Familiar, well-understood tasks | Overhead exceeds benefit |
The Junior vs. Senior Gap
One of the clearest findings across research: experience level dramatically affects outcomes.
For Junior Developers
AI genuinely bridges skill gaps:
- Provides patterns they haven't learned yet
- Accelerates the learning curve
- Reduces frustration with unfamiliar syntax
- GitHub data shows biggest productivity gains for newer developers
The "leveling up" effect is real. A junior developer with AI assistance can produce code that looks more like senior output—at least on the surface.
For Senior Developers
The calculus is different:
- Already have patterns internalized
- Time reviewing AI output may exceed time saved
- AI suggestions often worse than what they'd write
- But: still valuable for exploration and unfamiliar domains
The METR study specifically showed this: experts on their own codebases were slowed down by AI tools.
The Context Window Problem
A fundamental limitation underlies many AI tool struggles.
What AI Tools See
- Current file
- Maybe a few related files
- Limited history
What Real Engineering Requires
- Understanding of entire system architecture
- Knowledge of business constraints
- Awareness of performance characteristics
- Historical context of why code exists
The METR Insight
The study involved repositories averaging 22,000+ stars—real, complex projects with millions of lines of code. AI tools couldn't grasp the full context. Suggestions were often technically correct but contextually wrong.
This is where the next generation of tools has opportunity: full project understanding, not just file-level assistance.
How Developers Actually Use These Tools
The JetBrains research revealed practical usage patterns that don't match the "AI writes all my code" narrative.
Common patterns:
- Use AI for first draft, then heavily edit
- Turn off AI for complex logic
- Rely on AI for boilerplate, avoid it for business logic
- Context switch between AI-assisted and manual coding
Trust calibration happens over time:
- Initial enthusiasm
- Reality check when AI fails
- Learned intuition about when AI helps vs. hurts
- Selective engagement based on task type
Most productive developers aren't using AI for everything—they're using it strategically.
What This Means Going Forward
The trajectory is clear: AI tools are improving rapidly. Context windows are expanding. Agent capabilities are increasing.
But fundamental limitations remain:
- AI doesn't understand your business domain
- AI doesn't know your performance constraints
- AI can't see the full picture of a complex system
- AI makes confident mistakes
What to watch for in tools:
- Full project understanding (not just files)
- Task completion (not just suggestions)
- Integration with test/deploy pipelines
- Quality metrics (not just speed)
The Bottom Line
AI coding tools help. But it's complicated.
The research shows:
| Factor | Impact on AI Benefit |
|---|---|
| Task complexity | Simple = helps, Complex = may hurt |
| Developer experience | Junior = big gains, Senior = smaller/negative |
| Codebase familiarity | Unfamiliar = helps, Familiar = overhead |
| Measurement method | Self-reported often inflated |
Don't trust vendor benchmarks blindly. A 55% improvement on a specific JavaScript exercise doesn't mean 55% improvement on your production codebase.
Your mileage will literally vary. The same tool can make one developer faster and another slower, depending on task, experience, and context.
Best approach: Experiment with your actual workflow. Measure honestly. Use AI strategically for tasks where research shows it helps, and maintain your own skills for tasks where it doesn't.
Try AI Coding with Full Project Context
Orbit is built differently. Instead of file-level suggestions, AI agents understand your entire project. Describe what you want to build, and agents handle implementation with full context awareness.
Sources & Further Reading
The METR Study (19% Slower)
- METR Blog: Early 2025 AI Experienced OS Dev Study — Original announcement and methodology
- arXiv: Full Research Paper — Complete study with data
- TechCrunch: AI coding tools may not speed up every developer — Industry coverage
- Simon Willison's Analysis — Expert breakdown of implications
Positive Studies
- GitHub Blog: Quantifying Copilot's Impact on Developer Productivity — The foundational 55% faster study
Additional Context
- Stack Overflow Developer Survey 2024 — AI satisfaction and trust metrics
- JetBrains Developer Ecosystem Survey 2024 — Usage patterns and modes
- GitClear Code Quality Analysis 2025 — Code metrics under AI assistance