Delivery enterprise-quality code with AI brokers

Bloat varies dramatically by mannequin. Sonar’s LLM Leaderboard runs each frontier mannequin by 4,400+ Java duties and analyses the code generated. To finish the benchmark, GPT-5.4 Excessive generated 1,159,000 strains of code at an 81.05% cross price, whereas Claude Opus 4.7 Pondering generated solely 336,000 strains of code to return a greater than 82.52% cross price. Completely different fashions generate dramatically completely different code to realize comparable outcomes.

Bloat is not only messy. Carnegie Mellon researchers studied 807 open-source tasks that had adopted Cursor, matched in opposition to 1,380 controls, measured by SonarQube. A brief-term velocity achieve disappeared by month three, whereas static evaluation warnings rose 30% and code complexity rose 41% — each persistent. The more durable it grew to become to alter the codebase and the extra bugs it contained, the extra the speed was dragged down. Any skilled developer will know the way this goes: high quality issues compound till the code feels unimaginable to alter and the one possibility is the dreaded rewrite.

Three forces produce bloat as soon as a mannequin is in use:

Brokers don’t really feel the upkeep burden. Armin Ronacher, the creator of Flask, made the purpose on the Pragmatic Engineer podcast in late April. People really feel the price of dangerous code over time, and as Ronacher put it, “if the ache will get too huge, you as a human are incentivized to repair the reason for your ache” — so we refactor. Brokers don’t. They obliviously prolong dangerous construction indefinitely. A senior engineer’s job is to say no to pointless abstraction. The agent has no equal reflex.
Coaching rewards obvious completeness. Pretraining corpora are stuffed with explanatory materials — Stack Overflow solutions, tutorials, README snippets — intentionally self-contained and verbose. Put up-training compounds the impact: human raters choose outputs that look thorough, so fashions be taught that “complete” reads as higher. When unsure which edge case issues, the protected transfer is to deal with all of them. Every guard is regionally defensible. The mixture is bloat.
Iterative era has no deletion strain. Brokers add however hardly ever delete. Eradicating lifeless code doesn’t make any check go inexperienced, so outdated features accumulate alongside their replacements. SlopCodeBench, a March 2026 benchmark throughout 11 coding fashions, discovered rising structural complexity in 80% of trajectories and rising verbosity in 89.8%. Brokers proceed to patch dangerous code, treating each job as if it’s their final.

AC/DC: the loop that compensates

What closes the hole is a loop round every iteration of agent work. The agent does what it’s good at — producing code — and our job is to wrap that with three steps the agent can not reliably do by itself. At Sonar we name this the Agent Centric Improvement Cycle, or AC/DC: information, confirm, resolve.