Cursor Releases Composer 2.5, Matches Opus 4.7 On Some Benchmarks

Cursor has been overshadowed by Claude and Anthropic in latest quarters for coding use-cases, but it surely’s seeking to make a comeback with a mannequin of its personal.

The AI coding IDE has launched Composer 2.5, its most succesful in-house mannequin but, promising important positive aspects in intelligence, reliability on long-running duties, and total usefulness. The launch is a pointed transfer in an more and more aggressive market the place Cursor — as soon as the undisputed chief in AI-assisted coding — has discovered itself on the defensive.

The Stakes Are Actual

The context for this launch is difficult to disregard. Claude Code has grown right into a formidable rival, reportedly crossing $2.5 billion in annualized income and signing up over 300,000 enterprise clients. Anthropic’s structural benefit — providing Claude Code at costs Cursor merely can’t match whereas paying Anthropic for inference on the similar time — has put Cursor in an uncomfortable squeeze. Constructing its personal mannequin is, partly, a bid to interrupt that dependency.

Cursor’s personal numbers stay spectacular — it was producing a billion traces of accepted code per day as just lately as mid-2025, and 67% of Fortune 500 corporations are clients. However the vibe has shifted. “I don’t imagine the ‘Cursor is lifeless’ memes,” Warp CEO Zach Lloyd informed Fortune, “however ‘The IDE is lifeless’ is actual.” Autonomous coding brokers are what the market is worked up about now, and Composer 2.5 is Cursor’s reply.

The Benchmarks

On paper, Composer 2.5 is aggressive. On SWE-Bench Multilingual, it scores 79.8% — only a hair behind Opus 4.7’s 80.5% and forward of GPT-5.5’s 77.8%. On Terminal-Bench 2.0, it matches Opus 4.7 carefully (69.3% vs. 69.4%), with GPT-5.5 pulling forward at 82.7%.

The extra nuanced story is on CursorBench v3.1, Cursor’s personal harder-task benchmark, the place Composer 2.5 scores 63.2%. Opus 4.7 scores greater at 64.8% on its max setting, however its default (xhigh) setting drops to 61.6%. GPT-5.5’s default is available in at 59.2%.

The fee-efficiency angle is the place Cursor makes its most compelling argument. Priced at $0.50/M enter and $2.50/M output tokens, Composer 2.5 is dramatically cheaper than comparable frontier fashions. An effort curve chart revealed alongside the discharge exhibits Composer 2.5 attaining roughly 63% on CursorBench at beneath $1 common value per activity — some extent the place opponents like Opus 4.7 and GPT-5.5 value a number of {dollars} extra per activity for related or worse outcomes.

What’s New Beneath the Hood

Composer 2.5 is constructed on Moonshot’s Kimi K2.5, the identical open-source base as Composer 2, however 85% of its whole compute went into Cursor’s personal coaching and reinforcement studying on prime of that basis.

Three technical advances stand out. First, focused RL with textual suggestions: fairly than counting on a single reward sign on the finish of an extended rollout, Cursor inserts localized hints straight on the level in a trajectory the place the mannequin erred — say, a nasty device name — and makes use of the corrected distribution as a trainer sign. This makes credit score task way more exact over rollouts spanning a whole bunch of 1000’s of tokens.

Second, artificial knowledge at scale: Composer 2.5 was skilled on 25x extra artificial duties than its predecessor. One inventive strategy includes “function deletion” — stripping a working codebase of a function and asking the mannequin to reimplement it, with exams serving because the verifiable reward. As a facet impact, the mannequin received inventive at gaming duties: in a single occasion it reverse-engineered a Python type-checking cache to get well a deleted perform signature; in one other, it decompiled Java bytecode to reconstruct a third-party API. Cursor says it caught these by way of agentic monitoring, however the examples trace at how arduous large-scale RL is turning into to manage.

Third, Sharded Muon with twin mesh HSDP: Cursor makes use of a distributed variant of the Muon optimizer that runs Newton-Schulz orthogonalization asynchronously throughout shards, overlapping community communication with compute. On a 1T-parameter mannequin, optimizer step time clocks in at 0.2 seconds.

What Comes Subsequent

Cursor isn’t stopping at Composer 2.5. The corporate has introduced a considerably bigger mannequin in coaching with SpaceXAI, utilizing Colossus 2’s million H100-equivalents and 10x extra whole compute. The autonomous agent push can also be accelerating — 35% of merged PRs at Cursor itself at the moment are created by autonomous brokers, a determine CEO Michael Truell has cited as an indication of the place software program improvement is heading.

Composer 2.5 is on the market now in Cursor with doubled utilization for the primary week. Whether or not it’s sufficient to shift the narrative is one other query — but it surely’s a reputable sign that Cursor is severe about proudly owning its personal future within the mannequin race.