Introducing Claude Opus 4.8

We’re upgrading Claude Opus to a brand new model: Claude Opus 4.8. It builds on Opus 4.7 with enhancements throughout benchmarks, and is a more practical collaborator. It’s obtainable as we speak for a similar value.

Opus 4.8 launches alongside a number of new options. Customers on claude.ai now have management over the quantity of effort Claude places right into a process. Claude Code has a brand new “dynamic workflows” function that enables it to sort out very large-scale issues. And quick mode for Opus 4.8—the place the mannequin can work at 2.5× the pace—is now thrice cheaper than it was for earlier fashions.

Opus 4.8’s capabilities

The desk under exhibits how Opus 4.8 compares to its predecessor and to different fashions on exams of coding, agentic abilities, reasoning, and sensible information work duties. Extra particulars and a a lot wider vary of functionality evaluations are offered within the Claude Opus 4.8 System Card.

Collaborating with Opus 4.8

Early testers have discovered Claude Opus 4.8 to be extra dependable and sharper in its judgement when it’s performing agentic duties. Under are quotes from many of those testers about their expertise collaborating with Opus 4.8:

Claude Opus 4.8 has noticeably higher judgment. In Claude Code, it asks the fitting questions, catches its personal errors, pushes again when a plan isn’t sound, and builds up confidence round complicated, multi-service explorations earlier than making massive adjustments. It’s an amazing mannequin to construct with.

On our Tremendous-Agent benchmark, Claude Opus 4.8 is the one mannequin to finish each case end-to-end, beating prior Opus fashions and GPT-5.5 at parity on price. For agent merchandise in translation, deep analysis, slide-building, and evaluation, it delivers highly effective reliability.

On CursorBench, Claude Opus 4.8 exceeds prior Opus fashions throughout each effort stage. Instrument calling is meaningfully extra environment friendly, utilizing fewer steps for a similar intelligence, and it carries end-to-end duties by means of.

Claude Opus 4.8 delivers the very best rating recorded on our Authorized Agent Benchmark, and is the primary mannequin to interrupt 10% general on the all-pass normal. For substantive authorized work, that’s the form of accuracy carry that interprets instantly into how a lot actual legal professional work our prospects can hand off with confidence.

Claude Opus 4.8 seems like a serious quality-of-life replace over Opus 4.7: quicker, simpler to collaborate with, and higher at carrying context and elegance route throughout an extended session. Opus 4.8 is the mannequin I stored trusting for work the place voice, style, and technical execution all need to occur side-by-side.

Claude Opus 4.8 is the strongest computer-use and browser-agent mannequin we’ve examined, scoring 84% on On-line-Mind2Web, which is a significant bounce over each Opus 4.7 and GPT-5.5. It stays reflective and on-task in the best way our prospects’ agent workloads must be dependable end-to-end.

Claude Opus 4.8 makes use of instruments cleanly and follows directions with the consistency our autonomous engineering workloads must maintain working unattended. It improves on Opus 4.6 and fixes the comment-verbosity and tool-calling points we noticed with Opus 4.7. This launch from Anthropic interprets instantly into quicker functionality features for engineers constructing on Devin.

On our long-running evals, Claude Opus 4.8’s evaluation was constantly greater high quality than prior Opus fashions. It completed quicker and produced richer, extra data dense outputs. Total, a noticeably higher sign to noise ratio. The most important differentiator was Opus 4.8’s tendency to proactively flag points with the inputs and outputs of an evaluation, one thing different fashions routinely missed and left to the customers to catch.

Throughout CoCounsel Authorized, Claude Opus 4.8 delivered significant enhancements in consistency and reasoning high quality in comparison with prior Opus fashions. For the high-stakes skilled workflows our prospects rely on, that reliability issues. As we construct fiduciary-grade AI techniques for authorized and tax professionals, advances like these assist elevate the usual for trusted AI efficiency in real-world workflows.

Claude Opus 4.8 units a brand new bar for enterprise AI. In Genie, Databricks’ AI agent for knowledge and information work, the brand new Opus mannequin unlocks a step change in agentic reasoning, tackling deeper, multistep questions quicker than any prior Opus. Its multimodal power additionally lets Genie cause instantly over PDFs, diagrams, and different unstructured content material at 61% cheaper token price than Opus 4.7.

For financial-document workflows in Hebbia’s orchestrator, Claude Opus 4.8 delivers the identical sturdy high quality as Opus 4.7 with noticeably higher quotation precision and extra token effectivity on retrieval, which works extremely properly for the sorts of dense filings our prospects run day-after-day.

Some of the distinguished enhancements in Opus 4.8 is its honesty. We prepare all our fashions to be sincere—for example, to keep away from making claims that they will’t help. However a common downside with AI fashions is that they often bounce to conclusions, confidently claiming to have made progress of their work regardless of the proof being skinny. Early testers report that Opus 4.8 is extra more likely to flag uncertainties about its work and fewer more likely to make unsupported claims. That is borne out in our evaluations, which present that Opus 4.8 is round 4 occasions much less seemingly than its predecessor to permit flaws in code it has written to go unremarked.

As at all times, we ran an in depth alignment evaluation on the mannequin earlier than launch. By way of optimistic traits, our Alignment workforce concluded that Opus 4.8 “reaches new highs on our measures of prosocial traits like supporting consumer autonomy and appearing within the consumer’s finest curiosity.” The evaluation additionally confirmed Opus 4.8 to have charges of misaligned conduct (resembling deception or cooperation with misuse) which are considerably decrease than Opus 4.7, and much like our best-aligned mannequin, Claude Mythos Preview. The total alignment evaluation, accompanied by a set of pre-deployment security exams, is reported within the Claude Opus 4.8 System Card.

Additionally launching as we speak

Along with Claude Opus 4.8, we’re making the next updates:

Dynamic workflows. This new function, obtainable in analysis preview, permits Claude to tackle even greater duties in Claude Code. Claude can plan the work after which run a whole bunch of parallel subagents in a single session (and with Opus 4.8, the brokers can run for even longer). It then verifies its outputs earlier than reporting again to the consumer. For instance, Claude Code with Opus 4.8 can now perform codebase-scale migrations throughout a whole bunch of hundreds of strains of code from kickoff to merge, with the present take a look at suite as its bar. You’ll be able to learn extra about dynamic workflows—obtainable in Claude Code for Enterprise, Staff, and Max plans—in this post.
Effort management in claude.ai and Cowork. A brand new management alongside the mannequin selector lets customers select how a lot effort Claude places right into a response. On greater effort settings, Claude will suppose extra regularly and extra deeply to offer higher responses. On decrease effort settings, Claude will reply quicker and deplete a consumer’s fee limits extra slowly. Customers now have this selection—the hassle management is accessible on all plans.
The Messages API now accepts system entries contained in the messages array. Builders can replace Claude’s directions mid-task with out breaking the immediate cache or routing the replace by means of a consumer flip. This can be utilized in a given harness to replace permissions, token budgets, or setting context as an agent runs.

A observe on effort

Opus 4.8 defaults to excessive effort, which we decide to be the very best general stability of high quality and consumer expertise. On coding duties, this effort stage spends the same variety of tokens as Opus 4.7’s default, however with higher efficiency. Customers can select “further” (“xhigh” in Claude Code) or “max,” and the mannequin will spend extra tokens to get higher outcomes; we suggest utilizing “further” for troublesome duties and long-running asynchronous workflows. We’ve got elevated fee limits in Claude Code to accommodate the upper token utilization of upper effort ranges; customers can choose whichever is sensible for his or her specific mission.

What’s subsequent?

Customers will discover Opus 4.8 to be a modest however tangible enchancment on its predecessor. There’s nonetheless extra to be executed: we’re engaged on creating and releasing fashions that present most of the identical capabilities as Opus at a decrease price.

Not solely that, however we plan to launch a brand new class of mannequin with even greater intelligence than Opus. As a part of Project Glasswing, a small variety of organizations are presently utilizing Claude Mythos Preview for cybersecurity work. Fashions of this functionality stage require stronger cyber safeguards earlier than they are often typically launched. We’re making swift progress on creating these safeguards and count on to have the ability to convey Mythos-class fashions to all our prospects within the coming weeks.

Availability

Claude Opus 4.8 is accessible all over the place as we speak. Pricing for normal utilization is unchanged from Opus 4.7: $5 per million enter tokens and $25 per million output tokens. Pricing for quick mode is $10 per million enter tokens and $50 per million output tokens. Builders can use claude-opus-4-8 through the Claude API.