Donating our open-source alignment software

In October 2025, we launched Petri, an open-source toolbox of alignment exams that may be utilized to any massive language mannequin. Petri, which was developed as a part of our Anthropic Fellows program, can be utilized to quickly and simply check AI fashions for regarding tendencies like deception, sycophancy, and cooperation with dangerous requests. It’s a part of our efforts to develop alignment instruments which might be open and helpful for the entire AI improvement neighborhood.

Petri has been a part of our alignment evaluation for each Claude mannequin since Claude Sonnet 4.5. It compares how the brand new mannequin behaves throughout a spread of alignment-relevant eventualities which might be simulated by a separate “auditor” mannequin. An extra “decide” mannequin then scores the ensuing transcripts for misaligned behaviors.

We’ve been happy to see Petri being utilized by exterior organizations: for instance, the UK’s AI Safety Institute (AISI) made it a major part of how they consider fashions for his or her propensity to sabotage AI analysis.

We’re now updating Petri to its third model. Listed here are a few of the greatest modifications:

Adaptability. Petri 3.0 includes main architectural modifications that permit customers to adapt it to extra makes use of, particularly by splitting the auditor mannequin and the goal mannequin into separate elements that may be tweaked individually;
Realism. Even supposing alignment researchers attempt to make exams seem lifelike, a mannequin can usually deduce from numerous artificialities within the setup that it’s really a part of a check. And if the mannequin is conscious it’s being evaluated, the researcher is now not capable of see how the mannequin behaves usually. An add-on to Petri, which we’re calling “Dish,” makes the setup much more lifelike, for instance by working the exams utilizing the mannequin’s actual system immediate and the actual “scaffold” (the software program that wraps across the mannequin to assist it meet its targets) that will be utilized in real mannequin deployments;
Depth. We’ve now built-in Petri with our different open-source alignment software, Bloom, which may carry out rather more in-depth assessments of particular chosen behaviors (compared to Petri’s wider-ranging strategy).

We’re additionally giving Petri a brand new house. We’ve got handed over its improvement to Meridian Labs, an AI analysis nonprofit. This transfer—just like after we donated the Mannequin Context Protocol (MCP) to the Linux Basis—will assist be certain that Petri stays impartial of any AI lab, in order that its outcomes shall be seen as impartial and credible by these throughout the trade and past.

As a part of Meridian Labs, Petri joins different instruments like Inspect and Scout, constructing a know-how stack that’s open to labs, impartial researchers, and governments alike, at a time when dependable exams of AI mannequin habits matter greater than ever.

You may learn extra about Petri 3.0 on the Meridian Labs blog.

Directions to put in and use Petri might be discovered on the Petri website.