Kaggle is making AI benchmark creation easy

As AI fashions evolve from easy chatbots into reasoning brokers that write code, use instruments and remedy complicated issues, conventional benchmarks are not sufficient. The group wants dynamic, rigorous evaluations — constructed by the individuals who use these fashions within the real-world.

That’s why we launched Kaggle Benchmarks. Since then, the worldwide AI group has created greater than 10,000 analysis duties, creating the reliable, clear public leaderboards that assist labs measure and speed up AI progress.

Right this moment, we’re taking the following step by launching native improvement for Kaggle Benchmarks.

Use Kaggle Benchmarks out of your native improvement setting

Till now, creating analysis duties meant working completely in Kaggle’s web-based pocket book editor, as a substitute of builders’ most popular stack to construct with.

Our new replace allows builders to create, validate, push, run and obtain duties immediately from their native improvement environments like Antigravity, VSCode, Cursor and coding brokers. This replace is designed to fulfill builders the place they work, making the journey from thought to analysis quicker and extra intuitive.

Construct analysis duties in pure language with AI coding brokers

Native improvement additionally unlocks a robust new workflow: utilizing AI coding brokers to write down benchmark duties by the write-kaggle-benchmarks skill. This talent includes a set of structured directions that teaches a coding agent how one can construct duties utilizing the kaggle-benchmarks SDK and the Kaggle CLI.

So as to add this talent to your agent, merely ask your agent to:

As soon as put in, you possibly can describe an analysis in plain language and get a working job on Kaggle. For instance, you possibly can inform your agent:

These highly effective capabilities are pushed by the brand new instructions that we’ve got constructed for Benchmarks within the Kaggle CLI.