Radar Developments to Watch: Might 2026


Essentially the most important rigidity on this concern is between two firms making totally different choices about the best way to deal with AI with frontier safety capabilities. Anthropic restricted Claude Mythos to a small company cohort by way of Project Glasswing. OpenAI launched GPT-5.5 to normal availability, and a few are calling it “Mythos-like hacking, open to all.” The AI Security Institute’s evaluation confirms the potential is actual and consequential. How will you handle danger when the time between discovery of a vulnerability and exploitation collapses to zero?

One other essential theme is that, within the phrases of The Sequence, “AI is changing into operational.” It’s now not about LLMs that may play video games with phrases. It’s about instruments that may automate processes throughout an enterprise: brokers, in fact, however extra particularly brokers that may be shared by groups to provide a constant set of instruments that can be utilized by teams.

AI Fashions

The open-weight mannequin market is reshaping the economics of AI. This cycle introduced at the least 10 important mannequin releases or updates throughout open and closed suppliers, with pricing strain coming from a number of instructions. DeepSeek now performs inside a fraction of Claude Opus 4.7 on coding benchmarks at a radically cheaper price; Alibaba, Google, Z.ai, and Moonshot all launched succesful open fashions this cycle. The Stanford AI Index paperwork this at scale. For organizations constructing on AI, the query is now not whether or not open-weight alternate options are viable however which trade-offs they’re prepared to make on value, portability, and assist.

  • Google has printed an inventory of 1,302 real-world use cases for generative AI. It’s very lengthy and doubtless not value studying by yourself. Nevertheless, you would possibly need to level your agent at it.
  • OpenAI has announced GPT Photographs 2, its flagship mannequin for producing pictures. The preliminary response is that it’s barely higher than Google’s Nano Banana. What distinguishes Photographs 2 is that it “thinks” earlier than producing the picture.
  • Anthropic used Claude to work on some problems in alignment research. Claude outperformed the people at decrease value. The issues have been, admittedly, cherry-picked to be simply scoreable. However the experiment additionally demonstrated {that a} much less succesful mannequin can supervise a stronger mannequin.
  • Moonshot Labs has released Kimi K2.6, the newest in its collection of open fashions. It additionally open sourced the Kimi Vendor Verifier, a instrument that assessments the accuracy of distributors promoting inference utilizing Kimi.
  • Alibaba has released Qwen3.6-35B-A3B, the newest mannequin in its Qwen collection. It’s a mixture-of-experts mannequin with 3B energetic parameters. Simon Willison reviews that it draws great flamingos, if you happen to take into account that related.
  • Anthropic has released Claude Opus 4.7. The mannequin is positioned as an intermediate step between Opus 4.6 and Claude Mythos Preview. Anthropic claims that 4.7 is best at multimodal work, together with imaginative and prescient, instruction following, and reminiscence use. Its new tokenizer will increase the variety of tokens that Claude makes use of. As a result of billing is predicated on tokens, that’s successfully a value enhance. Simon Willison has built a instrument to check the token utilization of various fashions.
  • Google has announced Gemini 3.1 Flash TTS, a text-to-speech mannequin that provides extraordinary management over the audio system: accents, type, expression, and extra.
  • Stanford’s 2026 AI Index Report is out, with over 400 pages of information and evaluation concerning the state of AI.
  • Meta’s refactored AI lab has released its first mannequin, Muse Spark. It’s a multimodal mannequin that has been designed for integration with Meta’s merchandise. There’ll ultimately be a Considering Mode for orchestrating brokers.
  • DeepSeek has released a preview model of DeepSeek-V4, its newest open-weight mannequin. It’s a big mannequin (over 1T parameters) with efficiency very near the frontier fashions, however (as Simon Willison points out) operating it is vitally cheap.
  • OpenAI released GPT-5.5, which some are calling “Mythos-like hacking, open to all.” Along with being its “smartest and most intuitive” mannequin but, OpenAI claims that it reduces token counts, thereby lowering value. Other sources report that, whereas it scores extremely on benchmarks, GPT-5.5 is markedly extra prone to hallucinate and supply incorrect solutions.
  • Z.ai’s GLM-5.1 is a brand new model of the open supply GLM-5 mannequin that has been optimized to carry out nicely on long-running duties.
  • Google has released Gemma 4, a brand new model of its household of open supply fashions. The household features a 31B model and a mixture-of-experts model with 26B parameters, 4B energetic. These are all reasoning fashions which might be designed for agentic workflows. One mannequin, Gemma 4 E4B, can run on the iPhone and Android.

Software program Growth

Anthropic has clearly been successful the announcement race. Whether or not it’s additionally successful on efficiency is a unique query. Claude Code was a favourite amongst builders till its efficiency slipped. Many switched to newly launched Cursor 3, which places an agentic interface entrance and middle whereas relegating the IDE to the background. Anthropic’s public postmortem on Claude Code’s conduct regression is value studying each for its particular findings and as a mannequin for a way AI suppliers ought to talk high quality points to builders. And Cursor’s transformation from an IDE into an agent is a sample we anticipate to see repeated throughout the trade.

  • OpenAI has announced “workspace brokers.” Workspace brokers will be shared throughout a staff, whereas the brokers we have now thus far are tied to particular person productiveness. They allow a staff to collaborate on constructing shared instruments to automate workflows.
  • Microsoft has announced two new instruments, Critique and Council, that use Claude and GPT collectively to resolve analysis issues. Their benchmark outcomes present that the mixture works higher than any mannequin used by itself.
  • Stash is an open supply reminiscence layer that agent builders can use to attach their brokers to fashions. We’re starting to see an agentic stack that’s composed of interchangeable modules.
  • Builders have been complaining a couple of drop in Claude Code’s conduct over the previous few months. Anthropic has issued a response explaining what occurred and the way they’re fixing it.
  • Glif is an agent that tries to unify all of the LLMs and instruments at your disposal. You don’t should determine which mannequin or instrument is finest for every process; it makes the choice for you and will get the duty achieved.
  • OpenAI has decoupled its agent harness from computing and storage, enabling sturdy long-running brokers. The harness is now open supply and will be custom-made by way of the Brokers SDK.
  • Anthropic has introduced Claude Code routines. A routine is a package deal that features a immediate, a repository, and connectors that can run robotically on Anthropic’s infrastructure, both on a schedule or when triggered.
  • Anthropic additionally announced Claude Managed Brokers, a prebuilt harness for creating brokers that run on Anthropic’s infrastructure. The harness supplies a lot of the infrastructure that an agent wants (reminiscence administration, and so on.) however will be configured for the consumer’s duties. Anthropic’s purpose seems to be changing into the AWS of agentic AI: a service supplier for instrument builders.
  • Interoperability between instruments, fashions, and plug-ins is permitting a new programming stack to develop: an orchestration layer, an execution layer, and a overview layer.
  • Amazon has launched an agent registry service as a part of AWS Bedrock AgentCore. Bedrock AgentCore is a set of providers that make it straightforward to construct and deploy brokers on AWS. The registry provides builders a method to uncover third-party brokers that is likely to be helpful to their work.
  • Bryan Cantrill’s essay on laziness is a must-read. AI isn’t lazy, and that’s an issue. When work prices nothing, there’s no want to consider future employees. Laziness is a advantage that we have to protect.
  • Anthropic has announced Claude Design, a brand new instrument designed to assist designers. It competes immediately with Figma and Canva. It’s at the moment in “analysis preview.”
  • Perplexity has launched Personal Computer, a neighborhood AI agent that runs on a devoted Mac mini (Home windows to return) and has persistent entry to your information, native apps, inbox, and the online.
  • Anthropic has launched a Claude plug-in for Microsoft Word, focusing on the authorized market. Automated edits seem as tracked adjustments.
  • LiteParse is a command-line instrument that extracts textual content from PDF information. In case you’ve by no means wanted to do this, you’ve lived a blessed life. Simon Willison has constructed a web-based version that runs LiteParse within the browser.
  • Luke Wroblewski has said that designers ought to code; they should perceive their medium. However round 2014, heavyweight frameworks like React and Angular bought in the way in which. Coding brokers are actually making “collapsing the hole between designing and constructing.”
  • Cursor 3, the letest launch of Cursor, relegates its IDE to the background. The principle display is designed for orchestrating brokers. You possibly can fall again to the IDE for modifying code if you have to.
  • Within the first quarter of 2026, Apple’s app retailer has seen a huge (84%) increase in the number of new apps, in comparison with the primary quarter of 2025. The trigger might be the convenience of utilizing AI to create new apps. Apple additionally seems to be limiting the usage of “vibe coding” to create new apps, and has eliminated a number of vibe coding apps from the app retailer.
  • Anthropic unintentionally leaked the supply code for Claude Code, prompting waves of commentary. Two of essentially the most fascinating are Shlok Khemani’s tour of what he discovered fascinating within the supply and Gergely Orosz’s discussion of the authorized implications.
  • The Hidden Technical Debt of Agentic Engineering” argues that, as with machine learning, brokers are comparatively small elements of bigger software program techniques, and that technical debt accumulates in all of the supporting modules.
  • Chat is never the most effective interface for working with AI. Ethan Mollick writes that the present technology of AI fashions and brokers are able to creating task-specific interfaces on the fly.

Safety

Safety has spent quite a lot of time within the information. Two core instruments for safe personal networking, Tor and Sign, have been attacked. In each circumstances, the assault didn’t contain the software program or protocols themselves. These assaults train us that safe techniques are sometimes jeopardized by the software program that surrounds them. We’ve additionally seen that ransomware gangs are utilizing postquantum encryption, and that quantum computer systems are prone to break conventional encryption before anticipated. In case you’re not investing in safety, it’s time to start out.

  • The Tor community is the gold normal for safe personal networking. Researchers not too long ago found a vulnerability in Firefox browsers that lets attackers de-anonymize identities. The vulnerability has been mounted in Firefox 150, but it surely’s a reminder that something will be attacked.
  • Everyone knows that ransomware gangs use encryption. The Kyber group is making the transition to postquantum encryption.
  • A supply chain attack against npm permits unhealthy actors to steal builders’ credentials. As soon as it has contaminated a sufferer, it inserts itself into different packages that the sufferer publishes.
  • Legislation enforcement companies have been briefly capable of exploit a vulnerability in iOS notifications that allowed them to access unencrypted messages despatched with the Sign safe messaging system. The vulnerability has been patched. It’s essential to grasp that the vulnerability wasn’t in Sign itself however within the setting by which it operated.
  • With AI, time from discovery of a vulnerability to exploitation has dropped to zero. To assist protection catch up, Google has added three agents to its Google Safety Operations platform: Menace Looking, Detection Engineering, and Third Social gathering Context.
  • Microsoft reviews that criminals are increasingly using Teams to impersonate assist desk personnel, who ask customers for his or her credentials after which steal knowledge.
  • NIST has stopped assigning severity scores to lower-priority vulnerabilities. All vulnerabilities will nonetheless be added to the Nationwide Vulnerability Database (NVD).
  • The NSA is using Claude Mythos Preview, regardless of Anthropic being blacklisted by the Pentagon. Anybody need to guess what they’re utilizing it for?
  • Anthropic will ask for identity verification in some circumstances.
  • Small open-weight fashions can do as well as Anthropic’s Mythos at discovering vulnerabilities. The important thing isn’t the mannequin; it’s the system inside which the mannequin works.
  • A new malware campaign embeds credit-card stealing software program right into a single pixel SVG picture. ecommerce websites utilizing Magento Open Supply or Adobe Commerce are weak.
  • Anthropic has pulled its latest mannequin, Claude Mythos, from broader launch as a result of it’s too good at finding vulnerabilities in different software program. They’ve made it obtainable to a few corporations by way of Project Glasswing, an try to safe crucial software program earlier than it may be exploited. The AI Safety Institute’s analysis of Claude Mythos Preview says that it “represents a step up over earlier frontier fashions in a panorama the place cyber efficiency was already quickly enhancing.”
  • Many open source safety maintainers agree with Greg Kroah-Hartmann‘s report that the standard of AI-generated safety bug reviews has gone up tremendously.
  • Versions of Claude Code that include the Vidar malware have been printed on GitHub. They’re based mostly on the code that Anthropic inadvertently leaked. These variations entice victims to obtain them by claiming to have unlocked enterprise options.
  • Claude has been used to discover zero-day distant code execution vulnerabilities in each Vim and Emacs. The vulnerabilities are triggered when a consumer opens a file. An replace is obtainable for Vim; Emacs builders argue that it’s actually a bug in Git, which can be right however misses the purpose.
  • Breakthroughs in quantum computing imply that computer systems able to cracking current encryption algorithms could also be on the horizon.

Infrastructure and Operations

A number of suppliers launched overlapping items of an agent stack this cycle, masking orchestration, persistence, reminiscence, and registry providers. A 3-layer mannequin (orchestration, execution, overview) is changing into the usual structure, however every vendor’s implementation makes totally different bets about portability and sturdiness. It’s essential to guage every vendor’s merchandise rigorously earlier than selecting an agent stack.

  • Microsoft now permits admins to uninstall Copilot, although there are circumstances.
  • Google has announced two new eighth-generation TPUs. One is designed for coaching (8t), the opposite focuses on inference (8i). That is the primary time Google has produced specialised TPUs for coaching and inference.
  • Google has open-sourced Scion, its testbed for agent orchestration.
  • Anthropic has agreed to purchase 3.5 gigawatts of computing power from Google and Broadcom, maker of Google’s GPUs. The deal specifies energy consumption moderately than the variety of chips, implying that the limiting issue isn’t computation however the availability of energy. Chips come and go; watts are a relentless.
  • Ollama now uses Apple’s MLX framework to enhance efficiency on Apple silicon. Help is at the moment restricted to the Qwen3.5-35B-A3B; assist will probably be added for different fashions. As a part of this replace, it additionally makes use of NVIDIA’s NVFP4 floating level format for mannequin quantization.

Internet

Don’t overlook the online layer when planning for AI-driven disruption. The net’s infrastructure is older than most people who preserve it, and several other objects this cycle are reminders of the hole between what that infrastructure was designed for and the way it’s used at present. Two cope with protocols which have outlasted their authentic assumptions; one other reimagines the dominant CMS from scratch utilizing present tooling.

  • Is PHP the brand new COBOL? What about open supply itself? “Who Will Maintain the Web When PHP’s Veterans Retire?” factors to a actuality that we don’t like to consider. Not solely are firms reluctant to rent junior builders; those they do rent aren’t studying older applied sciences.
  • Laravel is seemingly injecting ads for its industrial cloud service into brokers. What occurs when an open supply framework receives enterprise funding and begins injecting adverts into brokers? We’re about to seek out out.
  • Doesn’t each musician want instruments to typeset Gregorian chant?
  • Is IPv8 the way forward for the Web? IPv6 has been “two years away” since early within the Nineties. IPv8 is totally backward suitable with IPv4, and resolves its safety and tackle depletion points.
  • Cloudflare has released EmDash, an alternative choice to WordPress based mostly on how the online is used at present. Drew Breunig calls this a reimagining: a brand new section of software program growth by which we will use agentic programming to rethink and reimplement instruments based mostly on present wants.
  • Is BGP Safe Yet? is an internet app that assessments whether or not your ISP has applied BGP (the protocol that’s answerable for routing packets at web scale) accurately. Many haven’t.

Biology

  • OpenAI has announced GPT-Rosalind, a mannequin that has been tuned for 50 widespread workflows in biology. In contrast to most fashions, Rosalind has been tuned to be skeptical moderately than enthusiastic or sycophantic. Entry to Rosalind is proscribed due to the potential for hurt.

Robotics

  • Spot, the Boston Robotics robotic canine, can now learn gauges and thermometers. It makes use of the Gemini Robotics-ER 1.6 mannequin, which may motive about visible data.
  • Major League Baseball is utilizing a robotic system to rule on challenges to a human umpire’s ball/strike calls.