Belitsoft Evaluations Methods for Deploying AI Brokers in A number of Environments with AgentOps​ | The AI Journal


AI brokers are exhibiting up in manufacturing environments in every single place. Based on PwC’s 2025 AI Agent Survey, 79% of the businesses surveyed have already begun using AI brokers. And as Docker’s 2026 State of Agentic AI report confirms, 94% of organizations view constructing brokers as a strategic precedence. However there’s a drawback. 

The Belitsoft worldwide AI automation consulting and software program improvement firm with an workplace in North America, reviews that firms are constructing brokers however are hesitant to place them into manufacturing as a result of they can’t see what the brokers are literally doing. Based on a 2026 Monte Carlo ballot of AI engineering leaders, 73% of firms gained’t ship an AI agent with out monitoring and alerting. Nonetheless, 63.4% say that the shortage of monitoring and observability is the largest purpose why AI brokers aren’t getting used extra extensively. 

AgentOps is what you want right here. You may say it’s DevOps for AI brokers. It consists of deployment, monitoring, governance, and optimization for all of the locations brokers run, reminiscent of public clouds, personal information facilities, edge units, serverless platforms, or a mixture of these. Docker’s State of Agentic AI analysis discovered that 79% of people that answered the survey use brokers in a couple of setting. For example, 51% use them in public clouds, 40% on their very own servers, and 32% on platforms that don’t have servers. With out a clear plan, it is a fast approach to get issues uncontrolled and get payments you didn’t count on.

The Visibility Drawback

AI brokers make choices, use instruments, and write content material in ways in which regular observability instruments weren’t meant to deal with. Based on Monte Carlo’s examine, 53% of companies plan to make huge adjustments to or fully rebuild the AI agent techniques they have already got. Greater than half of immediately’s manufacturing brokers will have to be redone as a result of they’ll’t be seen in actual time.

When brokers speak to one another, the issue will get worse. New Relic says that brokers rely upon one another’s outputs from a number of MCP servers, shared context, and reminiscence, which makes it onerous to debug the complicated webs they make. One agent’s hallucination can mess up one other agent a number of steps down the road, and it’s virtually inconceivable to seek out out what’s inflicting the issue with out particular instruments.

The Want for Good Governance

Visibility is barely half the battle. You must be in cost too. The EU AI Act and ISO 42001 already say that AI techniques should have audit trails. IBM’s AgentOps framework for watsonx Orchestrate retains monitor of choices, finds errors, and makes positive that insurance policies are adopted all through the complete agent lifecycle. Monte Carlo’s survey discovered that firms must keep watch over safe information dealing with (68% of respondents), be sure everybody is aware of what is anticipated of them (62.7%), and arrange alerts for failures (72.7%).

Gartner says that by 2028, companies will use hundreds of AI brokers in all areas of their operations. Coverage enforcement, centralized administration, and worth measurement shall be necessary. With out it, you get “AI sprawl” – dozens of separate initiatives throughout completely different groups, instruments, and clouds with no centralized visibility. Gartner says that by 2028, this can affect 40% of Fortune 1000 firms.

Your Brokers Are Already In all places

Brokers don’t reside in a single neat place. Docker’s analysis reveals that 61% of organizations mix cloud-hosted and native fashions. 46% of brokers say they use between 4 and 6 fashions, whereas solely 2% say they use only one. The multi-model, multi-cloud setup is already the norm.

The 2026 Edge AI Survey by ZEDEDA, which included 600 IT and enterprise leaders from the U.S. and Germany, discovered that 47% of firms have moved to hybrid cloud-edge architectures. However 41% say it’s onerous to handle AI workloads in several settings. There may be nonetheless an enormous distinction between coaching a mannequin in a single place and operating it reliably on the edge.

The Hyperscaler Panorama

Each huge cloud supplier has its personal agent service. AWS Bedrock AgentCore may be very intently linked to AWS. Azure AI Foundry works very nicely with Microsoft 365. Google Vertex AI Agent Builder is the third choice. However none of them allow you to swap between clouds, use native CI/CD for brokers, versioning, rollback, or canary deployments. Based on Xpander.ai, for those who choose a hyperscaler, you need to settle for cloud lock-in as a design constraint.

The Container Actuality

Containers are the spine of agent deployments. Based on Docker’s State of Agentic AI analysis, 94% of firms make use of containers of their manufacturing or agent improvement processes. 40% of groups that use Docker to construct brokers use Docker Compose as their orchestration layer. Containers offer you a constant runtime throughout environments, which is strictly what you want while you deploy brokers that may run on AWS someday and on-prem the subsequent.

Agent Administration Platforms

A number of platforms launched in 2025 and 2026 to deal with multi-environment governance.

In March 2026, Kore.ai Agent Administration Platform (AMP) was launched as a single command middle for overseeing, controlling, and managing AI brokers throughout frameworks, clouds, and improvement environments. It might handle LangGraph, CrewAI, AutoGen (which is now in upkeep mode), Google ADK, AWS AgentCore, Microsoft Foundry, Salesforce Agentforce, and its personal techniques from a single management airplane. It has an analysis studio that allows you to check how brokers act earlier than you ship them out.

Domino Knowledge Lab launched its Winter Launch in February 2026, which it calls the primary absolutely ruled end-to-end platform for making agentic AI techniques work. The platform provides an agentic improvement lifecycle (ADLC) expertise that features the phases of constructing, evaluating, deploying, and monitoring, with full lineage, reproducibility, and governance. Domino’s common tracing SDK works with any agentic orchestration framework.

In April 2026, Salesforce Agent Cloth grew to incorporate a centralized management airplane for managing and governing multi-vendor AI environments. The replace added automated discovery instruments, reminiscent of expanded Agent Scanners for Amazon Bedrock and Microsoft Foundry. It additionally added a Visible Authoring Canvas for mapping workflows and guided determinism by means of Agent Script for Agent Dealer.

xpander.ai positions itself as a cloud-agnostic full-lifecycle agent platform. It might work on any Kubernetes cluster in a non-public VPC, AWS, Azure, or GCP. It additionally has native self-hosted and air-gapped deployment choices. The platform helps no-code, low-code, and code-first construct paths, dynamic non-linear orchestration, and full lifecycle administration, which incorporates versioning, rollback, and canary deployments.

Instruments for Observability

Agent classes and LLM conversations are actually seen as the most effective telemetry alerts within the enterprise.

Grafana Cloud AI Observability was launched in April 2026 and is now accessible for public preview. It connects brokers to traces, instrument calls, token utilization, prices, and reside evaluations, multi function place the place you possibly can see what’s happening within the system. It really works with OpenTelemetry and retains monitor of the variations of the brokers by itself. Grafana additionally made o11y-bench, a free and open-source framework for checking how nicely brokers work.

Monte Carlo Agent Observability offers you a single view of 4 areas: context, efficiency, conduct, and outputs. It permits you to test AI-generated fields immediately in opposition to supply information to seek out errors and hallucinations earlier than they have an effect on different techniques.

Salesforce Agentforce 360 is a set of instruments for maintaining a tally of agent efficiency, well being, and optimization throughout the Salesforce ecosystem. It does this by retaining monitor of utilization, tracing session flows, and ensuring every part works.

In November 2025, Dynatrace and Amazon Bedrock AgentCore labored collectively to gather detailed telemetry and switch it into helpful info by means of a reside topology map and good alerts.

With the Brokers Service Map, New Relic Agentic AI Monitoring reveals you each agent and gear name in a multi-agent collaboration and the way they work together with one another. New Relic additionally launched an MCP Server that lets assistants like GitHub Copilot, ChatGPT, Claude, and Cursor get observability information immediately.

As a part of Cisco’s AgenticOps imaginative and prescient, Splunk AI Agent Monitoring screens value and token consumption metrics along with high quality and safety measures together with hallucinations, bias, drift, and accuracy, in addition to efficiency metrics like latency and errors.

The Datadog MCP Server offers AI brokers entry to logs, metrics, and traces in actual time whereas retaining the identical safety, governance, and audit controls in place.

IBM AgentOps for watsonx Orchestrate watches what brokers do in actual time, retaining monitor of their decisions, discovering errors, and ensuring that insurance policies are adopted at each stage of the method.

Agent Growth Frameworks

The framework you select impacts how simply you possibly can deploy throughout a number of environments.

LangGraph is predicated on LangChain. Unbiased research from 2026 present that LangGraph has a low latency for LLM calls and makes use of a median of 1.2GB of reminiscence. LangChain has greater than 500 integrations, and its abstraction layers allow you to change suppliers with out having to rewrite the code to your brokers.

CrewAI permits you to shortly construct prototypes (in lower than three hours) and provides builders expertise with role-based brokers. Deloitte case research present that 89% of the time, the initiatives labored, they usually value about $0.12 per question.

Microsoft Agent Framework (MAF) is Microsoft’s forward-looking framework, consolidating capabilities from AutoGen and Semantic Kernel, each of which entered upkeep mode in late 2025. MAF is designed for enterprise-scale multi-agent coordination with deep integration into the Microsoft ecosystem.

For duties that want information to be finished, LlamaIndex is an professional in RAG and multi-source querying.

PydanticAI makes use of Pydantic fashions to test and type responses to be sure that the outputs are protected for his or her sorts. It helps varied fashions, evaluations, instrument approvals, and workflows that may run for a very long time and be restored.

Infrastructure and Deployment Instruments

Nutanix Agentic AI (NAI) is a full-stack software program resolution introduced in March 2026. It makes issues easier, works higher, and is safer. It additionally works with the NVIDIA Nemotron household of open-source fashions and NVIDIA AI Enterprise.

Purple Hat AI Enterprise, which got here out in February 2026, is a single platform primarily based on Purple Hat OpenShift that allows you to deploy and handle fashions, brokers, and apps in hybrid cloud environments. Model 3.3 provides assist for fashions together with Mistral-Giant-3, Nemotron-Nano, Apertus-8B-Instruct, Ministral 3, and DeepSeek-V3.2.

The ZEDEDA Edge Intelligence Platform, which was introduced at NVIDIA GTC 2026, is a single instrument for making, deploying, securing, and operating AI on a big scale in many alternative edge environments.

In February 2026, Huge Knowledge launched Polaris, a worldwide management airplane for AI information infrastructure that works with each cloud and datacenter deployments. It lets companies management VAST clusters on any cloud as in the event that they had been one system, and it has a zero-trust framework for agentic AI.

Cloudera Agent Studio makes use of NVIDIA NIM and the NVIDIA Nemotron household of fashions to handle self-driving workflows. It does this through the use of iterative multi-step planning and multi-agent collaboration.

LLMOps and Instruments That Assist

OpenLLMetry is constructed on OpenTelemetry and retains monitor of traces, prompts, completions, and token utilization. You possibly can see LLM in already arrange logging and metrics pipelines.

Bifrost permits you to hook up with over 20 suppliers, reminiscent of OpenAI, Anthropic, AWS Bedrock, Google Vertex, and Azure, all by means of one API. It has caching, load balancing, and failover in-built. 

Promptfoo is a free, open-source instrument that allows you to run evaluations and red-teaming in CI/CD pipelines. OpenAI purchased it in March 2026, and it’s now a part of OpenAI Frontier.

Composio is a service that hyperlinks AI brokers and LLMs to greater than 250 different apps and companies. It handles authentication and works with many alternative agentic frameworks.

Based mostly on what works immediately, here’s a step-by-step plan.

Step 1: Perceive that multi-environment is your default. Docker says that 79% of companies work in a couple of surroundings. Make plans for a hybrid system from the beginning. Select frameworks (like LangGraph and CrewAI) and deployment platforms (like containers and Kubernetes) that allow you to work with a couple of vendor.

Step 2: Containerize every part. With 94% of groups utilizing containers for agent workloads, containerization is a should. Bundle brokers as photographs, use Docker Compose by yourself pc, after which transfer to Kubernetes in manufacturing.

Step 3: Arrange instruments for monitoring from the beginning. The three pillars of agent observability are system metrics (latency, error price, tokens per process), high quality metrics (process success price, trajectory adherence), and price monitoring. Use OpenTelemetry-compatible instrumentation like OpenLLMetry to ship telemetry to platforms like Grafana Cloud.

Step 4: Arrange a centralized administration airplane. Utilizing Kore. ai AMP, Domino, or Salesforce Agent Cloth permits you to see every part in a single place. That is wanted to cease sprawl.

Step 5: Arrange guardrails which might be particular to the surroundings. Edge deployments (by means of ZEDEDA) want to have the ability to cope with connections that come and go. For on-premises deployments, information residency compliance is important. Cloud deployments must keep watch over prices in order that they don’t get too excessive. Put these guidelines into your administration airplane.

Step 6: Arrange a approach to hold evaluating. Brokers worsen as time goes on. Earlier than deploying, test for drift, hallucination, and bias utilizing Domino’s analysis instruments, Monte Carlo’s information validation, or Promptfoo in CI/CD.

Step 7: Make it straightforward to see prices. Autonomous brokers could make tons of of API calls that aren’t wanted and price some huge cash within the cloud. Splunk’s AI Agent Monitoring retains a watch on prices and token utilization. Set limits on how a lot you possibly can spend and be sure that value is a transparent measure earlier than you deploy.

Step 8: Plan for failure restoration. Brokers fail. Multi-agent techniques fail in additional attention-grabbing methods. Use restoration and checkpointing instruments. With xpander. ai, you possibly can run stateful duties for a very long time, cease them so an individual can look them over, after which choose up the place you left off with out dropping state.

The Sprawl Drawback

Organizations can shortly find yourself with 50-200 brokers, every with its personal infrastructure and monitoring. Centralized governance by way of Kore. ai AMP or Domino is the one scalable resolution. Gartner says that by 2028, the common International Fortune 500 firm could have greater than 150,000 brokers working for it.

The Value Drawback

A cloud value optimization agent may look by means of your AWS and Azure accounts, make tons of of API calls that aren’t crucial, and lift your cloud invoice. Construct spend guardrails earlier than deployment.

The Safety Concern

Based on Docker’s State of Agentic AI report, 76% of individuals world wide are anxious about vendor lock-in. One approach to transfer ahead is Datadog’s MCP Server strategy, which lets brokers entry observability information in a managed approach whereas nonetheless following present safety guidelines. Cisco AI Protection works with Splunk to seek out dangers like information leaks and immediate injection.

The Vendor Lock-In Drawback

Based on Docker, the primary causes for utilizing each cloud and native fashions are management (64%), information privateness (60%), and compliance (54%). Value ranks decrease (41%). To chop down on reliance, groups are spreading their work throughout a number of fashions and clouds. Select platforms that don’t depend on the cloud, reminiscent of xpander. ai, and open requirements, reminiscent of OpenTelemetry.

Standardization round MCP (Mannequin Context Protocol) is choosing up pace. Based on Docker’s analysis, 85% of groups find out about MCP. Firms like Datadog and New Relic are making MCP servers to make it simpler for everybody to entry information.

An growing variety of persons are studying about edge intelligence. Based on a survey by ZEDEDA, 47% of companies use hybrid cloud-edge architectures. Platforms from ZEDEDA and Nutanix are bringing agent administration to the sting.

Agent-to-agent communication is changing into widespread. New Relic’s Brokers Service Map addresses this. AgentOps within the FutureDocker says that 33% of firms have hassle with orchestration as multi-model environments turn out to be extra widespread.

Governance shall be required. Gartner says that by 2028, 40% of Fortune 1000 firms will not be capable of management AI brokers that act outdoors of the principles.

Framework consolidation is underway. Microsoft’s consolidation of AutoGen and Semantic Kernel into the Microsoft Agent Framework factors to a development of fewer, extra steady enterprise frameworks.

In regards to the Creator:

BelitsoftBelitsoft

Dmitry Baraishuk is a Accomplice and Chief Innovation Officer at Belitsoft. Belitsoft is a software program engineering firm specializing in DevOps, AI integration, and enterprise software modernization. The corporate serves purchasers throughout healthcare, fintech, and enterprise SaaS within the US, UK, and Canada. Belitsoft publishes expertise development analyses to assist enterprise and expertise leaders make knowledgeable choices about their software program funding technique.

 

  • I’m Erika Balla, a expertise journalist and content material specialist with over 5 years of expertise masking developments in AI, software program improvement, and digital innovation. With a basis in graphic design and a powerful give attention to research-driven writing, I create correct, accessible, and interesting articles that break down complicated technical ideas and spotlight their real-world impression.

    View all posts