TL;DR
Personalized agents like Claude are extraordinary for a single user on a single machine. Managed agents handle autonomous, repeatable jobs in the cloud. Neither solves enterprise DevOps, which is inherently multi-player: shared infrastructure, shared context, and shared accountability across shifts and teams. Converting a single-player agent to multi-user is a different architecture altogether that brings along a dozen new requirements. We describe the specifications of such a system.
Historically, DevOps teams have spent the majority of their time stitching together a sprawling set of tools to automate an organization’s business needs. Each tool had its own SME, interface, and siloed function. It was left to operators to connect them.
Pre-AI tools had two structural failures: rigidity, where every workflow had to be anticipated and hardcoded by the vendor, and the Ops tax, where organizations still needed a large workforce just to configure and operate the software.
The advent of ChatGPT, followed by fast-improving reasoning models, opened the door to something better: software that could respond to user requests on the fly instead of forcing teams to pre-build every workflow. In theory, organizations could add an AI DevOps engineer to the workforce that works side by side with humans.
The Current State of DevOps
Silos and Rigidity
Scores of tools operate in their own niche with rigid workflows.
Expensive, non-productive, errors compound
Ops Tax
Various functions still need a specialist. Developers queue behind DevOps engineers for every change.
The glue between tools is human labor
Tool Chain Sprawl
Terraform • Kubernetes • ArgoCD • GitHub • Backstage • Datadog • OpenTelemetry • PagerDuty • ServiceNow • SIEM • Dozens more
But the industry’s first attempts at AI-native anything, DevOps or otherwise, did not get this right.
The First Three AI Approaches Fell Short
Tack-on AI
DevOps tools bolted a copilot onto unchanged architecture. AI can’t act beyond that system’s boundaries.
Rigidity isn’t solved. It’s just decorated
Wrapper AI
Startups built thin chat and RAG on foundation models. No multistep workflows, no reasoning. Just a chatbot. Claude ate their lunch.
Storefronts with no differentiator
Siloed AI Ops Tools
Many startups rebuilt the established categories with AI-native technology. AI SRE, AI SIEM, AI for vulnerability checks, and so on. But we retained the same silos. Better functionality, but the same rigidity and lack of interoperability.
Reintroduced the rigidity of past tools
Claude Code Changed the Game with a Personalized Agent
While most enterprise AI pilots were failing through mid-2025, with the exception of coding agents like Cursor, Anthropic launched Claude and instantly showed how AI could solve large swaths of enterprise use cases.
Managed Agents: Personalized Agents for Autonomous, Non-Interactive Work
Personalized agents were tied to individual laptops. It quickly became obvious that the tasks an agent can execute on an individual’s machine could also be executed in the cloud. They could run uninterrupted, on a schedule, with access to larger cloud resources and centralized system credentials. Claude formalized this need with the launch of Managed Agents — AI agents that operate autonomously to complete tasks on behalf of users or organizations within a governed framework that provides oversight.
An example application of Managed Agents is incident triaging. Triggered via webhook when an incident is logged in an Incident Management system, the agent triages it and updates its findings for the human to review and then take action.
Why Enterprise Ops Is Inherently Multi-Player
How often have we wished we could live-share a Claude Code session? At enterprise scale, the single-player agent model breaks down. You are managing shared infrastructure, dozens of engineers, shared state, access control across environments, audit trails for compliance, deterministic deployments, and a cost model that cannot scale linearly with every kubectl command.
Single-Player vs Multi-Player
Single-player ✓
Writing code, designing features, prototyping. One person, one machine, deep focus. Claude Code nails this. Autonomous agents auto-triaging and remediating incidents with no human in the loop. Managed agents nail this.
Operations (multi-player) ✗
Humans and agents need to collaborate: migrations, troubleshooting incidents, and coordinating deployments. Shared systems, shared context, shared accountability.
Using desktop agents for infrastructure operations means the same diagnostic questions get asked repeatedly across the organization. There is no organizational memory, no collaboration, and no compounding intelligence.
The SRE on the night shift discovers the root cause of a pod crash loop. The morning engineer asks Claude Code the same question from scratch. The context stays local to each machine.
If you were to imagine a multi-player agent, what would it look like? The table below summarizes the capabilities for each of these three types of agents.
| Personalized | Managed | Multi-player | |
|---|---|---|---|
| Reasoning Capabilities | ✅ | ✅ | ✅ |
| Tools | ✅ | ✅ | ✅ |
| Hosted Service | ❌ | ✅ | ✅ |
| Long Autonomous Tasks | ❌ | ✅ | ✅ |
| Interactive | ✅ | ❌ | ✅ |
| Multi-user Session | ❌ | ❌ | ✅ |
| Centralized Context | ❌ | ✅ | ✅ |
| Enterprise RBAC | NA | ❌ | ✅ |
| Shared Secrets | NA | ✅ | ✅ |
| Workspaces & Resource Mgmt | NA | ❌ | ✅ |
| Safety and Security | NA | NA | ✅ |
Enterprise Operations Need All Three and More
Once you make software multi-player, a Pandora’s box opens. Teams now need functionality that stems from the fact that no single individual is solely accountable for an agent’s actions. We identify and summarize twelve capabilities that any AI-native Ops platform must provide.
Ironically, each of these capabilities exists in traditional SaaS tooling. None of them exist in today’s AI tools. They must be architected from the ground up for AI.
1 Multiplayer AI Sessions
A production incident might span an on-call SRE, a platform engineer, a database administrator, and a team lead across multiple shifts over 48 hours. The state of the cluster when each person touched it, the commands they ran, the hypotheses they tested — all of this context must flow between participants. When the morning engineer picks up an incident, the AI should already know: here’s the pod state at 2 AM, here’s what the on-call tried, here’s what worked and what didn’t.
2 Centralized Context
When a developer asks Claude Code, “Why is this service failing?” the agent investigates from scratch, every time. It has no memory that the same question was asked yesterday, that the root cause was a DNS misconfiguration in the service mesh, or that this service has a history of OOM kills during peak traffic. An AI-native DevOps platform must centralize all context — every session, every infrastructure change, every incident resolution — into a shared knowledge layer that compounds over time.
3 Enterprise RBAC
There’s no way to define that a junior developer can deploy to staging but not production. No way to ensure an application team sees their own namespace but not the platform team’s infrastructure. No way to restrict which clusters, cloud accounts, or secrets an agent can touch. From SOC 2 to HIPAA and PCI-DSS, regulated industries can’t even evaluate a tool that lacks access control.
4 Determinism and SLAs
The same prompt can produce different Terraform plans on different runs. For a sandbox, acceptable. For a production Kubernetes upgrade, a non-starter. The determinism must be customizable: a deployment to production always follows the blue-green sequence. A database migration always takes a snapshot first. The AI reasons freely when diagnosing, executes predictably when deploying. And the boundary is in the team’s hands, not the vendor’s.
5 Templates for Repetitive Tasks
If a team deploys the same application stack to staging three times a week, that shouldn’t be a prompt every time. It should be a template: a pre-configured workflow that captures the intent once and executes it on demand. Typing a prompt is powerful for novel troubleshooting. It’s a regression from a one-click button for the deployment you run twenty times a day.
6 Alerts and Notifications
A deployment fails, and the team gets paged. A node runs out of disk, and the platform team is alerted. A Terraform drift is detected, and the compliance lead is notified. AI tools today are entirely reactive. They have no concept of proactive monitoring, threshold-based alerts, or notification routing to the right engineer through PagerDuty or Slack.
7 Fault Handling and Retries
Cloud API calls time out. Models hallucinate a kubectl command that doesn’t exist. AWS throttles an API. In a platform running hundreds of concurrent infrastructure workflows, manual retries don’t scale. Automatic retries with exponential backoff, circuit breakers, fallback strategies, and clear escalation paths when automated recovery fails. An AI agent that silently drops a deployment step is worse than one that never ran.
8 Scale and Performance
Claude Code runs on a developer’s laptop. It serves one user at a time. A DevOps platform serves hundreds of concurrent engineers, each running workflows that involve multiple AI interactions, tool calls, and cloud API invocations. The agents must run in the cloud, not on laptops — orchestrated, load-balanced, and monitored like any production service.
9 Resource Management and Workspaces
A DevOps platform needs workspaces that map to organizational structures — separating production from staging, isolating one team’s cluster from another’s, and providing boundaries that prevent one group’s AI operations from affecting another’s infrastructure.
10 Security
Token and credential safeguarding: AWS access keys, Kubernetes service account tokens, and Vault secrets — guaranteed to never be leaked through AI responses or exfiltrated through prompt manipulation.
Prompt injection defense: Malicious inputs in Helm values files, ConfigMaps, or pod annotations that could manipulate an agent into bypassing RBAC or running destructive commands.
Sandboxing: An agent diagnosing a production issue should not be able to accidentally delete a persistent volume unless explicitly authorized.
Impersonation controls: Every AI action is traceable to a specific user, governed by their access scope — not elevated system-level access.
11 Token Cost Management
In a DevOps organization with a hundred engineers running concurrent infrastructure workflows, the token bill becomes the single biggest line item in the AI budget. Token cost with desktop agents is not a pricing problem. It’s an architectural limitation. We’ve seen organizations prototype AI-native infrastructure workflows, validate the value, and then abandon them when they project the cost of running them at scale.
12 Token-less Analytics
Token-less analytics are AI-generated artifacts that run without AI. Fifty engineers checking the same deployment status dashboard means fifty inference cycles for identical information. This is architecturally absurd. AI should create the dashboard: generate the Prometheus query and build the visualization. It shouldn’t be running the dashboard. The intelligence is in the design, not in the rendering.
The Gap is the Opportunity
Twelve requirements. Every one of them has existed in present-day operations software. We need to take the power of personalized agents and scale it to the enterprise use case — without introducing the rigidity of SaaS tools and without recreating the siloed tool chain we are trying to escape.
This is the gap between a brilliant individual tool and enterprise infrastructure software. It is also the gap DuploCloud AI DevOps Platform is designed to close.
In the next chapter, we introduce the product and describe the architecture that is built from these first principles.
coming Next
DuploCloud ARMOR (Agent Runtime for MultiPlayer Operations): The Common Framework for AI-Native DevOps
We introduce the architecture that learns from the failures of Tack-on, Wrapper, and Silo’d AI — and what we believe is the foundation for the next generation of enterprise DevOps software.
FAQs
What is multi-player AI?
In simple terms it’s like a live shared session of your Claude or OpenAI. Multi-player AI refers to environments where multiple humans and agents collaborate on shared infrastructure with shared context and shared accountability. Unlike single-player AI (one person, one machine), multi-player AI supports scenarios like a production incident spanning an on-call SRE, a platform engineer, a database administrator, and a team lead across multiple shifts — where every action, hypothesis, and result needs to flow between all participants.
What’s the difference between personalized and managed agents?
Personalized agents are tied to an individual’s laptop — great for deep-focus tasks like writing code, designing features, or troubleshooting Kubernetes configs. Managed agents take those same capabilities into the cloud, running autonomously on a schedule without human interaction, handling repeatable and compute-intensive tasks like automated incident triage. The key distinction is that managed agents are hosted, long-running, and non-interactive. But both are still architecturally single-user.
What limits do desktop AI agents hit at enterprise scale operations?
Desktop agents have no organizational memory, no collaboration layer, and no shared context. When an SRE diagnoses a pod crash loop at 2 AM, that knowledge vanishes when the session ends. Then, the morning engineer asks Claude the exact same question from scratch. Beyond the intelligence gap, the token cost of hundreds of engineers running individual desktop sessions for the same repetitive workflows becomes financially unsustainable. Desktop agents were built for one person, not for shared infrastructure across dozens of teams.
What is token-less analytics?
Token-less analytics are AI-generated dashboards that do not use LLMs at runtime. Rather than firing an inference cycle every time an engineer checks a deployment status dashboard, the AI creates the dashboard once — generating the Prometheus query and building the visualization — and then the dashboard runs on its own. The intelligence is in the design, not in the rendering. It’s a traditional dashboard, except it was created and can be modified using a conversational AI interface.
What are the 12 capabilities of an AI-native DevOps platform?
- Multiplayer AI Sessions: shared context across engineers and shifts during incidents
- Centralized Context: a compounding knowledge layer across all sessions, changes, and resolutions
- Enterprise RBAC: role-based access control mapping to teams, environments, and compliance frameworks
- Determinism and SLAs: predictable execution for production workflows, with AI reasoning freely but deploying predictably
- Templates for Repetitive Tasks: pre-configured workflows for tasks run frequently, replacing repeated prompting
- Alerts and Notifications: proactive monitoring, threshold-based alerts, and routing to the right engineer
- Fault Handling and Retries: automatic retries, circuit breakers, and escalation paths for failed workflows
- Scale and Performance: cloud-hosted, load-balanced agents serving hundreds of concurrent engineers
- Resource Management and Workspaces: organizational boundaries isolating teams, environments, and infrastructure
- Security: credential safeguarding, prompt injection defense, sandboxing, and impersonation controls
- Token Cost Management: architectural solutions to prevent AI token spend from becoming the largest line item at scale
- Token-less Analytics: AI builds the tooling once; execution happens without ongoing inference