Beyond Personalized Agents: Enterprise DevOps Need Multiplayer AI

TL;DR

Personalized agents like Claude are extraordinary for a single user on a single machine. Managed agents handle autonomous, repeatable jobs in the cloud. Neither solves enterprise DevOps, which is inherently multiplayer: shared infrastructure, shared context, and shared accountability across shifts and teams. Converting a single-player agent to multi-user is a different architecture altogether that brings along a dozen new requirements. We describe the specifications of such a system.

Historically, DevOps teams have spent the majority of their time stitching together a sprawling set of tools to automate an organization’s business needs. Each tool had its own SME, interface, and siloed function. It was left to operators to connect them.

Pre-AI tools had two structural failures: rigidity, where every workflow had to be anticipated and hardcoded by the vendor, and the Ops tax, where organizations still needed a large workforce just to configure and operate the software.

The advent of ChatGPT, followed by fast-improving reasoning models, opened the door to something better: software that could respond to user requests on the fly instead of forcing teams to pre-build every workflow. In theory, organizations could add an AI DevOps engineer to the workforce that works side by side with humans.

The Current State of DevOps

Silos and Rigidity

Scores of tools operate in their own niche with rigid workflows.

Expensive, non-productive, errors compound

Ops Tax

Various functions still need a specialist. Developers queue behind DevOps engineers for every change.

The glue between tools is human labor

Tool Chain Sprawl

Terraform • Kubernetes • ArgoCD • GitHub • Backstage • Datadog • OpenTelemetry • PagerDuty • ServiceNow • SIEM • Dozens more

End State: Humans constantly stitching together tools. Lack of self-service. DevOps becomes a bottleneck.

But the industry’s first attempts at AI-native anything, DevOps or otherwise, did not get this right.

The First Three AI Approaches Fell Short

Tack-on AI

DevOps tools bolted a copilot onto unchanged architecture. AI can’t act beyond that system’s boundaries.

Rigidity isn’t solved. It’s just decorated

Wrapper AI

Startups built thin chat and RAG on foundation models. No multistep workflows, no reasoning. Just a chatbot. Claude ate their lunch.

Storefronts with no differentiator

Siloed AI Ops Tools

Many startups rebuilt the established categories with AI-native technology. AI SRE, AI SIEM, AI for vulnerability checks, and so on. But we retained the same silos. Better functionality, but the same rigidity and lack of interoperability.

Reintroduced the rigidity of past tools

95%

of enterprise AI pilots failed MIT report, cited by Fortune, Aug. 18, 2025

Claude Code Changed the Game with a Personalized Agent

While most enterprise AI pilots were failing through mid-2025, with the exception of coding agents like Cursor, Anthropic launched Claude and instantly showed how AI could solve large swaths of enterprise use cases.

Personalized Agents gave engineers a real copilot. One person, one machine, deep focus. Writing code, designing features, prototyping, writing blogs, and figuring out complex Kubernetes constructs to deploy an app in the cloud — Claude nailed it. LLM could reason, and Agents had all the tools and intelligence to take action on that reasoning. Connect external systems with MCP servers and voilà! It felt like humans became 10x more productive just with their own computer.

Managed Agents: Personalized Agents for Autonomous, Non-Interactive Work

Personalized agents were tied to individual laptops. It quickly became obvious that the tasks an agent can execute on an individual’s machine could also be executed in the cloud. They could run uninterrupted, on a schedule, with access to larger cloud resources and centralized system credentials. Claude formalized this need with the launch of Managed Agents — AI agents that operate autonomously to complete tasks on behalf of users or organizations within a governed framework that provides oversight.

The primary purpose of a Managed Agent is to run repeatable jobs autonomously without interruptions. They scale the capabilities of a personalized agent using cloud infrastructure that could run a swarm of them doing repeatable, compute-intensive tasks. But… they are still architecturally single-user.

An example application of Managed Agents is incident triaging. Triggered via webhook when an incident is logged in an Incident Management system, the agent triages it and updates its findings for the human to review and then take action.

Why Enterprise Ops Is Inherently Multiplayer

How often have we wished we could live-share a Claude Code session? At enterprise scale, the single-player agent model breaks down. You are managing shared infrastructure, dozens of engineers, shared state, access control across environments, audit trails for compliance, deterministic deployments, and a cost model that cannot scale linearly with every kubectl command.

An incident at 2 AM diagnosed by one on-call SRE must hand off full context — pod state, CPU and memory utilization, actions already taken, rollback attempts — to the morning shift. A multi-step cloud migration requires centralized coordination across many tasks.

Single-Player vs Multiplayer

Single-player ✓

Writing code, designing features, prototyping. One person, one machine, deep focus. Claude Code nails this. Autonomous agents auto-triaging and remediating incidents with no human in the loop. Managed agents nail this.

Operations (multiplayer) ✗

Humans and agents need to collaborate: migrations, troubleshooting incidents, and coordinating deployments. Shared systems, shared context, shared accountability.

Using desktop agents for infrastructure operations means the same diagnostic questions get asked repeatedly across the organization. There is no organizational memory, no collaboration, and no compounding intelligence.

The SRE on the night shift discovers the root cause of a pod crash loop. The morning engineer asks Claude Code the same question from scratch. The context stays local to each machine.

The token cost of every user running their own desktop session for the same repetitive Ops workflows is financially unsustainable at enterprise scale

If you were to imagine a multiplayer agent, what would it look like? The table below summarizes the capabilities for each of these three types of agents.

	Personalized	Managed	Multiplayer
Reasoning Capabilities	✅	✅	✅
Tools	✅	✅	✅
Hosted Service	❌	✅	✅
Long Autonomous Tasks	❌	✅	✅
Interactive	✅	❌	✅
Multi-user Session	❌	❌	✅
Centralized Context	❌	✅	✅
Enterprise RBAC	NA	❌	✅
Shared Secrets	NA	✅	✅
Workspaces & Resource Mgmt	NA	❌	✅
Safety and Security	NA	NA	✅

Enterprise Operations Need All Three and More

Once you make software multiplayer, a Pandora’s box opens. Teams now need functionality that stems from the fact that no single individual is solely accountable for an agent’s actions. We identify and summarize twelve capabilities that any AI-native Ops platform must provide.

Ironically, each of these capabilities exists in traditional SaaS tooling. None of them exist in today’s AI tools. They must be architected from the ground up for AI.

1 Multiplayer AI Sessions

A production incident might span an on-call SRE, a platform engineer, a database administrator, and a team lead across multiple shifts over 48 hours. The state of the cluster when each person touched it, the commands they ran, the hypotheses they tested — all of this context must flow between participants. When the morning engineer picks up an incident, the AI should already know: here’s the pod state at 2 AM, here’s what the on-call tried, here’s what worked and what didn’t.

2 Centralized Context

When a developer asks Claude Code, “Why is this service failing?” the agent investigates from scratch, every time. It has no memory that the same question was asked yesterday, that the root cause was a DNS misconfiguration in the service mesh, or that this service has a history of OOM kills during peak traffic. An AI-native DevOps platform must centralize all context — every session, every infrastructure change, every incident resolution — into a shared knowledge layer that compounds over time.

3 Enterprise RBAC

There’s no way to define that a junior developer can deploy to staging but not production. No way to ensure an application team sees their own namespace but not the platform team’s infrastructure. No way to restrict which clusters, cloud accounts, or secrets an agent can touch. From SOC 2 to HIPAA and PCI-DSS, regulated industries can’t even evaluate a tool that lacks access control.

4 Determinism and SLAs

The same prompt can produce different Terraform plans on different runs. For a sandbox, acceptable. For a production Kubernetes upgrade, a non-starter. The determinism must be customizable: a deployment to production always follows the blue-green sequence. A database migration always takes a snapshot first. The AI reasons freely when diagnosing, executes predictably when deploying. And the boundary is in the team’s hands, not the vendor’s.

5 Templates for Repetitive Tasks

If a team deploys the same application stack to staging three times a week, that shouldn’t be a prompt every time. It should be a template: a pre-configured workflow that captures the intent once and executes it on demand. Typing a prompt is powerful for novel troubleshooting. It’s a regression from a one-click button for the deployment you run twenty times a day.

6 Alerts and Notifications

A deployment fails, and the team gets paged. A node runs out of disk, and the platform team is alerted. A Terraform drift is detected, and the compliance lead is notified. AI tools today are entirely reactive. They have no concept of proactive monitoring, threshold-based alerts, or notification routing to the right engineer through PagerDuty or Slack.

7 Fault Handling and Retries

Cloud API calls time out. Models hallucinate a kubectl command that doesn’t exist. AWS throttles an API. In a platform running hundreds of concurrent infrastructure workflows, manual retries don’t scale. Automatic retries with exponential backoff, circuit breakers, fallback strategies, and clear escalation paths when automated recovery fails. An AI agent that silently drops a deployment step is worse than one that never ran.

8 Scale and Performance

Claude Code runs on a developer’s laptop. It serves one user at a time. A DevOps platform serves hundreds of concurrent engineers, each running workflows that involve multiple AI interactions, tool calls, and cloud API invocations. The agents must run in the cloud, not on laptops — orchestrated, load-balanced, and monitored like any production service.

9 Resource Management and Workspaces

A DevOps platform needs workspaces that map to organizational structures — separating production from staging, isolating one team’s cluster from another’s, and providing boundaries that prevent one group’s AI operations from affecting another’s infrastructure.

10 Security

Token and credential safeguarding: AWS access keys, Kubernetes service account tokens, and Vault secrets — guaranteed to never be leaked through AI responses or exfiltrated through prompt manipulation.

Prompt injection defense: Malicious inputs in Helm values files, ConfigMaps, or pod annotations that could manipulate an agent into bypassing RBAC or running destructive commands.

Sandboxing: An agent diagnosing a production issue should not be able to accidentally delete a persistent volume unless explicitly authorized.

Impersonation controls: Every AI action is traceable to a specific user, governed by their access scope — not elevated system-level access.

11 Token Cost Management

In a DevOps organization with a hundred engineers running concurrent infrastructure workflows, the token bill becomes the single biggest line item in the AI budget. Token cost with desktop agents is not a pricing problem. It’s an architectural limitation. We’ve seen organizations prototype AI-native infrastructure workflows, validate the value, and then abandon them when they project the cost of running them at scale.

12 Token-less Analytics

Token-less analytics are AI-generated artifacts that run without AI. Fifty engineers checking the same deployment status dashboard means fifty inference cycles for identical information. This is architecturally absurd. AI should create the dashboard: generate the Prometheus query and build the visualization. It shouldn’t be running the dashboard. The intelligence is in the design, not in the rendering.

The Gap is the Opportunity

Twelve requirements. Every one of them has existed in present-day operations software. We need to take the power of personalized agents and scale it to the enterprise use case — without introducing the rigidity of SaaS tools and without recreating the siloed tool chain we are trying to escape.

This is the gap between a brilliant individual tool and enterprise infrastructure software. It is also the gap DuploCloud AI DevOps Platform is designed to close.
In the next chapter, we introduce the product and describe the architecture that is built from these first principles.

Read Chapter 2

DuploCloud ARMOR (Agent Runtime for MultiPlayer Operations): The Common Framework for AI-Native DevOps

We introduce the architecture that learns from the failures of Tack-on, Wrapper, and Silo’d AI — and what we believe is the foundation for the next generation of enterprise DevOps software.

FAQs

What is multiplayer AI?

In simple terms it’s like a live shared session of your Claude or OpenAI. Multiplayer AI refers to environments where multiple humans and agents collaborate on shared infrastructure with shared context and shared accountability. Unlike single-player AI (one person, one machine), multiplayer AI supports scenarios like a production incident spanning an on-call SRE, a platform engineer, a database administrator, and a team lead across multiple shifts — where every action, hypothesis, and result needs to flow between all participants.

What’s the difference between personalized and managed agents?

Personalized agents are tied to an individual’s laptop — great for deep-focus tasks like writing code, designing features, or troubleshooting Kubernetes configs. Managed agents take those same capabilities into the cloud, running autonomously on a schedule without human interaction, handling repeatable and compute-intensive tasks like automated incident triage. The key distinction is that managed agents are hosted, long-running, and non-interactive. But both are still architecturally single-user.

What limits do desktop AI agents hit at enterprise scale operations?

Desktop agents have no organizational memory, no collaboration layer, and no shared context. When an SRE diagnoses a pod crash loop at 2 AM, that knowledge vanishes when the session ends. Then, the morning engineer asks Claude the exact same question from scratch. Beyond the intelligence gap, the token cost of hundreds of engineers running individual desktop sessions for the same repetitive workflows becomes financially unsustainable. Desktop agents were built for one person, not for shared infrastructure across dozens of teams.

What is token-less analytics?

Token-less analytics are AI-generated dashboards that do not use LLMs at runtime. Rather than firing an inference cycle every time an engineer checks a deployment status dashboard, the AI creates the dashboard once — generating the Prometheus query and building the visualization — and then the dashboard runs on its own. The intelligence is in the design, not in the rendering. It’s a traditional dashboard, except it was created and can be modified using a conversational AI interface.

What are the 12 capabilities of an AI-native DevOps platform?

Multiplayer AI Sessions: shared context across engineers and shifts during incidents
Centralized Context: a compounding knowledge layer across all sessions, changes, and resolutions
Enterprise RBAC: role-based access control mapping to teams, environments, and compliance frameworks
Determinism and SLAs: predictable execution for production workflows, with AI reasoning freely but deploying predictably
Templates for Repetitive Tasks: pre-configured workflows for tasks run frequently, replacing repeated prompting
Alerts and Notifications: proactive monitoring, threshold-based alerts, and routing to the right engineer
Fault Handling and Retries: automatic retries, circuit breakers, and escalation paths for failed workflows
Scale and Performance: cloud-hosted, load-balanced agents serving hundreds of concurrent engineers
Resource Management and Workspaces: organizational boundaries isolating teams, environments, and infrastructure
Security: credential safeguarding, prompt injection defense, sandboxing, and impersonation controls
Token Cost Management: architectural solutions to prevent AI token spend from becoming the largest line item at scale
Token-less Analytics: AI builds the tooling once; execution happens without ongoing inference