Original blogpost: The New Stack

It’s 2:47 am. Your phone is buzzing. Production alerts. The checkout service is throwing 5xx errors and customers are abandoning carts and the on-call engineer is flipping between Datadog, Argo CD, kubectl and logs. She’s just trying to figure out what changed. Latency spiked 20 minutes ago. A deployment went out at 2:31 am.

Two pods are in CrashLoopBackOff. Memory limits were changed. She rolls back, updates the ticket, writes the postmortem and… tries to go back to sleep. Yet she knows she’s gonna go through some version of this again next week.

Meanwhile, her colleague refactored an entire module in Cursor in minutes, because of AI. The AI understood the codebase, proposed the change and handled the tedious parts. And it did it all automatically.

What happened?

AI has transformed the way we write code. But it has not transformed the way we operate the infrastructure to run that code.

The Gap Continues to Grow Wider

In the past two years, AI has reshaped the way developers work:

  • Cursor and Copilot write and refactor code.
  • Tools like Lovable, v0 and Bolt generate frontends.
  • Replit agents scaffold and deploy full applications.

But DevOps work remains manual. Engineers still have to resolve incidents by:

Incidents still stall releases. Backlogs still grow.

AI has supercharged development, while operations remained stuck. This isn’t a market oversight. This problem is much, much harder.

Why Operating Infrastructure Is So Much Different

1. There’s No Buffer for Mistakes

A bad code suggestion fails in a branch.
A bad infrastructure change will immediately affect live traffic.