Spring Health Scaled Its DevOps Without Scaling Its Team: Here’s How

At VIVE 2026, the head of AI at Spring Health and the CEO of DuploCloud shared how Spring Health kept its DevOps team lean while supporting a growing engineering organization and helping more people through its mental health platform.

Downtime on a mental health app can mean the difference between reaching people and not. And Pradeep Thachile knows it. As the head of AI, he is closely involved in ensuring Spring Health’s client base gets the mental health help they need, and fast.

At a recent VIVE 2026 conference, he shared the stage with DuploCloud CEO Venkat Thiruvengadam to discuss how AI-assisted infrastructure can support reliability, safety, and scale in healthcare.

As Thachile puts it, “Downtime is kind of the bane of Spring Health’s ethical requirements, leave alone anything that’s legal or clinically validated.”

The bottom line: when your platform is someone’s first point of call for help, reliability is a duty.

That duty is what connected Thachile and his small DevOps team with DuploCloud. The partnership gave Spring Health a way to expand DevOps leverage without proportionally expanding headcount.

Watch it here:

The Problem: 200 Engineers, One Flat DevOps Team

Spring Health’s infrastructure journey was one most DevOps teams are familiar with. It began with bare metal and moved into AWS. Next came the realization that AWS’s raw power comes with operational complexity many engineers aren’t prepared to manage on their own.

The team at Spring Health had never had to think deeply about IAM roles, instance profiles, or security management. Now they were faced with responsibility for hundreds of compliance controls under HIPAA and HITRUST.

At the same time, Spring Health was growing its engineering headcount by roughly 50%. And yet…

“We kept our DevOps team flat,” Thachile said during the talk. “We’ll give you the secret sauce a little later.”

That secret sauce? Changing the leverage ratio, not the headcount.

The Solution: Work with an AI System that Lives on the Server

Typically, adopting AI means using a model as a personal productivity tool. You may, for example, have a copilot on one engineer’s laptop.

DuploCloud offered a different course of action.

Instead, the team moved the intelligence to the server side. This made it a shared organizational resource. Multiple engineers could draw on the same AI layer at the same time. And every interaction could build shared context.

So knowledge is no longer siloed.

As Thiruvengadam noted during the talk, “A lot of subject matter expertise in DevOps is actually in people’s minds. You ask for something, you’ll be like, oh, the guy is not here, let him come back, and he’ll respond.”

But DuploCloud’s platform is designed to capture and distribute that knowledge. So the expertise can outlast any one expert.

“We got rid of like 80% of the rules, offloaded it to the LLMs, but we kept the 20% for safety. We knew how to update it, how to deploy it, how to run it.”

— Venkat Thiruvengadam, Founder & CEO, DuploCloud

In order to make this shift happen, DuploCloud and Spring Health had to rethink how the platform itself was built. DuploCloud originally used a deterministic rules engine: given an infrastructure input, it would trigger a predefined set of provisioning actions. From that point, every new service required updating and redeploying that engine.

It’s a slow, brittle cycle.

But when LLMs became capable enough to reason about infrastructure, roughly 80% of those rules could be offloaded to the model. The remaining 20% stayed in place as safety constraints in the rules engine.

Now, many infrastructure patterns no longer require a platform deployment to handle. The system can reason through much of the work while still operating within established guardrails.

Why AI Should Start in Infrastructure

One of the most critical moments at the VIVE session was the argument Thiruvengadam made for why infrastructure is uniquely well-suited for AI autonomy and why that matters so much in healthcare.

The core idea is that infrastructure is a finite state machine. Regardless of how complex the environment might be, it still operates within a bounded set of possible states. A model asked about networking configurations is working in a domain where errors are more detectable and guardrails are easier to define.

That reality is fundamentally different from clinical AI application layers like the patient-facing work Thachile oversees. There, the space is much less bounded and the repercussions affect real humans in real time.

In short, infrastructure is one of the safer places to start with AI because the state space is finite and bounded. So teams can use AI with greater confidence as they deploy it in regulated industries. For this reason, it becomes a strong foundation for more ambitious AI applications elsewhere.

And now we know this is not operating in theory. Spring Health’s first AI use cases were read-only: incident triage, root cause analysis, and log investigation.

Low risk. High value.

Thachile told the audience that the impact was immediate. RCA quality improved, the number of patient-facing incidents dropped dramatically, and mean time to root cause compressed significantly.

As he said, “Crunching that time down was just like the low hanging fruit.”

What the Numbers Tell Us

Thachile admitted he was surprised by what DuploCloud made possible: “We were very surprised by the kind of leverage we received. The SRE-to-engineering heuristic is usually 1:20 or 1:30. We’ve been able to change that to something close to 1:50.”

A 1:50 DevOps-to-engineer ratio sounds like a staffing metric. But in practice, it’s a real look at what becomes possible when infrastructure is no longer a bottleneck. Spring Health’s engineering team can move faster on clinical AI, on care delivery features, and on the operational tooling that makes providers more effective.

All of this because they no longer have to wait on infrastructure to catch up.

The team cut operational costs by 30-40% thanks to direct reductions in infrastructure management overhead and by avoiding proportional DevOps headcount growth as engineering expanded.

They also achieved 99.9% availability, a figure that reflects what AI-assisted incident response and infrastructure leverage can produce when deployed well.

Technology Can’t Solve It All

Still, Thiruvengadam closed the session with an honest note that typically doesn’t find its way into these talks.

The technical problem is building AI-augmented infrastructure that actually works. And that problem is tractable. DuploCloud and Spring Health have shown that.

But the more difficult problem is organizational.

How do you get operations teams to trust systems they didn’t build? How do you convince them to change workflows that have calcified over the years?

How do you help them give up the institutional knowledge that comes with being the one person who knows how everything works?

There’s the real friction.

Spring Health is a newer, technology-forward company, so their path was easier than it would be for legacy health systems with decades of entrenched processes.

The playbook is real. The results are real.

The people and process side is where it gets hard.

But any organization willing to take on both can unlock results that speak for themselves.

Spring Health Scaled Its DevOps Without Scaling Its Team: Here’s How

At VIVE 2026, the head of AI at Spring Health and the CEO of DuploCloud shared how Spring Health kept its DevOps team lean while supporting a growing engineering organization and helping more people through its mental health platform.

The Problem: 200 Engineers, One Flat DevOps Team

The Solution: Work with an AI System that Lives on the Server

Why AI Should Start in Infrastructure

What the Numbers Tell Us

Technology Can’t Solve It All

Suggested Blog Articles

THE 10 HOTTEST DEVOPS STARTUPS OF 2022 (SO FAR)

A Guide to Building a Resilient DevOps Infrastructure

Selecting the Right Tools for AI Cloud Management in AWS, Azure, and GCP