The DevOps automation landscape is saturated with a myriad of software applications, each performing distinct functions across various layers of cloud infrastructure. The industry is now pursuing more end-to-end platform-based DevOps Automation solutions.
Organizations today have a wide range of cloud utilization options. Some maintain a minimal presence, employing several dockerized applications and utilizing fundamental data storage solutions such as SQL servers and object stores like S3. While others heavily depend on cloud provider-exclusive services. For example, AWS provides proprietary services like SQS, SNS, Dynamo, Lambda, EMR, RDS, MSK, and more. Regarding compliance requirements, some organizations adhere well to standard best practices, while others may be obligated to comply with SOC2, HIPAA, PCI, and NIST standards.
Figure 1 below shows a comprehensive view of a fully cloud-native application infrastructure operating in AWS. The scenario is analogous for Azure and GCP, each having distinct names for their cloud provider proprietary or platform services.
Here are the three most common DevOps Automation solutions:
- Do it yourself with Infrastructure-as-Code and Point Solutions
- PaaS Platforms abstracting Cloud Providers
- Platform Engineering Solutions
- Build it yourself
- Buy off the shelf
Figure 1
1. Do it Yourself with Infrastructure-as-Code and Point Solutions
The prevalent approach involves manually scripting using Infrastructure-as-Code tools like Terraform and Pulumi. Within this model, a dedicated team of DevOps engineers is responsible for weaving together and managing point solutions, covering tasks like Kubernetes (K8s) management, cloud platform provisioning, access policies, encryption, networking, Access Management, backup and restore, SIEM, Vulnerability management, audit trails, and CI/CD pipelines. Developer self-service is primarily confined to CI/CD pipelines, enabling updates to existing versions of application code.
Advantages | Disadvantages |
Highly Customizable: Operating directly on the lowest building blocks of the cloud provider's infrastructure during configuration. | High Operations Costs: This is primarily driven by the need for expensive DevOps salaries and tools for point solutions. |
Maximum flexibility: Organizations with large siloed functions like networking, security, compliance, and Kubernetes can easily pick their tools and individual approaches without impacting each other. | Error Prone: Most changes rely on human expertise to write the correct code and configurations. It is prone to errors either due to mistakes or lack of expertise. This adds additional risk to security and compliance, especially when required to operate under higher security standards like PCI, HITRUST, and NIST. |
Limited Developer Self-Service and speed as developers heavily depend on the DevOps team for changes, impacting agility and responsiveness. |
2. Platform-as-a-Service Solutions
Examples: Heroku, Aptible, Zeet
These platforms introduce a strongly opinionated abstraction layer on top of major cloud providers like AWS, Azure, and GCP. They view cloud providers primarily as sources of infrastructure for storage, compute, and network. As illustrated in Figure 1, these Platform-as-a-Service (PaaS) solutions primarily provide functionalities related to Kubernetes Orchestration, CI/CD, Observability, and customized implementations of select standard applications like SQL databases and Redis. However, these PaaS platforms explicitly omit various aspects of cloud providers, including diverse cloud-native services, detailed security and compliance controls, and the configuration of underlying storage, compute, and network.
There are two distinct categories of PaaS solutions:
- PaaS on user's cloud accounts refers to platforms operating within the user's cloud account.
- Hosted PaaS platforms, acting as equivalents to public clouds, are exemplified by platforms such as Heroku and Aptible. In this scenario, organizations don't own the underlying cloud account and have no direct interaction with it.
Advantages | Disadvantages |
Easy Onboarding for Simple Applications for simple applications. | Majority of Cloud-Native Services (approximately 80%) of DevOps and security operations fall out of the scope of the PaaS provider, limiting flexibility. |
Extensive Developer Self-Service within the defined scope of supported services. | Challenges meeting Compliance Standards. A limited scope in the overall infrastructure stack necessitates dealing with most security and compliance controls out-of-band. |
Lightweight Operational Model for Small Organizations that can operate effectively. within the constrained services exposed by the PaaS platform. | High Costs for Public Cloud PaaS Platforms such as Heroku and Aptible. They have a notable markup (40-100%) for compute costs compared to running directly on public cloud providers on which they run. |
3. Platform Engineering/IDP Solutions
Platform Engineering or Internal Development Platform (IDP) revolves around creating a developer-centric platform that empowers developers to build and manage their cloud infrastructure end-to-end independently, leveraging automation provided by the platform. The platform is meant to function as DevOps-as-a-Service. This concept is relatively new and not yet a mature software category.
3.1 Do-it-Yourself Platform Engineering/IDP Solution
In-house DevOps teams are evolving into platform teams intending to construct an Internal Developer Platform (IDP). Yet, these implementations frequently mirror the initial Do-It-Yourself (DIY) approach, employing Infrastructure-as-Code (IAC) and offering a restricted set of self-service use cases exposed through a CI/CD pipeline.
A prevalent trend involves using generic frameworks like Backstage.io to create developer portals. Customization is applied to publish standardized infrastructure templates that developers can deploy. However, there are several limitations associated with DIY platform engineering solutions.
Challenges of DIY Platform Engineering Solutions:
- Challenges with Templates in Dynamic Cloud Environments: With numerous moving components, templates fail to capture even a small set of use cases. This leaves developers reliant on DevOps teams to deploy changes via Infrastructure-as-Code (IAC).
- Limited Scope of Templates for Ongoing Management: Templates are primarily effective for initial configurations; however, ongoing management necessitates a "platform." This platform should enable the visualization, modification, and destruction of environments, tasks that demand the expertise of highly skilled and expensive distributed systems engineers.
- Missing Day-2 Functionality: Building an effective infrastructure platform requires incorporating essential functionalities such as just-in-time access, access controls, environment management, continuous compliance, security, and other user-centric abstractions, which are very hard to build in-house by IT/DevOps teams.
- Too Expensive and Takes Too Long: the complexity involved in building a platform can result in these endeavors either not materializing or falling significantly short of expectations, leading to a waste of time and resources.
In summary, for an infrastructure platform to be truly effective, all the components illustrated in Figure 1 should be natively supported as first-class components within the framework.
3.2 Buying an Off-the-Shelf Platform Engineering Solution
While Do-It-Yourself (DIY) platform engineering can be costly and time-consuming, it constitutes an emerging software category. Numerous established DevOps tool vendors have broadened their offerings to include an Internal Developer Platform (IDP), as exemplified by Harness.io and GitLab. Additionally, there are specialized Platform Engineering solutions like DuploCloud. The effectiveness of a platform engineering solution can be objectively measured by enumerating the number of supported cloud-native functions.
The table below compares and contrasts DuploCloud’s platform engineering solution with Harness and Gitlab, based on publicly available information. As you can see, Harness and Gitlab specialize in the CI layer with some limited CD functionality, while DuploCloud has rich CD with comprehensive provisioning and security capabilities that today is largely done by DevOps engineers by manually writing Terraform. An ideal solution for an organization could be to use Harness or Gitlab for CI, DuploCloud for CD, infrastructure provisioning, and security.
As the DevOps landscape continues to evolve, organizations must judiciously choose their approach to DevOps automation, balancing between customization and ease of use, cost-efficiency, compliance requirements, and the skill levels of their teams. The future of DevOps automation solutions lie in platforms that not only streamline cloud operations but also align closely with the strategic goals and operational realities of modern organizations. As this field matures, we can expect more comprehensive, integrated solutions that cater to cloud-based operations' diverse and complex needs.