This article is part of a series (links below). I would recommend reading the articles in order, starting with “Greenfield Opportunity”, which provides the required framing.

In the article Hybrid Multi-Cloud, I shared our application/data hosting strategy, covering SaaS, Public Cloud, Colocation Data Centres, and Edge Computing. This strategy emphasises modern application architecture, but is also pragmatic, providing options for almost any workload.

To unlock the value of this strategy, automation becomes a necessity, helping to streamline development and operations, improve the user experience, as well as guarantee security, quality, and privacy.

This article will highlight our proposed automation architecture, describing our philosophy, key technology decisions, and positioning.

Introduction

Historically, when dealing with legacy technologies, the effort to automate IT would often outweigh the value. Thankfully, due to advancements in Cloud, API-Centric Architecture and Software-Defined Infrastructure, automation has not only become simpler but also a prerequisite when looking to unlock the value of a modern IT ecosystem. This is highlighted by the value proposition outlined below.

  • Improved Speed: Reduce the reliance on manual intervention, streamlining the IT processes from ideation to production.

  • Improved Agility: Ability to deliver continuous improvements, enhancements, and issue resolution.

  • Improved Autonomy: Self-service for IT provisioning (e.g. Application Hosting, Network, etc.)

  • Improved Flexibility: Ability to complete real-time, ad-hoc releases across the entire IT ecosystem (e.g. Cloud, Colocation DC, Edge Computing, etc.)

  • Enforced Best Practices: Enforce consistent solution design and developer best practices (e.g. Style Guides, etc.)

  • Improved Information Security: Centralised governance, including role-based access and monitoring. Proactively mandated and enforce security controls (e.g. Linter, Source Code Analysis, Package Analysis, etc.)

  • Improved Quality: Proactive service discovery and CMDB Configuration Item creation. Proactively mandated and enforce quality controls, including testing (e.g. Application Scaffolding, Application Manifest, Unity Testing, UI Testing, etc.)

  • Improved Privacy: Proactively mandated and enforce privacy controls, including data encryption, obfuscation, residency, high-availability, and backup.

  • Cost Optimisation: Removal of duplicate technologies (technical debt) to support IT provisioning, governance budgeting, discovery, and change control. Removal of time-consuming manual tasks and associated resources.

To reinforce our position regarding automation, we have embedded key automation concepts and techniques as part of our IT principles/decelerations. The intent is to enable better strategic decision making, by proactively triggering the right architecture conversations. For example, the three statements below provide a clear direction regarding our desired approach to architecture.

  • Immutable Infrastructure: By default, new IaaS server deployments must target immutable infrastructure, where servers are ephemeral (short-lived) and stateless (do not persist a session).

  • Infrastructure-as-Code: By default, all cloud services (SaaS, Public Cloud), Colocation Data Centre, and Edge Computing must be fully automated, leveraging Infrastructure-as-Code and Software-Defined techniques.

  • Automate Everything: By default, new application deployments must be automated, leveraging build tools and/or automation services.

A full description of our IT principles/decelerations can be found in the article Modern IT Ecosystem.

Automation Requirements

The word “automation” can cover a very wide scope, therefore it is important to define exactly what capabilities are being targeted for automation. To support our business model, we are targeting three personas:

  • Product Team: IT Employees and/or Business Partners. Assume basic IT skills.

  • Engineering Team: IT Employees and/or Service Integrators. Assume moderate/advance IT and software development skills.

  • IT Admins/Operations: IT Employees and/or Service Integrators. Assume moderate/advance IT and software development skills.

Within these personas, we have the following requirements. These requirements describe which capabilities we aim to automate, connecting back to the previously stated value proposition.

  • Provisioning: The ability to create, modify and provision low-level (e.g. Sever) and high-level (e.g. DNS) infrastructure, as well as software services.

  • Governance: The ability to create, modify and maintain groups that control the user’s ability to provision, modify and remove infrastructure and services.

  • Budgeting: The ability to proactively understand provisioning costs, as well as track the spend over time (based on defined groups).

  • Discovery: The ability to identify, track and manage all provisioned infrastructure and services, including the creation and maintenance of CMDB Configuration Items.

  • Change Control: The ability to track and manage change across all provisioned infrastructure and services.

  • Operations: The ability to trigger housekeeping and maintenance activities, based on pre-defined criteria/events and/or predictive insights.

  • Testing: The ability to proactively trigger pre-defined tests, covering Unit, Module, Integration, System, and UI.

  • Standards Checks: The ability to proactively trigger pre-defined security, quality, and privacy checks.

  • Documentation: The ability to generate and maintain quality (installation/configuration/change) documentation.

To achieve this outcome, new concepts, techniques, and technologies will need to be deployed, including a paradigm shift to Immutable Infrastructure.

Immutable Infrastructure

In a traditional “mutable” infrastructure, services can be modified post-deployment, usually via a manual process. For example, an engineer and/or administrator would log in to a server via SSH and complete a set of manual steps to modify configuration, packages, etc.

Immutable infrastructure is a new infrastructure paradigm, where services cannot be modified post-deployment. If a modification is required, a new service is built from a common image, which inherently includes the required changes. This is only possible, if a modern application architecture is followed, for example, The Twelve-Factor App methodology.

The “Pets vs. Cattle” analogy is often used to describe this paradigm shift, where historically services were treated like pets (one of kind, long life, managed by hand). Whereas modern (immutable) services are treated like cattle (one of many, shorter life, managed at scale).

To support the value proposition, we intend to aggressively target immutable infrastructure, which is primed for automation via Infrastructure-as-Code (IaC).

Infrastructure-as-Code

Infrastructure-as-Code (IaC) is about writing code that describes the infrastructure, similar to writing a program in a high-level language. Once defined as code, infrastructure can be managed as if it were software, where resources can be created, removed, replaced, resized, and migrated, leveraging automation.

We have positioned Terraform and Ansible as part of our automation architecture.

Terraform is a cloud-agnostic Infrastructure-as-Code capability owned by HashiCorp, written in Go using HashiCorp Config Language (HCL). It includes an open-source command-line tool, which can be used to provision (low-level and high-level) infrastructure to multiple hosting environments, including Azure, GCP, AWS, as well as VMWare.

Ansible is an open-source Configuration Management, Deployment and Orchestration tool owned by Red Hat. Ansible leverages Playbooks (YAML files), to execute specific tasks, often focused on server configuration. Similar to Terraform, Ansible is agent-less, supporting a wide range of platforms and hosting environments.

In our proposed architecture, Terraform will be positioned to provision infrastructure, delegating configuration tasks to Ansible. The short video from VMware (embedded below) provides a good overview as to why/how Terraform and Ansible can complement each other.

We believe these two technologies not only adhere to our IT principles (open-source, decoupled, API-first) but also help to enable our Hybrid Multi-Cloud strategy, where we have the requirement to deploy to multiple hosting environments (Public Cloud, Colocation DC and Edge Computing).

Finally, as with any technology, Terraform and Ansible are not perfect and have their own set of challenges (e.g. State Management, Secret Management, etc.) With this in mind, likely, we will still leverage some cloud provider-specific Infrastructure-as-Code capabilities, specifically Azure Resource Manager (ARM) Templates, etc.

Automation Architecture

Although Terraform and Ansible are key technologies, they are supported by a wider automation architecture. This includes:

  • Embotics: SaaS-based Cloud Management Platform (CMP), which supports provisioning (via Terraform), governance (group management and change control), budgeting (proactive cost analysis and chargeback), as well as discovery (CI creation).

  • GitHub: SaaS-based source code management, which will host and version control Terraform, Ansible Playbooks and source code.

  • Azure DevOps: Building upon our partnership with Microsoft, Azure DevOps provides a suite of SaaS-based services covering CI/CD capabilities, as well as test tooling and package management.

  • ServiceNow: SaaS-based Service Management, providing change control and configuration management database (CMDB) capabilities.

  • Monitoring: A suite of tools (e.g. Azure, VMware, Application Performance Management), which monitor the end-to-end infrastructure, applications and data, with the ability to predict and proactively trigger maintenance activities.

The decision to leverage a Cloud Management Platform (CMP) was primarily driven by our desire to proactively manage the governance and budgeting, providing enterprise-level visibility and control, whilst also enabling autonomy for the business functions to provision infrastructure and services across any hosting environment via the simplified (non-technical) service catalogue. However, behind the scenes, Terraform and Ansible will be doing the heavy lifting, ensuring that our CMP is loosely coupled, enabling both “Cloud Direct” and “Cloud Brokered” provisioning approaches.

The diagram below highlights how these technologies (alongside Terraform and Ansible) complement each other to enable our end-to-end automation architecture.

Automation

As highlighted in the diagram, the Product Teams’ primary engagement will be via “user-centric” services, such as Embotics and ServiceNow. Embotics provides a service catalogue for standard services (e.g. Servers, Databases, Web Apps), covering all hosting environments (Public Cloud, Colocation DC and Edge Computing). Where required, ServiceNow manages all change control, including approvals.

The engineering team will primarily work within GitHub, following the GitFlow branching model. As code is pushed to GitHub, automated tasks are executed via Azure DevOps following a standard CI/CD practice. These tasks can cover a wide range of capabilities, including linters (e.g. ESLint), testing (e.g. Mocha), security analysis (e.g. Checkmarx), package analysis (e.g. Snyk), and document generation (e.g. Pandoc).

Finally, IT admins and operations will support the hosting environments, leveraging native services (e.g. Azure Portal, VMware vSphere) and tooling (e.g. Azure PowerShell, CLI).

Conclusion

In conclusion, our automation strategy aims to support a wider paradigm shift, where infrastructure is defined and maintained as software. We believe this approach will unlock tremendous business value, specifically speed, agility, flexibility and cost optimisation, as well as promoting innovation.