Maintaining Your Cloud Infrastructure With The Microsoft Azure Well Architected Framework

Maintaining Your Cloud Infrastructure with the Microsoft Azure Well-Architected Framework 

Kyle Jones | October 18, 2022

The Microsoft Azure Well-Architected Framework is Microsoft’s set of guiding principles in maintaining and improving a customer’s workload quality in the Azure Cloud.

To ensure easy adoption of the framework, there exists what are called the “five pillars of architectural excellence.” These pillars consist of Reliability, Security, Cost Optimization, Operational Excellence, and Performance Efficiency.

If an Azure adopter is successful in implementing and maintaining their Azure resources within the framework of the five pillars, they will produce a high-quality, stable, and efficient cloud architecture. In this blog, we will provide you with brief descriptions of each pillar and examples of how you can implement it within your Azure architecture. 


Reliability, or high availability, in the cloud is closely related to ensuring reliability on-premises with the desire to have our applications remain available 24/7 without our clients noticing any downtime. This principle is still held dear in the cloud. However, failures are seen as an inevitable consequence of several factors of cloud-hosted applications such as the complexity of distributed systems, the use of commodity hardware, dependence on external services, and volume bottlenecks.

To combat these failures is to implement the pillar of Reliability in Azure Cloud by designing the architecture around two driving forces: business requirements and failure factors. By determining business requirement metrics such as Recovery Time Objectives, Recovery Point Objectives, and Service Level Agreements, we can design the architecture to include services that provide redundancy and recovery contingencies.  

Example: A mission-critical application hosted on IIS needs to remain up 99.95% of the time. By utilizing 2 or more VMs in an Availability Set, you can ensure that during VM maintenance windows, at least 1 VM is always operational.   


One could say that security and reliability go hand in hand. For an application or service to be highly available, we need to ensure that bad actors or even inadvertent changes by internal staff cannot bring down the application or service.

To design security into cloud architecture is to provide assurances against deliberate attacks and abuse on systems with the use of security layering. Microsoft’s security services are catered to the principle of Zero Trust, meaning that we must assume a breach has already happened. By designing security measures around factors such as Identity and Access Management, Threat Protection services, Information Governance policies, Threat Detection and Response policies, etc., we can mitigate and reduce the amount of damage.  

Example: A key service in Azure that can get your architecture into a stable state is Role-Based Access Control. Implementation of Privileged Identity Management and the use of the least-privileged principle will ensure legitimate identities cannot perform tasks outside of their scope. If that account is breached, the attacker can only perform the actions the identity is allowed to perform. 

Cost Optimization 

Adoption into the cloud is not only a technological and operational journey, but a budgetary journey. It is a shift from the classic paradigm of spending upfront infrastructure costs (Capital Expenditure) into the paradigm of metered costs (Operational Expenditure) on leased infrastructure.

As a public cloud, Microsoft Azure leases the infrastructure to its customers and charges its customers on resource usage. It is important to build a cost model which depicts departmental responsibilities, captures clear requirements, and considers any constraints or tradeoffs compared to the metered billing model of Azure resources. Additionally, implementation of cost optimization practices can ensure that proper consumption is exercised by deploying policies which restrict overprovisioning. This is typically the driving factor in runaway costs. Fortunately, there are many first- and third-party tools that provide customers with a user-friendly experience when analyzing their current resource costs.  

Example: CloudMonitor is a Financial Operations practitioner that specializes in developing software that aids its customers, across all departmental types, determines what is costing them in their Azure environment, and promotes recommendations not seen in the Azure Portal.   

Operational Excellence 

The pillar of operational excellence focuses on hardening operational processes that keep applications running in production. Deployments in Azure must be reliable and predictable to avoid the chances of orphaned resources or operational pitfalls caused by human error. Developing means of automation will provide fast and accurate deployments alongside providing the ability to quickly roll back or forward an update.

Key concepts in Operational Excellence include designing applications or processes for scalability and reliability, ensuring resources are properly monitored for anomalies, provisioning repeatable infrastructure through automation such as Infrastructure as Code, and ensuring proper testing platforms are developed with CI/CD in mind. 

Example: In Azure, tools such as Azure Policy can ensure best practice measures are audited and/or enforced during and after resource creation. Azure Policy can be applied at the Management group, Subscription, Resource Group, and Resource type levels. 

Performance Efficiency 

The pillar of performance efficiency focuses on ensuring workloads can meet the demands by scaling in an efficient manner. The conventional means of ensuring workloads, like servers, meet demand is by purchasing overprovisioned resources.

This way of thinking ensures the capacity for peak usage is supplied, but this can be costly in a cloud setting. Rather, this pillar supplies concepts that can assist in designing workloads for scalability, performance, and capacity by intertwining specific resources that can be dedicated to vertically or horizontally scaling the infrastructure to meet those peak usage demands. 

Example: In Azure, tools such as Virtual Machine Scale Sets can be used to host similarly configured VMs and during peak usage times based on metrics, can increase resource size such as CPU or memory to meet the demands. Alternatively, scale sets can deploy more VMs to distribute the load.

Next Steps: Reinforcing Your Cloud Home with a Strong Foundation 

Your cloud journey does not have to be a total manual effort or a hodge-podge of ad hoc changes to deal with immediate challenges. It can be a smoothly running vehicle to propel your business further by taking advantage of everything the cloud has to offer while maintaining a posture which promotes security, high availability, and a reduction of administrative overhead. 

Microsoft offers multiple tools such as Azure Advisor, Azure Score, as well as the Azure Well-Architected Review which can guide you through the evaluation and decision-making process in a simplified and easy way. However, having a certified partner providing professional experience in evaluating and implementing the recommended changes can take you even further.

To learn more about the well-architected framework, its principles, and how it can help reinforce your cloud journey, contact Arraya today. 

Visit https://www.arrayasolutions.com/contact-us/ to connect with our team now. 

Comment on this and all of our posts on: LinkedInTwitter and Facebook.     

Follow us to stay up to date on our industry insights and unique IT learning opportunities.