The AWS Well-Architected Framework provides guidance on applying best practices in the design, delivery and maintenance of AWS environments.
The framework addresses general design principles and specific best practices across the five pillars or conceptual areas of the framework.
The AWS Well-Architected Framework lets you assess the pros and cons of decisions you make when building systems on AWS infrastructure and provides the mechanism to consistently measure your architecture against best practice and identify areas for improvement.
The Five Pillars of the framework are :
Pillar 1 : Operational Excellence
The ability to support development and run workloads effectively, gain insight into operations and to continuously improve supporting procedures and processes to deliver business value.
There are five design principles that fall under the operational excellence area of the well-architected framework.
a) Perform Operations as Code. In cloud computing, you can apply the same engineering disciplines that you use for application code to your entire environment. You can define and update applications and infrastructure with code and perform operations with code that limits human error and provides consistent responses to events.
b) Make frequent, small, reversible changes. When you design workloads that have components that can be updates regularly, you can improve systems in small increments that can be easily reversed if they fail.
c) Refine operations procedures frequently. Looks for operations procedures to improve on a continual basis.
d) Anticipate failure. Test failure scenarios ahead of time by identifying potential sources of failure and implementing mitigation strategies.
e) Learn from all operational failures. Drives improvement through the lessons learned from operational failures and sharing that knowledge across your teams and wider organization.
The best practices surrounding operational excellence centre around Organization - How do you determine priorities, structure and culture that support your business outcomes. Preparation - designing workloads to understand it's state, utilise approaches that improve the flow of changes into production, mitigate deployment risks and workload readiness. Operation - understanding the health of your workload and operations and managing workload and operations to minimise disruption. Evolution - learn, share and continuously improve to sustain operational excellence that supports business operation.
The AWS Ops Excellence implementation guide can be found here:
Pillar 2 : Security
The security pillar centres on the ability to protect systems, data and provides assets to take advantage of cloud technologies to improve security. The AWS Well-Architected pillar provides an overview of design principles, best practices and questions you need to ask when considering the security of your AWS infrastructure.
Implement a strong identity foundation - Centralise identity management with the aim of removing the reliance of long-term static credentials.
Enable traceability - Aim to monitor, alert and audit actions and changes to your environment in real-time via log and metric collection.
Apply security at all layers - Use multiple security controls across multiple layers like edge of network, VPC, load balancing, instance and compute instances, applications and code.
Automate security best practices - Utilise automated software-based security mechanisms to help securely scale.
Protect data in transit and at rest - Data should be classified into sensitivity classes and methods like encryption, tokenization and access control where and when necessary.
Keep people away from data - Utilise tools and mechanisms to reduce or eliminate the need for direct access or manual processing of data.
Prepare for security events - Be prepared for incidents by having a management and investigation policy that encompass incident response simulations using automated tools to increase your detection, investigation and recovery speed.
Questions to ask yourself around your security include:
- How do you securely operate your workload?
- How do you manage people and machine identities?
- How do you manage permissions?
- How do you detect and investigate security events?
- How do you protect network resources?
- How do you protect compute resources?
- How to you define and classify data?
- How do you protect data at rest?
- How do you protect your data in transit?
- How do you anticipate, respond to, and recover from incidents?
An in-depth prescriptive implementation guide can be found here:
Pillar 3 : Reliability
The reliability pillar focuses on the ability of a workload to perform its intended function correctly and consistently when it’s expected to. This includes the ability to operate and test the workload through its total lifecycle.
Design Principles for reliability in the cloud
Automatically recover from failure - by setting and monitoring workload KPIs and triggering automation when a threshold is breached.
Test recovery procedures - use automation to simulate or recreate scenarios that lead to failure to expose failure pathways you can test and fix before they occur in production.
Scale horizontally to increase workload availability - Replace single large resources with multiple small resources to reduce the impact of a single failure point on the overall workload.
Stop guessing capacity - monitor demand and workload utilisation and automate the addition and removal of resources with service quotas.
Manage change in automation - track and review automated infrastructure changes.
Questions to ask yourself in relation to reliability include:
- How do you manage service quota constraints?
- How do you plan your network topology?
- How do you design your workload service architecture?
- How do you design interactions in a distributed system to prevent failures?
- How do you mitigate or withstand failures?
- How do you monitor workload resources?
- How do you design workloads to adapt to changes in demand?
- How do you implement change?
- How do you back up data?
- How do you isolate faults to protect your workloads?
- How do you design to withstand component failures?
- How do you test reliability?
- What is your disaster recovery plan?
You can read an in-depth reliability implementation guide here:
Pillar 4 : Performance Efficiency
Examines the ability to use cloud computing resources to meet system requirements efficiently and to maintain that efficiency as demand changes and new technologies emerge.
The five design principles of performance efficiency in the cloud are:
Democratize advanced technologies - utilise complex technologies provided as a service.
Go global in minutes - Deploy workloads in multiple regions to provide lower latency and a better experience for your users.
Utilise serverless architectures - Serverless removes the need to run and maintain physical servers for compute activities.
Experiment more often - compare different types of instances and storage configurations quickly.
Consider mechanical sympathy - Understand how cloud services are utilised and align your design with demand.
Questions to consider related to performance efficiency include:
- How do you select the best performing architecture?
- How do you select your compute solution?
- How to choose a database on AWS?
- How do you select your database solution?
- How do you configure your networking solution?
- How do you evolve your workload to leverage new releases?
- How do you monitor resource performance?
- How do you use tradeoffs to improve performance?
Find the AWS Performance Efficiency implementation guide here:
Pillar 5 : Cost Optimization
The ability to select and run components of business systems at the lowest price point.
The five design principles for cost optimization in the cloud:
Implement cloud financial management - Invest the time and resources in cloud financial management and cost optimisation.
Adopt a consumption model - Pay only for the computing resources you require. Increase and decrease resources as the business case dictates.
Measure overall efficiency - Measure the business output of the workload and the cost of delivering it.
Stop spending money on undifferentiated heavy lifting - Utilise managed services to remove the need to deploy servers and operating systems in your own data centres.
Analyze and attribute expenditure - Utilise the cloud to identify the usage and costs of systems and transparently attribute them to individual workload owners.
The questions to ask related to the fifth pillar of the AWS well-architected framework cost optimization include:
- How do you implement cloud financial management?
- How do you govern usage?
- How do you monitor usage and cost?
- How do you decommission resources?
- How do you evaluate cost when you select services?
- How do you meet cost targets when selecting resource type, size or number?
- How do you use pricing modules to reduce cost?
- How do you plan for data transfer charges?
- How do you manage demand, and supply resources?
- How do you evaluate new services?
You can read more about implementing cost optimization here:
Monitoring AWS Well-Architected principles in your infrastructure
On top of automated infrastructure diagrams, Hava includes an automatically generated AWS Compliance report within the reporting module that will check your AWS infrastructure configuration against the AWS Well-Architected principles.
The report will detail graphs of your resources, users and roles followed by detailed findings related to configuration settings that don't comply with the AWS Well-Architected methodology.
The findings are graded into four levels, Informational, Low, Medium and High. Each finding is listed with a detailed explanation of the issue and suggested steps on resolving the problem.
Hava's AWS Compliance reports are automatically generated daily and available on Professional and Business plans.
If you are using hava.io to automate your cloud documentation, then you will find reporting inside your dashboard. If not, you can take a free 14 day trial to check out the interactive network topology diagrams and the AWS compliance reporting module.
(No Credit Card Required)