16 min read

Visualising Cloud Architectural Drift

July 27, 2023

Architectural drift sneaks up on you. What was once trim and lean cloud infrastructure can soon become a 900lb Gorilla consuming all the resources and budget you can feed it.

This can happen for a million different reasons, all of which you should prepare to manage and understand. 

If you are interested in curbing architectural drift, automating notifications as drift is occuring, staying up with and ahead of changes in the infrastructure you manage and easily visualising changes as they occur, you are in the right place.

With the increasing adoption of cloud services, managing infrastructure drift has become more crucial than ever. Ensuring that your cloud resources are secure and in sync with their intended configurations can be the difference between smooth operations and devastating security breaches. But how can you stay on top of the ever-evolving cloud environment? The answer lies in drift detection and management.

In this blog post, we’ll delve into the world of infrastructure drift, its causes, and effects. We’ll also discuss the importance of drift detection and the top tools people use to get the job done. We’ll explore strategies to minimize drift and how Infrastructure as Code (IaC) tools can help fix drift when it occurs.

Finally we'll discuss how using Hava's version comparison diff diagrams can visually identify the changes that happen over time to help answer the inevitable questions around infrastructure drift and the related cost and security topics that are sure to surface.

Short Summary

  • Infrastructure drift can lead to security risks and deployment failures, making it essential for organizations to detect misconfigurations.

  • Drift detection tools such as Terraform, AWS CloudFormation, Pulumi and Spacelift are available for identifying misconfigurations in cloud infrastructure and using Hava you can view and present these changes in an easily digestible format.

  • Strategies like increasing IaC coverage, defining team boundaries and implementing least-privileged policies help minimize infrastructure drift while promoting system stability & security.

  • Visualising infrastructure differences. See how to easily monitor change over time, see what happened between releases or events, see exactly what was added or removed, track changes even if you aren't using other tools.

Defining Infrastructure Drift

drift

Infrastructure drift occurs when the actual state of cloud resources deviates from their prescribed Infrastructure as Code (IaC) configuration. While it is natural for cloud infrastructure to evolve, this divergence from the intended state can lead to potential security risks and operational challenges. Ensuring uniformity across disparate environments and implementing infrastructure as code are crucial for managing drift and maintaining the desired state.

To effectively tackle infrastructure drift, it’s essential to understand its causes and effects. In the following sections, we’ll explore the factors contributing to drift and the consequences of unmanaged drift on critical cloud services.

Causes of Infrastructure Drift

There are several potential causes of infrastructure drift, such as:

  • Manual changes

  • Conflicting Infrastructure as Code (IaC) code

  • Poor practices

  • Overlapping team boundaries

  • Clients making changes

These causes can lead to unmanaged resources. Manual changes can include configuration modifications, software updates, hardware malfunctions, and human error, all of which can impact the production environment.

Incompatible IaC code can also result in infrastructure drift when multiple development teams are working on the same infrastructure. Poor practices, inappropriate permissions, and overlapping team boundaries can further contribute to drift when teams are not cognizant of the modifications to the entire infrastructure being made by other teams or developers, or when they lack the necessary permissions to make or see changes.

Effects of Infrastructure Drift

Unmanaged infrastructure drift can have severe consequences on both critical and less critical cloud services. Some of the consequences of infrastructure drift include:

  • Security breaches

  • Ransomware attacks

  • Financial losses

  • Increased resource costs

  • Increased support costs

  • Deployment failures caused by configuration issues

These consequences can greatly affect critical cloud services and cloud security in cloud environments.

One of the key factors contributing to deployment failure is infrastructure drift, and managing infrastructure drift is essential to maintain system stability. Alterations to the configuration of code can potentially lead to infrastructure failure, causing infrastructure drift.

The Importance of Drift Detection

architecture_diff_diagram

Drift detection is a crucial aspect of drift management that involves comparing the actual state of resources deployed in the cloud environment with the state described in Infrastructure as Code (IaC) templates. It’s essential for:

  • Detecting misconfigurations

  • Optimizing the infrastructure lifecycle

  • Guaranteeing the security and consistency of cloud infrastructure in accordance with IaC configurations

  • Improving the infrastructure management process.

By enabling organizations to recognize and address discrepancies, security issues, and compliance violations, drift detection helps to decrease the risks of security breaches and facilitates the implementation of remediation actions for managed resources.

In the following sections, we’ll explore how drift detection can help in identifying misconfigurations and strengthening the infrastructure lifecycle.

Identifying Misconfigurations

Terraform_Errors

Misconfigurations refer to errors in the configuration of a system or application that may result in security vulnerabilities or other issues. Recognizing misconfigurations in cloud infrastructure aids in avoiding security threats and sustaining system stability.

Drift detection can help identify misconfigurations in cloud infrastructure, ensuring that security controls are accurately configured and secure, that security settings are properly implemented, and that configuration changes are properly documented. 

Strengthening Infrastructure Lifecycle

An infrastructure lifecycle includes the following stages:

  1. Planning

  2. Design

  3. Implementation

  4. Maintenance

  5. Retirement

Utilizing drift detection tools facilitates the maintenance of desired system states. IaC security practices, such as using IaC tools to define and enforce the desired system states and drift detection tools to detect and alert any changes to the system, can be implemented to strengthen the infrastructure lifecycle.

To reduce infrastructure drift, it’s recommended to set up clear team boundaries, implement least-privileged policies, and routinely conduct drift detection scans. In the next section, we’ll explore the top tools for infrastructure drift detection and how they can help address this challenge.

Top Tools for Infrastructure Drift Detection

hava-ss-terraform-marketplace

Several tools are that are highly recommended for detecting infrastructure drift, include:

  • Terraform

  • AWS CloudFormation

  • Pulumi

  • Spacelift

  • Driftctl

Each of these tools has its own strengths and limitations, making it essential to compare them and select the right tool for your specific needs.

Tool Comparisons

Terraform, AWS CloudFormation, Pulumi, Spacelift, and Driftctl each have their own drift detection capabilities, ease of use, and compatibility with various cloud platforms. Terraform, for example, is a widely-used open-source IaC tool that is straightforward to use and provides a comprehensive report of the drift, but it’s not as fast as some other tools.

AWS CloudFormation, on the other hand, is a powerful tool for managing cloud infrastructure, yet it can be challenging to use and requires significant manual configuration. Pulumi and Spacelift are both cloud-native IaC tools with user-friendly interfaces and comprehensive drift reports, while driftctl is a speedy command-line tool for detecting and managing drift.

Selecting the Right Tool

The most suitable tool for your needs will depend on your specific requirements and the factors that matter most to your organization. For instance, Terraform is a popular open-source tool for Infrastructure as Code (IaC) that is straightforward to use and provides a comprehensive report of the drift. AWS CloudFormation, while powerful, can be challenging to use and requires a lot of manual configuration.

Pulumi is a cloud-native IaC tool that is user-friendly and provides a comprehensive report of the drift, while Spacelift is a fast cloud-native IaC tool that generates a comprehensive report of the drift. Driftctl, on the other hand, is a command-line tool for detecting and managing drift that is fast and easy to use, but it does not provide a comprehensive report of the drift.

By considering factors like access level requirements, IaC coverage, and your organization’s specific needs, you can choose the most suitable tool to manage infrastructure drift effectively.

No matter which tool you prefer, adding Hava's version comparison diff diagram capability allows you to simply visualize the changes these other tool's drift detection reports or log entries may be alerting you to.

Using Hava you can select any two diagrams and compare them to see exactly which resources were added to the cloud infrastructure you are managing. You don't have to play spot the difference with two diagrams side by side, the diff diagram highlights all the changes for you.  

Strategies to Minimize Infrastructure Drift

architectural_drift_mitigation_strategies

To keep infrastructure drift in check, several strategies can be employed, including increasing IaC coverage, defining team boundaries, and implementing least-privileged policies to prevent unauthorized modifications. Implementing these strategies helps to detect and manage drift more effectively, reducing the risk of security breaches and system instability.

Let’s take a closer look at each of these strategies and how they can help minimize infrastructure drift, ensuring that your cloud resources remain secure and in line with their intended configurations.

Implementing IaC Coverage

Implementing IaC coverage is a crucial step in detecting and managing infrastructure drift. By ensuring that all resources are included in configuration files, drift becomes easier to detect and manage. Utilizing IaC tools like:

  • Terraform

  • Ansible

  • Puppet

  • Chef

Organizations can create machine-readable definition files that can be version controlled and automatically applied to the infrastructure.

By increasing IaC coverage, organizations can:

  • Track and record all infrastructure modifications

  • Make it simpler to identify and manage any drift that occurs

  • Maintain the desired state of the infrastructure

  • Reduce the risk of security breaches and system instability.

Establishing Clear Team Boundaries

Cloud_Infrastructure_Team2

Clear team boundaries are essential for ensuring that only authorized personnel are making modifications to the infrastructure, thus minimizing the risk of drift. By defining roles, responsibilities, and limits within a project or organization, teams can avoid duplication of duties and minimize the risk of conflicting IaC code.

Establishing clear team boundaries has several benefits.

  • Prevents overlapping responsibilities

  • Promotes better communication and collaboration among team members

  • Contributes to more effective drift detection and management.

Adopting Least-Privileged Policies

Adopting least-privileged policies is another effective strategy for minimizing infrastructure drift. Least-privileged policies ensure that personnel are only granted the minimum level of access or permission required to complete their tasks. By restricting access to only necessary permissions, organizations can reduce the likelihood of unauthorized modifications and drift.

Implementing least-privileged policies offers several benefits.

  • Helps to prevent unauthorized changes

  • Reduces the risk of security breaches

  • Better protects infrastructure

  • Maintains the desired state

  • Ensures system stability and security.

Fixing Infrastructure Drift with IaC Tools

CloudFormation_Home

Fixing infrastructure drift with IaC tools is a crucial aspect of maintaining system stability and security. By implementing proposed changes to bring the actual state back in line with the desired state and reverting unwanted alterations, organizations can effectively address drift and keep their infrastructure secure and reliable.

Implementing Proposed Changes

Tools like Terraform, AWS CloudFormation, and Pulumi can be used to detect and fix drift by applying the necessary adjustments to the infrastructure. By using these IaC tools, organizations can ensure that their infrastructure is set up accurately and uniformly, and that any modifications are monitored and version controlled.

Implementing proposed changes using IaC tools offers several benefits.

  • Helps to detect and manage drift

  • Guarantees that the infrastructure remains secure and reliable

  • Proactively addresses drift, reducing the risk of security breaches and system instability.

  • By integrating Hava into your IaC pipeline you can instantly update your infrastructure documentation, capture before and after diagram artifacts and also immediately trigger Hava alerts to send you a diff diagram highlighting the changes

Reverting Unwanted Alterations

Reverting unwanted alterations is another essential aspect of fixing infrastructure drift. Tools like Spacelift can be used to automatically detect and revert changes caused by drift, enforcing IaC guardrails and maintaining system stability.

This proactive approach to drift management helps to keep cloud infrastructure secure and in sync with the desired state.

Monitoring and Maintaining Cloud Infrastructure

Architectural_Monitoring_Alerts

Monitoring and maintaining cloud infrastructure is a crucial aspect of ensuring overall system security and reliability. By conducting regular drift detection scans and creating alerts and notifications, organizations can ensure that their infrastructure remains secure, compliant, and reliable.

Regular Drift Detection Scans

Conducting regular drift detection scans is essential for quickly detecting and resolving drift, minimizing the risk of security breaches and system instability. Regular scans can be scheduled to run at specified intervals to detect drift in a timely manner, providing a comprehensive analysis and ensuring system security.

By routinely conducting drift detection scans, organizations can identify and address drift more effectively, reducing the risk of security breaches and maintaining the desired state of their cloud infrastructure.

Leveraging Hava, there is no need for any special scans or scheduled maintenance to monitor cloud drift, it all happens in real time. 

Creating Alerts and Notifications

Hava_Alerts

Creating alerts and notifications for infrastructure drift is an essential aspect of drift management. By setting up custom alerts and notifications for critical deviations from established baselines, organizations can ensure that any detected drift is promptly addressed, allowing for quick remediation and maintaining the desired state of the infrastructure.

Tools such as Hava can be used to monitor not only deviations away from a baseline, but in fact ANY change in infrastructure

  • Set up custom alerts and notifications for environments you manage

  • Monitor these alerts and notifications, see exactly what your team or your managed service clients are changing as they change it.

  • Proactively address drift and maintain the security and reliability of your cloud infrastructure. When an alert is triggered you can review the diff diagram and make an assessment whether the change was intended, whether it should be allowed to persist, review the security implications and any implications surrounding costs and budget.

Visualising Cloud ArchitecturAL Drift for MSPs

Managing cloud accounts for customers is always a challenge. Keeping clients informed about changes their team is deploying and also responding to information requests is significantly easier using Hava's diff diagram and alerts capability.

architecture_diff_diagram

If a client has a fluid environment with lots of people making changes, setting up alerts that trigger when change happens and sends you a diff diagram solves a multitude of problems.

 

Often things change so far from the agreed support framework that conversations need to be had. This is possible by:

  • Ensuring good IaC is used and guard rails are established can protect the customer from unexpected bill shock or security issues. 
  • Integrate Hava into deployment pipelines to capture and document changes and capture artifacts along the way
  • Set up architectural monitoring alerts within Hava so you can jump on any drift as it's occuring
  • Communicate. Show the customer the diff view of recent changes, or changes over a period of time leveraging Hava's diagrams as code. See what you need to see, as you need to see it.
  • Audit time has come around again. Give the customer a diff view of their cloud infrastructure between now and the last audit date
  • Bill shock!  The client's cloud resource bill just shot up and they are asking questions. Show them the diff diagram detailing the architecture drift and exactly what made the costs go up. If they would prefer to know ahead of time (and who wouldn't) set Hava alerts to warn them ahead of time that things just changed.

Visualising Cloud Architecture Drift for Professional Services

PS organisations are faced with similar challenges. Either the client is changing the infrastructure you are tasked to manage, or the project you have been engaged for will require changes to the client's cloud environments.

Diff_View-1

In either scenario, having the Hava diff view allows you to visually demonstrate the changes, so you can visually demonstrate and talk to the changes and why they matter.

Visuals are so much easier to interpret than IaC code, especially if you have a Hava diff view clearly highlighting what has been added and what has been removed. Sitting two diagrams side by side, it can be difficult to appreciate the changes, especially with complex environments.

Within a project management context, product stand ups for the cloud team or DevOps become a whole lot easier when you can show what's running now vs what was there before.

Visualizing Cloud Architectural Drift for Developers and DevOps

As a developer, when you get notification from DevOps that the changes you requested have been deployed, you are going to want to check that things look right and match your expectations. Was there a typo in the terraform script? Was there?

Getting a diagram of the entire environment after the changes is good, but can be a little frustrating when all you want to see is the changed resources. Hava's diff view makes this possible. This is also important when all you have been used to working in your own dev sandbox and haven't had the opportunity to view the changes in the context of the entire organisation's infrastructure. 

Visualising Architectural Drift for FinOps

There was a large bill. What changed from the last time we got a bill to now? Can we please get a diff of all the things for the month so that we can then address the changes with the operational teams to understand if the cost is expected and or if an alternative architecture can be laid out to save on cost? - With Hava you can.

Summary

In conclusion, infrastructure drift poses a significant challenge to organizations relying on cloud services. Addressing drift requires a comprehensive approach, involving drift detection, the use of IaC tools, and implementing strategies to minimize drift. By tackling drift head-on, organizations can maintain the security and reliability of their cloud infrastructure, ensuring smooth operations and avoiding costly security breaches.

As the world continues to rely more heavily on cloud services, managing infrastructure drift will become an increasingly important aspect of maintaining secure and reliable systems. By understanding the causes and effects of drift, utilizing top tools for drift detection, and implementing effective strategies to minimize drift, organizations can stay ahead of the curve and ensure the long-term success of their cloud operations.

 

Hava provides a cornerstone solution to stay on top of infrastructure changes and effectively manage architectural drift while providing the diagramming automation and cloud search capabilities that makes it the preferred choice for top cloud teams around the world.

With Hava you get :

  • AWS, Azure, GCP and Kubernetes support
  • Fully automated diagram creation - no drag and drop required
  • Self updating diagrams - connect once, up to date forever
  • Versioning and the ability to diff any environment diagram with any other diagram
  • Architectural monitoring alerts - email and dashboard alerts when things change
  • Deep search - find resources running in thousands of accounts with a single command
  • Security Diagrams for AWS and Azure
  • Detailed List View with cost estimates
  • Container Workload Views
  • API
  • Embedded Viewer to place diagrams outside of Hava
  • Integrations with GitHub, Terraform and Confluence and more TBA

testimonials


 

Frequently Asked Questions

What is meant by configuration drift?

Configuration drift occurs when changes to software and hardware are made without being tracked, resulting in the environment no longer adhering to an organization’s requirements.

This can lead to security vulnerabilities, compliance issues, and operational problems. To prevent configuration drift, organizations must have a system in place to track and manage changes. This system should include processes for documenting changes and monitoring for unauthorized changes.

What is an example of configuration drift?

Configuration drift occurs when infrastructure configurations become out of sync, such as when primary and secondary networking systems have different configurations.

This can lead to unexpected behavior, security vulnerabilities, and other issues that can be difficult to diagnose and fix. To prevent configuration drift, organizations should have a process in place to ensure that all systems are configured correctly.

What is drift in DevOps?

Drift in DevOps is the gradual change of an app, microservice, or infrastructure from its intended configuration. This can be difficult to detect and can introduce risk that may not be seen or managed until something serious happens.

What are the potential causes of infrastructure drift?

Poor practices, manual changes, overlapping team boundaries, conflicting IaC code, and inappropriate permissions can all lead to infrastructure drift.

These issues can cause a variety of problems, such as increased security risks, decreased performance, and increased complexity. They can also lead to increased costs and a longer time to market.

 

Team Hava

Written by Team Hava

The Hava content team

Featured