Moving up in Scale
In the first part of this series (read it here), we looked at cloud governance when an organization is a small number of accounts. As an organization starts adding more and more accounts, the challenges of managing access, security, and costs increase.
We can see above how these changes add complexity as the number of accounts increases. In this edition we will discuss tools and techniques that can help with this problem. Again, there is no one right solution, and every organization needs to determine what works best for their AWS usage.
Governance at a Larger Scale
A manual approach may suffice at the small end of the spectrum. What happens if your organization grows to, for example, 100 accounts with an average of two regions used per account? Let’s say your organization has 50 developers, a 10-member security team, and a 20-person operations team. Now you must determine how to manage and control access from different team members to different accounts.
To assess security, you may choose to deploy account-specific checks using services such as Config rules or CloudWatch Rules. In our scenario, this leaves you with 200 different configurations to deploy, manage, and update. What happens when you have 1,000 or 10,000 accounts?
Now you must look at how you are going to get the data out of the 200 different locations. It is not feasible to manually review each account. An AWS Simple Notification Service (SNS) topic can be used to send messages to a set of recipients, but this must be configured separately for each account and region. A parent SNS topic, or even a CloudWatch Log subscription, can be used to aggregate messages from one or more accounts and regions to another account. However, this may introduce a management headache if the IAM policy on the parent destination needs to be updated for every new account and region.
These techniques also typically include one or more Lambda scripts. These must also be deployed per account and per region. Finally, we then need to consider the life-cycle management of all these Config Rules, CloudWatch Logs, SNS topics, and Lambda scripts.
Now we start to see the problems as we scale!
Identity and Access Management (IAM)
Managing access across large numbers of users and accounts is difficult. Doing this via AWS IAM users does not scale and is essentially impossible to manage effectively.
Many organizations would like to tie their AWS authentication back to their organization’s authentication and directory service, such as their Microsoft Active Directory. At the account level, this can be accomplished via AWS IAM Identity Providers. These services, however, only work at the account level. We are again back to a scale issue.
The AWS SSO service was recently released by AWS as a tool to tie authentication back across multiple accounts and applications back to an organization’s Active Directory environment. The AWS SSO service does require the use of AWS Organizations. AWS Organizations was introduced after many AWS customer requests to help in managing a large number of accounts. We will likely see many upcoming improvements with AWS Organizations and its integration with other services. Like most new services, AWS SSO has several constraints on how and what can be configured that will need to be evaluated before using.
Some organizations have built their own access management system using the AWS Simple Token Service, IAM Roles, and IAM trust relationships to generate temporary, short-lived access tokens that can be used to access the console UI or as application level tokens. These short-lived tokens grant access based on the role and associated IAM policies that were assumed in the target AWS account.
Sungard Availability Services’ Managed Cloud – AWS Service offers a federation portal based on these short-lived tokens. This federation portal interfaces with several different authentication services such as SSO, SAML, and Active Directory. This provides a mechanism to tie users back to their own organization’s authentication service and manages their access to various IAM roles on any number of different target AWS accounts. Access is then granted to the AWS User interface (UI) with the allowed permissions, or the user can generate temporary tokens that can be used with the AWS CLI or SDKs.
Once you grow beyond a few accounts,
AWS Organizations, which replaces consolidated billing, can be useful for rolling up costs to a single account. While this can give the financial team a macro view of costs, they don’t usually control and manage the services that make up the spend within each AWS account. The designated account owners need to have detailed billing information to better understand spend across various AWS services, but may not have access to the master billing data.
The macro level billing and resource utilization data should also be used to drive analytics to better understand how resources are being used across the entire organization. This information can be used to develop strategies to reduce overall spend and make better use of resources. Spend management can be accomplished through reserved or spot instances, idling of development and test resources that are no longer being used, or migration to other services that offer a better cost/performance value.
So, organizations need to develop tools and processes to manage costs based on their organization’s needs. There are also several commercial offerings from AWS Partners, such as CloudCheckr and CloudHealth, that offer cost management features. Once integrated with your AWS account(s), these tools offer different types of reporting, cost management, and cost savings recommendations.
Most larger organizations have security teams that are responsible for the overall security governance, auditing, and/or monitoring of resources. As we saw above, using account and region-specific tools like Config Rules to manage security on a larger scale can offer several challenges, especially from a central point of view. Custom applications are needed to gather Config Rule and Trusted Advisor data.
AWS has also recently announced several tools that are starting to help address these needs. Last month, AWS released the Config Aggregator service. This allows a central account to aggregate Config data. It looks to be a step in the right direction but will still take work to create a central monitoring solution. AWS’s GuardDuty is another new managed service that adds threat detection capabilities to an account. This service applies analytics to netflow, DNS logs, and CloudTrail data to look for potential malicious activity. GuardDuty can be configured to aggregate findings across a set of AWS account into a central account for processing and alerting.
AWS’s IAM trust relationships between accounts can be used to build a central security monitoring account or solution. Each target AWS account adds a role that can be assumed by the central security AWS account. The policy associated with the role can be set to allow read-only access to the services that need to be monitored. As a start, the AWS managed SecurityAudit IAM Policy allows read-only access to most services.
From here, any number of tools or scripts can be developed and run to gather data from any AWS account that trusts the central security account. What is developed is dependent on the organization’s auditing and security needs.
There are several open source examples that can be used to audit different AWS services. For example, Netflix has an open source security project called Security Monkey (https://github.com/Netflix/security_monkey) that can be used to audit and monitor security of AWS accounts. Security Monkey runs from a central AWS account and uses IAM trusts to assume a read-only role in the targeted accounts. A set of “Watchers” gather configuration data from the target AWS accounts and “Auditors” analyze the configuration for potential security issues on user-defined schedules. The results are viewable within the Security Monkey interface, sent via email, or interfaced into other systems. The Security Monkey code can be easily modified to change behavior, create new checks, add new features, etc. Changes can then be submitted back to the project for everyone’s benefit.
AWS also has many commercial partners available on the market that perform security checks and monitoring. This includes CloudCheckr, Evident.io, CloudHealth, Splunk, AlertLogic, and Nutanix Beam. Each tool has its own integration methods between their service and the target AWS accounts, normally via access key or IAM trust relationship. They each have their own cost structure as well. Keep in mind these companies now have insight and access into your environments. That risk needs to be reviewed with respect to your organizations risk management process.
Any of these central tools introduces additional costs and management overhead that needs to be accounted for. These costs may still be less than trying to manage resources in each individual account. Also, one of the advantages of running checks from a central location is that changes to the collector and analysis tools happen in one place versus across all accounts. This can allow for quicker changes, introduction of new checks, or running of ad hoc checks with minimal configuration changes.
Wrapping it Up
There is no one right solution to manage hese complexities at different scales. Each option has pros and cons. You must look at each of them and determine how well they work at the scale you are at now, and at the scale to which you may grow. Your organization’s policies or security needs will also influence the route that you choose.