What happens when a cloud service that is specific to one geographic region fails?
An AWS glitch called a “service event” happened on Sunday, September 20, 2015, from 2:19 AM – 7:10 AM PDT. The primary issue was with DynamoDB. The root cause was a network outage that caused extra load on DynamoDB’s internal metadata service. The extra load was not detected by the internal monitoring system until the metadata service was out of processing capacity, resulting in greatly increased DynamoDB error rates.
This type of situation, although rare, highlights a definite need for DynamoDB users to plan for future region-specific failures. The six techniques below can be applied depending on the volume, performance and availability needs. Most of these use on a fairly new DynamoDB feature, Streams, which “provides a time ordered sequence of item level changes in any DynamoDB table”.
- Cross-Region Replication – Single-Master Model
- DynamoDB Streams Kinesis Adapter
- DynamoDB Triggers With Amazon Lambda
- Cross-Region Replication – Multi-Master Model
- Backup to S3
- Amazon Data Pipeline
1. Cross-Region Replication – Single Master Model
Ideal for large data sets with high transaction volumes, cross-region replication uses DynamoDB Streams to typically sync a master table in one region to an identical table in another region. With Streams, near real time ordered transactions allow a replica table to serve as a read-only copy of the data. The replica table can technically still be written to, but this should be avoided unless the master table fails and a full fail-over is initiated.
Note: Cross region replication group can be used to sync multiple replicas of a master table.
Cross-region replication application can be implemented with this AWS-provided CloudFormation template:
For • us-east-1 • us-west-2 • eu-west-1 • ap-northeast-1 • ap-southeast-2
Note: For this to work you will need to ensure the Docker version in each template is compatible with the Amazon Elastic Beanstalk available in the region.
This will create two CloudFormation stacks with descriptions “Dynamo DB Cross-Region Replication Coordinator” and “AWS Elastic Beanstalk environment”.
Upon successful creation of the CloudFormation stacks; the “cross-region replication console URL” output variable can be used to associate the master table in one region, to replica tables having the same schema in the same or other regions. Once this setup is done, tests can be completed by creating, updating or deleting data in the master table and verifying that the replica tables received the updates.
Note: For regions where Amazon ECS is not available (currently Frankfurt, Singapore and São Paulo), use the following CloudFormation template instead:
The non-ECS solution uses an AWS provided open source Cross-Region Replication Library that replicates writes from the master table to the replicas in other regions. It can also be used in your own applications to build replication solutions with DynamoDB Streams. While AWS handles the increase in partitions to deal with DynamoDB shards and Streams, the cross-region library reacts by increasing the Kinesis Client Library (KCL) worker threads. More information about this can be found on GitHub.
AWS Console screenshots for the cross region replication process:
a. Cross region CloudFormation invocation.
Note: Accept defaults in the following screens and optionally assign an ssh key if you want access to virtual server. If there is a failure due to the Docker version, modify the template to change to a supported docker version as in the Elastic Beanstalk.
b. Cross region CloudFormation outputs.
2. DynamoDB Streams Kinesis Adapter
Amazon Kinesis is a service for real-time processing of streaming data at massive scale. The DynamoDB Streams Kinesis Adapter makes it possible to use the Kinesis Client Library (KCL) to process DynamoDB Streams. The KCL can make it easier to build applications that consume DynamoDB Streams.
3. DynamoDB Triggers with Amazon Lambda
Amazon Lambda provides a highly efficient, event-handling system that can be integrated with DynamoDB. Triggers can be created to process DynamoDB Streams by calling Lambda functions. In the AWS Console, a stream handling function can be creating by selecting the dynamodb-process-stream blueprint when creating a new Lambda function. Using Lambda allows you to have your custom event handling code run whenever a stream event occurs without worrying about scaling or availability within a region. You could use your Lambda event handler(s) to write to DynamoDB tables in a remote region.
4. Cross-region Replication – Multi-Master Model
Note: This solution should be considered experimental — we have not tested it.
As discussed above, the Cross-Region Replication Library can be used to replicate DynamoDB tables to remote regions. This library can also help implement replication in a DynamoDB environment with multiple masters, because to help ensure consistency it provides a simple conflict resolution mechanism using timestamps, or you can provide custom conflict management code.
In this scenario, an application in one AWS region modifies the data in a DynamoDB table. A second application in another AWS region reads the modifications and writes them to another table, creating a replica that stays in sync with the original table. And the conflict resolution mechanism helps when writing new records to a table in either region.
More details on this approach can be found in Akshal Vig and Parik Pol’s video talk at AWS re:Invent 2014.
Note: Replication in a multi-master scenario could also be handled by one of the other mechanisms discussed above such as Lambda or the KCL.
5. Backup to S3
Data can be exported from DynamoDB to S3 or imported from S3 to DynamoDB by invoking Amazon Data Pipeline from the DynamoDB console. This causes an Amazon Elastic Mapreduce cluster to be launched that create backups of the DynamoDB table. Parallel scans can be used to speed up the initial backup. Once the initial backup is complete, incremental changes will be sent to S3 to prevent future full table scans.
If the backups are created in an S3 bucket in a remote region, then in the event of a failure of the primary region, the backups can be used to create copies of the DynamoDB tables in the remote region. While it will take longer to get up and running with this option, storing backups in S3 will in most cases cost less than storing them directly in DynamoDB.
6. Amazon Data Pipeline
The Amazon Data Pipeline console template lets you configure periodic movement of data between DynamoDB tables across different regions or to a different table within the same region. This feature is useful in the following scenarios:
- Disaster recovery in the case of data loss or region failure
- Copying DynamoDB data to remote regions to support applications in those regions
- Performing full or incremental DynamoDB backups
Conclusion: Depending on your availability requirements and your throughput and table growth estimates, one of the solutions given above can be used to deal with just about any regional failure scenario. Our recommended starting point is to explore a single-master implementation of cross-region replication using DynamoDB Streams.