July 8th, 2016
High Availability for Production Workloads in AWS
By Gregg Gibson
  If you are like most, you started out using AWS for dev and test workloads and maybe for some batch processing, and you may be thinking now about using AWS for production systems. The good news is that you will be in good company. AWS offers high availability across the entire family of services at an attractive price. All of the services where you may choose to store data are highly reliable, some of which are as follows:
Elastic Block Store (EBS) – Designed for 99.999% availability. Data is replicated within the availability zone to protect from data loss due to component failure. Snapshots can be taken and saved in S3 for additional protection. (See http://aws.amazon.com/ebs/details/.)
Simple Storage Service (S3) – Designed for 99.999999999% durability. Data can be replicated to other regions for increased confidence. (See https://aws.amazon.com/s3/details/.)
DynamoDB – Highly reliable, fully-managed noSQL database that relies on S3 for storage. All data is synchronously replicated to three availability zones to ensure data integrity. (See https://aws.amazon.com/dynamodb/.)
Relational Database Service (RDS) – A selection of managed SQL databases with automated backups. Options are provided for taking databases snapshots and multi-availability zone replication. (See https://aws.amazon.com/rds/details/#ha.)
The data storage services are all highly reliable, but what about the compute services? The EC2 SLA may leave you feeling a little bit uncomfortable. It basically doesn’t say much about instances you have running. It says more about your ability to create a new instance if you need to replace one that you have running. (See https://aws.amazon.com/ec2/sla/.) Furthermore, if you have had EC2 instances running for a lengthy period of time, you might have even had the experience of an instance becoming unusable. Never fear. Just recognize that EC2 is a basic building block for AWS. It can be used for cost-sensitive applications that do not require high availability, but all of the necessary tools are available to combine instances into a highly reliable cluster for production applications.This illustration concept shows the choices of availability. The availability is the degree to which a system is operable As a starting point, you could set up individual EC2 instances, and rely on notifications from AWS or 3rd party monitoring tools to notify you of any failures that occur. That is a simple approach that could work if you don’t mind getting out of bed at night to manually recover from a failure. After all, it isn’t likely like to be a frequent occurrence for a simple, small environment. If that approach isn’t for you, you can let AWS take care of recovery for you. The trick is to set up an auto scale group for your instances. The maximum size of your auto scale group could be set to one, in which case the auto scale group would do nothing more than automatically replace an instance that has gone bad. The initialization time for a replacement instances can be several minutes or more, so to prevent downtime, you may want to set the minimum auto scale group size to two or more to ensure that at least one instance is always running. You may also want to set the maximum auto scale group size to something higher so that more instances would be launched if the traffic demand warranted it. (See https://aws.amazon.com/autoscaling for more information.) To handle the load balancing across the instances in your auto scale group, the AWS Elastic Load Balancing service (ELB) is a great choice. (See https://AWS.amazon.com/elasticloadbalancing/.) One decision that you will need to make in order to set up an auto scale group is how you will get application code and configuration data on a new instance that gets automatically provisioned for you. Some of the possibilities are covered in Deploying Applications to EC2 . As you can see, AWS gives you options that allow you to keep solutions simple and costs low for applications that don’t merit high availability and other options that allow you to assemble a highly available system.