February 23rd, 2016
Use AWS Lambda to Clean-up Untagged EC2 Instances
By Rich Uhl
Creating AWS EC2 Instances is effortless these days, either by the web console or through Amazon’s APIs. Just as easy (but oftentimes an afterthought) is the termination of these Instances while testing or in a lab environment. Manual termination is possible, but why do the work when it can be automated? Better yet, event-driven automation. Given an AWS region name (or list of regions!), a Lambda function will find and terminate “tagless” Instances. All the code used in this post can be found on our GitHub repo.

Summary

This walkthrough will demonstrate:
  1. Define “tagless” and event parameters
  2. Node.js Implementation of logic (yay asynchronous API calls!)
  3. AWS Identity and Access Management (IAM) Policy, and Role definition for Lambda function
  4. AWS Lambda function creation
  5. Examples

1) Conventions

AWS EC2 implements Tags as a list of 0 to 10 dictionaries associated with an Instance. Each dictionary is guaranteed to have two keys, Key and Value. The preferred definition of “tagless” is an Instance with an empty list [] of Tags, or a 1-item list, with a Key of Name and a Value of "". The following examples are “tagless”:
[]
[{Key:"Name", Value:""}]
AWS EC2 will strip whitespace from Tags upon creation, so Value: " " will be tagged as Value: "".
While the event that drives this function can come from any AWS Lambda source, it still needs to be defined as it contains the necessary parameters. JSON, being the ubiquitous format it is, will be used with a single key which contains a list of regions upon which to operate:
{
    "region": ["us-west-1", ...]
}
Warming up the Node.js interpreter takes ~50ms and network requests can be non-blocking. So taking action against multiple regions in a single invocation has potential to save time over repeated Lambda function executions against a single region.

2) Node.js Implementation

Give the code found in index.js an eye over and adjust DryRun flags as desired.
When DryRun is set to true, the AWS API will return a DryRunOperation exception. This is why having error handlers is useful.
If a different definition of “tagless” is desired, the isTagless() function logic can be changed without affecting the rest of the code.

3) Required IAM Policy and Role

AWS Lambda can invoke AWS APIs through a number of different SDKs, but only if it has proper permissions to do so. Permissions in this context are two-fold:
  1. Allow AWS Lambda to call AWS APIs on your behalf (Role)
  2. Allow specific API actions for the Role (Policy)
First create the necessary policy. From the AWS console, navigate to Identity and Access Management (IAM) and find the Policies tab. From here, create a Policy by using the Policy Generator with Amazon EC2 describeInstances and terminateInstances checked. You must select “Add Statement” in order to stage this Policy. Continue on and review the Policy in full after providing a name and description.
The Sid key is generated dynamically and is likely to be different for other implementations.
Next create the Role which is associated with this new Policy. In IAM, with the Roles tab selected, create a new Role and specify AWS Lambda as the Service Role. Next, attach the recently created Policy made in the step previous and review before creating. By this point, the proper API actions (describeInstances, terminateInstances) have been marked as allowed in a new Policy, which is now attached to a new Role. Time to move on to creating the AWS Lambda function which will consume this new Role.

4) AWS Lambda Function

As to be expected by Amazon’s APIs, Lambda functions can be created from the commandline/SDK(s). While not difficult to do so, it’s still useful to see and use the AWS Lambda web console. From the main AWS console page, locate Lambda and continue on to create a function without using a blueprint. Configure the function, pasting in the code and specifying the Role under which this function will execute. In the Advanced Settings menu, the timeout operation has been increased to 10 seconds – this will offset network latency in API calls from AWS Lambda. The function will exit appropriately if it finishes before 10 seconds and you will not be billed for remaining time.
It has been observed that API calls from AWS Lambda itself are tens to hundreds of miliseconds slower than when invoking from a local development environment – hence the large increase to timeout. Mileage may vary…
Configure a test event with specific regions to execute – a single region or multiple regions may be supplied. Be sure to check that DryRun is set as deemed appropriate while testing.
The terminateInstances API call is idempotent. That is, multiple calls with the same set of Instance Ids will continue to return success/true for up to an hour. Feel free to invoke many times, even if the previous run has started the termination process.

5) Benchmarks

Provided are AWS Lambda web console results for different invocations of this function:
  1. A single region with a single Instance.
  2. Two AWS regions with only one region having a single Instance to terminate.
  3. All AWS regions with only one region having a single Instance to terminate.
  4. All AWS regions with nearly all having 1 to 7 Instances to be terminated.
Note that there is an increase in function runtime when adding multiple regions as this means more API calls. However, as more regions are specified in the event the expense is amortized due to asynchronous network calls being non-blocking.

Closing Thoughts

  • While AWS Lambda functions can call APIs for other regions, Lambda itself is only available in a specific few regions. There might be some optimizations gained from deploying this function to multiple regions to handle events for other regions near, rather than having one region call API frontends for all regions.
  • The Policy created here has two actions. There is arguably more modularity in creating two separate Policies, each having a single action.
  • While there is no danger in using the semaphore demonstrated, the Promise library might be considered if more asynchronous calls needed to be coordinated.
  • Be sure to review AWS Lambda function logs in CloudWatch as they sometimes can be too large for the small log output buffer on the Lambda test page.
  • Currently the function will call context.succeed() even if a single API call fails. A global boolean could be used to mark if any request fails and call context.fail() appropriately. (Any failure is returned as cleanly as possible as can be seen in the benchmarks with DryRunOperation exception messages.)