AWS makes IT infrastructure migration very appealing because you can scale resources according to your needs and don’t need to worry about the underlying management operations. You pay only for what you use, which is a good way to save money. However, unless you are careful and diligent, it’s easy to lose control of your cloud spend through over-provisioning, turning off unwanted resources, and so on. This brings us to our topic for today – cost optimization and the gotchas to watch out for.


We believe that cost optimization should be done before any other level. We need to begin at the most fundamental level, Security and Fault Tolerance. Without proper security controls, optimizing your architecture won’t be worthwhile. Native tools in AWS can be used to create notifications and monitors for resources being spun up in unutilized regions. 

To protect against two scenarios, you need to ensure that your account is secure. First, to prevent anyone from gaining access without authorization and spinning up resources. Second, ensure that only authorized users have access to your account and aren’t creating resources they don’t need. 

Fault Tolerance

Fault tolerance is very similar to security. It doesn’t matter how many infrastructure dollars you save if your website or app goes down. This is a cost-optimization opportunity that you must design for failure to make it a viable option. This means using multiple availability zones to protect your site, creating AMIs for all instances, and auto-scaling.

Once your infrastructure is secure and fault-tolerant, you can begin cost optimization activities.

Right-Sizing and Right-Typing

To ensure that you are running the correct instance sizes and families (Rightsizing) for your application, analyze your infrastructure’s performance metrics. You don’t want to run a high-performance compute-intensive app on a memory-optimized server.

It can save you a lot of money by knowing the exact type and size of your instance, including generation.

Consider the storage and memory usage when you analyze your instances. AWS doesn’t provide metrics about memory usage for their EC2 instances out of the box. The only way to get those is by installing an agent that provides statistics around memory usage, disk usage, etc.

AWS releases new instances with faster processors, higher performance, and lower costs. It’s a double-win if you upgrade to the new generation at the same size because you get more performance at a lower price. However, this is not always true. 

Windows uses per-core licensing. This means that AWS may have updated their instance generation and added cores to the new generation. This has led to higher costs for upgrading. Be on the lookout for these, but generally, upgrading to newer generations will save you money.

Instance Scheduling

Instance Scheduling is a way to save money on instances by turning them off overnight or when they aren’t being used. Let’s simplify this: If you can turn off your instances for 12 hours per night and on weekends for your dev instances, you will be able to save approximately 65% on your weekly 168-hour usage. 

There are third-party tools to do this, and you can use Lambda functions. Automating this is a good idea. This will ensure that nobody forgets to turn it off after they go home, or leaves it on for too long. Third-party tools will also have schedules, so you can set the default setting for “off” to prevent users from forgetting to turn it off when they return to the instance. The user does not have to remember to turn it off. Instead, the instance will automatically switch to the “Off” state at the end of each day.

Reserved Instances

After you have done your right-sizing, right-typing, and decided which instances can be booked, you can make reservations. Let’s talk about one common problem we see with customers. They have high costs and they purchased their Reserved Instances, but aren’t seeing the savings they expected.

Why is this happening? A lot of customers buy RIs, then forget about them. They don’t monitor their usage of the RIs to see if there have been any changes in their accounts. If you change an instance size to make it larger, it might still apply to the new size but not completely. You may have some reservations left over if you make it smaller. You may decide to move to another instance, but you might forget that you have RIs. This can lead you to incur additional costs that you don’t get back. You should monitor how you use your RIs.

AWS doesn’t provide notifications about expired RIs and there is no way to renew them automatically. You can set a reminder on your calendar to remind you to buy the RIs and allow yourself 30 days to plan any upgrades. This way you will be able to plan your work to upgrade to the appropriate instance type or size when it expires. You can also quickly purchase reservations so that you don’t run on demand.

Zombie Resources

The cloud offers many benefits, including the ability to quickly spin up instances for development purposes or troubleshooting. These instances and volumes can become forgotten over time. They are not automatically deleted by AWS, so they become “zombie resources”. This can lead to a lot of problems and can result in significant costs. Be aware that you may not have any unused EC2 instances or RDS instances.

Third-party tools, as well as automated backup programs, can often be used to remove old instances and terminate them. Manual snapshots are not automatically deleted. Make sure to go in and find any old snapshots that were manually created and then delete them.

You should remember that even if you delete an item, the EBS volume and sometimes elastic IPs are still available. AWS may charge you a fee to remove them.

Architecting for Savings

  • Tagging

AWS lets you tag your infrastructure by the owner, product, team, environment, and business unit. AWS managed services allows you to use multiple tags per instance. You also have tools such as AWS Cost Explorer, CloudCheckr, and Cloudhealth. These tools can be used to cut and dice your data and determine where your costs are coming from. You have more visibility into your environment if you have more tags.

  • Automation

Humans are susceptible to making mistakes. Automating your processes in the cloud will reduce waste and reduce human errors. You’ll also be able to move faster in the cloud.

  • Consider Fault Tolerance costs

You might need to pay an extra fee for inter-region Network Address Translation costs (NAT) for data transfer between zones. These costs should be included in your cost budget.

The Future: Savings Architecture

If you can’t do so, consider how to design the next generation app or website to use features such as containerization, serverless, and other similar technologies. This will help you realize additional cost savings that these technologies allow. As always, reach out to us here to learn more.

Leave a Reply

Your email address will not be published. Required fields are marked *