Optimize Your Infra - Compute
Table of Contents
- Optimizing your Compute
- Reduce Your VM Prices
- Improve your CPU & Memory Utilization
Optimizing your Compute
Cutting right to the chase, running something on cloud, is super easy & super easy to shoot yourself in the foot when it comes to the expenses. Eventually someone always asks you to reduce your infrastructure spend.
How do you do do that?
There are two things you can do
- Reduce your VM prices
- Improve your CPU & Memory Utilization
Both of the above things would be applicable to all the major clouds (AWS, GCP, Azure).
Reduce Your VM Prices
Okay - this might be stating the obvious for those seasoned at dealing with Clouds but I guess, someone has to state it.
If you are using it consistently
Almost all compute instances variants will provide ways to get committed usage discounts. Well, in the case of GCP its actually called Committed Use Discounts (CUDs), while Azure calls it Reserved Instances (RIs) similar to AWS's Reserved Instances.
What it is, is simply committing to using the VMs for 1 year or 3 year time periods, and the prices are slashed by upto 70%!
This is helpful when you know what your load is (or at the bare minimum what your usage will be over the time period).
Use Spots ?
The 3 major clouds provide spot VMs.
Few characteristics on what differentiates a Spot VM
- Spot VMs are pre-emptable in nature, cloud can take away your VMs when someone else is ready to the pay the usual premium (or dedicated) price
- Doesn't have the same SLAs you'd expect normal VMs to have.
- Super cheap (can go upto 80% and more savings)
While the cloud might scare you into saying Spots can go away whenever there's a demand, you could get away with this constraint by either overprovisioing Spots or running your workload on a combination of dedicated & spot setup. It is often better to do both.
Clouds also ensure that you get your Spot VMs back whenever they are available.
More info on GCP's Spot VMs can be provided here, Azure's documentation here & AWS here.
Simply put - this is a super great lever to reduce cost - use it.
Improve your CPU & Memory Utilization
If your workload is not consuming a significant chunk of your CPU and / or memory, FIX IT.
Resources wasted is $$$ burnt
Provision what is required
Rightly size your VM.
Provision what you need and not what you feel like you need. I have seen cases of VMs provisioned with little regard to what the actual need is. And to some extent that is fine, without running something you'd anyway not know what is required.
Rightly size resources for your workload
If you are using K8s, you specifiy what you need. If you are giving less resources you'd see the pods get killed or services are just slow. While if you are provisioning more than you need you'd just have something running not making use of everything that's available.
You can always go back and fix the sizing of your workloads and / or sizing of your VM.
Understand what your workloads consume - Dashboards!
One easy way to know what the right amount of compute your workload needs is simply performing load testing & understanding how much load is the workload expected to handle.
Clouds provide you the visibility to see how well you utilize your machines. Use this data.
There's the Monitoring offering everywhere. Just plot the CPU usage, memory usage & use that as a signal to optimize
Note, for emphasis - reducing your resource wastage directly helps you reduce your infra burn!
Understand your workload better
Every workload / service / whatever is expected to do something. Can be loading 10GB of file in memory or exposing a service that needs to handle 10k QPS.
Understand the limits of your workload when run on specific compute. And make the intelligent informed decision of knowing how many resources are needed to fulfil the objective of the workload.
Take an example of a REST service running on K8s. Service is expected to handle 10k QPS with p99 latencies under 100ms.
In this case - simply perform crisp load tests and get the most out of 1 replicas under the constraints of latencies, and scale up the number of replicas accordingly. If your replica can do 1k QPS with p99s less than 100ms, then you can go with 10 replicas. Never hurts to add a buffer if you aren't sure about the load.
Different contexts will have different autoscaling approaches.
Taking the example of a REST service, if you see that your traffic isn't consistent and has its ups and down, you could use autoscaling and always ensure that resources are used optimally.
Another example is running a Spark workload - let Spark decide how much resources it wants (it anyway does that with Dynamic Allocation)!
In 2022, it takes less than 2 mins for a VM to be up. Autoscaling is a powerful lever. Also know what to autoscale on. Traffic, resource utilization, etc etc.
So far we have treated the workload as a black box, and tried to get most of it via load tests, rightly provisioning the resources, autoscaling etc.
Once these levers are exhausted (which in my opinion are the low hanging fruits and absolutely necessary) you should right away explore the code improvements.
Simply put - optimal code will require less time to execute thus reducing your actual spend.
Take a rest service where your replica can handle more QPS than the unoptimized code or a Spark job that no longer takes 1 hour to complete because you fixed its parallism.
Code optimizations are almost always specific to your code. What an obvious statement. Well what I mean is, Spark would have spark optimizations, Python would have its own optimizations. There would also be concepts like parallism, concurrency, async (and improving IO) that would translate to most frameworks and workloads.
You do you.
Lastly - you could always complain about your Infra Expenses to your Cloud Customer Success Manager and get a better deal out of it :)