Kubernetes Cost Optimisation with Spot Instances

09.04.2021 | 7 min read

Michał Machowski

Lead Cloud Engineer (DevOps)

Kubernetes is an extensively used open-source platform that has grown tremendously in the last few years. It has become a primary container management system for scaling, automating deployment, and containerized applications management. Kubernetes supports various container runtimes, such as docker, CRI-O, Containerd, etc. This open-source platform gives us the freedom to take advantage of hybrid, on-premise, or public cloud infrastructure.

Amazon Web Services (AWS) is a famous place to run Kubernetes. According to Cloud Native Computing Foundation, 63 percent of the Kubernetes workloads run on AWS. There are three key requirements for any organization to run big data workloads: performance, scalability, and low cost. As data size increases, there is a need for more compute capacity, and the cost also increases.

Running containerized workloads and Kubernetes on Amazon EC2 spot instances is a perfect way for cost optimization. In this article, we will learn how spot instances help in cost reduction. But let's begin by touching on the Spot Instances Advisor tool.

Using Spot Instances Advisor

Before we start using spot instances, it is worth using the Spot Instances Advisor Tool, which will tell us which instances are worth using, how often they are preempted or which region to choose. The advisor helps you to find the pools with the least chance of interruption and provides the savings you get over on-demand rates.

Spot Instances Overview

Amazon EC2 Spot Instances allows you to take advantage of unutilized EC2 capacity in the AWS cloud. These are basically spare EC2 instances that let you save up to 90% compared to On-Demand prices. Spot instances can be used for various fault-tolerant, stateless, or flexible applications, such as containerized workloads, big data, web servers, CI/CD, test and development workloads, and high-performance computing (HPC).

Spot instances are firmly integrated with the AWS services, such as EMR, auto-scaling, ECS, Data Pipeline, Cloud Formation, and AWS Batch. You can choose the methods to launch and maintain applications running on Spot Instances. Spot capacity is divided into two parts determined by instance type, AWS Region, and Availability Zone. AWS sets the price for each spot instance type based on the trends in supply and demand.

When EC2 reclaims the spot capacity back with two minutes of notice, you can stop, hibernate, or terminate the spot instances. These are several sophisticated ways to handle the interruption that ensures the application is well-architectured for fault tolerance and resilience.

Spot Instance Interruption Handling

Use the 'node termination handler' to reduce the effect of spot instance interruptions. A pod deployed by DaemonSet on each spot instance detects the interruption notification. It will terminate any pod running on that node and allow Kubernetes to reschedule the ejected pods anywhere on the cluster.

Examples of when the Node Termination Handler is useful

The Node Termination Handler comes into its own, not only in expropriating spot instances, but also in the following:

EC2 maintenance events - AWS can schedule events for your instances, such as a reboot, stop/start, or retirement. Note that these events do not happen often.

ASG Scale-In - In the following three scenarios, the Auto Scaling group is directed to detach the EC2 instances from the group and terminate them:

When you manually decrease the size of the group.
When you create a scaling policy to automatically decrease the size of the group based on a specified decrease in demand.
When you set up scaling by schedule to decrease the size of the group at a specific time.

ASG AZ Rebalance - Rebalancing activities fall into two categories: Availability Zone rebalancing and capacity rebalancing.

EC2 Instance Termination via the API or Console - You can terminate an instance using the AWS Management Console or the command line.

Here is the summarized workflow to handle spot instance interruption.

First of all, it identifies that there will be a spot instance interruption in two minutes.
Prepare the node for termination by leveraging a two-minute notification window.
Traint the node and restraint it to prevent upcoming pods from being placed on it.
Safely drain a connection on the running pods.

Reasons to use Spot Instances

Provides cost optimization
Supports other AWS services
Massive scale
Easy to automate
Easy to use

How does Spot Instances help in cost optimization?

With Spot Instances, you have to pay the price for spots that are in progress for the time your instances are running on it. As Spot Instances save up to 90% over on-demand instances, it enables faster results through scaling.

Here are two key strategies for leveraging spot instances.

1. Maintain a minimum number of nodes for Spot Instances and On-demand services and then autoscale them using EC2 Autoscaling and Cluster Autoscaler. The Cluster Autoscaler integrates with Auto Scaling groups. When there are not enough resources, the executor and driver pods go into the pending state. Autoscaler detects pods in the pending state and maximizes worker nodes using EC2 Auto Scaling.

2. The scaling for Spot and On-demand nodes is exclusive of each other. When we launch applications, the executor and driver pods can schedule in different groups depending on the resource requirements. It thus adds resilience and cost optimization to the system by reducing job failure due to lack of resources.

Spot Instances best practices

Instance type requirements, application design, and your budget determine how to apply these best practices.

Use capacity-optimized allocation strategy.

Allocation strategies in the Auto Scaling groups allow you to deploy the target capacity without looking manually for the Spot Instance pools with free capacity. The capacity-optimized strategy is recommended because it automatically deploys spot instances from the most available instance pools.

Be flexible about instance types.

Spot instance pool is a set of unutilized EC2 instances with the same Availability Zone and instance type. Be flexible about the Availability zone for the workload and the instance type. It gives the Spot Instance a better chance to determine and allocate the required amount of computing capacity.

Use integrated AWS services for managing Spot Instances

Spot instances can integrate with other AWS services to reduce the compute cost without managing the individual fleets or instances. Amazon EMR, AWS Batch, Amazon ECS, SageMaker, Amazon EKS, Amazon GameLift, and AWS Elastic Beanstalk are recommended solutions for application workloads.

Use proactive Capacity Rebalancing.

Capacity Rebalancing helps maintain workload availability by adding a Spot fleet before the running instance gets a two-minute interruption notification. When you enable the Capacity Rebalancing feature, Auto Scaling attempts to replace instances that received the rebalance recommendation.

Cost saving Use Cases with Spot Instances

1. Big Data and Analytics

Fast-track the big data, Natural Language Processing (NLP), and machine learning workloads with Spot Instances. These instances provide scaling, acceleration, and cost-optimization for hyper-scalable, time-critical workloads that need rapid data analysis. Spot Instances can be used with Hadoop, Amazon EMR, or Spark to handle a massive amount of data.

2. CI/CD and testing

CI/CD workloads are fault-tolerant specifically designed to take advantage of cost savings provided by Spot Instances. Increase the cost-optimization by leveraging CI instances, as these processes do not require much power for testing. Integration, Load, security testing, and canary all benefits from the price savings and elasticity associated with Spot Instances.

3. Containerized workloads

Containers are often fault-tolerant, stateless, and best fit for Spot Instances. These instances can be used with Elastic Container Services (ECS) to run containerized workloads. You can also create Spot clusters with Kubernetes, Amazon EKS, or AWS ECS to process containerized workloads.

4. High-performance computing

Amazon Web Services provide the scalable and most elastic cloud infrastructure to run HPC applications. Accelerate embarrassingly parallel or loosely coupled HPC workloads, such as algorithmic trading and genomic sequencing. Integrate Spot Instances with AWS CloudFormation, AWS Batch, and other AWS services for a complete solution for large-scale computing workloads.

5. Web Services

Discover cost-saving initiatives or scale multiple instances for various web applications and services ranging from real-time bidding servers to ad servers. You can maintain optimal performance at a low cost during peak traffic using Spot Instances. Deploy EC2 Instance fleet behind a load balancer using Auto Scaling.

Increasing AWS EKS availability while using EC2 Spot

At 10Clouds, we always aim to offer our clients top cost-efficiency coupled with a high performing product. For this reason, we use the ‘Priority’ Expander with the K8s-Cluster Autoscaler. In a nutshell, this allows us to set rules for the prioritisation of node pool autoscaling decisions. In doing so, it creates a fallback mechanism from spot instances node groups to on-demand node groups for stateless workloads.

Below is a brief overview of cluster autoscalers and expanders.

Cluster autoscalers - “Cluster Autoscaler is a standalone program that adjusts the size of a Kubernetes cluster to meet the current needs.”

Expanders - “When Cluster Autoscaler identifies that it needs to scale up a cluster due to unscheduled pods, it increases the number of nodes in some node groups. When there is one node group, this strategy is trivial. When there is more than one node group, it has to decide which to expand.”

Looking for a quick solution? If you find that fallback isn’t right for you, you might be interested in some of the quick solutions offered by Spot, which uses ML and Analytics to automate and optimize cloud infrastructure on AWS, Azure and GCP.

Conclusion

Whether you want to augment capacity without increasing your budget or have flexible computing needs, using Spot Instances can be a great way. Achieve cost-optimization and resilience by following the best practices of Kubernetes workloads deployment on Spot Instances.

By Michał Machowski

COMPETENCY PROFILE Comprehensive software engineer with a strong Java and Golang background, some front-end practice, and a deep passion for cloud technologies. Over 17 years of experience in many different projects where mostly played the role of Java Developer or more recently DevOps engineer. Extensive knowledge of Amazon Web Services. Experienced - over 10 years - Full Stack Developer. Programming in Java, Go and JavaScript. Certifications: - AWS Solution Architect - AWS Developer Associate - AWS Certified Security - Specialty Professional goals: - Architecting cloud solutions - Managing engineering teams - Cloud product visionary - Multidisciplinary cooperation around cloud services Motivators: - Softare & cloud passion - People and their development - Education - Work atmosphere - Self-development Specializations & Technologies - AWS - Terraform - Kubernetes - SAM & Serverless