Home Kubernetes Kubernetes Vertical Pod Autoscaling – A Refresher

Kubernetes Vertical Pod Autoscaling – A Refresher

by Vamsi Chemitiganti

Kubernetes and cloud-native applications are built on the concept of cluster & horizontal scaling (which have been covered in the last 2 blogs – https://www.vamsitalkstech.com/kubernetes/kubernetes-horizontal-pod-autoscaling-a-refresher/ and ), so it comes as an interesting capability to scale pods vertically. What are the key use-cases where a VPA (Vertical Pod Autoscaler) would be handy? 

The autoscale paradigm in K8s is that when user traffic or system usage or both increase, new pods are created that enable the application to handle the additional load. This needs some upfront effort and diligence in terms of infrastructure setup as covered here https://www.vamsitalkstech.com/kubernetes/kubernetes-cluster-autoscaling-a-refresher/ and here- https://www.vamsitalkstech.com/kubernetes/kubernetes-horizontal-pod-autoscaling-a-refresher/. Mainly understanding resource requests and optimizing infrastructure to handle them, configuring appropriate pod disruption budgets and ensuring that pods have an appropriate amount of resources etc. While bin packing pods on the host helps with squeezing out more performance our of hardware, buffers need to be kept to account for the occasional node or pod failure. All of this depends on the application under consideration and on benchmarking.

Introducing the Vertical Pod Autoscaler

Kubernetes has a mechanism called the VPA (vertical pod autoscaler) that can either add or decrease the CPU and memory reservations allocated to pods. Doing so enables free up resources for other pods to serve users.  The VPA works as shown in the below diagram. Autoscaling is configured via a CRD object called the VerticalPodAutoscaler. The admin can specify which pods should be vertically scaled, other resource recommendations and when they should be applied.

VPA computes CPU/Memory usage for containers in pods and uses these metrics to determine the optimal usage in the context of requests to ensure pods are operating at the right level. As a result, the VPA works to reduce resources for pods that are over-requesting resources while not using them. At the same time, it upscales resources for pods that are not requesting enough.  The VPA also deletes out of alignment pods and uses an admission webhook to update pods before they are deployed onto nodes. The VPA can also be configured not to delete pods. As an example, if a pod is consistently using 60% of CPU but requests only 20%, the VPA will delete the pod and then restart it using its object, Deployment for example, and then updates the pod with the needed resources.

So what are some common VPA auto-scaling considerations – 

  1. Optimize for Cost based on your cloud provider
  2. Understand the application ramifications from a business and end-user standpoint
  3. Use proper versions of the VPA along with the K8s version – esp when going DIY 
  4. VPA cannot be used with HPA when using CPU/Memory. It can only be used with HPA on custom metrics

Conclusion

VPA is another facility provided in K8s to ensure that Pods are right-sized in terms of resource usage. The VPA constantly monitors available resources and adjusts the resource requirements so capacity is available to contending application workloads. VPA also maintains limits & request ratios that are specified by the admin. Pods associated with any workload API resource such as daemon set/replicaiton controller etc.

Discover more at Industry Talks Tech: your one-stop shop for upskilling in different industry segments!

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.