Home Cloud Is Istio Production Ready & When is a Service Mesh Overkill?

Is Istio Production Ready & When is a Service Mesh Overkill?

by Vamsi Chemitiganti

Kubecon 2018 was designated the ‘Year of the Service Mesh’. The chief reason for the existence of service meshes is to ensure that cloud-native deployments need not bake in a range of cross-cutting concerns (traffic shaping, security, observability, etc) into each microservice.  Using Envoy as a sidecar and sometimes an ingress, they separate all network failures from application failures so that the application does not need to know or care about any platform logic. That is the promised land. However, this is sort of a brand new technology space and early adopters will run into issues common with all open source projects. 

Image Credit: Emily Morter on Unsplash

How dependable Are Service Meshes in late 2020?

Last week’s post discussed and introduced Service Mesh technology.

Why Legacy Monolithic Architectures Won’t Work For Digital Platforms..

A few common questions I find myself discussing with a lot of enterprise customers are:

  1. When is it the correct time to introduce a Service Mesh into my application architecture?
  2. What are the antipatterns if any?
  3. What are the technical limitations as well as experiences from a performance standpoint?
  4. Differences between API gateways and Service Meshes? When do I need either?

As they mature and in a matter of a few quarters, Service Mesh technology should become the single pane of glass for most microservices deployments as well as the best solution to fix interservice issues, traffic policies, security, etc. However, it needs to be remembered that they are still a brand new category of technology.

In a nutshell, “No, not yet”.

Before I delve into the reasons, please understand that a service mesh intends to make your microservices implementation more robust by providing you with the foundational services in a few areas. These are dynamic routing/traffic shaping (for canary deployments, A/B testing, etc), improving resilience (circuit breaking), security (encryption, mTLS, auditing between services), and observability (measuring latency, uptime, and usage patterns)

I have been part of many customer interactions around the Managed Istio offering. Let me distill the essence of my conversations & experiences into the following.

First of the answer to the blog headline is a little multilayered, so I break it down into three areas –

  1. Application Design & Organization Issues
  2. Performance Issues
  3. DevOps

Application Design and Organization Specific Issues

In general, the complexity of service meshes can be overkill for smaller projects.

The following issues stand out –

  1.  Your application is just starting out with cloud-native design and has less than say 10 microservices running in just a few (say <3)   clusters, the
  2. Your microservices are coarse-grained in nature & support more of  SOA style architecture as opposed to a lighter microservice fronted with a simple REST API. Technologies such as Istio assume that its one microservice per container which makes running legacy services a bit of a challenge
  3. Your application development is primarily in one language with a DIY (do it yourself) framework that manages some of the cross-cutting complexity in these services
  4. You have not mastered the steep learning curve that comes with running containers (and K8s) at scale
  5. Your application does not need to be deployed as in multi-tenant usage scenarios
  6. You are deployed into a single cloud (AWS, Azure, VMware, etc) and don’t need to support hybrid cloud deployments across a growing number of microservices
  7. You are deploying workloads that are extremely latency-sensitive
  8. You work in a highly regulated industry with strict PCI/SOX and/or FedRamp security requirements
  9. You have a small operations team that is already taxed maintaining existing clusters and has no bandwidth to support newer technology
  10. You do not have the budget to potentially build out a Service Mesh support capability in your organization

Performance issues & the need to benchmark your application

Now, let us harken back to the Istio architecture as discussed in the last post – http://www.vamsitalkstech.com/?p=8746

To recap quickly, Istio consists of a control plane and a data plane. Both of these components (especially the control plane) have a lot of moving parts which can cause deployment complexity.

The Control plane Istiod, configures the Envoy sidecars based on user configuration (based on CRDs and deployments). The control plane in a large scale production deployment is intended to manage thousands of pods across hundreds of VMs/Bare metal servers running K8s clusters.

The performance of the control plane depends on

  • How fast deployments change
  • How frequent your config changes are
  • Number of proxies connecting to the control plane

It is recommended to benchmark your Istiod performance in terms of vCPU usage, memory usage for peak loads (x number of services, 2x number of sidecars, etc) on a per namespace basis.

The performance of the Data plane performance depends on the number of client connections coming in, the request size/response rate, the number of CPUs, any telemetry related filters running in the client service.

It needs to be mentioned that the sidecar proxy does add an extra hop which can be around tens of milliseconds as the number of services continues to grow. This may be an issue for certain kinds of workloads such as databases.

DevOps and Service Mesh

It is key to understand that unless you have the agility in your current DevOps pipelines, technologies such as Istio will add more technical debt if not leveraged wisely.

So a Service Mesh may not be the best technology if you have –

  1. Very Low Deployment Frequency – you deploy apps once every few weeks and do not need features such as A/B testing support, Canary deployments
  2. Low Change Volume – You are deploying very low amounts of functionality /user stories per deployment, which means you do not have a huge degree of risk
  3. Tolerance for High App Deployment Times – You have an ability to go offline while deployments are done

Logging, Observability, and Service Meshes

Observability refers to the challenge of maintaining a running Istio environment by monitoring the telemetry for service communications within a mesh – both within the mesh components as well as within the microservices themselves. Istio also generates distributed trace spans that provide an understanding of call flows within the mesh. Logging is also enabled using the EFK stack for instance. This enables operators to audit service behavior at the workload level. These tools should be used to perform extensive testing on your application and how it behaves/performs with Istio.

Conclusion

I will just restate my conclusion from the blog last week –  “As they mature and in a matter of a few quarters, Service Mesh technology should become the single pane of glass for most microservices deployments as well as the best solution to fix interservice issues, traffic policies, security, etc. However, it needs to be remembered that they are still a brand new category of technology.

Discover more at Industry Talks Tech: your one-stop shop for upskilling in different industry segments!

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.