Home Big Data Big Data & Serverless (on K8s) – Industry Architectures..(3/4)

Big Data & Serverless (on K8s) – Industry Architectures..(3/4)

by Vamsi Chemitiganti
The previous blog in this four-part series on Big Data & Kubernetes discussed integration points between Spark and Kubernetes. Let us now consider multiple integration possibilities for Hadoop/Spark with a Serverless functions (FaaS) platform such as Fission.  While I have written extensively about Fission over the last 18 months, the concepts outlined below apply to any industrial-strength FaaS platform.

Photo by Hsiao Shiyan on Unsplash
                                 Photo by Shiyan Hsao on Unsplash

Introduction

At their core, Serverless functions are about running pure business logic or ‘functions’ inside a container. These functions could be anything that makes sense to a business application. Some common examples include:

  • Payment Processing
  • Alerting – sending an alert to a Security Ops Console
  • Stream Processing – enriching a message with customer information
  • Doing a credit check
  • Submitting an insurance claim

Common use cases primed for Serverless functions include key industries, such as:

  1. Power & Utilities – where streaming data is intercepted and analyzed to perform demand forecasting, equipment reliability management, etc.
  2. Oil and Natural Gas Industry – Serverless is employed to perform log analytics to provide Single Views of Oil Wells
  3. Advertising – serving mobile & online users with instant & relevant offers based on their browsing history, location and buyer characteristics.
  4. Financial Services – such as banking and trading applications – where various facets of user data are populated in real-time in the customer portal.
  5. Healthcare – for example, the population of health management data
Let us examine some design patterns for common industry use cases to see how Serverless can help you accelerate innovation and software delivery for data-driven applications. We’ll be using the open-source Fission serverless framework for illustrating these architectures, as it is the most flexible and is not locked to a specific cloud provider or related services, so that your apps can run anywhere. You can swap Fission with any serverless/cloud offering of your choice.
All of the below use cases have been tried and tested from a Hadoop/Spark standpoint. For those interested in examining these architectures, please see the below blog post.

Architecture #1: Internet of Things (IoT)

Typically in IoT – from industrial internet to wearables, to smart cars – a set of autonomous physical devices such as sensors, actuators, smart appliances, and wearables, collect various types of data and communicate with an application running in the datacenter (either cloud/on-premises) using an IP protocol. Commonly, a lot of the data is aggregated using a gateway and then sent into a platform that can analyze all these variables for various business insights (performance, trends, triggered events, and more.)

The overall flow in an IoT application can be orchestrated using serverless functions:
  • Data is aggregated using a Gateway and sent into to various message queues running on a cluster of Kafka servers running on Kubernetes managed pods
  • Fission functions are invoked based on the overall pipeline flow:
    1. For a given file placed in a message queue, the contents of the file are passed into a Fission function that first normalizes it, extracts variables of interest and then sends the output into a NoSQL database or a file system.
    2. The second function will run in response to the normalized file being placed in the NoSQL database. It will read the contents of the file, perform computations as needed (based on the use case) and then invoke microservices that perform functions such as sending the data into a Data Lake or a Data Mart for further analysis.
  • Serverless functions can be written for any kind of processing actions in the event stream. The FaaS scales up or down as needed in response to data volumes.
  • The data can be optionally processed in memory using Spark or analyzed in a data lake for long term analysis.

Architecture #2: Financial Services (Payments Processing, Risk Calculations,  etc.)

In the financial services industry, critical tasks such as payments processing, compliance checks, and risk metrics can be calculated in real-time using a combination of Serverless & Hadoop/Spark enabled architecture.

  • The overall flow in a financial application can be orchestrated using Fission: Developer deploys Fission functions as a shared capability across several applications that are the frontend to a variety of payment processing gateways. These applications handle user authentication, registration & collection of payment-related data. These systems also interface on the backend with a variety of databases that record transaction data.
  • Fission functions are created to parse a given input data stream that has the following variables: user’s credit card data, location of the transaction, any other demographic information, etc.
    1. The first function can call a fraud detection API and based on the results of the check persist the data into an in-memory data grid.
    2. The second set of functions is invoked when the check has either passed or failed. If the check has passed, the function approves the payment and sends the user confirmation.
    3. If the payment is suspected to be fraudulent, another function is called which alerts a Fraud detection system at the backend.

Architecture #3: Machine Learning and Deep Learning with Apache Spark

Across industries, we see applications increasingly infusing business processes and big data with Machine Learning (ML) capabilities. From fraud detection, customer behavior and pipeline analysis, virtual reality, conversational interfaces, chatbots, consumption trends, video/facial recognition, and more — it seems that ML (and AI) is everywhere.

For most companies, ML and Predictive Analytics initiatives have typically been siloed to a specific project within the organization. To realize the real value of ML, it’s advantageous for the data, learnings, algorithms, and models to be shared across applications. Fission, in conjunction with technologies such as Apache Spark, enables you to provide the data of stream processing and trend predictions to be consumed by a variety of end users/applications.

The overall flow in Machine Learning applications can be orchestrated using Fission and Apache Spark:

  • Data from business operations is ingested in real-time into a cluster of Kafka-based message queues.
  • Spark Streaming is used to ingest this data in micro-batches, typically based on a time window. This data is stored into a data lake for batch analysis as well as sent into a Spark MlLib runtime where different predictive models are stored.
  • These models are based on general-purpose ML algorithms that run on Spark. These include both supervised and unsupervised algorithms – e.g. clustering, classification algorithms, etc.
  • Once the results of the model are written into a NoSQL database or an in-memory data grid, Fission functions are triggered.
  • These functions do a range of business-critical functionality. For example: updating business analytics dashboards, sending real-time customer offers & alerting customer agents.

Architecture #4: Help Achieve Models As A Service

  1. Firstly, Serverless can help Spark projects achieve the elusive Models as a Service(MaaS) paradigm of deployment while helping to decrease DevOps and Data Scientist cycle times.  The MaaS approach takes in business variables (often hundreds or thousands of inputs) and provides as output model results upon which business decisions can be predicated upon. It also supports visualizations that augment and support business decision support systems. Once different predictive models are built, tested and validated, they are ready to be used in real-world production deployments. The challenge that Data Scientist and Application developers run into is long testing cycle times. This has made it very difficult to engage new personas such as Citizen Developers in the Data Science process. Every model that needs to be tested needs to follow a lengthy data prep, provisioning, unit testing, CI/CD pipeline cycle. With a framework such as Fission, these cycle times are vastly shortened as Model Developers are able to follow a simple Build /Test/Publish/ Deploy Functions into a  Docker Repo & then into Fission framework that then stands them up as a Service fronted by an API Gateway.
  2. Given that any block of code can be converted into a function & any function can be rolled up into a container – this helps create quick functions to test models as a workflow.  Once the functions and the models they host are executed, the results of the test would be looped back into Model Mgmt, Model Monitoring, and Retraining Systems. MaaS is essentially a way of deploying these advanced models as a part of software applications where they are offered as a software subscription and Fission can help achieve that.


    Conclusion

Serverless technology has immense industry mindshare and is emerging as the hottest trend after Kubernetes. In conjunction with other technologies such as Hadoop & Spark, serverless platforms help industries solve their pressing business challenges using a highly flexible, state of the art, and standards-based FaaS platform.

Discover more at Industry Talks Tech: your one-stop shop for upskilling in different industry segments!

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.