Why Data Garbage-In means Analytics Garbage-Out..

This is the third in a series of blogs on Data Science that I am jointly authoring with Maleeha Qazi, (https://www.linkedin.com/in/maleehaqazi/). We have previously covered some of the inefficiencies that result from a siloed data science process @ http://www.vamsitalkstech.com/?p=5046 & the ideal way Data Scientists would like their models deployed for the maximal benefit and use – as a Service @ http://www.vamsitalkstech.com/?p=5321. As the name of this third blog post suggests, the success of a data science initiative depends on data. If the data going into the process is “bad” then the results cannot be relied upon. Our goal is to also suggest some practical steps that enterprises can take from a data quality & governance process standpoint. 

However, under the strong influence of the current AI hype, people try to plug in data that’s dirty & full of gaps, that spans years while changing in format and meaning, that’s not understood yet, that’s structured in ways that don’t make sense, and expect those tools to magically handle it. ” – Monica Rogati (Data Science Advisor and ex-VP  Jawbone – 2017) 

Image Credit – The Daily Omnivore

Introduction

Different posts in this blog have discussed Data Science and other Analytical approaches to some degree of depth. What is apparent is that whatever the kind of analytics – descriptive, predictive, or prescriptive – the availability of a wide range of quality data sources is key. However, along with volume and variety of data, the veracity, or the truth, in the data is as important. This blog post discusses the main factors that determine the quality of data from a Data Scientist’s perspective.  

The Top Issues of Data Quality

As highlighted in the above illustration, the top quality issues that data assets typically face are the following:

  1. Incomplete Data: The data provided for analysis should span the entire cross-section of known data about how the organization views its customers and products. This would include data generated from various applications that belong to the business, and external data bought from various vendors to enriched the knowledge base. The completeness criteria measures if all of the information about entities under consideration is available and useable.
  2. Inconsistent & Inaccurate Data: Consistency measures what data values give conflicting information and must be fixed. It also measures if all the data elements conform to specific and uniform formats and are stored in a consistent manner. Inaccurate data either has duplicate, missing or erroneous values. It also does not reflect an accurate picture of the state of the business at the point in time it was pulled.
  3. Lack of Data Lineage & Auditability: The data framework needs to support audit-ability, i.e provide an audit trail of how the data values were derived from source to analysis point; the various transformations performed on it to arrive at the data set being considered for analysis.
  4. Lack of Contextuality: Data needs to be accompanied by meaningful metadata – data that describes the concepts within the dataset.
  5. Temporally Inconsistent: This measures if the data was temporally consistent and meaningful given the time it was recorded.

What Business Challenges does Poor Data Quality Cause…

Image Credit – DataMartist

Data Quality causes the following business challenges in enterprises:

  1. Customer dissatisfaction: Across industries like Banking, Insurance, Telecom & Manufacturing, the ability to get a unified view of the customer & their journey is at the heart of the enterprise’s ability to promote relevant offerings & detect customer dissatisfaction. Currently, most industry players are woeful at putting together this comprehensive Single View of their Customers (SVC). Due to operational silos, each department possesses its own siloed & limited view of the customer across multiple channels. These views are typically inconsistent, lack synchronization with other departments, & miss a high amount of potential cross-sell and upsell opportunities. This is a data quality challenge at its core.
  2. Lost revenue: The Customer Journey problem has been an age-old issue which has gotten exponentially more complicated over the last five years as the staggering rise of mobile technology and the Internet of Things (IoT) have vastly increased the number of enterprise touch points that customers are exposed to in terms of being able to discover and purchase new products/services. In an OmniChannel world, an increasing number of transactions are being conducted online. In verticals like the Retail industry and Banking & Insurance industries, the number of online transactions conducted approaches an average of 40%. Adding to the problem, more and more consumers are posting product reviews and feedback online. Companies thus need to react in real-time to piece together the source of consumer dissatisfaction.
  3. Time and cost in data reconciliation: Every large enterprise nowadays runs expensive data re-engineering projects due to their data quality challenges. These are an inevitable first step in other digital projects which cause huge cost and time overheads.
  4. Increased time to market for key projects: Poor data quality causes poor data agility, which increases the time to market for key projects.
  5. Poor data means suboptimal analytics: Poor data quality causes the analytics done using it to be suboptimal – algorithms will end up giving wrong conclusions because the input provided to them is incorrect at best & inconsistent at worst.

Why is Data Quality a Challenge in Enterprises

Image Credit – DataMartist

The top reasons why data quality has been a huge challenge in the industry are:

  1. Prioritization conflicts: For most enterprises, the focus of their business is the product(s)/service(s) being provided, book-keeping is a mandatory but secondary concern. And since keeping the business running is the most important priority, keeping the books accurate for financial matters is the only aspect that gets most of the technical attention it deserves. Other data aspects are usually ignored.
  2. Organic growth of systems: Most enterprises have gone through a series of book-keeping methods and applications, most of which have no compatibility with one another. Warehousing data from various systems as they are deprecated, merging in data streams from new systems, and fixing data issues as these processes happen is not prioritized till something on the business end fundamentally breaks. Band-aids are usually cheaper and easier to apply than to try and think ahead to what the business will need in the future, build it, and back-fill it with all the previous systems’ data in an organized fashion.
  3. Lack of time/energy/resources: Nobody has infinite time, energy, or resources. Doing the work of making all the systems an enterprise chooses to use at any point in time talk to one another, share information between applications, and keep a single consistent view of the business is a near-impossible task. Many well-trained resources, time & energy is required to make sure this can be setup and successfully orchestrated on a daily basis. But how much is a business willing to pay for this? Most do not see short-term ROI and hence lose sight of the long-term problems that could be caused by ignoring the quality of data collected.
  4. What do you want to optimize?: There are only so many balls an enterprise can have up in the air to focus on without dropping one, and prioritizing those can be a challenge. Do you want to optimize the performance of the applications that need to use, gather and update the data, OR do you want to make sure data accuracy/consistency (one consistent view of the data for all applications in near real-time) is maintained regardless? One will have to suffer for the other.

How to Tackle Data Quality

Image Credit – DataMartist

                                                   

With the advent of Big Data and the need to derive value from ever increasing volumes and a variety of data, data quality becomes an important strategic capability. While every enterprise is different, certain common themes emerge as we consider the quality of data:

  1. The sheer number of transaction systems found in a large enterprise causes multiple challenges across the data quality dimensions. Organizations need to have valid frameworks and governance models to ensure the data’s quality.
  2. Data quality has typically been thought of as just data cleansing and fixing missing fields. However, it is very important to address the originating business processes that cause this data to take multiple dimensions of truth. For example, centralize customer onboarding in one system across channels rather than having every system do its own onboarding.
  3. It is clear from the above that data quality and its management is not a one time or siloed application exercise. As part of a structured governance process, it is very important to adopt data profiling and other capabilities to ensure high-quality data.

Conclusion

Enterprises need to define both quantitative and qualitative metrics to ensure that data quality goals are captured across the organization. Once this is done, an iterative process needs to be followed to ensure that a set of capabilities dealing with data governance, auditing, profiling, and cleansing is applied to continuously ensure that data is brought up to, and kept at, a high standard. Doing so can have salubrious effects on customer satisfaction, product growth, and regulatory compliance.

Anti Money Laundering (AML) – Industry Insights & Reference Architectures…

This blog has from time to time discussed issues around the defensive portion of financial services industry  (Banking, Payment Processing, and Insurance etc). Anti Money Laundering (AML) is a critical area where institutions need to protect themselves and their customers from malicious activity. This post summarizes eight key blogs on the topic of AML published at VamsiTalksTech.com. It aims to serve as a handy guide for business and technology audiences tasked with implementing complex AML projects.

Image Credit – FIBA Anti-Money Laundering Compliance Conference

Introduction

Money laundering has emerged as an umbrella crime which facilitates public corruption, drug trafficking, tax evasion, terrorism financing etc. Banks and other financial institutions are expected to conduct business in a manner that protects their countries of operations and consumers from security risks such as laundering, terrorist financing, and corruption (the ML/TF risks). Given the global reach of financial products, a variety of regulatory authorities is concerned about money laundering.  Technology has become key to meeting the regulatory expectations as well as reducing costs in these onerous programs. As the below graphic from PwC [1] demonstrates this is one of the most pressing issues facing financial services industry.

The above infographic from PwC provides a handy visual guide to the state of global AML programs.

The Six Critical Gaps in Global AML Programs…

From an industry standpoint, the highest priority issues that are being pointed out by regulators include the following –

  1. Institutions failing to develop AML frameworks that are unique to the risks run by organizations given their product and geographic mix
  2. Failure to develop real-time insights into business transactions and assigning them elevated risks based on their elements
  3. Developing AML models that draw from the widest possible sources of data – both internal and external – to understand a true picture of the business
  4. Demonstrating a consistent approach across geographies
  5. Leveraging the latest developments in analytics including Machine Learning to enable the automation of AML programs
  6. Lack of appropriate business governance & change management in setting, monitoring and managing AML compliance programs, policies and procedures

With this background in mind, the complete list of AML blogs on VamsiTalksTech is included below.

# 1 – Why Banks should Digitize their Operations and how this will help their AML programs –

Digitization implies a mix of business models predicated on agile systems, rapid & iterative development and more importantly – a Data First strategy. These have significant impacts on AML programs as well in addition to helping increase market share.

A Digital Bank is a Data Centric Bank..

# 2 – Why Data Silos are a huge challenge in many cross organization projects such as AML –

Organizational Data Silos inhibit the effectiveness of AML programs as compliance officers cannot gain a single view of a customer or single view of a suspicious transaction or view the social graph in critical areas such as trade finance. This blog discusses the Silo anti-pattern and ways to mitigate silos from proliferating.

Why Data Silos Are Your Biggest Source of Technical Debt..

# 3 – The Major Workstreams Around AML Programs

The headline is self-explanatory but we discuss the five major work streams on global AML projects – Customer Due Diligence, Entity Analysis, Downstream Analytics, Ongoing Monitoring and Investigation Lifecycle.

Deter Financial Crime by Creating an Effective Anti Money Laundering (AML) Program…(1/2)

# 4 – Predictive Analytics Across the AML workstreams –

Here we examine how Predictive Analytics can be applied across all of the five work streams.

How Big Data & Predictive Analytics transform AML Compliance in Banking & Payments..(2/2)

# 5 – The Business Need for Big Data in AML programs  –

This post discusses the most important developments in building AML systems using Big Data Technology-

Building AML Regulatory Platforms For The Big Data Era

# 6 – A Detailed Look at how Enterprises can use Big Data and Advanced Analytics to reduce AML costs –

How to leverage Big Data and Advanced Analytics to detect a range of suspicious transactions and actors.

Big Data – Banking’s New Weapon In War Against Financial Crime..(1/2)

# 7 – Reference Architecture for AML  –

We discuss a Big Data enabled Reference Architecture of an enterprise-wide AML program.

Big Data – Banking’s New Weapon In War Against Financial Crime..(2/2)

Conclusion

According to Pricewaterhouse Coopers, the estimates of global money laundering flows were between 2-5% of global GDP [1] in 2016 – however, only 1% of these transactions were caught. Certainly, the global financial industry has a long way to go before they effectively stop these nefarious actors but there should be no mistaking that technology is a huge part of the answer.

References –

  1. Pricewaterhouse Coopers 2016 AML Survey  http://www.pwc.com/gx/en/services/advisory/forensics/economic-crime-survey/anti-money-laundering.html

Data Science in the Cloud A.k.a. Models as a Service (MaaS)..

This is second in a series of blogs on Data Science that I am jointly authoring with Maleeha Qazi, (https://www.linkedin.com/in/maleehaqazi/). We have previously covered some of the inefficiencies that result from a siloed data science process @ http://www.vamsitalkstech.com/?p=5046. All of the actors in the data science space can agree that becoming responsive to business demands is the overarching goal of the process. In this second blog post, we will discuss Model as a Service (MaaS), an approach to ensuring that models and their insights can be leveraged throughout a large organization.

Image Credit – Logistics Industry Blog

Introduction

Hardware as a Service (HaaS), Software as a Service (SaaS), Database as a Service (DBaaS), Infrastructure as a Service (IaaS), Platform as a service (PaaS), Network as a Service (NaaS), Backend as a service (BaaS), Storage as a Service (STaaS). While every IT delivery model is going the way of the cloud, does Data Science lag behind in this movement?  In such an environment, what do Data Scientists dream of to ensure that their models are constantly being trained on high quality and high volume production grade data?… Models as a Service (MaaS).

The Predictive Analytics workflow…

The Predictive Analytics workflow always starts with a business problem in mind. For example: “A marketing project to detect which customers are likely to buy new products or services in the next six months based on their historical & real time product usage patterns” or “Detect real-time fraud in credit card transactions.”

Illustration – The Predictive Analysis Workflow in a financial services setting

In use cases like these, the goal of the data science process is to be able to segment & filter customers by corralling them into categories that enable easy ranking. Once this is done, the business can setup easy and intuitive visualizations to present the results.

A lot of times, business groups have a hard time explaining what they would like to see – both in terms of input data and output format. In such cases, a prototype makes things easier from a requirement gathering standpoint.  Once the problem is defined, the data scientist/modeler identifies the raw data sources (both internal and external) which are pertinent to the business challenge.  They spend a lot of time in the process of collating the data (from a variety of sources like Oracle/SQL Server, DB2, Mainframes, Greenplum, Excel sheets, external datasets, etc.). The cleanup process involves dealing with missing values, corrupted data elements, formatting fields to be homogenous in terms of format, etc.

This data wrangling phase involves writing code to join various data elements so that a complete dataset is gathered in the Data Lake from a raw features standpoint, at the correct granularity for the problem at hand.  If more data is obtained as the development cycle is underway, the Data Science team has to go back & redo the process to incorporate the new data feeds. The modeling phase is where sophisticated algorithms come into play. Feature engineering takes in business concepts & raw data features and creates predictive features from them. The Data Scientist takes the raw & engineered features and creates a model by applying various algorithms & testing to find the best one. Once the model has been refined, & tested for accuracy and performance, it is ideally deployed as a service.

Challenges with the existing approach

The challenges with the above approach are:

  1. Business Scalability – Predictive analytics as highlighted above resembles a typical line of business project or initiative. The benefits of the learning from localized application initiatives are largely lost to the larger organization if you don’t allow multiple applications and business initiatives to access the models built.
  2. Lack of Data Richness – The models created by individual teams are not always enriched by cross organizational data constantly being generated by different business applications. In addition to that, the vast majority of industrial applications do not leverage all possible kinds of unstructured data & 3rd party data in their business applications. Enabling the models to be exposed to a range of data (both internal and external) can only enrich the insights generated.
  3. Cross Application Applicability – This challenge deals with how business intelligence insights from disparate applications (which leverage different models), to enhance business areas they weren’t originally created for. This could allow for customer centered insights in real-time. For example, consider a customer sales application and a call center application. Can cross application insights be used to understand that customers are calling into the call center because it has been hard to use the website to order products?
  4. Data Monetization  – What is critical in the ability to create new commercial business models is agile analytics around existing and new data sources. If it follows that enterprise businesses are being increasingly built around data assets, then it must naturally follow that data as a commodity can be traded or re-imagined to create revenue streams off of it. As an example, pioneering payment providers now offer retailers analytical services to help them understand which products perform best and how to improve the micro-targeting of customers. Thus, data is the critical prong of any digital initiative. This has led to efforts to monetize on data by creating platforms that support ecosystems of capabilities. To vastly oversimplify this discussion, the ability to monetize data needs two prongs – to centralize it in the first place and then to perform strong predictive modeling at large scale where systems need to constantly learn and optimize their interactions, responsiveness & services based on client needs & preferences. Thus, centralizing models offer more benefits than the typical enterprise can imagine.

    Enter Model As A Service…

    The MaaS takes in business variables (often hundreds or thousands of inputs) and provides as output model results upon which business decisions can be made, & visualizations that augment decision support systems. As depicted in the above illustration, once different predictive models are built, tested and validated, they are ready to be used in real world production deployments. MaaS is essentially a way of deploying these advanced models as a part of software applications where they are offered as a software subscription.

    MaaS also enables cleaner separation of the Application development process and the Data Science workflow.

    Business Benefits from a MaaS approach

    1. Exposing models to different lines of business thus increasing their usefulness and opening them up to feedback to help increase their accuracy.
    2. MaaS opens the models to any application that wants to take advantage of them. This forces Data scientists to work with business teams that are much broader than they otherwise would have access to work with normally.
    3. The provision of dashboards and business intelligence across the organization becomes much easier than with a siloed approach.
    4. MaaS as an approach fundamentally encourages an agile approach to managing data assets and also to rationalizing them. For any MaaS initiative to succeed, timely access needs to be provided to potentially hundreds of data sources in an organization. MaaS encourages a move to viewing data as a reusable asset across the organization.

    Technical advantages of the MaaS approach

    • Separation of concerns : software & data feeds maintained by IT, models maintained by Data Scientists.
    • Versioning of models can be separated from versioning of system(s) using models.
    • Same models can be utilized by multiple software packages for consistency.
    • Consistent handling of data sources: e.g. which “master” source provides what types of data for all the models so that a customer looks the same regardless of the model acting on the data for insights.
    • Single point for putting a “watch” on the performance of a model.
    • Controlled usage of model.
    • MaaS ensures that the analytic process can be automated from a deployment standpoint.  

    Conclusion

    MaaS can enable organizations to move their analytic practices and capabilities to the next level. It enables the best of both worlds – the ability to centralize the data science capabilities across an organization while keeping customer data securely inside the organization. Done right, it can enable the democratization of data science insights across a large enterprise.

Why Linux Containers and Docker are the Runtime for the Software Defined Data Center (SDDC)..(4/7)

The third and previous blog in this seven part series (@ http://www.vamsitalkstech.com/?p=4659)  discussed Apache Mesos, a project that aims to abstract away various system resources – CPU, memory, network and disk resources to provide consuming digital applications with a giant cluster from which they can utilize capacity – a key requirement of the Software Defined Datacenter (SDDC). In this fourth blog, we will discuss another important ecosystem technology & project – Linux Containers and Docker – which forms the foundational runtime component in the SDDC. The next blog will discuss Kubernetes – Google’s container orchestration platform.

Much like shipping goods in Containers over Oceans, Linux Containers offer a portable, lightweight & convenient way to ship business applications. (Image Credit – WallPapers 13)

Executive Summary…

We can agree that the Digital application is inherently a distributed application. Such applications have historically been extremely hard to develop, setup and manage across a large fleet of data center servers that are a mix of platforms and technologies. Thus it is no surprise that one of the most disruptive developments in the last five years has been the innovation in the Linux container space. Containers now enable the running distributed applications at scale. 

Due to business reasons, Digital applications demand constant updates, changes and incremental revisions in response to changing customer needs. The Software Defined Datacenter (SDDC) thus needs a runtime paradigm that enables not just efficient hardware usage but also supports standardized application environments that are portable simplified and consistent across hybrid clouds and hypervisors.  Containers fill this need and are thus emerging to be the natural unit of deployment across the SDDC. Much has been written on the topic of Docker and Linux Container technology. My goal for this blog post is to distill key insights in the container ecosystem.

The Technologies of Linux Containers & Docker

Unlike Virtual Machines, Container Engines such as Docker share a common OS (Image Credit – MSFT Azure)

Linux Containers are alike and yet different from virtual machines. They are alike in the sense that each Container shares system resources on the underlying hardware platform – CPU, RAM, and Network – as with VMs. However, while each VM maintains its separate copy of the Operating System (OS), containers share the same OS kernel while keeping themselves separate from other containers running on the same OS.  How do they do that?

Though the terms ‘Docker’ and ‘Container’ have become almost synonymous – it needs to be noted that Docker is a company focused on developing technology enablement around containers in areas such as orchestration, networking, and management. Docker was an open source project (now renamed to Moby [1]) that provided capabilities such as a standard description of container formats, utilities for application packaging, deployment & lifecycle management of applications inside Linux Containers. It provides a Docker CLI command line tool for the lifecycle management of image-based containers.

Prior to the explosion of interest in Linux containers & the founding of Docker, traditional Linux distributions (with a minimum kernel level of 3.8) supported two foundational paradigms – control groups (cgroups) and kernel namespaces.  Linux containers use both these features to achieve their goal of isolation and portability. Cgroups enables the host to limit the resources each container process can use from a CPU, Memory, Filesystem, User ID components and Network standpoint. This ensures that containers running on a host cannot starve others of resources thus avoiding the “Noisy Neighbor” problem that bedeviled a lot of cloud deployments.

Kernel Namespaces ensure another kind of isolation for process interactions within the OS. Containers can only view and modify resources in the same namespace. This ensures a security mechanism where other containers and processes on the host cannot launch attacks on a given application running on a tenant container or on the host itself. Thus the combination of both these technologies ensures that multiple applications running within their individual containers can share CPU and Memory without needing the overhead of virtualization. Docker also grants each container its own networking implementation thus ensuring that resources such as socket and interfaces can also be protected.

Companies including Red Hat, IBM, Google, Cisco, VMware, and CoreOS have greatly aided with the development of and accessibility of containers in their platforms and products.

Layered Filesystems..

Various Image Layers in Docker. Each layer in the file system is mounted on the previous.The topmost is the Writable Container. (Image Credit – Docker)

We discussed how Container Images are Immutable. This is the key advantage of using container technology such as Docker & is made possible by the notion of a Union filesystem. What are Union filesystems and how do they enforce immutability? Much like the image in a Virtual Machine sense, Containers also run from an image, which typically are a snapshot of a filesystem but tend to be much smaller than VM images since the Container is installed on a host kernel.

Union filesystems are best described as a layered architecture – in that each layer is created independently and then added atop of the previous layer.  An example of a Union filesystem is a Linux Kernel – an OS – then a data base like Oracle – then Tomcat – and a web application on it. The top layer is always the Writable layer. The real advantage in using a union filesystem is that using these images becomes super efficient from a storage and execution standpoint. Union filesystems also help in sharing portions of the OS across containers. Simply put, an image contains everything an application needs – from it’s dependencies and external libraries. When an Image is run, it is called a Container. In the case of Docker, it uses a layered copy on write filesystem called AUFS (Another Union Filesystem).

Containers and Developers..

Containers are possibly the first infrastructure software category created by developers in mind. The prominence of Linux Containers has Docker coincided with the onset of agile development practices under the DevOps umbrella – CI/CD etc. Containers are an excellent choice to create agile delivery pipelines and continuous deployment. At their core, Containers enable the creation of multiple self-contained execution environments over the same operating system.

Developers are naturally excited about Linux Containers for five specific reasons –

  1. Containers allow for image consistency across OS environments. This is a huge help in accelerating the development process from development to debugging to production. Developers can just focus on building their applications (in dev environments that match the test and prod) and packaging them in containers. This just takes a lot of the inefficiency around environment dissimilarities out of the equation.
  2. Containers are treated as a standard linux process by the kernel & thus are orders of magnitude quicker from a startup time when compared to VMs. This means that developers can start their applications in a manner of seconds as long as they run them in a container.
  3. Containers also provide development organizations the ability to standardize application development workflows and update processes. This solves the scalability problem that digital applications have caused large organizations.
  4. Digital applications are leading the move to adopt microservices. Microservices offer a way to build applications as a collection of discrete services as opposed to a monolithic architecture. By there very nature, microservices can be built and managed by different teams. Containerization affords a lightweight way of building and deploying microservices.
  5. Containers offer a portable way of delivering applications (across Operating Systems) as well as provide horizontal scalability.

    Digital Application development using Containers..

Digital Application Development and Deployment Workflow using Containers.

There are a few key runtime components involved in operationalizing a small to medium to large scale container infrastructure as the above illustration depicts.

  1. Firstly, developers create container images. These images describe an application and it’s dependencies. An easy way to conceptualize an image is to think of it as a basic deployment template. Image are also immutable in that they are read only and any changes happen in the top most layer which is writable. Modifying an image is to create a new one. Images thus have a Parent Child relationship. Developers create images by building their applications on their developer environments, performing unit tests and then pushing to a repository. Once the container is built with the necessary dependencies, these tools run a battery of tests to validate business functionality. A large part of this process is usually best automated using CI/CD tools like Jenkins, CruiseControl or Buildbot etc.
  2. The built images are then made available in a Container Registry. This is either maintained internally or sourced from a trusted external source. As the name suggests, Registries maintain a catalog of container images of frequently used software – e.g. Custom applications and other software packages such as WordPress, Relational databases, Web Servers, Big Data technologies and Application Servers etc
  3. The next step is to create and deploy (runtime) containers from these images on a set of servers. Once images are released as a result of application development, sys admins work on the provisioning of the servers to run these images. Once a Container engine is installed on the server, images are loaded on and they take the runtime shape of containers. The mode of getting these images on these servers follows either a push/pull mechanism.
  4. Scheduling of containers on servers is also a process that usually done by Sys Admins. This involves running containers of certain kinds on servers that match up to certain CPU, I/O and Network capacity requirements
  5. To create complex real world deployments, not only do the servers and networking have to be created but these containers are also interconnected (e.g. a web server container to an application server) using Discovery mechanisms. These containers then need to also connect to a host of enterprise services. Customer traffic is then routed to the clustered containers running on these servers. Monitor the logs and performance of these containers and the microservices running on them.
  6. The process repeats from step #1 above.

Industry Adoption of Containers.

In a few years, containers will deliver the bulk of compute workloads across public cloud providers such as Amazon AWS, Google Compute Engine and Microsoft Azure. Given that the VM options on these clouds can run multiple containers which can scale on demand, the industry will begin to gravitate to higher utilization density. The SDDC has already begun incorporating hybrid architectures that run both containers and VMs in a complementary fashion.The Software Defined Datacenter will incorporate a hybrid model consisting of applications running on both Linux Containers and Virtual Machines.

Customers also have choices of traditional enterprise operating systems such as Red Hat Enterprise Linux or Microsoft Windows or can also run containers on OS’s developed for the purpose of hosting containers at hyper scale. These OS’s just provide tools to manage containers and nothing else. Examples include Red Hat Atomic Platform and CoreOS. Moving up the stack, pioneers such as Google and Red Hat have added core support for containers in projects such as OpenStack, Kubernetes, Mesos, OpenShift & CloudFoundry by helping with networking and persistent storage. Kubernetes (which we will cover in the next post) also handles provisioning on multiple public cloud platforms. Config Mgmt platforms such as Ansible, Chef and Puppet now support containerized deployments.

Technical Considerations for Container Adoption

Some key considerations that industry players are tackling from the standpoint of running containers at scale –

  1. Container Orchestration –  Organize groups of containers into compassable applications, scheduling them on servers that match their resource requirements, placement of containers based on network topology etc
  2. Container Networking – Containers follow a pluggable model and the network is no different. Key considerations – an enterprise network connectivity stack is needed to not only provide the interconnect between different containers but also to integrate them with existing Layer 2/3 networks. Additionally, network isolation needs to be provided for microservices running on these containers using either a dedicated IP address for each or an overlay network.
  3. Management and Monitoring -Life cycle processes ranging from Management and Monitoring encompass a range of questions – application patching with low downtime, graceful failures in cloud native applications, container scale up & scale down based on traffic patterns etc.

Containers and your Enterprise…

So what is the best way to adopt containers across a large enterprise?

  • Develop your container strategy in the context of the Nexus of Forces (i.e., information, mobile, social and cloud) initiatives in your organization — Containers are at the junction of these technologies.
  • Institute an organizational process to examine the business value of any initiative to adopt Containers. Understand what tools and platforms to adopt that will abstract away the complexities of using containers.
  • Understanding skills required to leverage containers. Containers are a new way for both developers and SysOps. Dependency management moves to the developers but they realize tremendous benefits in adopting these for high-velocity Digital applications
  • Identifying, measuring and benchmarking key success metrics that measure the ROI of the overall container investments.

Conclusion..

To sum up, the Linux (and Windows) container space is exploding both from a mindshare as well as an adoption standpoint. What is hugely encouraging is that a host of next generation platform technologies (ranging from IaaS to PaaS) are not just choosing to support containers as their basic runtime unit but are also focusing on becoming the defacto solution supporting a host of container ecosystem usecases – provisioning, orchestration, management, CI/CD et al. The next two blogs will respectively discuss how Google Kubernetes and Red Hat OpenShift overcome these challenges and abstract away much of the complexity around container deployments.

The next blog post in this series will discuss Google Kubernetes, the dominant project in the container orchestration space.

References

[1] Introducing Moby Project –  https://blog.docker.com/2017/04/introducing-the-moby-project/

The Deployment Architecture of an Enterprise API Management Platform..

We discussed the emergence of Application Programming Interfaces (APIs) as providing a key business capability in Digital Platforms @ http://www.vamsitalkstech.com/?p=3834. The next post then discussed the foundational technology, integration & governance capabilities that any Enterprise API Platform must support @ http://www.vamsitalkstech.com/?p=5102.  This final post in the API series will discuss a deployment model for an API Management Platform.

Background..

The first two posts in this series discussed the business background to API Management and the need for an Enterprise API Strategy. While details will vary across vendor platforms, the intention of this post is to discuss key runtime components of an API management platform and the overall developer workflow in creating APIs & runtime workflow to that enables client applications to access them.

Architectural Components of an API Management Platform..

The important runtime components of an API management platform are depicted in the below illustration. Note that we have abstracted out network components (firewalls, reverse proxies, VLANs, switches etc) as well as the internal details of application architecture which would normally be impacted by an API Platform.

The major components of an API Management Platform and the request flow across the architecture.

Let us cover the core components of the above:

  1. API Gateway -The API Gateway has emerged as the dominant deployment artifact in API Architectures. As the name suggests Gateways are based on a facade design pattern. The Gateway (or typically a set of highly available Gateways) acts as a proxy to traffic between client applications (used by customers, partners and employees) and back end services (ranging from mainframes to microservices). The Gateway is essentially an appliance or a software process that abstracts all API traffic into an organization and exposes business capabilities typically via a REST interface. Clients are exposed different views of the same API – coarse grained or granular – depending on the kind of client application (thick/thin) and access control permissions.  Gateways include protocol translation and request routing as their core functionality. It is also not uncommon to deploy multiple Gateways – in an internal and external fashion – depending on business requirements in terms of partner interactions etc. Gateways also include functionality such as caching requests for performance, load balancing, authentication, serving static content etc. The API Gateway can thus be managed using a set of policy controls. Performance characteristics such as throughput, scalability, caching, load balancing and failover are managed using a cluster of API Gateways.  The introduction of an API Gateway also ensures that application design is impacted going forward. API Gateways can be implemented in many forms – as a software platform or as an appliance. Public cloud providers have also begun offering mature API Gateways that integrate well with a range of backend services that they provide both from an IaaS and a PaaS standpoint. For instance, Amazon’s API Gateway integrates natively with AWS Lambda and EC2 Container Service for microservice deployments on AWS.
  2. Security -Though it is not a standalone runtime artifact, Security ends to be called out as one of the most important logical requirements of an API Management platform. APIs have to follow the same access control mechanisms, security constrains for different user roles etc as their underlying datasources. This is key as backend applications and organizational data need to be protected from a variety of targets – denial of service attacks, malware, access control violations etc. Accordingly, policy based protection using API keys, JSON/XML signature scanning & threat protection, encryption for Data in motion and at rest, OAuth support etc – all need to be provided as standard features.
  3. Developer portal -A Developer portal is the entry point for developers and can also serve as a developer onboarding tool. Thus, typically it is a web based portal integrated with the API Gateway. Developers use the portal to study API specs, download SDKs for different programming languages, register their APIs and to monitor their API performance. It also provides a visual interface to help developers build/test their APIs and also provides support for a high degree of automation using a continuous delivery model. For internal developers, the ability to provide self service consumption of API developer stacks (Node.js/ JavaScript frameworks/Java runtimes/ PaaS integration etc) is a highly desirable capability.
  4. Management and Monitoring -Ensuring that the exposed APIs are maintaining their QOS (Quality of Service) as helping admins monitor their quota of resource consumption is key from a Operations standpoint. Further, the M&M functionality should also aid operators in resolving complex systems issues and ensuring a high degree of availability during upgrades etc.
  5. Billing and Chargeback -Here we refer to the ability to tie in the usage of APIs to back office applications that can charge users based on their metered usage of the backend applications. This is typically provided through logging and auditing capability.
  6. Governance -From a Governance standpoint, the ability to track APIs across their lifecycle,  a handy catalog of available APIs, an ability to audit their usage and the underlying assets they expose and the ability for business to set policies on their usage etc.

API Design Process..

Most API Platforms provide a developer toolkit with varying degrees of integration with a runtime platform. Handy SKDs for iOS, Android and Javascript development are provided.

An internal developer uses the developer toolkit (e.g. Eclipse with an offline plugin) and/or an API Designer tool included with a vendor platform to create the API based on organizational policies. Extensive CLI (Command Line Interface) is also provided to perform all functions which can be done using the GUI. These include, local unit & system test capabilities and an ability to publish the tested APIs to a repository from where the runtime can access, deploy and update the APIs.

From a data standpoint, multiple databases including RDBMS, NoSQL are supported for data access. During the creation of the API, depending on whether the developer already has an existing data model in mind, the business logic is mapped closely with the data schema, or, one can also work top down to create the backend once the API interface has been defined using a model driven approach. These also include settings for security permissions with support for OAuth and any other third party authentication dependencies.

Once defined and tested, the API is published onto the runtime. During this process access control privileges, access policies and the endpoint itself are defined. The API is then ready for external consumption and discovery.

Runtime Flow Across the Architecture..

In the simplest case – once the API has been deployed and tested it is made available for public discovery and consumption. Client Applications then begin to leverage the API and this can be done in a variety of ways. For example – user interactions on mobile applications, webpages and B2B services trigger calls to the API Gateway. The Gateway performs a range of functions to process the request – from security authorization to load-balancing before accessing policies setup for that particular API. The Gateway then invokes the API by calling the backend system typically via message oriented middleware such as an ESB or a Message Broker. Once the backend responds with the appropriate payload ,the data is sent to the requesting application. Systems and Administration teams can view detailed operational metrics and logs to monitor API performance.

A Note on Security..

It should come as no surprise that the security aspect of an API Management Platform is one of the most critical aspects of the implementation. While API Security is a good subject for a followup post and too exhaustive to be covered in a short blurb – several standards such as OAuth2, OpenID Connect, JSON Security & Policy languages are all topics that need to be explored by both organizational developers and administrators.  Extensive flow mapping and scenario testing are mandated here. Also, endpoint security from a client application standpoint is key. Your Servers, Desktops, Supported Mobile devices need to be updated and secured with the latest antivirus & other standard IT Security/access control policies.

Conclusion..

In this post, we tried to highlight the major components of an API Management Platform from a technology standpoint. While there are a range of commercial & open source platforms, it is important to evaluate them from a feature standpoint as well as from an ecosystem capability perspective as developers began implementing microservices based Digital Architectures.

Risk Management – Industry Insights & Reference Architectures…

Financial Risk Management as it pertains to different industries – Banking, Capital Markets and Insurance – has been one of the most discussed topics in this blog. The business issues and technology architecture of systems dedicated to aggregating, measuring & visualizing Risk are probably one of the more complex tasks in the worlds of finance & insurance. This post summarizes ten key blogs on the topic of Financial Risk published at VamsiTalksTech.com. It aims to serve as a handy guide for business and technology audiences tasked with implementing Risk projects.

Image Credit – ShutterStock

The twin effects of the global financial crisis & the FinTech boom has caused Financial Services, Insurance and allied companies to become laser focused on on risk management.  What was once a concern primarily of senior executives in the financial services sector has now become a top-management priority in nearly every industry.

Whatever be the kind of Risk, certain themes are common from a regulatory intention standpoint-

  1. Limiting risks that may cause wider harm to the economy by restricting certain activities such as preventing banks with retail operations from engaging in proprietary trading activities
  2. Requiring that banks increase the amount of and quality of capital held on reserve to back their assets and by requiring higher liquidity positions
  3. Ensuring that banks put in place appropriate governance standards ensuring that boards and management interact not just internally but also with regulators and their clients
  4. Upgrading governance standards, enabling a fundamental change in bank governance and the way boards interact with both management and regulators. These ambitions were expressed in various new post‐crisis rules and approaches.
  5.  Tackle the “too big to fail” challenge for highly complex businesses spanning multiple geographies, product lines and multifaceted customer segments. Accurate risk reporting ensures adequate capital conservation buffers.

With this background in mind,  complete list of Risk use case blogs on VamsiTalksTech is included below.

# 1 – Why Banks and Other Financial Institutions Should Digitize Risk Management –

Banks need to operate their IT across two distinct prongs – defense and offense. Defensive in areas like Risk, Fraud and Compliance (RFC) ; Offensive as in revenue producing areas of the business like Customer 360 (whether Institutional or Retail), Digital Marketing, Mobile Payments, Omni channel Wealth Management etc. If one really thinks about it – the biggest activity that banks do is manipulate and deal in information whether customer or transaction or general ledger etc.

Why Banks, Payment Providers and Insurers Should Digitize Their Risk Management..

# 2 – Case Study of a Big Data Enabled IT Architecture for Risk Data Measurement – Volcker Rule/Dodd Frank –

While industry analysts can discuss the implications of a certain Risk mandate, it is certainly most help for Business & IT audiences to find CIOs discussing overall strategy & specific technology tools. This blogpost discusses how two co-CIOs charged with an enterprise technology mandate are focused on growing and improving a Global Banking leaders internal systems, platforms and applications especially from a Risk standpoint.

How a Pioneering Bank leverages Hadoop for Enterprise Risk Data Aggregation & Reporting..



# 3 – A POV on Bank Stress Testing – CCAR and DFast

An indepth discussion of Bank Stress Testing from both a business and technology standpoint.

A POV on Bank Stress Testing – CCAR & DFAST..

# 4 – Capital Markets – Architectural Approaches to the practice of Risk Management

In Capital Markets, large infrastructures ,on a typical day, process millions of derivative trades. The main implication is that there are a large number of data inserts and updates to handle. Once the data is loaded into the infrastructure there needs to be complex mathematical calculations that need to be done in near real time to calculate intraday positions. Most banks use techniques like Monte Carlo modeling and other computational simulations to build & calculate these exposures. Hitherto, these techniques were extremely expensive from both the cost of hardware and software needed to run them. Neither were tools & projects available that supported a wide variety of data processing paradigms – batch, interactive, realtime and streaming. This post examines a detailed reference architecture applicable to areas such as Market, Credit & Liquidity Risk Measurement.

Big Data architectural approaches to Financial Risk Mgmt..

# 5 – Risk Management in the Insurance Industry  – Solvency II 

A discussion of Solvency II – the Insurance industry’s equivalent of Basel III – from both a business and technology standpoint.

Why the Insurance Industry Needs to Learn from Banking’s Risk Management Nightmares..

# 6 – FRTB (Fundamental Review of the Trading Book)

An in-depth business and technology discussion of the highlights and key implications of the FRTB (Fundamental Review of the Trading Book).

A POV on the FRTB (Fundamental Review of the Trading Book)…

# 7 – Architecture and Data Management Antipatterns

How not to architect Financial Service IT platforms using Risk Applications as an example.

The Five Deadly Sins of Financial Services IT..

# 8 – The Intelligent Banker Needs Better Risk Management –

The Intelligent Banker needs better Risk Management

# 9 – Implications of Basel III

This blogpost discusses the key implications of Basel III.

Towards better Risk Management..Basel III

#10 The Implications of BCBS 239

This blogpost discusses the data management and governance implications of BCBS 239. The BCBS 239 provides guidelines to overhaul an organization’s risk data aggregation capabilities and internal risk reporting practices.

BCBS 239 and the need for smart data management

Conclusion..

Industry clearly requires a fresh way of thinking about Risk Management. Leader firms will approach Risk as a way to create customer value and a board level conversation around such themes rather than as a purely defensive and regulatory challenge. Surely, this will mean that budgets for innovation related spending in areas such as Digital Transformation will also slowly percolate over to Risk. As firms either digitize or deal with gradually eroding market share, business systems that work with and leverage risk will emerge as a strong enterprise capability over the upcoming 3-5 year horizon.

Why APIs Are a Day One Capability In Digital Platforms..

As enterprises embark or continue on their Digital Journey, APIs are starting to emerge as a key business capability and one that we need to discuss. Regular readers of this blog will remember that APIs are one of the common threads across the range of architectures we have discussed in Banking, Insurance and IoT et al. In this blogpost, we will discuss the five key imperatives or business drivers for enterprises embarking on a centralized API Strategy. 

Digital Platforms are composed of an interconnected range of enterprise services exposed as APIs across the Internet.

API Management as a Native Digital Capability..

The use of application programming interfaces (APIs) has been well documented across web scale companies such as Facebook, Amazon and Google et al.Over the last decade, APIs have begun emerging as the primary means for B2B and B2C companies to interact with their customers, partners and employees. The leader enterprises already have Digital Platform efforts underway as opposed to creating standalone Digital applications. Digital Platforms aim to increase the number of product and client channels of interaction so that enterprises can reach customer audiences that were hitherto untapped. The primary mode of interaction with a variety of target audiences in such digital settings are via APIs.

APIs enable the creation of new business models that can deliver differentiated experiences (source – IBM)

APIs are widely interoperable, relatively easy to create and form the front end of many internet scale platforms. APIs are leveraged to essentially access the core services provided by these platforms and can be used to create partner and customer ecosystems. Leaders such as PayPal, Amazon & FinTechs such as Square, Mint etc have overwhelmingly used APIs as a way to not only open their platforms to millions of developers but also to offer innovative services.

As of 2015, programmableweb.com estimated that over 12,000 APIs were already being offered by enterprise firms. Leaders such as Salesforce.com were generating about 50% of their revenue through APIs. Salesforce.com created a thriving marketplace – AppExchange – for apps created by its partners that work on its platform which numbered around 300 at the time of writing. APIs were contributing 60% of revenues at eBay and a staggering 90% for Expedia.com. eBay uses APIs to create additional exposure for it’s products – list auctions on other websites, get bidder information about sold items, collect feedback on transactions, and list new items for sale. Expedia’s APIs allowed customers to use third party websites to book flights, cars, and hotels. [2]

The Three Core Competencies of Digital – Cloud, Big Data & Intelligent Middleware

While most of the Fortune 500 have already begun experimenting with the value that APIs can deliver, the conversation around these capabilities needs to be elevated from an IT level to a line of business to a CIO/Head of Marketing. APIs help generate significant revenue upside while enabling rapid experimentation in business projects. Examples of API usage abound in industries like Financial Services, Telecom, Retail and Healthcare.

The Main Kinds of APIs

While the categories of APIs will vary across industry, some types of APIs have been widely accepted. The three most popular from a high level are described below –

  1. Private APIs – These are APIs defined for use by employees and internal systems within an organization or across a global company. By their very nature, they’re created for sensitive internal functions and have access to privileged functions that external actors cannot perform.
  2. Customer APIs – Customer APIs are provided as a way of enabling used by global customers to conduct business using product/service distribution channels – examples include product orders, view catalogs etc. These carry a very limited set of privileges limited to customer facing actions in a B2C context.
  3. Partner APIs – Partner APIs are used for varying levels of business to be able to perform business functions in the context of a B2B relationship. Examples include Affiliate programs in Retail, inventory management, Supply Orders in Manufacturing & Billing functions in Financial Services etc.The API provider hosts marketplaces that enable partner developers to create software that leverages these APIs.

The Five Business Drivers for an Enterprise API Strategy..

The question for enterprise executives then becomes, when do they begin to invest in a central API Management Platform?  Is such a decision based on the API sprawl in the organization or the sheer number of APIs being manually managed etc?

While the exact trigger point may vary for every enterprise, Let us consider the five key value drivers..

Driver #1 APIs enable Digital Platforms to evolve into ecosystems

In my mind, the first and most important reason to move to a centralized API strategy is to evolve standalone Digital capabilities into a coherent Platform. Different lines of business can use these capabilities to develop an ecosystem of applications that can be offered as a SaaS (Software as a Service). The end state of Digital Platforms are to operate business systems at massive scale in terms of customers, partners and employees.

The two central ideas at the heart of a platform based approach are as follows –

  1. Create new customer revenue streams by reaching out to new customer segments across the globe or in new (and non traditional) markets. Examples of these platforms abound in the business world. In financial services, Banks & Credit reporting agencies are able to monetize their assets of years of customer & product data by reselling them to interested third parties which use them either for new product creation or to offer services that simplify a pressing industry issue – Customer Onboarding.
  2. Reduce cost in current business models by extending core processes to business partners and also by automating manual communication steps (which are almost always higher cost and inefficient). For instance, Amazon has built their retail business using partner APIs to extend retailing provisioning, entitlement, enablement and order fulfillment processes.

Driver #2 Impact the Customer experience

We have seen how mobile systems are a key source of customer engagement. Offering the customer a seamless experience while they transact with an organization is a key way of disarming competition. Accordingly, Digital projects emphasize the importance of capabilities such as Customer Journey Mapping (CJM) and Single View of Customer (SVC) as the minimum table stakes that they need to provide. For instance, in Retail Banking, players are feeling the pressure to move beyond the traditional transactional banking model to a true customer centric model by offering value added services on the customer data that they already possess. APIs are leveraged across such projects to enrich the views of the customer (typically with data from external systems) as well as to expose these views to customers themselves, business partners and employees.

Demystifying Digital – Why Customer 360 is the Foundational Digital Capability – ..(1/3)

Driver #3 Cloud Computing & DevOps

This one is all too familiar to anyone working in technology. We have seen how both Cloud Computing & DevOps are the foundation of agile technology implementations across a range of back end resources. These include but are not limited to Compute, NAS/SAN storage, Big Data, Application platforms, and other middleware. Extending that idea, Cloud (IaaS/PaaS) is a set of APIs.

APIs are used to abstract out the internals of these underlying platform services. Application Developers and other infrastructure services use well defined APIs to interact with the platforms. These APIs enable the provisioning, deployment and management of platform services.

APIs have become the de facto model that provide developers and administrators with the ability to assemble Digital applications such as microservices using complicated componentry. Thus, there is a strong case to be made for adopting an API centric strategy when evolving to a Software Defined Datacenter.

A huge trend on the developer side has been the evolution of Continuous build, integration and deployment processes. The integration of APIs into the DevOps process has begun with use cases ranging from using publicly available APIs being used to trigger CI jobs to running CI/CD jobs using a cloud based provider.

Why Digital Disruption is the Cure for the Common Data Center..

Driver #4 APIs enable Business & Product Line Experimentation

APIs thus enable companies to constantly churn out innovative offerings while still continuously adapting & learning from customer feedback. Internet scale companies such as Facebook provide edge APIs that enable thousands of companies to write applications that driver greater customer volumes to the Facebook platform. The term API Economy is increasingly in vogue and it connotes a loosely federated ecosystem of companies, consumers, business models and channels

The API economy is a set of business models and channels — based on secure access of functionality and the exchange of data to an ecosystem of developers and the users of the app constructs they build — through an API, either within a company or via the internet, with business partners and customers.

The Three Habits of Highly Effective Real Time Enterprises…

Driver #5 Increasingly, APIs are needed to comply with Regulatory Mandates

We have already seen how key industries such as Banking and Financial Services, regulatory authorities are at the forefront of forcing incumbents to support Open APIs. APIs thus become a mechanism for increasing competition to benefit consumer choice. The Regulators are  changing the rules of participation in the banking & payments industry and APIs are a key enabling factor in this transformation.

Under the PSD2, Banks and Payment Providers in the EU will need to unlock access to their customer data via Open APIs

Why the PSD2 will Spark Digital Innovation in European Banking and Payments….

Financial Services, Healthcare, Telecom and Retail.. a case in point for why APIs present an Enormous Opportunity for the Fortune 500..

Banking – At various times, we have highlighted various business & innovation issues with Banking providers in the areas of Retail Banking, Payment Providers and Capital Markets. Regimes such as Payment Systems Directive (PSD2) in the EU will compel staid industry players to innovate faster than they otherwise would. FinTechs across the industry offer APIs to enable third party services to use their offerings.

Healthcare – there is broad support in the industry for Open APIs to drive improved patient care & highly efficient billing processes as well as to ensure realtime engagement across stakeholders.

APIs across the Healthcare value chain can ensure more aligned care plans and business processes. (Image Credit – Chilmark)

In the Telecom industry, nearly every large operator has developed APIs which are offered to customers and the developer community. Companies such as AT&T and Telefonica are using their anonymized access to hundreds of millions of subscribers to grant large global brands access to nonsensitive customer data. Federated platforms such as the GSM Association’s oneAPI are already promoting the usage of industry APIs.[1]

Retailers are building new business models based on functionality such as Product Catalogs, Product Search, Online Customer Orders, Inventory Management and Advanced Analytics (such as Recommendation Engines). APIs enable retailers to expand their footprints beyond the brick and mortar store & an online presence.

Ranking Your API Maturity..

Is there a maturity model for APIs?  We can try listing those into three different strategic options for Banks. Readers can extrapolate these into for their specific industry segment.

  1. Minimally Compliant Enterprises – Here we should categorize Companies that seek to provide compliance with a minimal Open API. Taking the example of Banking, while this may be the starting point for several organizations, staying too long in this segment will mean gradual market share erosion as well as a loss of customer lifetime value (CLV) over time. The reason for this is that FinTechs and other startups will offer a range of services such as Instant mortgages,  personal financial management tools, paperless approval processes for a range of consumer accounts etc. It is also anticipated that such organizations will treat their API strategy as a localized effort and will allocate personnel to the project mainly around the front office and marketing.
  2. Digital Starters -Players that have begun exploring opening up customer data but are looking to support the core Open API but also introduce their own proprietary APIs. While this approach may work in the short to medium term, it will only impose integration headaches on the banks as time goes on.
  3. Digital Innovators – The Digital Innovators will lead the way in adopting APIs. These companies will fund dedicated teams in lines of business serving their particular customer segments either organically or using partnerships with third party service providers. They will not only adhere to the industry standard APIs but also extend these specs to create own services with a focus on data monetization.

Conclusion..

Increasingly, a company’s APIs represent a business development tool and a new go-to-market channel that can generate substantial revenues from referrals and usage fees. Given the strategic importance and revenue potential of this resource, the C-suite must integrate APIs into its corporate decision making.

The next post will take a technical look into the core (desired) features of an API Management Platform.

References..

[1] Forrester Research 2016 – “Sizing the Market for API Management Solutions” http://resources.idgenterprise.com/original/AST-0165452_Forrester_Sizing_the_market_for_api_management_solutions.pdf 

[2]  Harvard Business Review 2016 – “The Strategic Value of APIs” – https://hbr.org/2015/01/the-strategic-value-of-apis

What Your Data Science Team Needs From IT..

Data matures like wine, applications like fish.” – James Governor, Principal Analyst & Founder of RedMonk, circa 2007

I would like to begin a series of posts on Data Science jointly authored with my friend, ex-colleague, & collaborator, Maleeha Qazi – Data Scientist (https://www.linkedin.com/in/maleehaqazi/). In these posts, we are intending to bring to light several technology themes around industrial use of Data Science and Deep Learning around Industrial Applications, Big Data , Cyber Security, Cognitive Applications, Business Process Management, and Cloud Computing. Our goal for this first post is to discuss typical issues that bedevil every Data Science initiative at the beginning. Namely, the top technical and cultural concerns to communicate to the IT Department every time a new project is begun.

Introduction

With Data Science emerging as a key enabler in Digital Customer focused Applications, renewed focus is  being placed on how the lifecycle of these new fangled applications happens alongside traditional IT development. This blogpost aims to highlight some of the key concerns involved when Data Science groups work with IT departments. Currently there is no “one size fits all model” in terms of how advanced models are developed and deployed so that they can be accessed and used at scale by customers. It is our wager that almost every large enterprise working on these projects encounters these issues. We wanted to share our experience with the enterprise community over a series of blog posts.

It is clear that Data Science teams, product teams and IT need to collaborate to create business applications that learn from customer needs.

So what are the top asks that Data Science has for their IT groups? There are at least nine important focus areas:

#1 Understanding of the business challenge and agreeing on a common vocabulary 

It is a generally accepted fact that most IT/Data Science interactions are focused on the technology portion which include some of the following elements : the data sources within the organization, acquisition and access to external data sources, the availability of tools & infrastructure to begin supporting the data science development process, cloud or on-prem, data ingestion engines (e.g. Kafka, Flume, Sqoop) to ingest and process the data, etc. While this is certainly part of the process, there has begun to be a distinct anti pattern in how this interaction is working when solely driven by technology alone. The Data Science team is involved in creating models that typically reflect customer needs that drive business value for an organization’s customers, partners, regulators & employees. In that rather important context, technology at it’s core is just an engine and does not exist in a vacuum. The most vibrant enterprises understand this ground reality and always ensure that business needs drive both Data Scientists & IT and not the other way around. It is thus highly important for both the Data Science team and IT team to agree on the business challenge at hand to ensure that their interactions (long and short term) are being driven with business & competitive outcomes in mind. Examples of such goals are a common organization wide business language (so that definitions agree semantically) across products, customers, logistics, supply chains & business domains. The shared emphasis on both teams should be on overall goals such as increased customer profitability, enhanced customer segmentation, customer service productivity, etc. Setting this tone upfront will not only ensure that outcomes for both teams are aligned but will also ensure that critical gaps in knowledge and capabilities are filled. One of the approaches that is working well is increased cross pollination across both teams, collapsing artificial organizational barriers by adopting DevOps & ensuring that Data Science teams have a “slim IT” presence (e.g. an embedded data engineer and datacenter person) to rapidly be able to fill in gaps in IT’s business knowledge or capability.

#2 IT needs to help Data Scientists acquire a deep understanding of the overall Data Architecture

Once business requirements have been identified, Data Scientists get right to work in understanding the different data sources that will comprise inputs to their models. In large enterprises, it is not inconceivable to find out that there are many varied data sources from which data needs to be sourced. For instance, in Banking there are a range of Book of Record Transaction (BORT) systems from which data needs to be extracted. It is also key to supplement this data with external data sets. Models are only as good as the data they are given to work with. Garbage In, Garbage Out (GIGO) is the moniker given to bad data that ensures that models perform poorly. A lot of times, business groups have a hard time explaining what they would like to see – both in terms of data and visualization. In such cases, a prototype makes things easier from a requirements gathering standpoint.  Once the problem is defined, the data scientist/modeler identifies the raw data sources (both internal and external) which are needed for the execution of the business challenge.  They spend a lot of time in the process of collating the data (from Oracle/SQL Server, DB2, Mainframes, Greenplum, Excel sheets, external datasets, etc.). The cleanup/data-wrangling process includes fixing and standardizing missing value representations, identifying potentially corrupted data elements, formatting fields that indicate time and date in a consistent manner, etc.

#3 Infrastructure & IT Self Service Across Environments, Platforms and Tools 

This one is huge. The traditional IT model of hardware acquisition and vetting is typically drawn out as a process. Even with public cloud, onerous security controls are sometime added to infrastructure which delay the Data Science team’s ability to develop their models in an agile manner. The dreaded term Shadow IT (where business & data science teams go around the IT team to procure compute and storage on the public cloud) is not just an issue with infrastructure software but is slowly creeping up to business intelligence and advanced analytics apps. The delays associated with provisioning legacy data silos combined with using tools that are neither intuitive nor able to scale to deal with the increasing data deluge are making timely business analysis almost impossible to perform.  Insights delivered too late are not very valuable. Data Scientists dearly desire that the environments that they need for development and testing are made available as soon as possible and ideally via a self service user interface. This calls for IT investments in Cloud computing platforms that enable agility and speedy provisioning of dev/test environments across compute, network and storage.

#4 – Collaboration with IT around the DS development lifecycle

Organizations typically have well established development methodologies and processes. Currently most data science development and traditional application development  happen in two distinct tracks. Software development typically follows a Agile/DevOps process (a combination of Scrum/XP). The development lifecycle is divided into several stages with each producing a working deliverable at the end. The deliverables are incrementally updated to arrive at an acceptable product at the end which is then deployed for customer use. In this model, team members typically follow a defined role.

The Data Science development cycle is different. Data scientists/modelers are given a certain business problem to solve. They proceed to find the appropriate data they need, pull it into Hadoop or a Data Warehouse, wrangle it, try various algorithms to create the best possible models, test the models, and ensure that they perform well for the problem at hand. If they  get more data during the process, they will go back and retest the whole process. The issue is that IT needs to partner with and collaborate with the Data Science team to first strategize and then help provision different environments (dev, test, prod) to enable data scientists to do iterative model development. They then need to help the Data Science team deploy these models in the appropriate deployment architecture.

#5 – Help Improve the Data Science User Experience

Using traditional app dev methodologies, it can take months to design, test and deploy software – which is simply unsustainable. One of chief goals of the DevOps model is to close the long-standing gap between the engineers who develop and test IT capability and business requirements for such capabilities.  Accordingly the data science teams need best practice recommendations on using IDEs that support iterative model development & debugging. It is important that these development tools support programming languages such as R and Python – the most common go-to languages for data science – to rapidly develop code. It is critical that the IT group partner with the Data Scientists to enable these capabilities both from a development and a deployment standpoint.

#6 – Model Deployment

The data wrangling phase involves writing code to be able to join various data sets so that a single complete dataset can be created from a raw features standpoint.  If more data is obtained as the development cycle is underway, the Data Science team has no option but to go back and redo the whole process. Once the raw features are gathered, feature engineering can begin to create predictive features from the raw data, taking into account business concepts. The modeling phase is where the choice of algorithms comes into play. A Data Scientist takes the raw & engineered features and creates models using the most appropriate algorithms for the task. After the models have been repeatedly tested for accuracy and performance, the best one is typically deployed for use. Once the models have been developed it is critical to ensure that these can be deployed rapidly, run automatically, and changed as per business requirements and performance. How and where these models will get deployed depends on the business case, ideally they should be deployed as a service. Models as a Service (MaaS) is the Data Science counterpart to Software as a Service. The MaaS takes in business variables (often hundreds of inputs) and provides as output business decisions/intelligence, measurements, and visualizations that augment decision support systems. IT help is needed to ensure that the models can scale as customer usage of these Digital Platforms increases.

#7 Model Governance and Management

There needs to be appropriate checks put into place to allow for the monitoring and maintenance of the models once in production. Model versioning must be handled so that customers aren’t affected during a maintenance cycle – old models must still function while the new ones are being put into place. And by keeping a check on the performance of models in production, the IT team can tell when a model stops performing optimally & to call on the Data Science team to check on why.

#8 Security and Compliance  

How are security constraints around different environments managed? Though IT maintains control over the vast domain of tools and environments in any organization, the Data Science team must maintain control of the models. Any random person updating the models could lead to performance degradations. This separation of concerns is akin to DB security over schemas/tables/columns – only certain individuals should be granted access to perform certain operations for the most optimal results.

#9 – Delivering Results to Business Users –

Once the model has been deployed the results need to be made available to business users. Depending on the application, model results might need to be served up in near-real-time, every day/week/month/year, ad-hoc on demand, or any other time frame in-between. Organizations need to deal with providing appropriate tools (e.g. apps, sandboxes, etc.) to enable end users to explore the results of the analysis, and to perform intelligent visualization of the data.  Visualizations include trend analysis over time, KPIs, list of interesting customers/accounts, etc.

Conclusion

Digital applications will continue to incorporate Data Science at an increasing scale. However, traditional IT Departments need to collaborate in the above specific areas to ensure that the algorithms developed for specific business issues are effective, forward looking and scalable.

Apache Mesos: Cluster Manager for the Software Defined Data Center ..(3/7)

The second and previous blog in this six part series (@ http://www.vamsitalkstech.com/?p=4670)  discussed technical challenges with running large scale Digital Applications on traditional datacenter architectures. In this third blog, we will deep dive into another important ecosystem platform – Apache Mesos, a project that aims to abstract away various system resources – CPU, memory, network and disk resources to provide consuming digital applications with a giant cluster from which they can utilize capacity – a key requirement of the Software Defined Datacenter (SDDC). The next blogpost will deep dive into Linux Containers & Docker.

Introduction and the need for Apache Mesos..

This blog has from time to time discussed how Digital applications are a diverse blend of several different and broad technology paradigms – Big Data, Intelligent Middleware, Messaging, Business Process Management, Data Science et al.

To that end almost every Enterprise Datacenter supporting Digital workloads typically has clusters of multi-varied applications installed. Most traditional datacenters have used either physical or virtual machines (VMs) as the primary runtime unit to run such applications. These VMs are typically provisioned based on application asks and have applications deployed onto them. These VMs then are formed into logical clusters which are essentially a series of machines serving a given business application in an n-tier architecture.

As load increases on these servers, more VMs are provisioned into the cluster and so on. The challenge with this traditional model is that it is fairly static in nature in the sense that machines are preallocated to run certain kinds of workloads (databases, webservers, developer servers etc). The challenge with Digital and Cloud Native applications are that scaling needs to happen dynamically and applications think of the infrastructure as being infinite.  These applications present various challenges and headaches that call for the Datacenter to be software defined as we discussed in the last blog below. We will continue our look at the SDDC by considering one of the important projects in this landscape – Apache Mesos.

Why Digital Platforms Need A Software Defined Datacenter..(2/6)

Apache Mesos is a project that was developed at the University of California at Berkeley circa 2009. While it was initially created to solve the challenge of provisioning and scaling Spark clusters, the Mesos project evolved to become a centralized cluster manager. The central idea of Mesos is to pool together all the physical resources of the cluster and making it available as a single reservoir of highly available resources for different applications (or frameworks) to consume. Over time, Mesos has begun supporting complex n-tier application platforms that leverage capabilities such as Hadoop, Middleware, Jenkins, Kafka, Spark, Machine Learning etc.

As with almost all innovative Cloud & Big Data projects, the adoption of Apache Mesos has primarily been in the web scale arena. Prominent users include highly technical engineering shops such as Twitter, Netflix, Airbnb, Uber, eBay, Yelp and Apple. However, there seems to be early adopter activity with increased acceptance in the Fortune 100. For instance, Verizon signed on in 2015 to use a Mesosphere DC/OS (based on Apache Mesos) for datacenter orchestration.

The Many Definitions of Mesos..

At it’s simplest, Mesos is an Open Source Cluster Manager. What does that mean? Mesos can be described as a cluster manger because it ensures that datacenter hardware resources are managed and advantageously shared among multiple distributed technologies – Big Data, Message Oriented Middleware, Application Servers, Mobile apps etc. Mesos also enables applications to scale with a high degree of resiliency, without having to bother about details of the underlying infrastructure.

The model of resource allocation followed by Mesos allows a range of constituents sys-admins, developers & DevOps teams to request resources (CPU, RAM, Storage) from a cloud provider.

Mesos has alternatively been described as a Datacenter Kernel as it provides a single unified view of node resources to software frameworks that wish to consume them via APIs. Mesos performs the role of an Intelligent global level scheduler that can match a massive pool of hardware resources to distributed applications that want to consume these resources. Mesos aggregates all the resources into a large virtual pool using not just virtual machines and containers but primitives such as CPU, I/O and RAM. It also breaks applications into small units that can be assigned across this pool. Mesos also provides APIs in multiple languages to allow applications to be built for it. Apache Spark, the most popular data processing engine, was built originally as a Mesos framework.

It is also called a Data Center Operating System (DCOS) as it performs a similar role to the operating system. Any application that can run on Linux runs on Mesos.

apache-mesos

To illustrate how Mesos works. Consider two clusters in a datacenter – Cluster A and Cluster B. Cluster A has 8 nodes with each node/server possessing 4 CPUs and 64 GB RAM; Cluster B has 5 nodes with each node/server having with 4 CPUs and 64 GB RAM. Mesos can essentially combine both these clusters into one virtual cluster of 52 CPUs and 832 GB RAM. The advantage of this approach is that cluster usage is greatly improved because applications share resources much more efficiently.

Mesos and Cloud Native Applications..

We discussed the differences between Cloud Native and legacy applications in the previous post @ http://www.vamsitalkstech.com/?p=4670 . Mesos has been impactful when running stateless Cloud Native applications as opposed to running traditional applications which are built on a stateful/ vertical scaling paradigm. While the defining features of Cloud Native applications are worthy of a dedicated blogpost, these applications can scale to handle massive & increasing amounts of load while tolerating any failure without impacting service. These applications are also intrinsically distributed in nature and are typically composed of loosely coupled microservices. Examples include – stateless web applications running on a Platform as a Service (PaaS), CI/CD applications working on Jenkins, NoSQL databases like HBase, Cassandra, Couchbase and MongoDB. Stateful applications that persist data using a RDBMS to disk aren’t good workloads for Mesos as yet.

When Cloud Native Digital applications are run on Mesos, several of the headaches encountered in running these on legacy datacenters are  ameliorated, namely –

  1. Clusters can be dynamically provisioned by Mesos  based on demand spikes
  2. Location independence for microservices
  3. Fault tolerance

As it matures, Mesos has also began supporting multi datacenter deployments with web scale shops like Uber running Cassandra as a framework across datacenters at scale. In the case of Uber, each datacenter has it’s own Mesos cluster with independent frameworks that exchange information periodically. The Cassandra database includes a seed node that bootstraps the gossip process for new nodes joining the cluster. A custom seed provider was created to launch Cassandra nodes which allows new nodes to be rolled out automatically into the Mesos cluster in each datacenter. (Credit – Abhishek Verma – Uber)

Mesos Architecture..

There are three main architectural primitives in Mesos – Master, Slave, Frameworks. The central orchestrator in the Mesos system is called a Master and the worker processes are called Slaves.

As depicted below, the Master process manages the overall cluster and delegates tasks to the slaves based on the resources requested by Frameworks.

The core Mesos process is installed on all nodes and their personality is given at runtime. The Slaves run application workloads that are requested by appropriate frameworks. This overall setup of Master and Slave daemons makes up a Mesos cluster.

Frameworks which are commonly called Mesos applications and are composed of three main components. First off, they have a scheduler which registers with the Master to receive resource offers and then executors which launch workloads or tasks on the slaves. The Resource offers are a simple list of a slave’s available capacity – CPU and Memory. The Master receives these offers from the slaves and then provides them to the frameworks.  A task can be anything really – a simple script or a command, or a MapReduce job or an initialization of a Jetty/Tomcat/JBOSS AS etc.

The Mesos executor is a process on the Slave that runs tasks. The executor is a program or command on the slaves which runs the tasks. No matter which isolation module is used, the executor packages all resources and runs the task on the slave node.  When the task is complete, the containers are destroyed and the Slaves resources are released back to the Master.

For Master HA, you can run multiple masters with only one Active at a given point communicating with the slave nodes. Once the Hot Master fails, Apache Zookeeper is used to manage leader election to a standby Master as depicted. Master quorum is a minimum of 3 nodes but most production deployments are recommended to have 5 Master nodes. Once a new Master is elected, all of the cluster/slave and framework information is submitted to the new Master by the frameworks so that state before failure can be reconstructed. Mesos has elaborate recovery processes for the frameworks, the schedulers and the Slave nodes.

Apache Mesos Architecture comprises of Master Nodes, Slave Nodes and Frameworks.

By some measures, Mesos is a very straightforward concept. Frameworks need to run tasks and they are traffic managed by Masters which coordinate tasks on worker machines called – Slaves.

From a production deployment standpoint, the following components are required – An odd number of Mesos Masters, Many Slave machines needed to run applications, a Zookeeper ensemble for HA configurations and an optional Docker engine running on each slave.

The Mesos Resource Allocation Process..

Mesos follows a default resource scheduling model known as two-tier scheduling. This model may seem a little convoluted but it is important to keep in mind that it was designed to satisfy the requirements & constraints of many different frameworks without having to know details of each.

The Master’s allocation module receives resource offers from slaves which then forwards them on to the framework schedulers. These offers are not just high level in terms of the resources but also how much of these resources to offer. The framework schedulers can accept or reject the Master’s offers based on their current capacity requirements. The Master’s allocation module is customizable based on specific requirements that implementing enterprises may have. The default allocation algorithm is known as Dominant Resource Fairness (DRF) and is based on fair sharing of cluster resources among requesting applications. For instance, DRF ensures that requests are equalized i.e CPU hungry applications are provided a higher share of CPU heavy resources & Memory intensive applications are provided the same fractional amount of RAM.

Mesos follows a two level resource allocation policy (Image Credit – Apache Mesos Project Documentation)

To better illustrate the resource allocation method in Mesos, let us discuss the sequence of events in the above figure from the Apache Mesos documentation[1]

  1. The Slave Node – as depicted, Agent 1 can offer reports to4 CPUs and 4 GB of memory for allocation to any framework that can use it. It reports this  available capacity to the master. The allocation policy module offers framework 1 these resources.
  2. The Master sends a resource offer describing what is available on agent 1 to framework 1.
  3. The Framework’s scheduler then provides the master withmore information on the two tasks to run on the agent, using <2 CPUs, 1 GB RAM> for the first task, and <1 CPUs, 2 GB RAM> for the second task.
  4. The master sends the tasks to the agent, which allocates appropriate resources to the framework’s executor, which in turn launches the two tasks (depicted with dotted-line borders in the figure). Because 1 CPU and 1 GB of RAM are still unallocated, the allocation module may now offer them to framework 2.

Mesos integration with other SDDC components – Linux Containers, Docker, OpenStack, Kubernetes etc

The Mesosphere stack (Credit – Alexander Rukletsov)

As with other platforms we are discussing in this series, Mesos does not stand alone in the SDDC and leverages other technologies as needed and as discussed in the last post (@ http://www.vamsitalkstech.com/?p=4670). However it needs to be stated that Mesos does have overlapping functionality at times with technologies such as Kubernetes and OpenStack.

However, let us consider the integration points between these technologies  –

  1. Linux Containers -Over the last few years, linux containers have emerged as a viable and lightweight alternative to hypervisors as way of running multiple applications on a given OS. Different containers share one underlying OS and perform with less overhead than virtual machines. Given that one of the chief goals of Mesos is to run multiple frameworks on the same set of hardware, Mesos implements what are called isolation modules and isolation mechanisms to achieve its goal of multi-tenency for different applications running on the same hardware. Mesos supports popular technologies for process isolation – cgroups, Solaris Zones, Docker containers. The first two are the default but the Mesos project has recently added Docker as an isolation mechanism.
  2. Schedulers – There is no single widely accepted definition as to what constitutes a Container Orchestration  technology. The tooling to achieve this has become one of the trickiest parts of launching containers at scale discussion with multiple projects attempting to capture this market. The requirement in the case of Mesos is straightforward –  frameworks constitute applications which need to make the the most efficient use of hardware. This means avoiding the overhead of VMs and leveraging containers –  cgroups or Docker or Rocket etc. Hence Mesos needs to be able to support container orchestration as a core feature. Mesos follows a pluggable model for container orchestration by supporting schedulers like Kubernetes or YARN or Marathon or Docker Swarm. All these tools provide service that organize containers into a clusters and running them on specified servers & overall lifecycle management and scheduling of applications running as  containers. At large webscale properties, massive container oriented environments running hundreds of microservices are all being managed with this combination of tools using Mesos.Mesos needs to be able to start and stop services in response to failure conditions etc.
  3. Private and Public Cloud Infrastructure as a Service (IaaS) Providers– Mesos works at a different layer of abstraction than a IaaS provider such as Openstack and aims to solve different problems. While OpenStack provides provisioned infrastructure across OS, Storage, Networking et, Mesos intends to achieve better cloud instance utilization. Mesos integrates well with Openstack and runs on top of resources offered up by Openstack to run frameworks on them. Mesos itself runs on a Linux instance on an existing OpenStack deployments though it also can simply run on bare metal as well. It simply requires to run a small Linux process on each of the nodes. Mesos is also significantly simpler than OpenStack and it only takes a few hrs if even to get it up and running.
    Mesos has also been deployed on public cloud technology with both Microsoft Azure and Amazon AWS. Azure’s container services are built on Mesos. Netflix leverages Mesos extensively on their EC2 cloud and have also written an advanced scheduling library called Fenzo. Fenzo ensures that a first fit kind of assignment is followed where tasks are ‘bin packed’ onto Agents by the requested use of CPU, memory and network bandwidth. Fenzo also autoscales cluster usage based on demand and also spreads tasks of a given job across EC2 availability zones for high availability. [2]With the stage set from a technology standpoint, let us look over at a few real world use cases where Mesos has been deployed in mission critical applications at various Netflix.

Mesos Deployment @ Netflix..

Netflix are one of the largest adopters and contributors to Mesos and they use it across a wide variety of business capabilities. These use cases include real time anomaly detection, data science lifecycle (training and model building batch jobs, machine learning orchestration), and other business applications. These workloads span a range of technical architectures- batch processing, stream processing and running microservices based applications.

Netflix runs their business applications as a collection of microservices deployed on Amazon EC2 and their first use of Mesos was to perform fine grained resource allocation for compute tasks to gain greater unit efficiency on EC2. The first use case for Mesos at large enterprises is typically around increasing the usage and efficiency of elastic cloud services. In Netflix’s case, they needed the cluster scheduler to increase both agent ephemerality as well as autoscale agents based on demand.

Major Application Use Cases –

  • Mantis – Netflix deals with a lot of operational data that is constantly streaming in to their environment. They have a range of use cases on streaming data such as real-time dashboarding, alerting, anomaly detection, metric generation, and ad-hoc interactive exploration of streaming data. With this Mantis is a reactive stream processing platform that is deployed as a cloud native service which focuses on operational data streams. The other goal of Mantis is to make it easy for different development teams to obtain access to real time events and then to build applications on them. The current throughput of Mantis is around 8 million events per second and Apache Mesos is running hundreds of stream-processing jobs around the clock. For certain kinds of streaming applications, this amounts to tracking millions of unique combinations of data all the time.

    Mantis Architecture is based on Apache Mesos ..
  •  As mentioned above, Netflix runs their Application services stack on Amazon EC2 and most workloads run on linux containers. Netflix created Titus to create a container management platform and to provision Docker containers on EC2.  Netflix had to do this as Amazon ECS was not upto par yet as a container orchestration solution for EC2. The use cases supported by Titus include serving batch jobs which help with algorithm training (similar titles for recommendations, A/B test cell analysis, etc.) as well as hourly ad-hoc reporting and analysis jobs. Titus recently added support for service style invocation for Netflix resources that are used to provide consistent development environments and more fine grained resource management.

  • Titus is a Container management platform that provisions Docker containers on EC2.

    Meson – One of the most important capabilities that Netflix possess is its uncanny ability to predict what movies and shows that its subscribers want to watch based on their previous watching history and similar segmentation data. Netflix excels at personalizing video recommendations and this capability is powered by machine learning algorithms. To ensure that a very large number of machine learning workflow pipelines can be efficiently created, scheduled and managed – Netflix created Meson on top of Apache Mesos. It is critical that for this system to scale and for the algorithms themselves to be fast, reliable and efficient, these pipelines are run over a large cluster of Amazon AWS instances. As depicted below, Meson manages a large number of jobs with differing CPU, Memory and Disk requirements. Once the slaves/agents are chosen, Spark jobs are run on these shared clusters. Meson uses Linux cgroups based isolation. All of the resource scheduling is handled via Fenzo (described above)

    Meson is a platform used to create high velocity Data Science pipelines that power much of Netflix’s intelligent applications.

Conclusion..

Apache Mesos is a promising new technology which attempts to solve scaling and clustering challenges encountered in the Software Defined Datacenter (SDDC). The biggest benefits of using Mesos are more efficient use of infrastructure across complex applications with native support for multitenant applications. Mesos can ensure that multiple kinds of applications or frameworks can share a given set of nodes. This ensures not just more efficient sharing of hardware but also fault tolerance and load balancing for complex Cloud Native applications.

While, Mesos has had a good degree of adoption in the webscale properties where it was first created (Twitter, Netflix, Uber, Airbnb etc to name the most prominent), it still needs to be proven as a dependable and robust platform in the datacenter.

The next post in this series will explore another exciting technology Docker, the emerging standard in the Linux container space.

References

[1] Apache Mesos Documentation – http://mesos.apache.org/documentation/latest/architecture/

[2] Distributed Resource Scheduling with Apache Mesos at Netflix – Medium.com

View story at Medium.com

A Framework for Digital Transformation in the Retail Industry..(2/2)

Our environment embraces a lot of change — we have to, because the internet is changing and the technologies we use are changing… for somebody who hated change, I imagine high tech would be a pretty bad career. It would be very tough. There are much more stable industries and they should probably choose one of those more stable industries with less change. They’ll probably be happier there.” – Jeff Bezos Chairman and CEO – Amazon, May-2016

The retail carnage continued over the last month with more household names such as Macy’s, Michael Kors, JC Penney, Abercrombie & Fitch et al announcing store closures. Long term management teams are also departing at struggling Retailers – who are unable to make the digital cut. It is clear now that players that are primarily brick and mortar need to urgently reinvent themselves via Digital Innovation. This is easier said than effectively done as delivering an transformative customer experience in such a highly competitive industry requires a cultural ability embrace change and to thrive in it. From a core technology standpoint, the industry’s digital divide between leaders and laggards manifests itself across four high level dimensions – Cloud Computing, Big Data, Predictive Analytics & Business Culture. Investments in these areas are needed to improve customer value drivers – Increased Consumer Choice, Better Pricing, Frictionless Shopping & Checkout, Ease of Payments, Speedy Order Fulfillment and Operations. In this blogpost, we will discuss a transformation framework for legacy retailers across both of these dimensions – business and technology.

Digital Reinvention in Retail..

We’re taking a look at the reasons for the Storefront pullback in Retail industry. For those catching up on the business background, please read the first post below.

Here Is What Is Causing The Great Brick-And-Mortar Retail Meltdown of 2017..(1/2)


Amazon, the Gold Standard of Retail..

We have seen how an ever increasing percentage of global retail sales are gradually moving online. It then follows naturally that a business model primarily focused on Digital e-commerce capabilities is a must have in the industry.  Amazon is setting new records for online sales – selling an increased online catalog of products (including furniture), generating record revenues from other seemingly unrelated areas of it’s business (e.g AWS, Alexa, Echo). The ability of Amazon to continue generating cash is critically important as it increases it’s financial ability to compete with incumbent retailers such as Walmart. According to a research report by the ISLR [4], as of end of 2016, Amazon had a market cap twice that of it’s biggest competitor – Walmart – even though it (Amazon) only reported around $1 billion in profits over a financial reporting period. Walmart generated about $80 billion around the same five year timeframe. Amazon plows everything back into growing it’s diverse businesses and aims to grow market share at the expense of quarterly profits.

Amazon’s market cap is worth more than all major brick and mortar retailers put together. (Credit – Equitykicker)

Amazon is the envy of every retailer out there with it’s mammoth sales of almost $80 billion in 2016. Walmart and Apple are a distant #2 and #3 with sales of $13.5 billion and $12 billion respectively [1].

Why has Amazon been and will continue to be successful?

I should confess that I am an Amazon fan going back several years. Please find a backgrounder on their business strategy written over two years ago.

Amazon declares results..and stuns!

Amazon has largely been successful for a few important reasons. The biggest reason being that using technology, it has completely rewritten the rules of Retail across the key processes – Consumer Choice, Frictionless Shopping & Checkout, Ease of Payments, Fulfillment and Innovation.

Consider the following –

  1. Platforms and Ecosystems – Amazon has built platforms that serve a host of areas in retail – ranging from books to online video to groceries to video games to virtually any kind of e-commerce. Across all of these platforms, it has constantly invested in business strategies that offer a superior customer experience from choice to fulfillment. Be it in free shipping (Amazon Prime) to innovating on drone based delivery.
  2. Constant Platform Innovation -Amazon has given its customers the largest (and ever growing catalog) of products to choose from, instant 1 – click ordering, rapid delivery via Amazon Prime and online marketplaces for sellers etc. By diversifying into areas like streaming video (via Amazon Prime Video), it is also turning into a content producer.With the launch of storefronts such as Amazon Go, it is striving to provide a seamless multichannel (digital and physical) experience so consumers can move effortlessly from one channel to another. For example, many shoppers use smartphones to reserve a product online and pick it up in a store.
  3. Always Push the Envelope on Advanced Technology -Digital product innovation implies an ability to create new products and services that meet changing customer demands. Such capability implies an ability to bring products to market faster, and then to refine such approaches based on customer feedback. Amazon supplements every platform it builds with ecosystems based on advanced technology. For instance, customers can now try any of the hundreds of voice control apps being built on Amazon Alexa to order products across Amazon’s huge catalog.  In April 2017, Amazon launched Echo Look which is a Digital Assistant which can perform various functions – order a ride via Uber, order pizza from Domino’s etc. But in addition to just obeying commands and reading out news etc, it also comes with a camera that can take pictures and perform advanced image recognition. Owners can even try a few outfits, upload them to the device and the Style Check function will tell you which combination looks best. [5]
  4. Create a Data Driven Customer Experience -Using data in a way that improves the efficiencies in back end supply chains as well as creating micro opportunities to influence every customer interaction. Amazon leverages big data and advanced analytics to better understand customer behavior. For example, gaining insight into customers’ buying habits—with their consent, of course—can lead to an improved customer experience and increased sales through more effective bundling.
  5. Streamlined Operations – Finally, Amazon’s operations are the envy of the retail world.With Amazon Web Services (AWS), Amazon is the Public Cloud leader. Amazon has constantly proved it’s capabilities in automating operations and digitizing business processes using robotic process automation. This is important because it enables quicker shipping times to customers while cutting operating waste and costs. As an example, in the earnings call in 2015, their CFO touted Amazon’s use of robotics in its large warehouses to lower costs. “We’re using software and algorithms to make decisions rather than people, which we think is more efficient and scales better,”[3]

Again, according to the ILSR [4], “Today, half of all U.S. households are subscribed to the membership program Amazon Prime, half of all online shopping searches start directly on Amazon, and Amazon captures nearly one in every two dollars that Americans spend online. Amazon sells more books, toys, and by next year, apparel and consumer electronics than any retailer online or off, and is investing heavily in its grocery business.”

Retail Transformation Roadmap

Outlined below are the four prongs of a progressive strategy that Retailers can adopt to survive and thrive in today’s competitive marketplace. Needless to say, the theme across these strategies is Amazon-lite i.e leverage Digital technologies to create an immersive & convenient cross channel customer experience.

How can Retailers transform themselves to better compete in the Digital Age

Step #1 Develop a (Customer Focused) Digital Strategy..

This is a two pronged phase. Firstly, Customers across age groups are using a variety of channels such as mobile phones, apps, in-store kiosks, tablets etc to purchase products. Secondly, despite all of the attractive features around convenient ordering, frictionless payments and ease of delivery – the primary factor driving purchase in certain channels is price. Thus, it is key to identify the critical focus channels, customer segments based on loyalty & other historical data, their willingness to pay based on the channel and product mix in defining the overall digital strategy. Once defined, at the board level, key metrics which drive top line growth need to be identified.

It is important to understand that brick and mortar sales will continue to lead for a long time. Thus, investing in highly efficient store layouts, performing customer traffic analysis and in-store mapping of applications etc is highly called for. Brick and Mortar retailers have a significant ability to drive higher customer foot traffic based on their ability to deliver in-store pickup after online ordering etc. These are advantages that need to be leveraged.

There are four questions at this phase every Retailer needs to ask at the outset –

  • What customer focused business capabilities can be enabled across the organization?
  • What aspects of these  transformation can be enabled best by digital technology? Where are the current organizational and technology gaps that inhibit innovation?
  • How do we measure ROI and Business success across these projects? How do we benchmark ourselves from a granular process standpoint against the leaders?

Step #2 Accelerate Investments in New Technology..

The need of the hour for legacy Retail IT is to implement such flexible digital platforms that are built around efficient use of data , real time insights and predictive analytics. Leaders are driving platforms based on not just Big Data Analytics but are also adopting Deep Learning. Examples include Digital Assistants such as Chatbots, Mobile applications that can perform image recognition etc. What business capabilities can be driven from this? Tons.

The ability to offer quicker product tests and to modify them per customer feedback is fast becoming the norm. Depending on the segment of Retail you operate across (e.g. Apparel), the need is to make customer tryouts more convenient with the aid of both technology and humans – knowledgeable sales associates.

Further, an ability to mine customer, supplier and partner data implies the ability to offer customers relevant products, a wide ranging catalog of products, promotions & coupons and other complementary products such as store/private label credit cards.

The Three Core Competencies of Digital – Cloud, Big Data & Intelligent Middleware

Step #3 Use Data to Drive Operations

It is no longer sufficient to just perform brand studies that only understand historical customer transaction and behavior history. There is a strong degree of correlation between customer purchases and products recommended across their social networks – such as Facebook, Pinterest and Twitter etc.  Such advanced analytics are needed to drive product development, promotions and advertising.

Demystifying Digital – the importance of Customer Journey Mapping…(2/3)

The tremendous impact of AI (Artificial Intelligence) based approaches such as Deep Learning & Robotic Process Automation are beginning to be felt across early adopter industries like Banking.

Retailers playing catchup need to re-examine business and technology strategy across six critical prongs –

  1. Product Design
  2. Inventory Optimization
  3. Supply Chain Planning
  4. Transportation and Logistics
  5. IoT driven Store design
  6. Technology driven warehousing and order fulfillment

Step #4 Drive a Digital Customer Experience..

We have discussed the need to provide an immersive customer experience. Big Data Analytics drives business use cases in Digital in myriad ways – key examples include  –

  1. Obtaining a realtime Single View of an entity (typically a customer across multiple channels, product silos & geographies) – this drives customer acquisition, cross-sell, pricing and promotion. 
  2. Customer Segmentation by helping retailers understand their customers down to the individual as well as at a segment level. This has applicability in marketing promotions and campaigns.
  3. Customer sentiment analysis by combining internal organizational data, clickstream data, sentiment analysis with structured sales history to provide a clear view into consumer behavior.
  4. Product Recommendation engines which provide compelling personal product recommendations by mining realtime consumer sentiment, product affinity information with historical data.
  5. Market Basket Analysis, observing consumer purchase history and enriching this data with social media, web activity, and community sentiment regarding past purchase and future buying trends.

Demystifying Digital – Why Customer 360 is the Foundational Digital Capability – ..(1/3)

Conclusion..

Retail as an industry continues to present interesting paradoxes in 2017. Traditional retailers continue to suffer with store closings while online entrants with their new approaches continue to thrive by taking market share away from the incumbents. The ability to adopt a Digital mindset and to offer technology platforms that enhance customer experiences will largely determine survival.

References..

[1] “Amazon and Walmart are the top e-commerce retailers” http://wwd.com/business-news/financial/amazon-walmart-top-ecommerce-retailers-10383750/

[2] “Four Reasons Why Amazon’s stock will keep doubling every three years ”  https://www.forbes.com/sites/petercohan/2017/04/28/four-reasons-amazon-stock-will-keep-doubling-every-three-years/#45a4b68923c8

[3] “Wal-Mart, others speed up deliveries to shoppers” http://www.chicagotribune.com/business/ct-faster-holiday-deliveries-20151016-story.html

[4] Report on Amazon by the Institute of Local Self Reliance (ILSR) – https://ilsr.org/wp-content/uploads/2016/11/ILSR_AmazonReport_final.pdf

[5] “How Amazon stays more agile than most startups” –https://www.forbes.com/sites/howardhyu/2017/05/02/how-amazon-stays-more-agile-than-most-startups/#76ab2b572103