Vamsi Chemitiganti's weekly musings on applying Big Data, Cloud, & Middleware technology to solving industry challenges. Published every Friday or Sunday (if I'm very busy). All opinions are entirely my own.I write this blog so you don't have to spend money on expensive consultants.
We discussed the emergence of Application Programming Interfaces (APIs) as providing a key business capability in Digital Platforms @ http://www.vamsitalkstech.com/?p=3834. The next post then discussed the foundational technology, integration & governance capabilities that any Enterprise API Platform must support @ http://www.vamsitalkstech.com/?p=5102. This final post in the API series will discuss a deployment model for an API Management Platform.
The first two posts in this series discussed the business background to API Management and the need for an Enterprise API Strategy. While details will vary across vendor platforms, the intention of this post is to discuss key runtime components of an API management platform and the overall developer workflow in creating APIs & runtime workflow to that enables client applications to access them.
Architectural Components of an API Management Platform..
The important runtime components of an API management platform are depicted in the below illustration. Note that we have abstracted out network components (firewalls, reverse proxies, VLANs, switches etc) as well as the internal details of application architecture which would normally be impacted by an API Platform.
Let us cover the core components of the above:
API Gateway -The API Gateway has emerged as the dominant deployment artifact in API Architectures. As the name suggests Gateways are based on a facade design pattern. The Gateway (or typically a set of highly available Gateways) acts as a proxy to traffic between client applications (used by customers, partners and employees) and back end services (ranging from mainframes to microservices). The Gateway is essentially an appliance or a software process that abstracts all API traffic into an organization and exposes business capabilities typically via a REST interface. Clients are exposed different views of the same API – coarse grained or granular – depending on the kind of client application (thick/thin) and access control permissions. Gateways include protocol translation and request routing as their core functionality. It is also not uncommon to deploy multiple Gateways – in an internal and external fashion – depending on business requirements in terms of partner interactions etc. Gateways also include functionality such as caching requests for performance, load balancing, authentication, serving static content etc. The API Gateway can thus be managed using a set of policy controls. Performance characteristics such as throughput, scalability, caching, load balancing and failover are managed using a cluster of API Gateways. The introduction of an API Gateway also ensures that application design is impacted going forward. API Gateways can be implemented in many forms – as a software platform or as an appliance. Public cloud providers have also begun offering mature API Gateways that integrate well with a range of backend services that they provide both from an IaaS and a PaaS standpoint. For instance, Amazon’s API Gateway integrates natively with AWS Lambda and EC2 Container Service for microservice deployments on AWS.
Security -Though it is not a standalone runtime artifact, Security ends to be called out as one of the most important logical requirements of an API Management platform. APIs have to follow the same access control mechanisms, security constrains for different user roles etc as their underlying datasources. This is key as backend applications and organizational data need to be protected from a variety of targets – denial of service attacks, malware, access control violations etc. Accordingly, policy based protection using API keys, JSON/XML signature scanning & threat protection, encryption for Data in motion and at rest, OAuth support etc – all need to be provided as standard features.
Management and Monitoring -Ensuring that the exposed APIs are maintaining their QOS (Quality of Service) as helping admins monitor their quota of resource consumption is key from a Operations standpoint. Further, the M&M functionality should also aid operators in resolving complex systems issues and ensuring a high degree of availability during upgrades etc.
Billing and Chargeback -Here we refer to the ability to tie in the usage of APIs to back office applications that can charge users based on their metered usage of the backend applications. This is typically provided through logging and auditing capability.
Governance -From a Governance standpoint, the ability to track APIs across their lifecycle, a handy catalog of available APIs, an ability to audit their usage and the underlying assets they expose and the ability for business to set policies on their usage etc.
API Design Process..
An internal developer uses the developer toolkit (e.g. Eclipse with an offline plugin) and/or an API Designer tool included with a vendor platform to create the API based on organizational policies. Extensive CLI (Command Line Interface) is also provided to perform all functions which can be done using the GUI. These include, local unit & system test capabilities and an ability to publish the tested APIs to a repository from where the runtime can access, deploy and update the APIs.
From a data standpoint, multiple databases including RDBMS, NoSQL are supported for data access. During the creation of the API, depending on whether the developer already has an existing data model in mind, the business logic is mapped closely with the data schema, or, one can also work top down to create the backend once the API interface has been defined using a model driven approach. These also include settings for security permissions with support for OAuth and any other third party authentication dependencies.
Once defined and tested, the API is published onto the runtime. During this process access control privileges, access policies and the endpoint itself are defined. The API is then ready for external consumption and discovery.
Runtime Flow Across the Architecture..
In the simplest case – once the API has been deployed and tested it is made available for public discovery and consumption. Client Applications then begin to leverage the API and this can be done in a variety of ways. For example – user interactions on mobile applications, webpages and B2B services trigger calls to the API Gateway. The Gateway performs a range of functions to process the request – from security authorization to load-balancing before accessing policies setup for that particular API. The Gateway then invokes the API by calling the backend system typically via message oriented middleware such as an ESB or a Message Broker. Once the backend responds with the appropriate payload ,the data is sent to the requesting application. Systems and Administration teams can view detailed operational metrics and logs to monitor API performance.
A Note on Security..
It should come as no surprise that the security aspect of an API Management Platform is one of the most critical aspects of the implementation. While API Security is a good subject for a followup post and too exhaustive to be covered in a short blurb – several standards such as OAuth2, OpenID Connect, JSON Security & Policy languages are all topics that need to be explored by both organizational developers and administrators. Extensive flow mapping and scenario testing are mandated here. Also, endpoint security from a client application standpoint is key. Your Servers, Desktops, Supported Mobile devices need to be updated and secured with the latest antivirus & other standard IT Security/access control policies.
In this post, we tried to highlight the major components of an API Management Platform from a technology standpoint. While there are a range of commercial & open source platforms, it is important to evaluate them from a feature standpoint as well as from an ecosystem capability perspective as developers began implementing microservices based Digital Architectures.
We discussed the emergence of Application Programming Interfaces (APIs) as a key business capability in Digital Platforms @ http://www.vamsitalkstech.com/?p=3834. We also saw how APIs can serve as a business interaction driven integration layer. APIs provide a layer that serves to connect backend business services across Digital applications across multiple channels. In this second post we will discuss the foundational business, technology, integration & governance capabilities that any Enterprise API Platform must support. The next and final post will discuss an API centric deployment architecture for a medium to large enterprise.
What is your API vision?
The first post in this series (http://www.vamsitalkstech.com/?p=3834) covered the need for industry players to treat APIs as a way of reinventing the many aspects of their business and their consumers. From a high level standpoint, this can be done in one of three ways –
Inculcating Digital Innovation both inside & inside out. Extending the boundaries of a large global or national enterprise or outside in, by enabling partners to build innovative applications.
Exposing Data Assets and combining them with advanced analytics to enable customers to consume enterprise business services across the globe.
Taking a Platform first approach to building new applications and enabling API nativity in such greenfield development.
Not every Borders Bookstores like company can turn into an Amazon but the ability to create new lines of revenue implies closer integration with business partners. The creation of APIs enables this integration as we saw in the previous post but it is really the treatment of APIs as an enterprise enabler that ensure the scalability of innovation. Hence the need for an enterprise API strategy which senior executives need to be able to devise based on both from a tactical standpoint as well as keeping the strategic vision in mind.
As with all things in digital technology, API Management is founded on strong business use cases. So let us begin by examining a smattering of these.
Industrial Use Cases for API Management Platforms..
Let us first discuss the major business use cases for APIs in a business enterprise.
The simplest use case for any API implementation is to provide Information Retrieval. This includes such use cases for a Free API (which typically accesses non private information) to a Paid API (which securely accesses business sensitive data stored in Book of Record Transaction (BORT) systems). E.g. Patient Medical Records, Supply Chain data, Bank Customer Account Information, Insurance Policies etc.
Other complementary use cases include supporting exposing functionality in Internal applications (that typically perform Document & File Management) across a range of business scenarios – typically via a Private API
Across Partner & Supplier Applications, support the invocation of business logic that typically performs a business process, using an internal or trusted partner API.
Support for Mobile applications and web front ends for applications ranging from field employee enablement to online payments etc using consumer facing public APIs
The most complex use case is support for Data Monetization using advanced analytics. The last post discussed how APIs need to help monetize business assets, this implies an ability to provide complex analytic support for select APIs that extend brands by connecting to a range of backend sources.
The technology and platform requirements for an API strategy will cascade from these use cases – all of which should fairly resonate across several industry verticals.
Business Requirements for API Strategy..
The goal of an enterprise API strategy should be to support the creation of a centralized API platform which appeals to various audiences – Customers, Internal & External Developers, Lines of business and Operations teams.
There are ten distinct business challenges that an enterprise API strategy needs to account for.
First and foremost, an API strategy needs to support the ability of existing business systems to expose their business assets for consumption in Digital scenarios. This implies not just supporting a cloud native/micro-service model of application development but also a range of legacy systems such as RDBMS’s, ERP, CRM systems etc. The ability to front these systems with RESTful APIs, at a minimum, will ensure that these can participate in a digital business process without a lot of upfront rewriting. Adapters that can provide deep integration with these sources that allow for efficient API performance using techniques such as query optimization, pagination, support for business policies etc. The API platform also needs to support easy ways of composing APIs and orchestrating them across backend applications which are not always cloud native. The capability of API Composition where backend APIs are orchestrated to perform a higher business function is highly desirable.
The API Management Platform needs to support a High Performance Architecture capable of supporting high volumes of client applications – at a high end potentially millions of API calls per minute.
The Platform needs to provide five nine’s of Infrastructure and Application reliability. Lost API messages mean missed revenue – it is as simple as that. Thus, API’s need to be highly available and support a high degree of redundancy.
APIs increase the attack surface of an enterprise. Accordingly, the strategy needs to account for the provision of bulletproof Security against a range of threat vectors – malicious API client applications, Malware, Denial of Service (DOS) attacks etc. Also ensuring strong Identity Management capabilities for client applications across complex backend services
The ability to Monitor the APIs for performance, throttling etc to guarantee SLA (Support Level Agreements). It is also important to provide the ability to generate granular business & IT reporting on API usage across a range of metrics etc.
As discussed in the last blog @ http://www.vamsitalkstech.com/?p=3834, an API ecosystem provides support for multiple players – customers, partners, employees etc. Accordingly needs to support multiple versions of underlying APIs that expose different views of business assets. This is key so that consumers can obtain value around the capabilities that are aligned with their interests.
An ability to support Data Monetization via Rich Analytics than has been possible before that provide a great degree of context. This ability to reason around context is what provides an ability to design new business models which cannot be currently imagined due to lack of agility in the data and analytics space.This integration helps these systems leverage the digital intelligence and insights across (potentially) millions of devices across complex areas of operation.
Application developers access to APIs with a view to including them in their business applications. Accordingly, an API Management strategy should provide strong capabilities for Developers via a Portal. The Portal helps them right from on-boarding, help around exploring organization backend capabilities, API documentation, Quickstart Guides, Online videos, API Testing capabilities, API version history, search & discovery tools for API discovery etc.It should be noted that multiple developer portal views must be supported – both for internal and external communities of developers. Internal developers will want to do a range of tasks that create support lines of business, business automation tasks, supporting workforce related IT access applications etc. They will create, package and upload APIs to the portal. External API developers range from Partners to Customer communities. They typically access these APIs, subscribe to them and run a range of dev-test tasks using the Portal.
Supporting Governance across potentially hundreds of API definitions. The topic of Governance is the most critical area and tools need to help right from the definition of business case to assigning actors (who may already be defined in business directories) to managing deployment schedules to change management etc. Business policies need to be supported to enable business and IT stakeholders to retire APIs.
Finally, an API strategy cannot be divorced from the Industry Vertical that the enterprise operates in. This implies that starter set APIs, templates, SDKs etc be provided as modules for verticals like Financial Services, Insurance, Telecom, Healthcare, Manufacturing and Connected Cars etc.
APIs are a product line and should be treated as such which implies an ability to manage them across their lifecycle. Developers create API client applications, the corporation makes these API definitions available for communities of developers consume in their applications. Sys admins secure, deploy & manage these APIs.
The end goal of an API strategy is to ensure that the process of creating, securing, orchestrating & monitoring these API interfaces is intuitive, consistent and scalable across a large organization. We will round off this three part series on APIs by define a technical deployment architecture in the next & final post.
Financial Risk Management as it pertains to different industries – Banking, Capital Markets and Insurance – has been one of the most discussed topics in this blog. The business issues and technology architecture of systems dedicated to aggregating, measuring & visualizing Risk are probably one of the more complex tasks in the worlds of finance & insurance. This post summarizes ten key blogs on the topic of Financial Risk published at VamsiTalksTech.com. It aims to serve as a handy guide for business and technology audiences tasked with implementing Risk projects.
The twin effects of the global financial crisis & the FinTech boom has caused Financial Services, Insurance and allied companies to become laser focused on on risk management. What was once a concern primarily of senior executives in the financial services sector has now become a top-management priority in nearly every industry.
Whatever be the kind of Risk, certain themes are common from a regulatory intention standpoint-
Limiting risks that may cause wider harm to the economy by restricting certain activities such as preventing banks with retail operations from engaging in proprietary trading activities
Requiring that banks increase the amount of and quality of capital held on reserve to back their assets and by requiring higher liquidity positions
Ensuring that banks put in place appropriate governance standards ensuring that boards and management interact not just internally but also with regulators and their clients
Upgrading governance standards, enabling a fundamental change in bank governance and the way boards interact with both management and regulators. These ambitions were expressed in various new post‐crisis rules and approaches.
Tackle the “too big to fail” challenge for highly complex businesses spanning multiple geographies, product lines and multifaceted customer segments. Accurate risk reporting ensures adequate capital conservation buffers.
With this background in mind, complete list of Risk use case blogs on VamsiTalksTech is included below.
# 1 – Why Banks and Other Financial Institutions Should Digitize Risk Management –
Banks need to operate their IT across two distinct prongs – defense and offense. Defensive in areas like Risk, Fraud and Compliance (RFC) ; Offensive as in revenue producing areas of the business like Customer 360 (whether Institutional or Retail), Digital Marketing, Mobile Payments, Omni channel Wealth Management etc. If one really thinks about it – the biggest activity that banks do is manipulate and deal in information whether customer or transaction or general ledger etc.
# 2 – Case Study of a Big Data Enabled IT Architecture for Risk Data Measurement – Volcker Rule/Dodd Frank –
While industry analysts can discuss the implications of a certain Risk mandate, it is certainly most help for Business & IT audiences to find CIOs discussing overall strategy & specific technology tools. This blogpost discusses how two co-CIOs charged with an enterprise technology mandate are focused on growing and improving a Global Banking leaders internal systems, platforms and applications especially from a Risk standpoint.
# 4 – Capital Markets – Architectural Approaches to the practice of Risk Management –
In Capital Markets, large infrastructures ,on a typical day, process millions of derivative trades. The main implication is that there are a large number of data inserts and updates to handle. Once the data is loaded into the infrastructure there needs to be complex mathematical calculations that need to be done in near real time to calculate intraday positions. Most banks use techniques like Monte Carlo modeling and other computational simulations to build & calculate these exposures. Hitherto, these techniques were extremely expensive from both the cost of hardware and software needed to run them. Neither were tools & projects available that supported a wide variety of data processing paradigms – batch, interactive, realtime and streaming. This post examines a detailed reference architecture applicable to areas such as Market, Credit & Liquidity Risk Measurement.
This blogpost discusses the data management and governance implications of BCBS 239. The BCBS 239 provides guidelines to overhaul an organization’s risk data aggregation capabilities and internal risk reporting practices.
Industry clearly requires a fresh way of thinking about Risk Management. Leader firms will approach Risk as a way to create customer value and a board level conversation around such themes rather than as a purely defensive and regulatory challenge. Surely, this will mean that budgets for innovation related spending in areas such as Digital Transformation will also slowly percolate over to Risk. As firms either digitize or deal with gradually eroding market share, business systems that work with and leverage risk will emerge as a strong enterprise capability over the upcoming 3-5 year horizon.
As enterprises embark or continue on their Digital Journey, APIs are starting to emerge as a key business capability and one that we need to discuss. Regular readers of this blog will remember that APIs are one of the common threads across the range of architectures we have discussed in Banking, Insurance and IoT et al. In this blogpost, we will discuss the five key imperatives or business drivers for enterprises embarking on a centralized API Strategy.
API Management as a Native Digital Capability..
The use of application programming interfaces (APIs) has been well documented across web scale companies such as Facebook, Amazon and Google et al.Over the last decade, APIs have begun emerging as the primary means for B2B and B2C companies to interact with their customers, partners and employees. The leader enterprises already have Digital Platform efforts underway as opposed to creating standalone Digital applications. Digital Platforms aim to increase the number of product and client channels of interaction so that enterprises can reach customer audiences that were hitherto untapped. The primary mode of interaction with a variety of target audiences in such digital settings are via APIs.
APIs are widely interoperable, relatively easy to create and form the front end of many internet scale platforms. APIs are leveraged to essentially access the core services provided by these platforms and can be used to create partner and customer ecosystems. Leaders such as PayPal, Amazon & FinTechs such as Square, Mint etc have overwhelmingly used APIs as a way to not only open their platforms to millions of developers but also to offer innovative services.
As of 2015, programmableweb.com estimated that over 12,000 APIs were already being offered by enterprise firms. Leaders such as Salesforce.com were generating about 50% of their revenue through APIs. Salesforce.com created a thriving marketplace – AppExchange – for apps created by its partners that work on its platform which numbered around 300 at the time of writing. APIs were contributing 60% of revenues at eBay and a staggering 90% for Expedia.com. eBay uses APIs to create additional exposure for it’s products – list auctions on other websites, get bidder information about sold items, collect feedback on transactions, and list new items for sale. Expedia’s APIs allowed customers to use third party websites to book flights, cars, and hotels. 
While most of the Fortune 500 have already begun experimenting with the value that APIs can deliver, the conversation around these capabilities needs to be elevated from an IT level to a line of business to a CIO/Head of Marketing. APIs help generate significant revenue upside while enabling rapid experimentation in business projects. Examples of API usage abound in industries like Financial Services, Telecom, Retail and Healthcare.
The Main Kinds of APIs
While the categories of APIs will vary across industry, some types of APIs have been widely accepted. The three most popular from a high level are described below –
Private APIs – These are APIs defined for use by employees and internal systems within an organization or across a global company. By their very nature, they’re created for sensitive internal functions and have access to privileged functions that external actors cannot perform.
Customer APIs – Customer APIs are provided as a way of enabling used by global customers to conduct business using product/service distribution channels – examples include product orders, view catalogs etc. These carry a very limited set of privileges limited to customer facing actions in a B2C context.
Partner APIs – Partner APIs are used for varying levels of business to be able to perform business functions in the context of a B2B relationship. Examples include Affiliate programs in Retail, inventory management, Supply Orders in Manufacturing & Billing functions in Financial Services etc.The API provider hosts marketplaces that enable partner developers to create software that leverages these APIs.
The Five Business Drivers for an Enterprise API Strategy..
The question for enterprise executives then becomes, when do they begin to invest in a central API Management Platform? Is such a decision based on the API sprawl in the organization or the sheer number of APIs being manually managed etc?
While the exact trigger point may vary for every enterprise, Let us consider the five key value drivers..
Driver #1 APIs enable Digital Platforms to evolve into ecosystems
In my mind, the first and most important reason to move to a centralized API strategy is to evolve standalone Digital capabilities into a coherent Platform. Different lines of business can use these capabilities to develop an ecosystem of applications that can be offered as a SaaS (Software as a Service). The end state of Digital Platforms are to operate business systems at massive scale in terms of customers, partners and employees.
The two central ideas at the heart of a platform based approach are as follows –
Create new customer revenue streams by reaching out to new customer segments across the globe or in new (and non traditional) markets. Examples of these platforms abound in the business world. In financial services, Banks & Credit reporting agencies are able to monetize their assets of years of customer & product data by reselling them to interested third parties which use them either for new product creation or to offer services that simplify a pressing industry issue – Customer Onboarding.
Reduce cost in current business models by extending core processes to business partners and also by automating manual communication steps (which are almost always higher cost and inefficient). For instance, Amazon has built their retail business using partner APIs to extend retailing provisioning, entitlement, enablement and order fulfillment processes.
Driver #2 Impact the Customer experience
We have seen how mobile systems are a key source of customer engagement. Offering the customer a seamless experience while they transact with an organization is a key way of disarming competition. Accordingly, Digital projects emphasize the importance of capabilities such as Customer Journey Mapping (CJM) and Single View of Customer (SVC) as the minimum table stakes that they need to provide. For instance, in Retail Banking, players are feeling the pressure to move beyond the traditional transactional banking model to a true customer centric model by offering value added services on the customer data that they already possess. APIs are leveraged across such projects to enrich the views of the customer (typically with data from external systems) as well as to expose these views to customers themselves, business partners and employees.
This one is all too familiar to anyone working in technology. We have seen how both Cloud Computing & DevOps are the foundation of agile technology implementations across a range of back end resources. These include but are not limited to Compute, NAS/SAN storage, Big Data, Application platforms, and other middleware. Extending that idea, Cloud (IaaS/PaaS) is a set of APIs.
APIs are used to abstract out the internals of these underlying platform services. Application Developers and other infrastructure services use well defined APIs to interact with the platforms. These APIs enable the provisioning, deployment and management of platform services.
APIs have become the de facto model that provide developers and administrators with the ability to assemble Digital applications such as microservices using complicated componentry. Thus, there is a strong case to be made for adopting an API centric strategy when evolving to a Software Defined Datacenter.
A huge trend on the developer side has been the evolution of Continuous build, integration and deployment processes. The integration of APIs into the DevOps process has begun with use cases ranging from using publicly available APIs being used to trigger CI jobs to running CI/CD jobs using a cloud based provider.
Driver #4 APIs enable Business & Product Line Experimentation
APIs thus enable companies to constantly churn out innovative offerings while still continuously adapting & learning from customer feedback. Internet scale companies such as Facebook provide edge APIs that enable thousands of companies to write applications that driver greater customer volumes to the Facebook platform. The term API Economy is increasingly in vogue and it connotes a loosely federated ecosystem of companies, consumers, business models and channels
The API economy is a set of business models and channels — based on secure access of functionality and the exchange of data to an ecosystem of developers and the users of the app constructs they build — through an API, either within a company or via the internet, with business partners and customers.
Driver #5 Increasingly, APIs are needed to comply with Regulatory Mandates
We have already seen how key industries such as Banking and Financial Services, regulatory authorities are at the forefront of forcing incumbents to support Open APIs. APIs thus become a mechanism for increasing competition to benefit consumer choice. The Regulators are changing the rules of participation in the banking & payments industry and APIs are a key enabling factor in this transformation.
Financial Services, Healthcare, Telecom and Retail.. a case in point for why APIs present an Enormous Opportunity for the Fortune 500..
Banking – At various times, we have highlighted various business & innovation issues with Banking providers in the areas of Retail Banking, Payment Providers and Capital Markets. Regimes such as Payment Systems Directive (PSD2) in the EU will compel staid industry players to innovate faster than they otherwise would. FinTechs across the industry offer APIs to enable third party services to use their offerings.
Healthcare – there is broad support in the industry for Open APIs to drive improved patient care & highly efficient billing processes as well as to ensure realtime engagement across stakeholders.
In the Telecom industry, nearly every large operator has developed APIs which are offered to customers and the developer community. Companies such as AT&T and Telefonica are using their anonymized access to hundreds of millions of subscribers to grant large global brands access to nonsensitive customer data. Federated platforms such as the GSM Association’s oneAPI are already promoting the usage of industry APIs.
Retailers are building new business models based on functionality such as Product Catalogs, Product Search, Online Customer Orders, Inventory Management and Advanced Analytics (such as Recommendation Engines). APIs enable retailers to expand their footprints beyond the brick and mortar store & an online presence.
Ranking Your API Maturity..
Is there a maturity model for APIs? We can try listing those into three different strategic options for Banks. Readers can extrapolate these into for their specific industry segment.
Minimally Compliant Enterprises – Here we should categorize Companies that seek to provide compliance with a minimal Open API. Taking the example of Banking, while this may be the starting point for several organizations, staying too long in this segment will mean gradual market share erosion as well as a loss of customer lifetime value (CLV) over time. The reason for this is that FinTechs and other startups will offer a range of services such as Instant mortgages, personal financial management tools, paperless approval processes for a range of consumer accounts etc. It is also anticipated that such organizations will treat their API strategy as a localized effort and will allocate personnel to the project mainly around the front office and marketing.
Digital Starters -Players that have begun exploring opening up customer data but are looking to support the core Open API but also introduce their own proprietary APIs. While this approach may work in the short to medium term, it will only impose integration headaches on the banks as time goes on.
Digital Innovators – The Digital Innovators will lead the way in adopting APIs. These companies will fund dedicated teams in lines of business serving their particular customer segments either organically or using partnerships with third party service providers. They will not only adhere to the industry standard APIs but also extend these specs to create own services with a focus on data monetization.
Increasingly, a company’s APIs represent a business development tool and a new go-to-market channel that can generate substantial revenues from referrals and usage fees. Given the strategic importance and revenue potential of this resource, the C-suite must integrate APIs into its corporate decision making.
The next post will take a technical look into the core (desired) features of an API Management Platform.
“Data matures like wine, applications like fish.” – James Governor, Principal Analyst & Founder of RedMonk, circa 2007
I would like to begin a series of posts on Data Science jointly authored with my friend, ex-colleague, & collaborator, Maleeha Qazi – Data Scientist (https://www.linkedin.com/in/maleehaqazi/). In these posts, we are intending to bring to light several technology themes around industrial use of Data Science and Deep Learning around Industrial Applications, Big Data , Cyber Security, Cognitive Applications, Business Process Management, and Cloud Computing. Our goal for this first post is to discuss typical issues that bedevil every Data Science initiative at the beginning. Namely, the top technical and cultural concerns to communicate to the IT Department every time a new project is begun.
With Data Science emerging as a key enabler in Digital Customer focused Applications, renewed focus is being placed on how the lifecycle of these new fangled applications happens alongside traditional IT development. This blogpost aims to highlight some of the key concerns involved when Data Science groups work with IT departments. Currently there is no “one size fits all model” in terms of how advanced models are developed and deployed so that they can be accessed and used at scale by customers.It is our wager that almost every large enterprise working on these projects encounters these issues. We wanted to share our experience with the enterprise community over a series of blog posts.
It is clear that Data Science teams, product teams and IT need to collaborate to create business applications that learn from customer needs.
So what are the top asks that Data Science has for their IT groups? There are at least nine important focus areas:
#1 Understanding of the business challenge and agreeing on a common vocabulary
It is a generally accepted fact that most IT/Data Science interactions are focused on the technology portion which include some of the following elements : the data sources within the organization, acquisition and access to external data sources, the availability of tools & infrastructure to begin supporting the data science development process, cloud or on-prem, data ingestion engines (e.g. Kafka, Flume, Sqoop) to ingest and process the data, etc. While this is certainly part of the process, there has begun to be a distinct anti pattern in how this interaction is working when solely driven by technology alone. The Data Science team is involved in creating models that typically reflect customer needs that drive business value for an organization’s customers, partners, regulators & employees. In that rather important context, technology at it’s core is just an engine and does not exist in a vacuum. The most vibrant enterprises understand this ground reality and always ensure that business needs drive both Data Scientists & IT and not the other way around. It is thus highly important for both the Data Science team and IT team to agree on the business challenge at hand to ensure that their interactions (long and short term) are being driven with business & competitive outcomes in mind. Examples of such goals are a common organization wide business language (so that definitions agree semantically) across products, customers, logistics, supply chains & business domains. The shared emphasis on both teams should be on overall goals such as increased customer profitability, enhanced customer segmentation, customer service productivity, etc. Setting this tone upfront will not only ensure that outcomes for both teams are aligned but will also ensure that critical gaps in knowledge and capabilities are filled. One of the approaches that is working well is increased cross pollination across both teams, collapsing artificial organizational barriers by adopting DevOps & ensuring that Data Science teams have a “slim IT” presence (e.g. an embedded data engineer and datacenter person) to rapidly be able to fill in gaps in IT’s business knowledge or capability.
#2 IT needs to help Data Scientists acquire a deep understanding of the overall Data Architecture
Once business requirements have been identified, Data Scientists get right to work in understanding the different data sources that will comprise inputs to their models. In large enterprises, it is not inconceivable to find out that there are many varied data sources from which data needs to be sourced. For instance, in Banking there are a range of Book of Record Transaction (BORT) systems from which data needs to be extracted. It is also key to supplement this data with external data sets. Models are only as good as the data they are given to work with. Garbage In, Garbage Out (GIGO) is the moniker given to bad data that ensures that models perform poorly. A lot of times, business groups have a hard time explaining what they would like to see – both in terms of data and visualization. In such cases, a prototype makes things easier from a requirements gathering standpoint. Once the problem is defined, the data scientist/modeler identifies the raw data sources (both internal and external) which are needed for the execution of the business challenge. They spend a lot of time in the process of collating the data (from Oracle/SQL Server, DB2, Mainframes, Greenplum, Excel sheets, external datasets, etc.). The cleanup/data-wrangling process includes fixing and standardizing missing value representations, identifying potentially corrupted data elements, formatting fields that indicate time and date in a consistent manner, etc.
#3 Infrastructure & IT Self Service Across Environments, Platforms and Tools
This one is huge. The traditional IT model of hardware acquisition and vetting is typically drawn out as a process. Even with public cloud, onerous security controls are sometime added to infrastructure which delay the Data Science team’s ability to develop their models in an agile manner. The dreaded term Shadow IT (where business & data science teams go around the IT team to procure compute and storage on the public cloud) is not just an issue with infrastructure software but is slowly creeping up to business intelligence and advanced analytics apps. The delays associated with provisioning legacy data silos combined with using tools that are neither intuitive nor able to scale to deal with the increasing data deluge are making timely business analysis almost impossible to perform. Insights delivered too late are not very valuable. Data Scientists dearly desire that the environments that they need for development and testing are made available as soon as possible and ideally via a self service user interface. This calls for IT investments in Cloud computing platforms that enable agility and speedy provisioning of dev/test environments across compute, network and storage.
#4 – Collaboration with IT around the DS development lifecycle
Organizations typically have well established development methodologies and processes. Currently most data science development and traditional application development happen in two distinct tracks. Software development typically follows a Agile/DevOps process (a combination of Scrum/XP). The development lifecycle is divided into several stages with each producing a working deliverable at the end. The deliverables are incrementally updated to arrive at an acceptable product at the end which is then deployed for customer use. In this model, team members typically follow a defined role.
The Data Science development cycle is different. Data scientists/modelers are given a certain business problem to solve. They proceed to find the appropriate data they need, pull it into Hadoop or a Data Warehouse, wrangle it, try various algorithms to create the best possible models, test the models, and ensure that they perform well for the problem at hand. If they get more data during the process, they will go back and retest the whole process. The issue is that IT needs to partner with and collaborate with the Data Science team to first strategize and then help provision different environments (dev, test, prod) to enable data scientists to do iterative model development. They then need to help the Data Science team deploy these models in the appropriate deployment architecture.
#5 – Help Improve the Data Science User Experience
Using traditional app dev methodologies, it can take months to design, test and deploy software – which is simply unsustainable. One of chief goals of the DevOps model is to close the long-standing gap between the engineers who develop and test IT capability and business requirements for such capabilities. Accordingly the data science teams need best practice recommendations on using IDEs that support iterative model development & debugging. It is important that these development tools support programming languages such as R and Python – the most common go-to languages for data science – to rapidly develop code. It is critical that the IT group partner with the Data Scientists to enable these capabilities both from a development and a deployment standpoint.
#6 – Model Deployment
The data wrangling phase involves writing code to be able to join various data sets so that a single complete dataset can be created from a raw features standpoint. If more data is obtained as the development cycle is underway, the Data Science team has no option but to go back and redo the whole process. Once the raw features are gathered, feature engineering can begin to create predictive features from the raw data, taking into account business concepts. The modeling phase is where the choice of algorithms comes into play. A Data Scientist takes the raw & engineered features and creates models using the most appropriate algorithms for the task. After the models have been repeatedly tested for accuracy and performance, the best one is typically deployed for use. Once the models have been developed it is critical to ensure that these can be deployed rapidly, run automatically, and changed as per business requirements and performance. How and where these models will get deployed depends on the business case, ideally they should be deployed as a service. Models as a Service (MaaS) is the Data Science counterpart to Software as a Service. The MaaS takes in business variables (often hundreds of inputs) and provides as output business decisions/intelligence, measurements, and visualizations that augment decision support systems. IT help is needed to ensure that the models can scale as customer usage of these Digital Platforms increases.
#7 Model Governance and Management
There needs to be appropriate checks put into place to allow for the monitoring and maintenance of the models once in production. Model versioning must be handled so that customers aren’t affected during a maintenance cycle – old models must still function while the new ones are being put into place. And by keeping a check on the performance of models in production, the IT team can tell when a model stops performing optimally & to call on the Data Science team to check on why.
#8 Security and Compliance
How are security constraints around different environments managed? Though IT maintains control over the vast domain of tools and environments in any organization, the Data Science team must maintain control of the models. Any random person updating the models could lead to performance degradations. This separation of concerns is akin to DB security over schemas/tables/columns – only certain individuals should be granted access to perform certain operations for the most optimal results.
#9 – Delivering Results to Business Users –
Once the model has been deployed the results need to be made available to business users. Depending on the application, model results might need to be served up in near-real-time, every day/week/month/year, ad-hoc on demand, or any other time frame in-between. Organizations need to deal with providing appropriate tools (e.g. apps, sandboxes, etc.) to enable end users to explore the results of the analysis, and to perform intelligent visualization of the data. Visualizations include trend analysis over time, KPIs, list of interesting customers/accounts, etc.
Digital applications will continue to incorporate Data Science at an increasing scale. However, traditional IT Departments need to collaborate in the above specific areas to ensure that the algorithms developed for specific business issues are effective, forward looking and scalable.
“When models turn on, brains turn off.” – Dr. Til Schuermann, Formerly Research Officer in the Banking Studies function at the Federal Reserve Bank of New York.Currently Partner at Oliver Wyman & Company.
There exist two primary reasons for Enterprises such as Banks, Insurers, Payment Providers and FinTechs to pursue best in class Risk Management Processes and Platforms. The first need is compliance driven by various regulatory reporting mandates such as the Basel Reporting Requirements, the FRTB, the Dodd‐Frank Act, Solvency II, CCAR and CAT/MiFiD II in the United States & the EU. The second reason is the need to drive top-line sales growth leveraging using Digital technology. This post advocates the implementation of Digital Technology on Risk Management across both the areas.
Recapping the Goals of Regulatory Reform..
There are many kinds of Risk, ranging from the three keystone kinds – Credit, Market and Operational to the Basel-II.5/III accords, FRTB, Dodd Frank etc. The best enterprises not only manage Risk well but they also turn it into a source of competitive advantage. Leading banks have recognized this and according to McKinsey forecasts, while risk-operational processes such as credit administration today account for the majority of the some (50 percent) of the Risk function’s staff, and analytics just 15 percent, by 2025 those figures will be around 25 percent and 40 percent respectively. 
Whatever be the kind of Risk, certain themes are common from a regulatory intention standpoint-
Limiting risks that may cause wider harm to the economy by restricting certain activities such as preventing banks with retail operations from engaging in proprietary trading activities
Requiring that banks increase the amount of and quality of capital held on reserve to back their assets and by requiring higher liquidity positions
Ensuring that banks put in place appropriate governance standards ensuring that boards and management interact not just internally but also with regulators and their clients
Upgrading governance standards, enabling a fundamental change in bank governance and the way boards interact with both management and regulators. These ambitions were expressed in various new post‐crisis rules and approaches.
Tackle the “too big to fail” challenge for highly complex businesses spanning multiple geographies, product lines and multifaceted customer segments. Accurate risk reporting ensures adequate capital conservation buffers.
Beyond the standard models used for Risk regulatory reporting, Banks & FinTechs are pushing the uses of risk modeling to new areas such as retail lending, SME lending. Since the crisis of 2008, new entrants have begun offering alternatives to traditional financial services in areas such as payments, mortgage loans, cryptocurrency, crowdfunding, alternative lending, and Investment management. The innovative use of Risk analytics lies at the core of the FinTechs success.
Across these areas, risk models are being leveraged in diverse areas such as marketing analytics to gain customers, defend against competition etc. For instance, realtime analytic tools are also being used to improve the credit granting processes. The intention is to gain increased acceptance by pre-approving qualified customers quickly without the manual intervention that can cause weeks of delays. Again, according to McKinsey, the goals of leading Banks are to approve up to 90 percent of consumer loans in seconds, generate efficiencies of 50 percent leading to revenue increases of 5 to 10 percent. Thus, leading institutions are using Risk Analytics to rethink their business models and to expand their product portfolios. 
Over the last two years, this blog has extensively covered areas such as cyber security, fraud detection, anti money laundering (AML) etc from a data analytics standpoint. The industry has treated Risk as yet another defensive function but over the next 10 years, it is expected that the Risk function will be an integral part of all of these above areas thus driving business revenue growth & detecting financial fraud, crimes. There is no doubt that Risk is a true cross cutting concern across a range of business functions & not just the traditional Credit, Market, Liquidity and Operational silos. Risk strategy needs to be a priority at the highest levels of an organization.
The Challenges with Current Industry Risk Architectures..
Almost an year ago, we discussed these technology issues in the below blogpost. To recap – most industry players have a mishmash of organically developed & shrink wrapped IT systems. These platforms run critical Core Banking Applications to Trade Lifecycle to Securities Settlement to Financial Reporting etc. Each of these systems operates in an application, workflow, data silo with it’s own view of the enterprise. These are all kept in sync largely via data replication & stove piped process integration. Further siloed risk functions ensure that different risk reporting applications are developed using duplicative technology paradigms causing massive IT spend. Further, the preponderance of complex vendor supplied systems ensures lengthy release cycles and complex data center deployment requirements.
A Roadmap for Digitization of Risk Architectures..
The end state or how a Digital Risk function will look like will vary for every institution embarking on this journey. There are six foundational elements we can still point out a few guideposts based on the .
#1 Automate Back & Mid Office Processes Across Risk and Compliance –
As discussed, Many business processes across the front, mid and back office involve risk management. These processes range from risk data aggregation, customer on boarding, loan approvals, regulatory compliance (AML,KYC, CRS & FATCA), enterprise financial reporting& Cyber Security.It is critical to move all and any manual steps from these business functions to a highly automated model. Doing so will not only reduce operational costs in a huge way but also demonstrate substantial auditability capabilities to regulatory authorities.
#2 Design Risk Architectures to handle Real time Data Feeds –
A critical component of Digital Risk is the need to incorporate real time data feeds across Risk applications. While Risk algorithms have traditionally dealt with historical data, new regulations such as FRTB explicitly call for various time horizons. These imply that Banksto run a full spectrum of analytics across many buckets on data seeded from real time interactions. While the focus has been on the overall quality and auditability of data, the real time requirement is critical as one moves from front office applications such as customer on boarding, loan qualifications & pre-approvals tokey areas such as market, credit and liquidity risks. Why is this critical? We have discussed the need for real time decision making insights for business leaders. Understanding risk exposures and performing root cause analysis in real time is a huge business capability for any Digital Enterprise.
#3 Experiment with Advanced Analytics and Machine Learning–
In response to real time risk reporting, the analytics themselves will be begin to get considerably more complex. This technology complexity will only be made more difficult with multiple teams working on all of these areas. This calls out for standardization of the calculations themselves across the firm. This also implies that from an analytics standpoint, a large number of scenarios on a large volume of data.For Risk to become truly a digital practice, the innovative uses of Data Science across areas such as customer segmentation, fraud detection, social graph analysis must all make their way into risk management. Insurance companies and Banks are already deploying self learning algorithms in applications that deal with credit underwriting, employee surveillance and fraud detection. Wealth Managers are deploying these in automated investment advisory.Thus, machine learning will support critical risk influenced areas such as Loan Underwriting, Credit Analytics, Single view of risk etc. All of these areas will need to leverage predictive modeling leading to better business decisions across the board.
#4 Technology Led Cross Organization Collaboration –
McKinsey predicts  that in the coming five to ten years, different regulatory ratios such as capital, funding, leverage, total loss-absorbing capacity etc will drivethe composition of the balance sheet to support profitability. Thus the risk function will work with finance and strategy functions to help optimize the enterprise balance sheet across various economic scenarios and then provide executives with strategic choices (e.g. increase or shrink a loan portfolio, for example), and likely regulatory impacts across these scenarios. Leveraging analytical optimization tools, an improvement on return on equity (ROE) by anywhere between 50 and 400 basis points has been forecasted.
The Value Drivers in Digitization of Risk Architectures..
McKinsey contends that the automation of credit processes and the digitization of the key steps in the credit value chain can yield cost savings of up to 50 percent. The benefits of digitizing credit risk go well beyond even these improvements. Digitization can also protect bank revenue, potentially reducing leakage by 5 to 10 percent. 
To give an example, by putting in place real-time credit decision making in the front line, banks reduce the risk of losing creditworthy clients to competitors as a result of slow approval processes. Additionally, banks can generate credit leads by integrating into their suite of products new digital offerings from third parties and Fintech’s, such as unsecured lending platforms for business. Finally, credit risk costs can be further reduced through the integration of new data sources and the application of advanced-analytics techniques. These improvements generate richer insights for better risk decisions and ensure more effective and forward-looking credit risk monitoring. The use of machine-learning techniques, for example, can help banks improve the predictability of credit early-warning systems by up to 25 percent .
The Questions to Ask at the Start of Risk Transformation..
There are three questions at this phase every Enterprise needs to ask at the outset –
What customer focused business capabilities can be enabled across the organization by incorporating an understanding of the various kinds of Risk ?
What aspects of this Risk transformation can be enabled by digital technology? Where are the current organizational and technology gaps that inhibit innovation?
How do we measure ROI and Business success across these projects before and after the introduction of ? How do we benchmark ourselves from a granular process standpoint against the leaders?
As the above makes it clear, traditional legacy based approaches to risk data management reporting do not lend themselves well to managing your business effectively. When things are going well it has become very difficult for executives and regulators to get a good handle on how the business is functioning. In the worst of times, the risk function can fail to function well as models do not perform effectively. It is not enough to take an incremental approach to improving current analytics approaches. The need of the hour is to incorporate the state of the art data management and analytic approaches based on Big Data, Machine Learning and Artificial Intelligence.
The second and previous blog in this six part series (@ http://www.vamsitalkstech.com/?p=4670) discussed technical challenges with running large scale Digital Applications on traditional datacenter architectures. In this third blog, we will deep dive into another important ecosystem platform – Apache Mesos, a project that aims to abstract away various system resources – CPU, memory, network and disk resources to provide consuming digital applications with a giant cluster from which they can utilize capacity – a key requirement of the Software Defined Datacenter (SDDC). The next blogpost will deep dive into Linux Containers & Docker.
Introduction and the need for Apache Mesos..
This blog has from time to time discussed how Digital applications are a diverse blend of several different and broad technology paradigms – Big Data, Intelligent Middleware, Messaging, Business Process Management, Data Science et al.
To that end almost every Enterprise Datacenter supporting Digital workloads typically has clusters of multi-varied applications installed. Most traditional datacenters have used either physical or virtual machines (VMs) as the primary runtime unit to run such applications. These VMs are typically provisioned based on application asks and have applications deployed onto them. These VMs then are formed into logical clusters which are essentially a series of machines serving a given business application in an n-tier architecture.
As load increases on these servers, more VMs are provisioned into the cluster and so on. The challenge with this traditional model is that it is fairly static in nature in the sense that machines are preallocated to run certain kinds of workloads (databases, webservers, developer servers etc). The challenge with Digital and Cloud Native applications are that scaling needs to happen dynamically and applications think of the infrastructure as being infinite. These applications present various challenges and headaches that call for the Datacenter to be software defined as we discussed in the last blog below. We will continue our look at the SDDC by considering one of the important projects in this landscape – Apache Mesos.
Apache Mesos is a project that was developed at the University of California at Berkeley circa 2009. While it was initially created to solve the challenge of provisioning and scaling Spark clusters, the Mesos project evolved to become a centralized cluster manager. The central idea of Mesos is to pool together all the physical resources of the cluster and making it available as a single reservoir of highly available resources for different applications (or frameworks) to consume. Over time, Mesos has begun supporting complex n-tier application platforms that leverage capabilities such as Hadoop, Middleware, Jenkins, Kafka, Spark, Machine Learning etc.
As with almost all innovative Cloud & Big Data projects, the adoption of Apache Mesos has primarily been in the web scale arena. Prominent users include highly technical engineering shops such as Twitter, Netflix, Airbnb, Uber, eBay, Yelp and Apple. However, there seems to be early adopter activity with increased acceptance in the Fortune 100. For instance, Verizon signed on in 2015 to use a Mesosphere DC/OS (based on Apache Mesos) for datacenter orchestration.
The Many Definitions of Mesos..
At it’s simplest, Mesos is an Open Source Cluster Manager. What does that mean? Mesos can be described as a cluster manger because it ensures that datacenter hardware resources are managed and advantageously shared among multiple distributed technologies – Big Data, Message Oriented Middleware, Application Servers, Mobile apps etc. Mesos also enables applications to scale with a high degree of resiliency, without having to bother about details of the underlying infrastructure.
The model of resource allocation followed by Mesos allows a range of constituents sys-admins, developers & DevOps teams to request resources (CPU, RAM, Storage) from a cloud provider.
Mesos has alternatively been described as a Datacenter Kernel as it provides a single unified view of node resources to software frameworks that wish to consume them via APIs. Mesos performs the role of an Intelligent global level scheduler that can match a massive pool of hardware resources to distributed applications that want to consume these resources. Mesos aggregates all the resources into a large virtual pool using not just virtual machines and containers but primitives such as CPU, I/O and RAM. It also breaks applications into small units that can be assigned across this pool. Mesos also provides APIs in multiple languages to allow applications to be built for it. Apache Spark, the most popular data processing engine, was built originally as a Mesos framework.
It is also called a Data Center Operating System (DCOS) as it performs a similar role to the operating system. Any application that can run on Linux runs on Mesos.
To illustrate how Mesos works. Consider two clusters in a datacenter – Cluster A and Cluster B. Cluster A has 8 nodes with each node/server possessing 4 CPUs and 64 GB RAM; Cluster B has 5 nodes with each node/server having with 4 CPUs and 64 GB RAM. Mesos can essentially combine both these clusters into one virtual cluster of 52 CPUs and 832 GB RAM. The advantage of this approach is that cluster usage is greatly improved because applications share resources much more efficiently.
Mesos and Cloud Native Applications..
We discussed the differences between Cloud Native and legacy applications in the previous post @ http://www.vamsitalkstech.com/?p=4670. Mesos has been impactful when running stateless Cloud Native applications as opposed to running traditional applications which are built on a stateful/ vertical scaling paradigm. While the defining features of Cloud Native applications are worthy of a dedicated blogpost, these applications can scale to handle massive & increasing amounts of load while tolerating any failure without impacting service. These applications are also intrinsically distributed in nature and are typically composed of loosely coupled microservices. Examples include – stateless web applications running on a Platform as a Service (PaaS), CI/CD applications working on Jenkins, NoSQL databases like HBase, Cassandra, Couchbase and MongoDB. Stateful applications that persist data using a RDBMS to disk aren’t good workloads for Mesos as yet.
When Cloud Native Digital applications are run on Mesos, several of the headaches encountered in running these on legacy datacenters are ameliorated, namely –
Clusters can be dynamically provisioned by Mesos based on demand spikes
Location independence for microservices
As it matures, Mesos has also began supporting multi datacenter deployments with web scale shops like Uber running Cassandra as a framework across datacenters at scale. In the case of Uber, each datacenter has it’s own Mesos cluster with independent frameworks that exchange information periodically. The Cassandra database includes a seed node that bootstraps the gossip process for new nodes joining the cluster. A custom seed provider was created to launch Cassandra nodes which allows new nodes to be rolled out automatically into the Mesos cluster in each datacenter. (Credit – Abhishek Verma – Uber)
There are three main architectural primitives in Mesos – Master, Slave, Frameworks. The central orchestrator in the Mesos system is called a Master and the worker processes are called Slaves.
As depicted below, the Master process manages the overall cluster and delegates tasks to the slaves based on the resources requested by Frameworks.
The core Mesos process is installed on all nodes and their personality is given at runtime. The Slaves run application workloads that are requested by appropriate frameworks. This overall setup of Master and Slave daemons makes up a Mesos cluster.
Frameworks which are commonly called Mesos applications and are composed of three main components. First off, they have a scheduler which registers with the Master to receive resource offers and then executors which launch workloads or tasks on the slaves. The Resource offers are a simple list of a slave’s available capacity – CPU and Memory. The Master receives these offers from the slaves and then provides them to the frameworks. A task can be anything really – a simple script or a command, or a MapReduce job or an initialization of a Jetty/Tomcat/JBOSS AS etc.
The Mesos executor is a process on the Slave that runs tasks. The executor is a program or command on the slaves which runs the tasks. No matter which isolation module is used, the executor packages all resources and runs the task on the slave node. When the task is complete, the containers are destroyed and the Slaves resources are released back to the Master.
For Master HA, you can run multiple masters with only one Active at a given point communicating with the slave nodes. Once the Hot Master fails, Apache Zookeeper is used to manage leader election to a standby Master as depicted. Master quorum is a minimum of 3 nodes but most production deployments are recommended to have 5 Master nodes. Once a new Master is elected, all of the cluster/slave and framework information is submitted to the new Master by the frameworks so that state before failure can be reconstructed. Mesos has elaborate recovery processes for the frameworks, the schedulers and the Slave nodes.
By some measures, Mesos is a very straightforward concept. Frameworks need to run tasks and they are traffic managed by Masters which coordinate tasks on worker machines called – Slaves.
From a production deployment standpoint, the following components are required – An odd number of Mesos Masters, Many Slave machines needed to run applications, a Zookeeper ensemble for HA configurations and an optional Docker engine running on each slave.
The Mesos Resource Allocation Process..
Mesos follows a default resource scheduling model known as two-tier scheduling. This model may seem a little convoluted but it is important to keep in mind that it was designed to satisfy the requirements & constraints of many different frameworks without having to know details of each.
The Master’s allocation module receives resource offers from slaves which then forwards them on to the framework schedulers. These offers are not just high level in terms of the resources but also how much of these resources to offer. The framework schedulers can accept or reject the Master’s offers based on their current capacity requirements. The Master’s allocation module is customizable based on specific requirements that implementing enterprises may have. The default allocation algorithm is known as Dominant Resource Fairness (DRF) and is based on fair sharing of cluster resources among requesting applications. For instance, DRF ensures that requests are equalized i.e CPU hungry applications are provided a higher share of CPU heavy resources & Memory intensive applications are provided the same fractional amount of RAM.
To better illustrate the resource allocation method in Mesos, let us discuss the sequence of events in the above figure from the Apache Mesos documentation
The Slave Node – as depicted, Agent 1 can offer reports to4 CPUs and 4 GB of memory for allocation to any framework that can use it. It reports this available capacity to the master. The allocation policy module offers framework 1 these resources.
The Master sends a resource offer describing what is available on agent 1 to framework 1.
The Framework’s scheduler then provides the master withmore information on the two tasks to run on the agent, using <2 CPUs, 1 GB RAM> for the first task, and <1 CPUs, 2 GB RAM> for the second task.
The master sends the tasks to the agent, which allocates appropriate resources to the framework’s executor, which in turn launches the two tasks (depicted with dotted-line borders in the figure). Because 1 CPU and 1 GB of RAM are still unallocated, the allocation module may now offer them to framework 2.
Mesos integration with other SDDC components – Linux Containers, Docker, OpenStack, Kubernetes etc
As with other platforms we are discussing in this series, Mesos does not stand alone in the SDDC and leverages other technologies as needed and as discussed in the last post (@ http://www.vamsitalkstech.com/?p=4670). However it needs to be stated that Mesos does have overlapping functionality at times with technologies such as Kubernetes and OpenStack.
However, let us consider the integration points between these technologies –
Linux Containers-Over the last few years, linux containers have emerged as a viable and lightweight alternative to hypervisors as way of running multiple applications on a given OS. Different containers share one underlying OS and perform with less overhead than virtual machines. Given that one of the chief goals of Mesos is to run multiple frameworks on the same set of hardware, Mesos implements what are called isolation modules and isolation mechanisms to achieve its goal of multi-tenency for different applications running on the same hardware. Mesos supports popular technologies for process isolation – cgroups, Solaris Zones, Docker containers. The first two are the default but the Mesos project has recently added Docker as an isolation mechanism.
Schedulers – There is no single widely accepted definition as to what constitutes a Container Orchestration technology. The tooling to achieve this has become one of the trickiest parts of launching containers at scale discussion with multiple projects attempting to capture this market. The requirement in the case of Mesos is straightforward – frameworks constitute applications which need to make the the most efficient use of hardware. This means avoiding the overhead of VMs and leveraging containers – cgroups or Docker or Rocket etc. Hence Mesos needs to be able to support container orchestration as a core feature. Mesos follows a pluggable model for container orchestration by supporting schedulers like Kubernetes or YARN or Marathon or Docker Swarm. All these tools provide service that organize containers into a clusters and running them on specified servers & overall lifecycle management and scheduling of applications running as containers. At large webscale properties, massive container oriented environments running hundreds of microservices are all being managed with this combination of tools using Mesos.Mesos needs to be able to start and stop services in response to failure conditions etc.
Private and Public Cloud Infrastructure as a Service (IaaS) Providers– Mesos works at a different layer of abstraction than a IaaS provider such as Openstack and aims to solve different problems. While OpenStack provides provisioned infrastructure across OS, Storage, Networking et, Mesos intends to achieve better cloud instance utilization. Mesos integrates well with Openstack and runs on top of resources offered up by Openstack to run frameworks on them. Mesos itself runs on a Linux instance on an existing OpenStack deployments though it also can simply run on bare metal as well. It simply requires to run a small Linux process on each of the nodes. Mesos is also significantly simpler than OpenStack and it only takes a few hrs if even to get it up and running.
Mesos has also been deployed on public cloud technology with both Microsoft Azure and Amazon AWS. Azure’s container services are built on Mesos. Netflix leverages Mesos extensively on their EC2 cloud and have also written an advanced scheduling library called Fenzo. Fenzo ensures that a first fit kind of assignment is followed where tasks are ‘bin packed’ onto Agents by the requested use of CPU, memory and network bandwidth. Fenzo also autoscales cluster usage based on demand and also spreads tasks of a given job across EC2 availability zones for high availability. With the stage set from a technology standpoint, let us look over at a few real world use cases where Mesos has been deployed in mission critical applications at various Netflix.
Mesos Deployment @ Netflix..
Netflix are one of the largest adopters and contributors to Mesos and they use it across a wide variety of business capabilities. These use cases include real time anomaly detection, data science lifecycle (training and model building batch jobs, machine learning orchestration), and other business applications. These workloads span a range of technical architectures- batch processing, stream processing and running microservices based applications.
Netflix runs their business applications as a collection of microservices deployed on Amazon EC2 and their first use of Mesos was to perform fine grained resource allocation for compute tasks to gain greater unit efficiency on EC2. The first use case for Mesos at large enterprises is typically around increasing the usage and efficiency of elastic cloud services. In Netflix’s case, they needed the cluster scheduler to increase both agent ephemerality as well as autoscale agents based on demand.
Major Application Use Cases –
Mantis – Netflix deals with a lot of operational data that is constantly streaming in to their environment. They have a range of use cases on streaming data such as real-time dashboarding, alerting, anomaly detection, metric generation, and ad-hoc interactive exploration of streaming data. With this Mantis is a reactive stream processing platform that is deployed as a cloud native service which focuses on operational data streams. The other goal of Mantis is to make it easy for different development teams to obtain access to real time events and then to build applications on them. The current throughput of Mantis is around 8 million events per second and Apache Mesos is running hundreds of stream-processing jobs around the clock. For certain kinds of streaming applications, this amounts to tracking millions of unique combinations of data all the time.
As mentioned above, Netflix runs their Application services stack on Amazon EC2 and most workloads run on linux containers. Netflix created Titus to create a container management platform and to provision Docker containers on EC2. Netflix had to do this as Amazon ECS was not upto par yet as a container orchestration solution for EC2. The use cases supported by Titus include serving batch jobs which help with algorithm training (similar titles for recommendations, A/B test cell analysis, etc.) as well as hourly ad-hoc reporting and analysis jobs. Titus recently added support for service style invocation for Netflix resources that are used to provide consistent development environments and more fine grained resource management.
Meson – One of the most important capabilities that Netflix possess is its uncanny ability to predict what movies and shows that its subscribers want to watch based on their previous watching history and similar segmentation data. Netflix excels at personalizing video recommendations and this capability is powered by machine learning algorithms. To ensure that a very large number of machine learning workflow pipelines can be efficiently created, scheduled and managed – Netflix created Meson on top of Apache Mesos. It is critical that for this system to scale and for the algorithms themselves to be fast, reliable and efficient, these pipelines are run over a large cluster of Amazon AWS instances. As depicted below, Meson manages a large number of jobs with differing CPU, Memory and Disk requirements. Once the slaves/agents are chosen, Spark jobs are run on these shared clusters. Meson uses Linux cgroups based isolation. All of the resource scheduling is handled via Fenzo (described above)
Apache Mesos is a promising new technology which attempts to solve scaling and clustering challenges encountered in the Software Defined Datacenter (SDDC). The biggest benefits of using Mesos are more efficient use of infrastructure across complex applications with native support for multitenant applications. Mesos can ensure that multiple kinds of applications or frameworks can share a given set of nodes. This ensures not just more efficient sharing of hardware but also fault tolerance and load balancing for complex Cloud Native applications.
While, Mesos has had a good degree of adoption in the webscale properties where it was first created (Twitter, Netflix, Uber, Airbnb etc to name the most prominent), it still needs to be proven as a dependable and robust platform in the datacenter.
The next post in this series will explore another exciting technology Docker, the emerging standard in the Linux container space.
“Our environment embraces a lot of change — we have to, because the internet is changing and the technologies we use are changing… for somebody who hated change, I imagine high tech would be a pretty bad career. It would be very tough. There are much more stable industries and they should probably choose one of those more stable industries with less change. They’ll probably be happier there.” – Jeff Bezos Chairman and CEO – Amazon, May-2016
The retail carnage continued over the last month with more household names such as Macy’s, Michael Kors, JC Penney, Abercrombie & Fitch et al announcing store closures. Long term management teams are also departing at struggling Retailers – who are unable to make the digital cut. It is clear now that players that are primarily brick and mortar need to urgently reinvent themselves via Digital Innovation. This is easier said than effectively done as delivering an transformative customer experience in such a highly competitive industry requires a cultural ability embrace change and to thrive in it. From a core technology standpoint, the industry’s digital divide between leaders and laggards manifests itself across four high level dimensions – Cloud Computing, Big Data, Predictive Analytics & Business Culture. Investments in these areas are needed to improve customer value drivers – Increased Consumer Choice, Better Pricing, Frictionless Shopping & Checkout, Ease of Payments, Speedy Order Fulfillment and Operations. In this blogpost, we will discuss a transformation framework for legacy retailers across both of these dimensions – business and technology.
Digital Reinvention in Retail..
We’re taking a look at the reasons for the Storefront pullback in Retail industry. For those catching up on the business background, please read the first post below.
We have seen how an ever increasing percentage of global retail sales are gradually moving online. It then follows naturally that a business model primarily focused onDigital e-commerce capabilities is a must have in the industry. Amazon is setting new records for online sales – selling an increased online catalog of products (including furniture), generating record revenues from other seemingly unrelated areas of it’s business (e.g AWS, Alexa, Echo). The ability of Amazon to continue generating cash is critically important as it increases it’s financial ability to compete with incumbent retailers such as Walmart. According to a research report by the ISLR , as of end of 2016, Amazon had a market cap twice that of it’s biggest competitor – Walmart – even though it (Amazon) only reported around $1 billion in profits over a financial reporting period. Walmart generated about $80 billion around the same five year timeframe. Amazon plows everything back into growing it’s diverse businesses and aims to grow market share at the expense of quarterly profits.
Amazon is the envy of every retailer out there with it’s mammoth sales of almost $80 billion in 2016. Walmart and Apple are a distant #2 and #3 with sales of $13.5 billion and $12 billion respectively .
Why has Amazon been and will continue to be successful?
I should confess that I am an Amazon fan going back several years. Please find a backgrounder on their business strategy written over two years ago.
Amazon has largely been successful for a few important reasons. The biggest reason being that using technology, it has completely rewritten the rules of Retail across the key processes – Consumer Choice, Frictionless Shopping & Checkout, Ease of Payments, Fulfillment and Innovation.
Consider the following –
Platforms and Ecosystems – Amazon has built platforms that serve a host of areas in retail – ranging from books to online video to groceries to video games to virtually any kind of e-commerce. Across all of these platforms, it has constantly invested in business strategies that offer a superior customer experience from choice to fulfillment. Be it in free shipping (Amazon Prime) to innovating on drone based delivery.
Constant Platform Innovation -Amazon has given its customers the largest (and ever growing catalog) of products to choose from, instant 1 – click ordering, rapid delivery via Amazon Prime and online marketplaces for sellers etc. By diversifying into areas like streaming video (via Amazon Prime Video), it is also turning into a content producer.With the launch of storefronts such as Amazon Go, it is striving to provide a seamless multichannel (digital and physical) experience so consumers can move effortlessly from one channel to another. For example, many shoppers use smartphones to reserve a product online and pick it up in a store.
Always Push the Envelope on Advanced Technology -Digital product innovation implies an ability to create new products and services that meet changing customer demands. Such capability implies an ability to bring products to market faster, and then to refine such approaches based on customer feedback. Amazon supplements every platform it builds with ecosystems based on advanced technology. For instance, customers can now try any of the hundreds of voice control apps being built on Amazon Alexa to order products across Amazon’s huge catalog. In April 2017, Amazon launched Echo Look which is a Digital Assistant which can perform various functions – order a ride via Uber, order pizza from Domino’s etc. But in addition to just obeying commands and reading out news etc, it also comes with a camera that can take pictures and perform advanced image recognition. Owners can even try a few outfits, upload them to the device and the Style Check function will tell you which combination looks best. 
Create a Data Driven Customer Experience -Using data in a way that improves the efficiencies in back end supply chains as well as creating micro opportunities to influence every customer interaction. Amazon leverages big data and advanced analytics to better understand customer behavior. For example, gaining insight into customers’ buying habits—with their consent, of course—can lead to an improved customer experience and increased sales through more effective bundling.
Streamlined Operations – Finally, Amazon’s operations are the envy of the retail world.With Amazon Web Services (AWS), Amazon is the Public Cloud leader. Amazon has constantly proved it’s capabilities in automating operations and digitizing business processes using robotic process automation. This is important because it enables quicker shipping times to customers while cutting operating waste and costs. As an example, in the earnings call in 2015, their CFO touted Amazon’s use of robotics in its large warehouses to lower costs. “We’re using software and algorithms to make decisions rather than people, which we think is more efficient and scales better,”
Again, according to the ILSR , “Today, half of all U.S. households are subscribed to the membership program Amazon Prime, half of all online shopping searches start directly on Amazon, and Amazon captures nearly one in every two dollars that Americans spend online. Amazon sells more books, toys, and by next year, apparel and consumer electronics than any retailer online or off, and is investing heavily in its grocery business.”
Retail Transformation Roadmap
Outlined below are the four prongs of a progressive strategy that Retailers can adopt to survive and thrive in today’s competitive marketplace. Needless to say, the theme across these strategies is Amazon-lite i.e leverage Digital technologies to create an immersive & convenient cross channel customer experience.
Step #1 Develop a (Customer Focused) Digital Strategy..
This is a two pronged phase. Firstly, Customers across age groups are using a variety of channels such as mobile phones, apps, in-store kiosks, tablets etc to purchase products. Secondly, despite all of the attractive features around convenient ordering, frictionless payments and ease of delivery – the primary factor driving purchase in certain channels is price. Thus, it is key to identify the critical focus channels, customer segments based on loyalty & other historical data, their willingness to pay based on the channel and product mix in defining the overall digital strategy. Once defined, at the board level, key metrics which drive top line growth need to be identified.
It is important to understand that brick and mortar sales will continue to lead for a long time. Thus, investing in highly efficient store layouts, performing customer traffic analysis and in-store mapping of applications etc is highly called for. Brick and Mortar retailers have a significant ability to drive higher customer foot traffic based on their ability to deliver in-store pickup after online ordering etc. These are advantages that need to be leveraged.
There are four questions at this phase every Retailer needs to ask at the outset –
What customer focused business capabilities can be enabled across the organization?
What aspects of these transformation can be enabled best by digital technology? Where are the current organizational and technology gaps that inhibit innovation?
How do we measure ROI and Business success across these projects? How do we benchmark ourselves from a granular process standpoint against the leaders?
Step #2 Accelerate Investments in New Technology..
The need of the hour for legacy Retail IT is to implement such flexible digital platforms that are built around efficient use of data , real time insights and predictive analytics. Leaders are driving platforms based on not just Big Data Analytics but are also adopting Deep Learning. Examples include Digital Assistants such as Chatbots, Mobile applications that can perform image recognition etc. What business capabilities can be driven from this? Tons.
The ability to offer quicker product tests and to modify them per customer feedback is fast becoming the norm. Depending on the segment of Retail you operate across (e.g. Apparel), the need is to make customer tryouts more convenient with the aid of both technology and humans – knowledgeable sales associates.
Further, an ability to mine customer, supplier and partner data implies the ability to offer customers relevant products, a wide ranging catalog of products, promotions & coupons and other complementary products such as store/private label credit cards.
It is no longer sufficient to just perform brand studies that only understand historical customer transaction and behavior history. There is a strong degree of correlation between customer purchases and products recommended across their social networks – such as Facebook, Pinterest and Twitter etc. Such advanced analytics are needed to drive product development, promotions and advertising.
Retail as an industry continues to present interesting paradoxes in 2017. Traditional retailers continue to suffer with store closings while online entrants with their new approaches continue to thrive by taking market share away from the incumbents. The ability to adopt a Digital mindset and to offer technology platforms that enhance customer experiences will largely determine survival.
The first blog in this seven part series (@ http://www.vamsitalkstech.com/?p=1833) introduced and discussed a reference architecture for Software Defined Data Centers (SDDC).The key runtime technology paradigm that enables Digital applications is the agility in the underlying datacenter infrastructure. Using a SDDC approach, complex underlying infrastructure (primarily Compute, Storage and Network) is abstracted away from the applications running on them. This second blog post will discuss traditional datacenter challenges with running large scale Digital Applications.
Every Enterprise in the middle of Digital reinvention realizes that the transformation component is critically based on technology – a mix of Big Data, Cloud, IoT, Predictive Analytics etc. It is stark from the above that the traditional IT assets & the enterprise datacenter is in need of a substantial refresh. Systems that dominate the legacy landscape such as mainframes, midrange servers, proprietary storage systems are slowly being phased out in favor of Cloud platforms running commodity x86 servers with the Linux OS, Big Data workloads, Predictive Analytics etc.
Traditional datacenters were built with application specific workloads in mind with silos of monitoring tools whereas Digital implies a move to fluid applications with changing workload requirements and more unified monitoring across the different layers.
We have dwelt on how the Digital platforms are underpinned by Cloud, Big Data and Intelligent Middleware.
It comes as no surprise that according to Gartner Research, by 2020, the Software-Defined Datacenter (SDDC) will become the dominant architecture in at least 75 percent of global data centers. With the increasing adoption of APIs across the board and rapid increase in development of cloud-native digital applications using DevOps methodologies, the need for SDDC is only forecast to increase.
For those new to the concept of SDDC, attached is a link to the first blog in this series below where we discussed the overall technical concept along with a reference architecture.
For the last two decades, the vast majority of enterprise software applications created were based on monolithic architectures. These were typically created by dispersed teams who modeled the designs around organizational silos and the resulting inchoate patterns of communication. These applications were created by globally siloed developer teams who would then pass the deployment artifacts over to the operations team. These applications were then deployed in datacenters typically on high end servers using Vertical Scaling where multiple instances of an application are run on a few high end servers. As load increases on the application, adding more CPU, RAM etc to these server increased it’s ability scale.These applications were typically deployed, managed and updated in silos.
Thus, much of what exists in the data centers across enterprise are antiquated technology stacks. These range from proprietary hardware platforms to network devices & switches to monolithic applications running on them. Other challenges surrounding these systems include inflexible, proprietary integration & data architectures.
The vast majority of current workloads are focused around systems such as ERP and other back office applications. They are unsuited to running cloud native applications such as Digital Platforms which support large scale users and need real time insights around customer engagement.
Quite often these legacy applications have business & process logic tightly coupled with infrastructure code. This results in complex manual processes, monolithic applications, out of compliance systems with out of date on patch levels and tightly coupled systems integration. Some of these challenges have been termed – Technical Debt.
While it is critical for Datacenters to operate in a manner that maximizes their efficiency. They also need to manage costs from an infrastructure, power and cooling standpoint while ultimately delivering the right business outcomes for the organization.
IDC forecasts that by 2018, 50% of new datacenter infrastructure investments will be for systems of engagement, insight, and action rather than maintaining existing systems of record.
A great part of this transformation is also cultural. It is clear and apparent to me that the relationship lines of business (LOBs) have with their IT teams – typically central & shared – is completely broken at a majority of large organizations. Each side cannot seem to view either the perspective or the passions of the other. This dangerous dysfunction usually leads to multiple complaints from the business. Examples of which include –
IT is perceived to be glacially slow in providing infrastructure needed to launch new business initiatives or to amend existing ones. This leads to the phenomenon of ‘Shadow IT’ where business applications are run on public clouds bypassing internal IT
Something seems to be lost in translation while conveying requirements to different teams within IT
IT is too focused on technological capabilities – Virtualization, Middleware, Cloud, Containers, Hadoop et al without much emphasis on business value drivers
Rapid provisioning of IT resources is a huge bottleneck which frequently leads to lines of business adopting the public cloud to run their workloads. According to Rakesh Kumar, managing vice president at Gartner – “For over 40 years, data centers have pretty much been a staple of the IT ecosystem,Despite changes in technology for power and cooling, and changes in the design and build of these structures, their basic function and core requirements have, by and large, remained constant. These are centered on high levels of availability and redundancy, strong, well-documented processes to manage change, traditional vendor management and segmented organizational structures. This approach, however, is no longer appropriate for the digital world.” 
Further, Cloud-native applications are evolving into enterprise architectures built on granular microservices. Each microservice runs its own linux container. Thus, Digital architectures are evolving to highly standardized stacks that can scale “horizontally”. Horizontal Scaling refers to increasing the overall footprint of an application’s architecture by quickly adding more servers as opposed to increasing the capacity of existing servers.
The below illustration depicts the needs of a Digital datacenter as opposed to the traditional model.
The SDDC with it’s focus on software controlling commodity hardware enables a range of flexibility and cost savings that were simply not possible before. In the next section, we will consider what requirements Digital Applications impose on a traditional datacenter.
What Do Digital Applications Require From Data Center Infrastructure..
As one can see from the above, traditional approaches to architecting data centers do not scale well from both a technology and from a cost standpoint as far as Digital Applications are concerned. As the diagram below captures, there are five main datacenter challenges that are encountered while architecting and deploying large or medium scale digital applications.
#1– Digital Applications Need Fast Delivery of Complex, Multivaried Application Stacks –
Digital applications are a combination of several different technology disciplines – Big Data, Intelligent Middleware, Mobile applications etc. Thus, data centers will need to run clusters of multi-varied applications at scale. Depending on the scope – a given application will consist of web servers, application servers, Big Data processing clusters, message queues, business rules and process management engines et al.
In the typical datacenter configuration, servers follow a vertical scaling model which limits their ability to host multi tenant applications. This is so as they are not inherently multi tenant in that they cannot natively separate workloads of different kinds when they’re running on the same underlying hardware. The typical traditional approach to ameliorate this has been to invest in multiple sets of hardware (servers, storage arrays) to physically separate applications which resulting in increases in running cost, a higher personnel requirement and manual processes around system patch and maintenance etc.
#2– Digital Applications Need Real Time Monitoring & Capacity Management of complex Architectures–
Digital Applications also call for the highest degrees of Infrastructure and Application reliability across the stack. This implies not only a high level of monitoring but also seamless deployment of large scale applications across large clusters. Digital Applications are data intensive. Data flows into them from various sources in realtime for processing. These applications are subject to spikes in usage and as a result the underlying infrastructure hosting these can display issues with poor response times and availability.
Further, these applications are owned by combined teams of Developers and Operations. Owing to microservice architectures, the number of failure points also increase. Thus, Datacenter infrastructure is also shared between both teams with each area expected to understand the other discipline and even participate in it.
Traditional datacenters suffered from high capacity and low utilization rates. Capacity Management is critical across compute, network and storage. Sizing these resources (vCPU, vRAM, virtual Network etc) and dynamically managing their placement is a key requirement for digital application elasticity.
The other angle to this is the fact that Digital applications typically work on a chargeback model where Central IT needs to only charge the line of business for IT services consumed. This implies that IT can smartly manage capacity consumption on a real time basis using APIs. Thus, monitoring, capacity management and chargeback all need to be an integrated capability.
#3– Digital Applications Call for Dynamic Workload Scheduling –
The ability to provide policy driven application & workload scheduling is a key criteria for Digital Applications. These applications work best on a self service paradigm. The capability of leveraging APIs to reconfigure & re provision infrastructure resources dynamically based on application workload needs. For instance, most Digital applications leverage linux containers which need to be dynamically scheduled and migrated across different hosts. Digital Applications thus need to be fluid in terms of how they scale across multiple hosts.
#4– Digital Applications Need Speedy Automation Across the Layers –
We discussed how one of the critical differentiators for Digital Enterprise applications is the standardization of architectural stacks. Depending on the scale, size and complexity of applications – choices of web development frameworks, libraries, application servers, databases and Big Data stacks need to be whittled down to a manageable few. This increases dependencies for applications across the infrastructure. From a horizontal scalability perspective, thousands of instances of popular applications will need to run on large scale infrastructure. What is key is ensuring that a high degree of automation from a cloud system administration standpoint. Automation spans a variety of topics- lines of business self service, server automation, dynamic allocation of infrastructure, intelligent deployments, configuration of runtime elements using a template based approach, patching and workflow management.
#5– Seamless Operations and Deployment Management at Scale –
Traditional datacenters typically take weeks to months to deliver new applications. Digital Applications call for multiple weekly deployments and an ability to roll up or go down versions quickly. Application deployment and security patch management needs to include a range of use cases such as rolling deployments which ensure zero downtime, canary deployments to test functionality with a subset of users, sharded deployments et al. From an application maintenance standpoint, understanding where performance issues are occurring, such as delayed response times is of critical importance in ensuring customer satisfaction.
For instance, in the Retail industry, online shopping cart abandonment is as high as 70% when website response times are slow.
The lack of support for any of these operational features in Digital Applications can be fatal to user acceptance. And this can ultimately result a range of issues – increased CapEx and OpEx, high server to sysadmin ratios and unacceptably high downtimes.
In summary, the traditional datacenter is not a good fit for the new age Digital Platform.
The SDDC Technology Ecosystem
It is evident from the first post (@ http://www.vamsitalkstech.com/?p=1833) that Software Defined Datacenters have evolved into large & complex ecosystems dominated by open source technology.
It has become increasingly difficult for enterprise CXOs and IT leadership to identify which projects do what and how they all fit together.
I believe, the current SDDDC technology ecosystem could be broken down into four complementary categories –
Cloud Infrastructure – Includes IaaS providers (AWS, Azure, OpenStack etc) and Service Management Platforms such as ManageIQ
Provisioning & Configuration Management – Tools like Puppet, Ansible and Chef.
Serverless Infrastructure & DevOps – Includes a range of technologies but primarily PaaS providers such as OpenShift and CloudFoundry who use Linux containers (such as Docker, Rocket) as the basic runtime unit
Cloud Orchestration & Monitoring- Includes a range of projects such as ApacheMesos, Kubernetes
Readers will detect a distinct tilt in my thinking towards open source but it is generally accepted that open technology communities are the ones leading most of the innovation in this space – along with meaty contributions from public cloud providers especially Amazon and Google.
The Roadmap for the rest of the blogs in this series..
In this blog series, we use these highlight specific cloud projects that are leading market adoption in the above categories.
The third and next post in this series will deep dive into Apache Mesos.
Subsequent posts in this series will cover best of breed projects – Docker & Kubernetes, ManageIQ, OpenStack, OpenShift in that order. The final post will round it all together with a sample real-world flow bringing all these projects together using a sample application provisioning flow.
Progressive enterprise IT teams have begun learning from the practices of the web-scale players and have adopting agile ways of developing applications. They have also begun slowly introducing disruptive technologies around Cloud Computing (IaaS & PaaS), Big Data, Agile developer toolsets, DevOps style development pipelines & Deployment Automation etc. Traditional datacenters are siloed in the sense that the core foundational components servers, networking and storage are deployed, managed and monitored by separate teams. This is the antithesis of Digital where all these areas converge in a highly fluid manner.
The next post in this series will discuss Apache Mesos, an exciting new technology project that strives to provide a global cluster manager for the vast diversity of applications found in Digital projects.
“Banking may be on the cusp of an industrial revolution. This is being propelled by technology on the supply side and the financial crisis on the demand side. The upshot could be the most radical reconfiguration of banking in centuries.” – Andrew Haldane, Chief Economist, Bank of England, 2013 
This blog has discussed Banking and Payments industry trends quite extensively over the last year. Perhaps the most groundbreaking directive from a supranational regulatory standpoint has been the promulgation of the Payment Services Directive revision 2 (PSD2) in the European Union (EU). The PSD2 is a technology driven directive that aims to foster competition, digital innovation and security across Retail Banking & Internet Payments.
Banking and Payments Innovation in the EU..
The first Payment Services Directive (PSD1) came into vogue in the EU in 2009. With the stated goal of creating a Single Euro Payments Area (SEPA), the PSD1 provided created rules and frameworks for the creation of modern payment services as well as opened up payments to new entrants. The goal of the Single Euro Payments Area is to standardize the way euro payments are made across the EU and to make all cross border payments in euro as seamless as making domestic payments within a given member state. SEPA covers the whole of the EU and other non-EU European countries such as Iceland, Finland, Norway, Switzerland. Monaco etc.
A revised PSD (PSD2) was proposed in 2013 –PSD2 – EU Directive 2015/2366 . The PSD2 carries huge and monumental consequences in two lines of Global Banking – Retail Lines of Business (which typically include consumer savings & checking accounts, auto loans, mortgages and Small & Medium Enterprise Lending) and Payments (card payments, corporate payments, credit transfers, direct debits etc).
Many leading European Banks were propped by the EU Central Bank during the financial crisis. However, most have not innovated in any meaningful manner as their market shares have largely stayed intact with consumers still facing difficulties in cross border transactions. The EU clearly wants to use PSD2 as a vehicle to drive banking & payments innovation. Added to this is the Digital trend driven largely by global companies in the US and Asia. The intent of the PSD2 is to jumpstart the slow pace of innovation in the EU.
The PSD2 aims to foster a single market for consumer payments and eventually banking services. It intends to provide a framework for EU companies to respond to competitive changes in the payments landscape which have largely been driven by technology. The PSD2 also aims to drive further improvements in payment services across Europe by providing a number of enhancements to the PSD1 around the areas of mobile & online payments. It also harmonizes pricing and security among all member states. EU member state companies have until January 2018 to implement the PSD2.
It needs to be stated that all transactions that are ‘one leg out’ where at-least one party is located inside the EU – are within scope of the PSD2.
Open Banking, GDPR and PSD2…
The core themes of PSD2 may not be all that new for the UK banks for Her Majesty’s Treasury is putting finishing touches on the Open Bank Standard (OBS). While the topic has been covered quite exhaustively before in this blog, the themes are very similar as compared with the PSD2 –
While the General Data Protection Regulation (GDPR) deserves it’s own blogpost, it certainly seems to impose an opposite effect on the industry as compared with PSD2. Let me explain, while the PSD2 forces banks to unlock customer data via APIs, GDPR imposes stringent requirements on them to protect customer data. It becomes effective on May 2018 (a few months after PSD2). Given the scope of both PSD2 and GDPR, banks will need to carefully assess and calibrate changes to a range of areas across the organization – security, lines of business communication, data management, partner ecosystems, outsourcing agreements etc.
So what does the PSD2 entail..
As mentioned above, the PSD2 moves the EU towards a single payment zone by creating explicit new institutional roles in the banking landscape. The goal is to clearly increase competition by changing the rules of participation in the banking & payments industry.
First off, Banks need to begin opening up their customer data to third party providers of financial services, under the XS2A (Access to account) rule. They need to begin offering Open APIs (Application Programming Interface) to the TPPs (Third Party Providers).
This change creates three new types of roles for account and payment service providers –
PISPs (Payment Initiation Service Providers) – who will be initiate online payments on the behalf of consumers which do not need to use existing payment networks and schemes. These will clearly provide new payment options to consumers in areas such as account to account payment transfers and bill pay. Example of a scenario. When a EU customer purchases a product from a retailer sometime in 2018, the retailer can initiate a payment request directly to the consumers Bank (via a secure API call) without going through any intermediaries.
AISPs – (Account Information Service Providers) – who will be able to access customer core banking data and be able to provide value added personal financial management tools such as account aggregation etc.Example of a scenario – An AISP will offer a consumer with multiple banking accounts, a single aggregated view of all those accounts plus value added services such as personal financial management tools using all the transaction & historical data.
ASPSPs (Account Servicing Payment Service Providers) – these are Credit Institutions (Banks that offer multiple services) and Payment Institutions( payment services providers) which are required to offer open APIs to the PSPs and the AISPs. These providers can charge a small price per transaction for the PISPs but not charge differently for payments initiated through their own products.
The PISPs, AISPs and ASPSPs will all be registered, licensed and regulated by an EU agency – the European Banking Authority (EBA). They will also need be required to negotiate contracts with the individual banks. They will all need to use Strong Customer Authentication (SCA) mechanisms to access customer data thus reducing fraud in PSD2 transactions.
Open Banking via Open APIs..
The use of application programming interfaces (APIs) has been well documented across web scale companies such as Facebook, Amazon and Google. APIs are widely interoperable, relatively easy to create and form the front end of many Digital Platforms. APIs are leveraged to essentially access the core services provided by these platforms and can be used to create partner and customer ecosystems. Leader firms such as PayPal, Amazon & FinTechs such as Square, Mint etc have overwhelmingly used APIs as a way to not only open their platforms to millions of developers but also to offer innovative services. It is anticipated that the high margin services created as a result of PSD2, will include consumer & SME lending, financial advisory, peer to peer payments, crowdfunding, comparison shopping, chatbots etc to creating Banking ‘App Stores’ for widespread download and use. The AISPs and PISPs will definitely target high end margins such as financial advisory and lending.
It is expected that the EBA will define standards for the PSD2 Open API encompassing areas such as API definitions for standard banking operations to check account balances, perform transfers, view transaction histories, process payments. Vendors in the API space have already begun offering models for specific banking workflows. Security models for PSD2 should include support for two factor authentication, consent management etc using standards such as OpenID Connect.
Strategic Implications for Banks & Payment Providers..
With PSD2, the European Parliament has adopted the legal foundation of the creation of a EU-wide single payments area (SEPA). While the goal of the PSD is to establish a set of modern, digital industry rules for all payment services in the European Union; it has significant ramifications for the financial services industry as it will surely current business models & foster new areas of competition. The key message from a regulatory standpoint is that consumer data can be opened up to other players in the payment value chain. This will lead to a clamor by players to own more of the customers data with a view to selling business services (e.g. accurate credit scoring, access to mortgage & other consumer loans and mutual funds etc) on that information.
The top five implications of the PSD2 for Banks will be –
Increased competition for revenues in their existing customer base – It is expected that a whole range of nimble competitors such as FinTechs and other financial institutions will jockey to sell products to bank customers.
Banks that are unable to respond to PSD2 in a nimbler manner will be commodified into utilities – Banks will lose their monopoly on being their customers primary front end. As FinTechs take over areas such as mortgage loans (an area where they’re much faster than banks in granting loans), Banks that cannot change their distribution and product models will be commodified. The challenges start with inflexible core banking systems that maintain customer demographics, balances, product information and other BORT (Book Of Record Transaction) Systems that store a range of loan, payment and risk data. These architectures will slowly need to transition from their current (largely) monolithic architectures to compose-able units. There are various strategies that Banks can follow to ‘modernize the core’. That may be the subject of a followup post.
Lost Revenues – Over time, under PSD2, Banks and Payment providers will lose substantial revenues to the PISPs. The veritable elimination of card surcharges and Interchange Fee Regulation (IFR) for payment transactions using credit cards will not only dis-intermediate but also negatively impact card schemes such as Visa and MasterCard.
A High Degree of IT Spend – To comply with the PSD2, Banks will spend tens to hundreds of millions of dollars implementing Open APIs, retrofitting these on legacy systems and complying with increased security requirements mandated by the PSD2.
Implications for Regulatory Reporting and Risk Management – Clearly the Banks are a disadvantage here compared to the new entrants. The Banks still have to adhere to the Basel frameworks and AML (Anti Money Laundering) controls. The AISPs on the other hand are not subject to any of these restrictions nor do they need to hold capital in reserve.PISPs on the other hand will need to prove access to minimal capital reserves. Both AISPs and PISPs will need to explain their business plans and models clearly to regulators. They will also need to prove that their access to consumer data does not violate the intended use.
Why PSD2 is an Enormous Opportunity for Banks and Payment Providers..
At various times, we have highlighted various business & innovation issues with Banking providers in the areas of Retail Banking, Payment Providers and Capital Markets. Regimes such as PSD2 will compel staid industry players to innovate faster than they otherwise would.
After the PSD2 takes effect, banks face various choices. We can list those into three different strategic options.
Minimally Compliant Banks – Here we should categorize Banks that seek to provide bare bones compliance with the Open API. While this may be the starting point for several banks, staying too long in this segment will mean gradual market share erosion as well as a loss of customer lifetime value (CLV) over time. The reason for this is that FinTechs and other startups will offer a range of services such as Instant mortgages, personal financial management tools, paperless approval processes for a range of consumer accounts etc. It is also anticipated that such organizations will treat PSD2 as a localized effort and will allocate personnel to the project mainly around the front office and marketing.
Digital Starters -Banks that have begun exploring opening up customer data but are looking to support the core Open API but also introduce their own proprietary APIs. While this approach may work in the short to medium term, it will only impose integration headaches on the banks as time goes on.
Digital Innovators – The Digital Innovators will lead the way in adopting open APIs. These banks will fund dedicated teams in lines of business serving their particular customer segments either organically or using partnerships with TPPs. They will not only adhere to the PSD2 APIs but also extend the spec to create own with a focus on data monetization. Examples of such products and services will include Robo-advisors and Chatbots.
Recommendations for Banks on how to be a Digital Innovator….
In the PSD2 age, financial institutions need to embrace digital technology as a way of disarming competition and increasing their wallet share of customer business. They need to move beyond transactional banking to a customer centric model by offering value added services on the customer data that they already provide. Capabilities such as Customer Journey Mapping (CJM) and Single View of Customer (SVC) are the minimum table stakes that they need to provide.
So, the four strategic business goals that Innovators PSD2 compliant need to drive towards in the long run –
Digitize The Customer Journey – Bank clients who use services like Uber, Zillow, Amazon etc in their daily lives are now very vocal in demanding a seamless experience across all of their banking services using digital channels. The vast majority of Bank applications still lag the innovation cycle, are archaic & are separately managed. The net issue with this is that the client is faced with distinct user experiences ranging from client on-boarding to servicing to transaction management. Such applications need to provide anticipatory or predictive capabilities at scale while understand the specific customers lifestyles, financial needs & behavioral preferences.
Provide Improved Access to Personal Financial Management (PFM) Tools & Improved Lending Processes – Provide consumers with a single aggregated picture of all their accounts without customers needing to engage a TPP (Third Party Provider). Also improve lending systems by providing more efficient access to loans by incorporating a large amount of contextual data in the process.
Automate Back & Mid Office Processes Across Lending, Risk, Compliance & Fraud – PSD2 will force substantial compliance costs on the regulatory arena. The needs to forge a closer banker/client experience is not just driving demand around data silos & streams themselves but also forcing players to move away from paper based models to more of a seamless, digital & highly automated model to rework a ton of existing back & front office processes. These processes range from risk data aggregation, supranational compliance (AML,KYC, CRS & FATCA), financial reporting across a range of global regions & Cyber Security. Can the Data architectures & the IT systems that leverage them be created in such a way that they permit agility while constantly learning & optimizing their behaviors across national regulations, InfoSec & compliance requirements? Can every piece of actionable data be aggregated, secured, transformed and reported on in such a way that it’s quality across the entire lifecycle is guaranteed?
Tune Existing Business Models Based on Client Tastes and Feedback – While the initial build out of the core architecture may seem to focus on digitizing interactions and exposing data via APIs. What follows fast is strong predictive modeling capabilities working at large scale where systems need to constantly learn and optimize their interactions, responsiveness & services based on client needs & preferences.
Recommendations for Payment Service Providers on how to be a Digital Innovator….
Banks must revise their Payments Strategy and adopt six components to be successful as an Everyday Payments provider in the new regulatory environment:
Frictionless and integrated payments – working with interested 3rd parties in facilitating multimode payments through a variety of front ends
Payments Ecosystems – Payment providers should work on creating smart ecosystems with TPPs that not only offer payment services but also leverage their knowledge of customers to offer value added tools for personal financial planning
Real time Payments innovation – driving realtime cross border payments that are seamless, reliable, cost effective for both corporates and individuals
Customer Data Monetization, Payment providers have been sitting on petabytes of customer data and have only now began waking up to the possibilities of monetizing this data. An area of increasing interest is to provide sophisticated analytics to merchants as a way of driving merchant rewards programs. Retailers, Airlines and other online merchants need to understand what segments their customers fall into as well as what the best avenues are to market to each of them. E.g. Webapp, desktop or tablet etc. Using all of the Payment Data available to them, Payment providers can help Merchant Retailers understand their customers better as well as improve their loyalty programs.
Enhancing the Digital experience in corporate payments – Using the learnings from the more dynamic consumer payments spectrum, payment providers should offer their business clients the same experience in a range of areas such as wire transfers, cash management services using mobile devicesThe below blogpost provide more reading around the capabilities payment providers need to develop in the Digital arena.
With the PSD2, EU Banks and Payment service providers will need to accelerate the transition to a customer oriented mindset. They will being pushed to share data through open standards, become highly digitized in interacting with consumers and will need to begin leveraging their first mover advantage. They need to use the vast internal data (about customers, their transaction histories, financial preferences, operational insights etc) to create new products or services or to enhance the product experience.