Want to go Cloud or Digital Native? You’ll Need to Make These Six Key Investments…

The ability for an enterprise to become a Cloud Native (CN) or Digitally Native (DN) business implies the need to develop a host of technology capabilities and cultural practices in support of two goals. First, IT becomes aligned with & responsive to the business. Second, IT leads the charge on inculcating a culture of constant business innovation. Given these realities, large & complex enterprises that have invested into DN capabilities often struggle to identify the highest priority areas to target across lines of business or in shared services. In this post, I want to argue that there are six fundamental capabilities large enterprises need to adopt housewide in order to revamp legacy systems. 

Introduction..

The blog has discussed a range of digital applications and platforms at depth. We have covered a range of line of business use cases & architectures – ranging from Customer Journeys, Customer 360, Fraud Detection, Compliance, Risk Management, CRM systems etc. While the specific details will vary from industry to industry, the common themes to all these implementations include a seamless ability to work across multiple channels, to predictively anticipate client needs and support business models in real-time. In short, these are all Digital requirements which have been proven in the webscale world with Google, Facebook, Amazon and Netflix et al.  Most traditional companies are realizing that the adopting the practices of these pioneering enterprises are a must for them to survive and thrive.

However, the vast majority of Fortune 500 enterprises need to overcome significant challenges in their migrating their legacy architecture stacks to a Cloud Native mode.  While it is very easy to slap mobile UIs via static HTML on existing legacy systems, without a re-engineering of their core, they can never realize the true value of digital projects. The end goal of such initiatives is to ensure that underlying systems are agile and able to be responsive to business requirements. The key question then becomes how to develop and scale these capabilities across massive organizations.

Legacy Monolithic IT as a Digital Disabler…

From a top-down direction, business leadership is requesting agiler IT delivery and faster development mechanisms to deal with competitive pressures such as social media streams, a growing number of channels, disruptive competitors and demanding millennial consumers. When one compares the Cloud Native (CN) model (@ http://www.vamsitalkstech.com/?p=5632) to the earlier monolithic deployment stack (@ http://www.vamsitalkstech.com/?p=5617), it is easily noticeable that there are a sheer number of technical elements and trends that enterprise IT is being forced to devise strategies for.

This pressure is being applied on Enterprise IT from both directions.

Let me explain…

In most organizations, the process of identifying the correct set of IT capabilities needed for line of business projects looks like the below –

  1. Lines of business leadership works with product management teams to request IT for new projects to satisfy business needs either in support of new business initiatives or to revamp existing offerings
  2. IT teams follow a structured process to identify the appropriate (siloed) technology elements to create the solution
  3. Development teams follow a mix of agile and waterfall models to stand up the solution which then gets deployed and managed by an operations team
  4. Customer needs and update requests get slowly reflected causing customer dissatisfaction
    Given this reality, how can legacy systems and architectures reinvent themselves to become Cloud Native?

Complexity is inevitable & Enterprises that master complexity will win…

The correct way to creating a CN/DN architecture is that certain technology investments need to be made by complex organizations to speed up each step of the above process. The key challenge in the CN process is to help incumbent enterprises kickstart their digital products to disarm competition.

The sheer number of offerings of the digital IT challenge is due in large part to a large number of technology trends and developments that have begun to have a demonstrable impact on IT architectures today. There are no fewer than nine—including social media and mobile technology, the Internet of Things (IoT), open ecosystems, big data and advanced analytics, and cloud computing et al.

Thus, the CN movement is a complex mishmash of technologies that straddle infrastructure, storage, compute and management. This is an obstacle that must be surmounted by enterprise architects and IT leadership to be able to best position their enterprise for the transformation that must occur.

Six Foundational Technology Investments to go Cloud Native…

There are six foundational technology investments that predicate the creation of a Cloud Native Application Architecture – IaaS, PaaS & Containers, Container Orchestration, Data Analytics & BPM, API Management & DevOps.

There are six layers that large enterprises will need to focus on to improve their systems, processes, and applications in order to achieve a Digital Native architecture. These investments can proceed in parallel.

#1 First and foremost, you will need an IaaS platform

An agile IaaS is an organization-wide foundational layer which provides unlimited capacity across a range of infrastructure services – compute, network, storage, and management. IaaS provides an agile but scalable foundation to deploy everything else on it without incurring undue complexity in development, deployment & management. Key tenets of the private cloud approach include better resource utilization, self-service provisioning and a high degree of automation. Core IT processes such as the lifecycle of resource provisioning, deployment management, change management and monitoring will need to be redone for an enterprise-grade IaaS platform such as OpenStack.

#2 You will need to adopt a PaaS layer with Containers at its heart  –

Containers are possibly the first infrastructure software category created by developers in mind. The prominence of Linux Containers has Docker coincided with the onset of agile development practices under the DevOps umbrella – CI/CD etc. Containers are an excellent choice to create agile delivery pipelines and continuous deployment. It is a very safe bet to make that in a few years, the majority of digital applications (or mundane applications for that matter) will transition to hundreds of services deployed on and running on containers.

Adopting a market leading Platform As A Service (PaaS) platform such as Red Hat’s OpenShift or CloudFoundry can provide a range of benefits from helping with container adoption, tools to help with CI/CD process, reliable rollout with A/B testing, Green-Blue deployments. A PaaS such as OpenShift adds auto-scaling, failover & other kinds of infrastructure management.

Why Linux Containers and Docker are the Runtime for the Software Defined Data Center (SDDC)..(4/7)

#3 You will need an Orchestration layer for Containers –

At their core, Containers enable the creation of multiple self-contained execution environments over the same operating system. However, containers are not enough in and of themselves – to drive large-scale DN applications. An Orchestration layer at a minimum, organizes groups of containers into applications, schedules them on servers that match their resource requirements, places the containers on complex network topology etc. It also helps with complex tasks such as release management, Canary releases and administration. The actual tipping point for large-scale container adoption will vary from enterprise to enterprise. However, the common precursor to supporting containerized applications at scale has to be an enterprise-grade management and orchestration platform. Again, a PaaS technology such as OpenShift provides two benefits in one – a native container model and orchestration using Kubernetes.

Kubernetes – Container Orchestration for the Software Defined Data Center (SDDC)..(5/7)

#4 Accelerate investments in and combine Big Data Analytics and BPM engines –

In the end, the ability to drive business processes is what makes an agile enterprise. Automation in terms of both Business Processes (BPM) and Data Driven decision making are proven approaches used at webscale,  data-driven organizations. This makes all the difference in terms of what is perceived to be a digital enterprise. Accordingly, the ability to tie in a range of front, mid and back-office processes such as Customer Onboarding, Claims Management & Fraud Detection to a BPM-based system and allowing applications to access these via a loosely coupled architecture based on microservices is key. Additionally leveraging Big Data architectures to process data streams in near real-time is another key capability to possess.

Why Big Data Analytics is the Future of CRM..

#5 Invest in APIs –

APIs enable companies to constantly churn out innovative offerings while still continuously adapting & learning from customer feedback. Internet-scale companies such as Facebook provide edge APIs that enable thousands of companies to write applications that drives greater customer volumes to the Facebook platform. The term API Economy is increasingly in vogue and it connotes a loosely federated ecosystem of companies, consumers, business models and channels

APIs are used to abstract out the internals of complex underlying platform services. Application Developers and other infrastructure services can be leveraged well defined APIs to interact with Digital platforms. These APIs enable the provisioning, deployment, and management of platform services.

Applications developed for a Digital infrastructure will be developed as small, nimble processes that communicate via APIs and over traditional infrastructure such as service mediation components (e.g Apache Camel). These microservices based applications will offer huge operational and development advantages over legacy applications. While one does not expect legacy but critical applications that still run on mainframes (e.g. Core Banking, Customer Order Processing etc) to move over to a microservices model anytime soon, customer-facing applications that need responsive digital UIs will definitely move.

Why APIs Are a Day One Capability In Digital Platforms..

#6 Be prepared, your development methodologies will gradually evolve to DevOps – 

The key non-technology component that is involved in delivering error-free and adaptive software is DevOps.  Currently, most traditional application development and IT operations happen in silos. DevOps with its focus on CI/CD practices requires engineers to communicate more closely, release more frequently, deploy & automate daily, reduce deployment failures and mean time to recover from failures.

Typical software development life cycles that require lengthy validations and quality control testing prior to deployment can stifle innovation. Agile software process, which is adaptive and is rooted in evolutionary development and continuous improvement, can be combined with DevOps. DevOps focuses on tight integration between developers and teams who deploy and run IT operations. DevOps is the only development methodology to drive large-scale Digital application development.

Conclusion..

By following a transformation roughly outlined as above, the vast majority of enterprises can derive a tremendous amount of value in their Digital initiatives. However, the current industry approach as in vogue – to treat Digital projects as a one-off, tactical project investments – does not simply work or scale anymore. There are various organizational models that one could employ from the standpoint of developing analytical maturity. These ranging from a shared service to a line of business led approach. An approach that I have seen work very well is to build a Digital Center of Excellence (COE) to create contextual capabilities, best practices and rollout strategies across the larger organization. The COE should be at the forefront of pushing the above technology boundaries within the larger framework of the organization.

Blockchain and Bitcoin – Industry Insights & Reference Architectures…

Distributed Ledger technology (DLT) and applications built for DLT’s – such as cryptocurrencies – are arguably the hottest topics in tech. This post summarizes seven key blogs on the topic of Blockchain and Bitcoin published at VamsiTalksTech.com. It aims to serve as a handy guide for business and technology audiences tasked with understanding and implementing this groundbreaking technology.

Image Credit – DCEBrief

Introduction…

We have been discussing the capabilities of Blockchain and Bitcoin for quite some time on this blog. The impact of Blockchain on many industries is now clearly apparent. But can the DLT movement enable business efficiency and profitability?

# 1 – Introduction to Bitcoin –

Bitcoin (BTC) is truly the first decentralized, peer to peer, high secure and purely digital currency. Bitcoin & it’s other cousins such as Ether and AltCoins now regularly get widespread (& mostly positive) notice by a range of industry actors- ranging from Consumers, Banking Institutions, Retailers & Regulators. Riding on the real pathbreaker – the Blockchain, the cryptocurrency movement will help drive democratization in the financial industry and society at large in the years to come. This blog post discusses BTC from a business standpoint.

Bitcoin (BTC) Ushers in the Future of Finance..(1/5)

# 2 – The Architecture of Bitcoin –

This post discusses the technical architecture of Bitcoin.

The Architecture of Bitcoin..(2/5)

# 3 – Introduction to Blockchain –

The term Blockchain is derived from a design pattern that describes a chain of data blocks that map to individual transactions. Each transaction that is conducted in the real world (e.g a Bitcoin wire transfer) results in the creation of new blocks in the chain. The new blocks so created are done so by calculating a cryptographic hash function of its previous block thus constructing a chain of blocks – hence the name. This post introduces the business potential of Blockchain to the reader.

The immense potential of the Blockchain..(3/5)

# 4 – The Reference Architecture of the Blockchain –

This post discusses the technical architecture of Blockchain.

The Architecture of Blockchain..(4/5)

# 5 – How Blockchain will lead to Industry disruption –

Blockchain lies at the heart of the Bitcoin implementation & is easily the most influential part of the BTC platform ecosystem. Blockchain is thus both a technology platform and a design pattern for building global scale industry applications that make all of the above possible. Its design makes it possible to be used as a platform for digital currency also enable it to indelibly record any kind of transaction – be it a currency, or, a medical record, or, supply chain data, or, a document etc into it.

How the Blockchain will lead disruption across industry..(5/5)

# 6 – What Blockchain can do for the Internet of Things (IoT) –

Blockchain can enable & augment a variety of application scenarios and use-cases for the IoT. No longer are such possibilities too futuristic – as we discuss in this post.

What Blockchain can do for The Internet Of Things..

# 7 – Key Considerations in Adapting the Blockchain for the Enterprise –

With advances in various Blockchain based DLTs (distributed ledger technology) platforms such as HyperLedger & Etherium et al, enterprises have begun to take baby steps to adapt the Blockchain (BC) to industrial scale applications. This post discusses some of the stumbling blocks the author has witnessed enterprises are running into as they look to adopt Blockchain based Distributed Ledger Technology in real-world applications.

Blockchain For the Enterprise: Key Considerations..

Conclusion..

The true disruption of Blockchain based distributed ledgers will in moving companies to an operating model where they leave behind siloed and stovepiped business processes, to the next generation of distributed business processes predicated on a seamless global platform. The DLT based platform will enable the easy transaction, exchange, and contraction of digital assets. However, before enterprises rush in, they need to perform an adequate degree of due diligence to avoid some of the pitfalls we have highlighted above.

Why Legacy Monolithic Architectures Won’t Work For Digital Platforms..

As times change, so do architectural paradigms in software development. For the more than fifteen years the industry has been developing large scale JEE/.NET applications, the three-tier architecture has been the dominant design pattern. However, as enterprises embark or continue on their Digital Journey, they are facing a new set of business challenges which demand fresh technology approaches. We have looked into transformative data architectures at a great degree of depth in this blog, now let us now consider a rethink in the Applications themselves. Applications that were earlier deemed to be sufficiently well-architected are now termed as being monolithic.  This post solely focuses on the underpinnings of why legacy architectures will not work in the new software-defined world. My intention is not to merely criticize a model (the three-tier monolith) that has worked well in the past but merely to reason why it may be time for a generally well accepted newer paradigm.

Traditional Software Platform Architectures… 

Digital applications support a wider variety of frontends & channels, they need to accommodate larger volumes of users, they need wider support for a range of business actors  – partners, suppliers et al via APIs. Finally, these new age applications need to work with unstructured data formats (as opposed to the strictly structured relational format). From an operations standpoint, there is a strong need for a higher degree of automation in the datacenter. All of these requirements call for agility as the most important construct in the enterprise architecture.

As we will discuss, legacy applications (typically defined as created more than 5+ years ago) are beginning to emerge as one of the key obstacles in doing Digital. The issue is not just in the underlying architectures themselves but also in the development culture involved building and maintaining such applications.

Consider the vast majority of applications deployed in enterprise data centers. These applications deliver collections of very specific business functions – e.g. onboarding new customers, provisioning services, processing payments etc. Whatever be the choice of vendor application platform, the vast majority of existing enterprise applications & platforms essentially follows a traditional three-tier software architecture with specific separation of concerns at each tier (as the vastly simplified illustration depicts below).

Traditional three-tier Monolithic Application Architecture

The first tier is the Presentation tier which is depicted at the top of the diagram. The job of the presentation tier is to present the user experience. This includes the user interface components that present various clients with the overall web application flow and also renders UI components. A variety of UI frameworks that provide both flow and UI rendering is typically used here. These include Spring MVC, Apache Struts, HTML5, AngularJS et al.

The middle tier is the Business logic tier where all the business logic for the application is centralized while separating it from the user interface layer. The business logic is usually a mix of objects and business rules written in Java using frameworks such EJB3, Spring etc. The business logic is housed in an application server such as JBoss AS or Oracle WebLogic AS or IBM WebSphere AS – which provides enterprise services (such as caching, resource pooling, naming and identity services et al) to the business components run on these servers. This layer also contains data access logic and also initiates transactions to a range of supporting systems – message queues, transaction monitors, rules and workflow engines, ESB (Enterprise Service Bus) based integration, accessing partner systems using web services, identity, and access management systems et al.

The Data tier is where traditional databases and enterprise integration systems logically reside. The RDBMS rules this area in three-tier architectures & the data access code is typically written using an ORM (Object Relational Mapping) framework such as Hibernate or iBatis or plain JDBC code.

Across all of these layers, common utilities & agents are provided to address cross-cutting concerns such as logging, monitoring, security, single sign-on etc.

The application is packaged as an enterprise archive (EAR) which can be composed of a single or multiple WAR/JAR files. While most enterprise-grade applications are neatly packaged, the total package is typically compiled as a single collection of various modules and then shipped as one single artifact. It should bear mentioning that dependency & version management can be a painstaking exercise for complex applications.

Let us consider the typical deployment process and setup for a thee tier application.

From a deployment standpoint, static content is typically served from an Apache webserver which fronts a Java-based webserver (mostly Tomcat) and then a cluster of backend Java-based application servers running multiple instances of the application for High Availability. The application is Stateful (and Stateless in some cases) in most implementations. The rest of the setup with firewalls and other supporting systems is fairly standard.

While the above architectural template is fairly standard across industry applications built on Java EE, there are some very valid reasons why it has begun to emerge as an anti-pattern when applied to digital applications.

Challenges involved in developing and maintaining Monolithic Applications …

Let us consider what Digital business usecases demand of application architecture and where the monolith is inadequate at satisfying.

  1. The entire application is typically packaged as a single enterprise archive (EAR file), which is a combination of various WAR and JAR files. While this certainly makes the deployment easier given that there is only one executable to copy over, it makes the development lifecycle a nightmare. The reason being that even a simple change in the user interface can cause a rebuild of the entire executable. This results in not just long cycles but makes it extremely hard on teams that span various disciplines from the business to QA.
  2. What follows from such long “code-test & deploy” cycles are that the architecture becomes change resistant, the code very complex over time and as a whole the system subsequently becomes not agile at all in responding to rapidly changing business requirements.
  3. Developers are constrained in multiple ways. Firstly the architecture becomes very complex over a period of time which inhibits quick new developer onboarding. Secondly,  the architecture force-fits developers from different teams into working in lockstep thus forgoing their autonomy in terms of their planning and release cycles. Services across tiers are not independently deployable which leads to big bang releases in short windows of time. Thus it is no surprise that failures and rollbacks happen at an alarming rate.
  4. From an infrastructure standpoint, the application is tightly coupled to the underlying hardware. From a software clustering standpoint, the application scales better vertically while also supporting limited horizontal scale-out. As volumes of customer traffic increase, performance across clusters can degrade.
  5. The Applications are neither designed nor tested to operate gracefully under failure conditions. This is a key point which does not really get that much attention during design time but causes performance headaches later on.
  6. An important point is that Digital applications & their parts are beginning to be created using different languages such as Java, Scala, and Groovy etc. The Monolith essentially limits such a choice of languages, frameworks, platforms and even databases.
  7. The Architecture does not natively support the notion of API externalization or Continuous Integration and Delivery (CI/CD).
  8. As highlighted above, the architecture primarily supports the relational model. If you need to accommodate alternative data approaches such as NoSQL or Hadoop, you are largely out of luck.

Operational challenges involved in running a Monolithic Application…

The difficulties in running a range of monolithic applications across an operational infrastructure have already been summed up in the other posts on this blog.

The primary issues include –

  1. The Monolithic architecture typically dictates a vertical scaling model which ensures limits on its scalability as users increase. The typical traditional approach to ameliorate this has been to invest in multiple sets of hardware (servers, storage arrays) to physically separate applications which results in increases in running cost, a higher personnel requirement and manual processes around system patch and maintenance etc.
  2. Capacity management tends to be a bit of challenge as there are many fine-grained resources competing for compute, network and storage resources (vCPU, vRAM, virtual Network etc) that are essentially running on a single JVM. Lots of JVM tuning is needed from a test and pre-production standpoint.
  3. A range of functions needed to be performed around monolithic Applications lack any kind of policy-driven workload and scheduling capability. This is because the Application does very little to drive the infrastructure.
  4. The vast majority of the work needed to be done to provision, schedule and patch these applications is done by system administrators and consequently, automation is minimal at best.
  5. The same is true in Operations Management. Functions like log administration, other housekeeping, monitoring, auditing, app deployment, and rollback are vastly manual with some scripting.

Conclusion…

It deserves mention that the above Monolithic design pattern will work well for Departmental (low user volume) applications which have limited business impact and for applications serving a well-defined user base with well delineated workstreams. The next blog post will consider the microservices way of building new age architectures. We will introduce and discuss Cloud Native Application development which has been popularized across web-scale enterprises esp Netflix. We will also discuss how this new paradigm overcomes many of the above-discussed limitations from both a development and operations standpoint.

Anti Money Laundering (AML) – Industry Insights & Reference Architectures…

This blog has from time to time discussed issues around the defensive portion of financial services industry  (Banking, Payment Processing, and Insurance etc). Anti Money Laundering (AML) is a critical area where institutions need to protect themselves and their customers from malicious activity. This post summarizes eight key blogs on the topic of AML published at VamsiTalksTech.com. It aims to serve as a handy guide for business and technology audiences tasked with implementing complex AML projects.

Image Credit – FIBA Anti-Money Laundering Compliance Conference

Introduction

Money laundering has emerged as an umbrella crime which facilitates public corruption, drug trafficking, tax evasion, terrorism financing etc. Banks and other financial institutions are expected to conduct business in a manner that protects their countries of operations and consumers from security risks such as laundering, terrorist financing, and corruption (the ML/TF risks). Given the global reach of financial products, a variety of regulatory authorities is concerned about money laundering.  Technology has become key to meeting the regulatory expectations as well as reducing costs in these onerous programs. As the below graphic from PwC [1] demonstrates this is one of the most pressing issues facing financial services industry.

The above infographic from PwC provides a handy visual guide to the state of global AML programs.

The Six Critical Gaps in Global AML Programs…

From an industry standpoint, the highest priority issues that are being pointed out by regulators include the following –

  1. Institutions failing to develop AML frameworks that are unique to the risks run by organizations given their product and geographic mix
  2. Failure to develop real-time insights into business transactions and assigning them elevated risks based on their elements
  3. Developing AML models that draw from the widest possible sources of data – both internal and external – to understand a true picture of the business
  4. Demonstrating a consistent approach across geographies
  5. Leveraging the latest developments in analytics including Machine Learning to enable the automation of AML programs
  6. Lack of appropriate business governance & change management in setting, monitoring and managing AML compliance programs, policies and procedures

With this background in mind, the complete list of AML blogs on VamsiTalksTech is included below.

# 1 – Why Banks should Digitize their Operations and how this will help their AML programs –

Digitization implies a mix of business models predicated on agile systems, rapid & iterative development and more importantly – a Data First strategy. These have significant impacts on AML programs as well in addition to helping increase market share.

A Digital Bank is a Data Centric Bank..

# 2 – Why Data Silos are a huge challenge in many cross organization projects such as AML –

Organizational Data Silos inhibit the effectiveness of AML programs as compliance officers cannot gain a single view of a customer or single view of a suspicious transaction or view the social graph in critical areas such as trade finance. This blog discusses the Silo anti-pattern and ways to mitigate silos from proliferating.

Why Data Silos Are Your Biggest Source of Technical Debt..

# 3 – The Major Workstreams Around AML Programs

The headline is self-explanatory but we discuss the five major work streams on global AML projects – Customer Due Diligence, Entity Analysis, Downstream Analytics, Ongoing Monitoring and Investigation Lifecycle.

Deter Financial Crime by Creating an Effective Anti Money Laundering (AML) Program…(1/2)

# 4 – Predictive Analytics Across the AML workstreams –

Here we examine how Predictive Analytics can be applied across all of the five work streams.

How Big Data & Predictive Analytics transform AML Compliance in Banking & Payments..(2/2)

# 5 – The Business Need for Big Data in AML programs  –

This post discusses the most important developments in building AML systems using Big Data Technology-

Building AML Regulatory Platforms For The Big Data Era

# 6 – A Detailed Look at how Enterprises can use Big Data and Advanced Analytics to reduce AML costs –

How to leverage Big Data and Advanced Analytics to detect a range of suspicious transactions and actors.

Big Data – Banking’s New Weapon In War Against Financial Crime..(1/2)

# 7 – Reference Architecture for AML  –

We discuss a Big Data enabled Reference Architecture of an enterprise-wide AML program.

Big Data – Banking’s New Weapon In War Against Financial Crime..(2/2)

Conclusion

According to Pricewaterhouse Coopers, the estimates of global money laundering flows were between 2-5% of global GDP [1] in 2016 – however, only 1% of these transactions were caught. Certainly, the global financial industry has a long way to go before they effectively stop these nefarious actors but there should be no mistaking that technology is a huge part of the answer.

References –

  1. Pricewaterhouse Coopers 2016 AML Survey  http://www.pwc.com/gx/en/services/advisory/forensics/economic-crime-survey/anti-money-laundering.html

Why Linux Containers and Docker are the Runtime for the Software Defined Data Center (SDDC)..(4/7)

The third and previous blog in this seven part series (@ http://www.vamsitalkstech.com/?p=4659)  discussed Apache Mesos, a project that aims to abstract away various system resources – CPU, memory, network and disk resources to provide consuming digital applications with a giant cluster from which they can utilize capacity – a key requirement of the Software Defined Datacenter (SDDC). In this fourth blog, we will discuss another important ecosystem technology & project – Linux Containers and Docker – which forms the foundational runtime component in the SDDC. The next blog will discuss Kubernetes – Google’s container orchestration platform.

Much like shipping goods in Containers over Oceans, Linux Containers offer a portable, lightweight & convenient way to ship business applications. (Image Credit – WallPapers 13)

Executive Summary…

We can agree that the Digital application is inherently a distributed application. Such applications have historically been extremely hard to develop, setup and manage across a large fleet of data center servers that are a mix of platforms and technologies. Thus it is no surprise that one of the most disruptive developments in the last five years has been the innovation in the Linux container space. Containers now enable the running distributed applications at scale. 

Due to business reasons, Digital applications demand constant updates, changes and incremental revisions in response to changing customer needs. The Software Defined Datacenter (SDDC) thus needs a runtime paradigm that enables not just efficient hardware usage but also supports standardized application environments that are portable simplified and consistent across hybrid clouds and hypervisors.  Containers fill this need and are thus emerging to be the natural unit of deployment across the SDDC. Much has been written on the topic of Docker and Linux Container technology. My goal for this blog post is to distill key insights in the container ecosystem.

The Technologies of Linux Containers & Docker

Unlike Virtual Machines, Container Engines such as Docker share a common OS (Image Credit – MSFT Azure)

Linux Containers are alike and yet different from virtual machines. They are alike in the sense that each Container shares system resources on the underlying hardware platform – CPU, RAM, and Network – as with VMs. However, while each VM maintains its separate copy of the Operating System (OS), containers share the same OS kernel while keeping themselves separate from other containers running on the same OS.  How do they do that?

Though the terms ‘Docker’ and ‘Container’ have become almost synonymous – it needs to be noted that Docker is a company focused on developing technology enablement around containers in areas such as orchestration, networking, and management. Docker was an open source project (now renamed to Moby [1]) that provided capabilities such as a standard description of container formats, utilities for application packaging, deployment & lifecycle management of applications inside Linux Containers. It provides a Docker CLI command line tool for the lifecycle management of image-based containers.

Prior to the explosion of interest in Linux containers & the founding of Docker, traditional Linux distributions (with a minimum kernel level of 3.8) supported two foundational paradigms – control groups (cgroups) and kernel namespaces.  Linux containers use both these features to achieve their goal of isolation and portability. Cgroups enables the host to limit the resources each container process can use from a CPU, Memory, Filesystem, User ID components and Network standpoint. This ensures that containers running on a host cannot starve others of resources thus avoiding the “Noisy Neighbor” problem that bedeviled a lot of cloud deployments.

Kernel Namespaces ensure another kind of isolation for process interactions within the OS. Containers can only view and modify resources in the same namespace. This ensures a security mechanism where other containers and processes on the host cannot launch attacks on a given application running on a tenant container or on the host itself. Thus the combination of both these technologies ensures that multiple applications running within their individual containers can share CPU and Memory without needing the overhead of virtualization. Docker also grants each container its own networking implementation thus ensuring that resources such as socket and interfaces can also be protected.

Companies including Red Hat, IBM, Google, Cisco, VMware, and CoreOS have greatly aided with the development of and accessibility of containers in their platforms and products.

Layered Filesystems..

Various Image Layers in Docker. Each layer in the file system is mounted on the previous.The topmost is the Writable Container. (Image Credit – Docker)

We discussed how Container Images are Immutable. This is the key advantage of using container technology such as Docker & is made possible by the notion of a Union filesystem. What are Union filesystems and how do they enforce immutability? Much like the image in a Virtual Machine sense, Containers also run from an image, which typically are a snapshot of a filesystem but tend to be much smaller than VM images since the Container is installed on a host kernel.

Union filesystems are best described as a layered architecture – in that each layer is created independently and then added atop of the previous layer.  An example of a Union filesystem is a Linux Kernel – an OS – then a data base like Oracle – then Tomcat – and a web application on it. The top layer is always the Writable layer. The real advantage in using a union filesystem is that using these images becomes super efficient from a storage and execution standpoint. Union filesystems also help in sharing portions of the OS across containers. Simply put, an image contains everything an application needs – from it’s dependencies and external libraries. When an Image is run, it is called a Container. In the case of Docker, it uses a layered copy on write filesystem called AUFS (Another Union Filesystem).

Containers and Developers..

Containers are possibly the first infrastructure software category created by developers in mind. The prominence of Linux Containers has Docker coincided with the onset of agile development practices under the DevOps umbrella – CI/CD etc. Containers are an excellent choice to create agile delivery pipelines and continuous deployment. At their core, Containers enable the creation of multiple self-contained execution environments over the same operating system.

Developers are naturally excited about Linux Containers for five specific reasons –

  1. Containers allow for image consistency across OS environments. This is a huge help in accelerating the development process from development to debugging to production. Developers can just focus on building their applications (in dev environments that match the test and prod) and packaging them in containers. This just takes a lot of the inefficiency around environment dissimilarities out of the equation.
  2. Containers are treated as a standard linux process by the kernel & thus are orders of magnitude quicker from a startup time when compared to VMs. This means that developers can start their applications in a manner of seconds as long as they run them in a container.
  3. Containers also provide development organizations the ability to standardize application development workflows and update processes. This solves the scalability problem that digital applications have caused large organizations.
  4. Digital applications are leading the move to adopt microservices. Microservices offer a way to build applications as a collection of discrete services as opposed to a monolithic architecture. By there very nature, microservices can be built and managed by different teams. Containerization affords a lightweight way of building and deploying microservices.
  5. Containers offer a portable way of delivering applications (across Operating Systems) as well as provide horizontal scalability.

    Digital Application development using Containers..

Digital Application Development and Deployment Workflow using Containers.

There are a few key runtime components involved in operationalizing a small to medium to large scale container infrastructure as the above illustration depicts.

  1. Firstly, developers create container images. These images describe an application and it’s dependencies. An easy way to conceptualize an image is to think of it as a basic deployment template. Image are also immutable in that they are read only and any changes happen in the top most layer which is writable. Modifying an image is to create a new one. Images thus have a Parent Child relationship. Developers create images by building their applications on their developer environments, performing unit tests and then pushing to a repository. Once the container is built with the necessary dependencies, these tools run a battery of tests to validate business functionality. A large part of this process is usually best automated using CI/CD tools like Jenkins, CruiseControl or Buildbot etc.
  2. The built images are then made available in a Container Registry. This is either maintained internally or sourced from a trusted external source. As the name suggests, Registries maintain a catalog of container images of frequently used software – e.g. Custom applications and other software packages such as WordPress, Relational databases, Web Servers, Big Data technologies and Application Servers etc
  3. The next step is to create and deploy (runtime) containers from these images on a set of servers. Once images are released as a result of application development, sys admins work on the provisioning of the servers to run these images. Once a Container engine is installed on the server, images are loaded on and they take the runtime shape of containers. The mode of getting these images on these servers follows either a push/pull mechanism.
  4. Scheduling of containers on servers is also a process that usually done by Sys Admins. This involves running containers of certain kinds on servers that match up to certain CPU, I/O and Network capacity requirements
  5. To create complex real world deployments, not only do the servers and networking have to be created but these containers are also interconnected (e.g. a web server container to an application server) using Discovery mechanisms. These containers then need to also connect to a host of enterprise services. Customer traffic is then routed to the clustered containers running on these servers. Monitor the logs and performance of these containers and the microservices running on them.
  6. The process repeats from step #1 above.

Industry Adoption of Containers.

In a few years, containers will deliver the bulk of compute workloads across public cloud providers such as Amazon AWS, Google Compute Engine and Microsoft Azure. Given that the VM options on these clouds can run multiple containers which can scale on demand, the industry will begin to gravitate to higher utilization density. The SDDC has already begun incorporating hybrid architectures that run both containers and VMs in a complementary fashion.The Software Defined Datacenter will incorporate a hybrid model consisting of applications running on both Linux Containers and Virtual Machines.

Customers also have choices of traditional enterprise operating systems such as Red Hat Enterprise Linux or Microsoft Windows or can also run containers on OS’s developed for the purpose of hosting containers at hyper scale. These OS’s just provide tools to manage containers and nothing else. Examples include Red Hat Atomic Platform and CoreOS. Moving up the stack, pioneers such as Google and Red Hat have added core support for containers in projects such as OpenStack, Kubernetes, Mesos, OpenShift & CloudFoundry by helping with networking and persistent storage. Kubernetes (which we will cover in the next post) also handles provisioning on multiple public cloud platforms. Config Mgmt platforms such as Ansible, Chef and Puppet now support containerized deployments.

Technical Considerations for Container Adoption

Some key considerations that industry players are tackling from the standpoint of running containers at scale –

  1. Container Orchestration –  Organize groups of containers into compassable applications, scheduling them on servers that match their resource requirements, placement of containers based on network topology etc
  2. Container Networking – Containers follow a pluggable model and the network is no different. Key considerations – an enterprise network connectivity stack is needed to not only provide the interconnect between different containers but also to integrate them with existing Layer 2/3 networks. Additionally, network isolation needs to be provided for microservices running on these containers using either a dedicated IP address for each or an overlay network.
  3. Management and Monitoring -Life cycle processes ranging from Management and Monitoring encompass a range of questions – application patching with low downtime, graceful failures in cloud native applications, container scale up & scale down based on traffic patterns etc.

Containers and your Enterprise…

So what is the best way to adopt containers across a large enterprise?

  • Develop your container strategy in the context of the Nexus of Forces (i.e., information, mobile, social and cloud) initiatives in your organization — Containers are at the junction of these technologies.
  • Institute an organizational process to examine the business value of any initiative to adopt Containers. Understand what tools and platforms to adopt that will abstract away the complexities of using containers.
  • Understanding skills required to leverage containers. Containers are a new way for both developers and SysOps. Dependency management moves to the developers but they realize tremendous benefits in adopting these for high-velocity Digital applications
  • Identifying, measuring and benchmarking key success metrics that measure the ROI of the overall container investments.

Conclusion..

To sum up, the Linux (and Windows) container space is exploding both from a mindshare as well as an adoption standpoint. What is hugely encouraging is that a host of next generation platform technologies (ranging from IaaS to PaaS) are not just choosing to support containers as their basic runtime unit but are also focusing on becoming the defacto solution supporting a host of container ecosystem usecases – provisioning, orchestration, management, CI/CD et al. The next two blogs will respectively discuss how Google Kubernetes and Red Hat OpenShift overcome these challenges and abstract away much of the complexity around container deployments.

The next blog post in this series will discuss Google Kubernetes, the dominant project in the container orchestration space.

References

[1] Introducing Moby Project –  https://blog.docker.com/2017/04/introducing-the-moby-project/

Risk Management – Industry Insights & Reference Architectures…

Financial Risk Management as it pertains to different industries – Banking, Capital Markets and Insurance – has been one of the most discussed topics in this blog. The business issues and technology architecture of systems dedicated to aggregating, measuring & visualizing Risk are probably one of the more complex tasks in the worlds of finance & insurance. This post summarizes ten key blogs on the topic of Financial Risk published at VamsiTalksTech.com. It aims to serve as a handy guide for business and technology audiences tasked with implementing Risk projects.

Image Credit – ShutterStock

The twin effects of the global financial crisis & the FinTech boom has caused Financial Services, Insurance and allied companies to become laser focused on on risk management.  What was once a concern primarily of senior executives in the financial services sector has now become a top-management priority in nearly every industry.

Whatever be the kind of Risk, certain themes are common from a regulatory intention standpoint-

  1. Limiting risks that may cause wider harm to the economy by restricting certain activities such as preventing banks with retail operations from engaging in proprietary trading activities
  2. Requiring that banks increase the amount of and quality of capital held on reserve to back their assets and by requiring higher liquidity positions
  3. Ensuring that banks put in place appropriate governance standards ensuring that boards and management interact not just internally but also with regulators and their clients
  4. Upgrading governance standards, enabling a fundamental change in bank governance and the way boards interact with both management and regulators. These ambitions were expressed in various new post‐crisis rules and approaches.
  5.  Tackle the “too big to fail” challenge for highly complex businesses spanning multiple geographies, product lines and multifaceted customer segments. Accurate risk reporting ensures adequate capital conservation buffers.

With this background in mind,  complete list of Risk use case blogs on VamsiTalksTech is included below.

# 1 – Why Banks and Other Financial Institutions Should Digitize Risk Management –

Banks need to operate their IT across two distinct prongs – defense and offense. Defensive in areas like Risk, Fraud and Compliance (RFC) ; Offensive as in revenue producing areas of the business like Customer 360 (whether Institutional or Retail), Digital Marketing, Mobile Payments, Omni channel Wealth Management etc. If one really thinks about it – the biggest activity that banks do is manipulate and deal in information whether customer or transaction or general ledger etc.

Why Banks, Payment Providers and Insurers Should Digitize Their Risk Management..

# 2 – Case Study of a Big Data Enabled IT Architecture for Risk Data Measurement – Volcker Rule/Dodd Frank –

While industry analysts can discuss the implications of a certain Risk mandate, it is certainly most help for Business & IT audiences to find CIOs discussing overall strategy & specific technology tools. This blogpost discusses how two co-CIOs charged with an enterprise technology mandate are focused on growing and improving a Global Banking leaders internal systems, platforms and applications especially from a Risk standpoint.

How a Pioneering Bank leverages Hadoop for Enterprise Risk Data Aggregation & Reporting..



# 3 – A POV on Bank Stress Testing – CCAR and DFast

An indepth discussion of Bank Stress Testing from both a business and technology standpoint.

A POV on Bank Stress Testing – CCAR & DFAST..

# 4 – Capital Markets – Architectural Approaches to the practice of Risk Management

In Capital Markets, large infrastructures ,on a typical day, process millions of derivative trades. The main implication is that there are a large number of data inserts and updates to handle. Once the data is loaded into the infrastructure there needs to be complex mathematical calculations that need to be done in near real time to calculate intraday positions. Most banks use techniques like Monte Carlo modeling and other computational simulations to build & calculate these exposures. Hitherto, these techniques were extremely expensive from both the cost of hardware and software needed to run them. Neither were tools & projects available that supported a wide variety of data processing paradigms – batch, interactive, realtime and streaming. This post examines a detailed reference architecture applicable to areas such as Market, Credit & Liquidity Risk Measurement.

Big Data architectural approaches to Financial Risk Mgmt..

# 5 – Risk Management in the Insurance Industry  – Solvency II 

A discussion of Solvency II – the Insurance industry’s equivalent of Basel III – from both a business and technology standpoint.

Why the Insurance Industry Needs to Learn from Banking’s Risk Management Nightmares..

# 6 – FRTB (Fundamental Review of the Trading Book)

An in-depth business and technology discussion of the highlights and key implications of the FRTB (Fundamental Review of the Trading Book).

A POV on the FRTB (Fundamental Review of the Trading Book)…

# 7 – Architecture and Data Management Antipatterns

How not to architect Financial Service IT platforms using Risk Applications as an example.

The Five Deadly Sins of Financial Services IT..

# 8 – The Intelligent Banker Needs Better Risk Management –

The Intelligent Banker needs better Risk Management

# 9 – Implications of Basel III

This blogpost discusses the key implications of Basel III.

Towards better Risk Management..Basel III

#10 The Implications of BCBS 239

This blogpost discusses the data management and governance implications of BCBS 239. The BCBS 239 provides guidelines to overhaul an organization’s risk data aggregation capabilities and internal risk reporting practices.

BCBS 239 and the need for smart data management

Conclusion..

Industry clearly requires a fresh way of thinking about Risk Management. Leader firms will approach Risk as a way to create customer value and a board level conversation around such themes rather than as a purely defensive and regulatory challenge. Surely, this will mean that budgets for innovation related spending in areas such as Digital Transformation will also slowly percolate over to Risk. As firms either digitize or deal with gradually eroding market share, business systems that work with and leverage risk will emerge as a strong enterprise capability over the upcoming 3-5 year horizon.

Cybersecurity and the Next Generation Datacenter..(2/4)

The first blog of this four part series introduced a key business issue in the Digital Age – Cybersecurity. We also briefly touched upon responses that are being put in place by Corporate Boards. This part two focuses on technology strategies for enterprises to achieve resilience in the face of these attacks. The next post – part three – will focus on advances in Big Data Analytics that provide advanced security analytics capabilities. The final post of the series will focus on the business steps that Corporate boards, Executive & IT leadership need to adopt from a governance & strategy standpoint to protect & insulate their businesses from the constant firehose of cyber attacks.

Growing reliance on IT breeds Cyber Insecurity – 

The increased reliance on on information technology to help run businesses, their supply chains and consumer facing applications has led to a massive increase cyber risk. Given that most organizations are increasingly allowing employees to remotely access critical systems, the need to provide highly secure computing capabilities has become more pronounced. This IT infrastructure ranges from systems that store sensitive customer information, financial data etc has lead to an entire industry segment for network and computer security. This also has led to the creation of a burgeoning market of security experts across a range of cyber segments to tailor solutions to fit the operating requirements of respective client organizations.

 The core domains of Cyber Defense –

A fact of life facing the CIO & CISO (Chief Information Security Officer) or an IT manager is that every enterprise corporate datacenter is currently a mishmash of existing legacy technology investments. These range from antiquated proprietary software, some open source investments, proprietary server, rack & networking architectures. The people & software process piece is then added on all of these by incorporating custom architecture tools and governance processes.

Layered across & within these are the typical security tools, frameworks and approaches that are employed commonly.

The important functional areas of Cybersecurity are listed below –

  • Intrusion detection systems (IDS)
  • Firewalls
  • Application Security leveraging Cryptography
  • Data Security
  • System administration controls,
  • Server & Workstation security
  • Server Management Procedures (Patching, Updating etc),
  • Incident Response,
  • Data Protection
  • Identity and Access Management (IAAM) etc. These tools are also commonly extended to endpoint devices like laptops and mobile clients etc.

While these are all valid & necessary investments, security as a theme in IT is almost always an afterthought across the four primary technology domains in the datacenter – Infrastructure, Platforms, Data & Management.

Why is that?

From an IT philosophy & culture standpoint, security has historically been thought of as a Non Functional Requirement (or an “illity”) or a desirable or additive feature. As a result most high level executives as well as IT personnel & end users have come to regard security as a process of running through checklists by installing cumbersome client tools, malware scanners as well as conforming with periodic audits.

However recent hack attacks at major financial institutions as discussed in the first post in this series – http://www.vamsitalkstech.com/?p=1265, have brought to fore the need to view Cybersecurity & defense as an integral factor in the IT lifecycle. Not just an integral factor but a strategic component while building out applications & datacenter architectures that host them.

Datacenter complexity breeds Cyber Insecurity – 

Information Technology as an industry is really only 30+ years old as compared to architecture or banking or manufacturing or healthcare which have existed as codified bodies of knowledge for hundreds of years. Consequently the body of work on IT still evolves and continues to do so at a rapid clip. Over the last couple of decades, computing architectures have evolved from being purely mainframe based in the 1960s & 70s to a mishmash of few hundred Unix servers running the entire application portfolio of a major enterprise like a financial institution in the 1980s.

Fast forward to 2016, a typical Fortune 500 enterprise now runs multiple data centers with each hosting hundreds of thousands of Linux & Windows based servers either bare metal or virtualized, high end mainframes, legacy Unix systems etc. The sum total of the system images can run into tens of operating systems alone. When one factors in complex n-tier applications themselves along with packaged software (databases, application servers, message oriented middleware,business process management systems, ISV applications and systems utilities etc), the number of unique systems runs into 100,000 or more instances.

This complexity adds significantly to management tasks (maintaining, updating, patching servers) as well as the automation factor needed to derive business value at scale.

Security challenges thus rise manifold in the typical legacy technology dominated data center.

The top five day to day challenges from a security standpoint include –

  • Obtaining a ‘single pane of glass‘ view from a security standpoint across the zones in the infrastructure
  • Understanding and gaining visibility in realtime across this complex infrastructure
  • Staying ahead of rapidly moving exploits like the Heartbleed virus, the Shellshock bash vulnerability etc. The key point is that ensuring that all vulnerable systems are instantly patched
  • Understanding what platforms and systems are hosting applications that have been designated as “Non Compliant” for various reasons – legacy applications that are no longer maintained, out of support or unsupported software stacks which are way behind on patch levels etc
  • Proactively enforcing policies around security compliance and governance. These run the gamut from server patch policies, hardware configurations tailored to security zones e.g. a server with too many NICs in a DMZ or applications that did not have the correct & certified version of an application.

Datacenter Architecture built for Cybersecurity – 

Can there be a data center architecture that is optimized for Cybersecurity from the ground up?

I contend that at a high level, four critical tiers and planes underlie every typical corporate information technology architecture.

These are  –

  1. Infrastructure tier – where Storage, Compute & Network provisioning reside
  2. Data tier – the sum total of all data assets including OLTP systems, Big Data
  3. Application/Services tier – applications composed of services or microservices
  4. Management plane – which maintains the operator and admins view

Open source and Cybersecurity – Like two peas in a pod – 

Open source technology choices across the above layers provide the highest security benefits. Open source platforms are maintained by the highest number of varied contributors that removes the dependence on any one organization as a source of security support. The open development model ensures that hordes of developers – both corporate & hobbyists, agree on standards while constantly testing and improving platforms. For example, platforms ranging from Red Hat Linux to Open Middleware to Hadoop to OpenStack have received the highest security ratings in their respective categories. All of the above platforms have the highest release velocity, rate of product updates & security fixes.

There has been a perception across the industry that while open source frameworks and platforms are invaluable for developers, they are probably not a good fit for IT operations teams who need a mix of highly usable management consoles as well as scripting facilities & monitoring capabilities. However, open source projects have largely closed this feature gap and then some over the last five years. Robust and mature open source management platforms now span the gamut across all the above disciplines as enumerated below.

  • OS Management – Systems Management Consoles
  • Application Middleware – end to end application deployment, provisioning & monitoring toolsets
  • Big Data & Open Source RDBMS – Mature consoles for provisioning, managing, and monitoring clusters
  • Cloud Computing – Cloud Management Platforms

The below illustration captures these tiers along with specifics on security; lets examine each of the tiers starting from the lowest –

CyberSecurity_DC_Arc

         Illustration: Next generation Data Center with different technology layers 

Infrastructure Tier

The next generation way of architecting infrastructure is largely centered around Cloud Computing. A range of forward looking institutions are either deploying or testing cloud-based solutions that span the full range of cloud delivery models – whether private or public or a hybrid mode.

Security and transparency are best enabled by a cloud based infrastructure due to the below reasons.

  • highly standardized application & OS stacks with enables seamless patching across tiers
  • Workload isolation by leveraging virtualization or containers
  • Highest levels of deployment automation
  • The ability of cloud based stacks to scale up at an instant to handle massive amounts of streaming data

Cloud computing provides three main delivery models (IaaS, PaaS & SaaS).

  • IaaS (infrastructure-as-a-service) to provision compute, network & storage,
  • PaaS (platform-as-a-service) to develop applications &
  • exposing their business services as  SaaS (software-as-a-service) via APIs.

There are three broad options while choosing Cloud a based infrastructure –

  1. Leveraging a secure public cloud  (Amazon AWS or Microsoft Azure) or
  2. An internal private cloud (built on OpenStack etc)
  3. A combination of the two i.e a hybrid approach is a safe and sound bet for any new or greenfield applications.

In fact many vendors now offer cloud based security products which offer a range of services from malware detection to monitoring cloud based applications like Google’s suite of office applications, Salesforce etc.

Data Tier – 

While enterprise data tiers are usually composed of different technologies like RDBMS, EDW (Enterprise Data Warehouses), CMS (Content Management Systems) & Big Data etc. My recommendation for the target state is largely dominated by appropriate technologies for the appropriate usecase. For example a Big Data Platform powered by Hadoop is a great fit for data ingest, processing & long term storage. EDW’s shine at reporting use cases & RDBMS’s at online transaction processing. Document Management Systems are fantastic at providing business document storage, retrieval etc. All of these technologies can be secured for both data in motion and at rest.

Given the focus of the digital wave in leveraging algorithmic & predictive analytics capabilities in create tailored & managed consumer products  – Hadoop is a natural fit as it is fast emerging as the platform of choice for analytic applications.  

Big Data and Hadoop make security comparatively easy to bake in as compared to a silo’ed approach due to the below reasons –  

  1. Hadoop’s ability to ingest and work with all the above kinds of data & more (using the schema on read method) has been proven at massive scale. Operational data stores are being built on Hadoop at a fraction of the cost & effort involved with older types of data technology (RDBMS & EDW). Since the data is all available in one place, it makes it much more easier to perform data governance & auditing
  2. The ability to perform multiple types of security processing on a given data set. This processing varies across batch, streaming, in memory and realtime which greatly opens up the ability to create, test & deploy closed loop analytics quicker than ever before. In areas like security telemetry where streams of data are constantly being generated,  ingesting high volume data at high speeds and sending it to various processing applications for computation and analytics – is key
  3. The DAS (Direct Attached Storage) model that Hadoop provides fits neatly in with the horizontal scale out model that the services, UX and business process tier leverage in a cloud based architecture.
  4. The ability to retain data for long periods of time thus providing security oriented applications with predictive models that can reason on historical data
  5. Hadoop provides the ability to run a massive volumes of models in a very short amount of time helps with modeling automation

Techniques like Machine Learning, Data Science & AI feed into core business processes thus improving them. For instance, Machine Learning techniques support the creation of self improving algorithms which get better with data thus making accurate cyber security & other business predictions. Thus, the overarching goal of the analytics tier should be to support a higher degree of automation by working with the business process and the services tier. Predictive Analytics can be leveraged across the value chain of Cybersecurity & have begun to find increased rates of adoption with usecases ranging from behavior detection to telemetry data processing.

Services Tier

A highly scalable, open source & industry leading platform as a service (PaaS) is recommended as the way of building out and hosting this tier. A leading PaaS technology , (e.g. Red Hat’s OpenShift), is hardened constantly for process, network, and storage separation for each of the tenets running on a private or public cloud. In addition, there is focus on providing intrusion detection capabilities across files, ports & potential back doors etc.

An enterpise PaaS provides the right level of abstraction for both developers and deployers to encapsulate business functinlaity as microservices. This capability is provided via it’s native support for a linux container standard like Docker that can  be hosted on either bare metal or any virtualization platform. This also has the concomitant advantage of standardizing application stacks, streamlining deployment pipelines thus leading the charge to a DevOps style of building applications which can constantly protect against new security exploits. Microservices have moved from the webscale world to fast becoming the standard for building mission critical applications in many industries. Leveraging a PaaS such as OpenShift provides a way to help cut the “technical debt” [1] that has plagued both developers and IT Ops.

Further I recommend that service designers design their micro services so that they can be deployed in a SaaS paradigm – which usually implies taking an API based approach. APIs promotes security from the get-go due to their ability to expose business oriented functionality depending on the end users permission levels.

Further, APIs enable one to natively build or to integrate security features into the applications themselves  – via simple REST/SOAP calls. These include APIs for data encryption, throttling traffic from suspect consumers, systems behavior monitoring &  integration with Identity & Access Management Systems etc.

A DevOps oriented methodology is recommended in building applications in the following ways –

  • Ensuring that security tooling is incorporated into development environments
  • Leveraging resiliency & recoverability tools like ChaosMonkey (which is part of the Netflix Simian Army project) etc to constantly test systems for different kinds of vulnerabilities (e.g abnormal conditions, random errors, massive amounts of traffic etc) from Day 1
  • Promoting horizontal scaling and resilience by testing live application updates, rollbacks etc
  • Leveraging a Cloud, OS & development language agnostic style of application development

 

User Experience Tier – 

The UX (User Experience) tier fronts humans – clients. partners, regulators, management and other business users across all touch points. The UX tier interacts closely APIs  provided for partner applications and other non-human actors to interact with business service tier. Data and information security are key priorities at this layer offers secure connectivity to backend systems.  Data is transmitted over a secure pipe from device to backend systems across business applications.

The UX tier has the following global security responsibilities  – 

  1. Provide a consistent security across all channels (mobile, eBanking, tablet etc) in a way that is a seamless and non-siloed. The implication is that clients should be able to begin a business transaction in channel A and be able to continue them in channel B where it makes business sense where security is carried forth across both channels.
  2. Understand client personas and integrate with the business & predictive analytic tier in such a way that the UX is deeply integrated with the overall security architecture
  3. Provide advanced visualization (wireframes, process control, social media collaboration) that integrates with single sign on(SSO) & cross partner authentication
  4. The UX should also be designed is such a manner that it’s design, development & ongoing enhancement follow an Agile & DevOps methodology

The other recommendation for remote clients is to leverage desktop virtualization. In this model, the user essentially uses a device (a laptop or a terminal or a smartphone) that performs zero processing in that it just displays a user interface or application (ranging from the simple to the complex – a financial application, or office tools, document management user interface etc) delivered from a secure server over a secure connection. These clients known as zero clients are highly secure as they run a golden uncompromisable image run from a highly protected central server. These also have a smaller attack surface.

How to embed Cybersecurity into the infrastructure –  

How do all of the above foundational technologies (Big Data, UX,Cloud, BPM & Predictive Analytics) help encourage a virtuous cycle?

This cycle needs to be accelerated helping the creation of a learning organization which can outlast competition by means of a culture of unafraid experimentation and innovation.

  1.  The Architecture shall support small, incremental changes to business services & data elements based on changing business requirements which include Cybersecurity
  2. The Architecture shall support standardization across application stacks, toolsets for development & data technology to a high degree
  3. The Architecture shall support the creation of a user interface that is highly visual and feature rich from a content standpoint when accessed across any device
  4. The Architecture shall support an API based model to invoke any interaction – by a client or an advisor or a business partner
  5. The Architecture shall support the development and deployment of an application that encourages a DevOps based approach
  6. The Architecture shall support the easy creation of scalable business processes that natively emit security metrics from the time they’re instantiated to throughout their lifecycle

Summary

My goal in this post was to convince enterprise practitioners to shed their conservatism in adopting new approaches in building out applications & data center architectures. The inherent advantage in using Cloud, Big Data & Realtime analytics is that security can been intrinsically built into infrastructure.

This post makes no apologies about being forward looking. Fresh challenges call for fresh approaches and a new mindset.

References

[1] https://en.wikipedia.org/wiki/Technical_debt

[2] http://venturebeat.com/2014/01/30/top-ten-saas-security-tools/

Hadoop counters Credit Card Fraud..(2/3)

This article is the second installment in a three part series that covers one of the most critical issues facing the financial industry – Payment Card Fraud. While the first (and previous) post discussed the global scope of the problem & the business ramifications –  this post will discuss a candidate Big Data Architecture that can help financial institutions turn the tables on Fraudster Networks. The final post will cover the evolving business landscape in this sector – in the context of disruptive technology innovation (predictive & streaming analytics) and will make specific recommendations from a thought leadership standpoint.

Traditional Approach to Fraud Monitoring & Detection – 

Traditional Fraud detection systems have been focused on looking for factors such as known bad IP addresses or unusual login times based on Business Rules and Events. Advanced fraud detection systems augment the above approach with building models of customer behavior at the macro level. Then they would use these models to detect anomalous transactions and flag them as potentially being fraudulent. However, the scammers have also learnt to stay ahead of the scammed and are leveraging computing advances to come up with ever new ways of cheating the banks.

Case in point [1] –

In 2008 and 2009, PayPal tested several fraud detection packages, finding that none could provide correct analysis fast enough, Dr. Wang (head of Fraud Risk Sciences – PayPal) said. She declined to name the packages but said that the sheer amount of data PayPal must analyze slowed those systems down.

Why Big Data and Hadoop for Fraud Detection?

Big Data is dramatically changing that approach with advanced analytic solutions that are powerful and fast enough to detect fraud in real time but also build models based on historical data (and deep learning) to proactively identify risks.

The business reasons why Hadoop is emerging as the best choice for fraud detection are –

  1. Real time insights –  Hadoop can be used to generate insights at a latency of a few milliseconds  that can assist Banks in detecting fraud as soon as it happens
  2. A Single View of Customer/Transaction & Fraud enabled by Hadoop
  3. Loosely coupled yet Cloud Ready Architecture
  4. Highly Scalable yet Cost effective

The technology reasons why Hadoop is emerging as the best choice for fraud detection are:

  1. Hadoop (Gen 2) is not just a data processing platform. It has multiple personas – a real time, streaming data, interactive platform for any kind of data processing (batch, analytical, in memory & graph based) along with search, messaging & governance capabilities built in – all of which support fraud detection architecture patterns
  2. Hadoop provides not just massive data storage capabilities but also provides multiple frameworks to process the data resulting in response times of milliseconds with the outmost reliability whether that be realtime data or historical processing of backend data
  3. Hadoop can ingest billions of events at scale thus supporting the most mission critical analytics irrespective of data size
  4. From a component perspective Hadoop supports multiple ways of running models and algorithms that are used to find patterns of fraud and anomalies in the data to predict customer behavior. Examples include Bayesian filters, Clustering, Regression Analysis, Neural Networks etc. Data Scientists & Business Analysts have a choice of MapReduce, Spark (via Java,Python,R), Storm etc and SAS to name a few – to create these models. Fraud model development, testing and deployment on fresh & historical data become very straightforward to implement on Hadoop
  5. Hadoop is not all about highly scalable filesystems and processing engines. It also provides native integration with highly scalable NoSQL options including a database called HBase. HBase has been proven to support near real-time ingest of billions of data streams. HBase provides near real-time, random read and write access to tables containing billions of rows and millions of columns

Again, from [1] –

PayPal processes more than 1.1 petabytes of data for 169 million active customer accounts, according to James Barrese, PayPal’s chief technology officer. During customer transactions, subsets of the data are analyzed in real-time.

Since 2009, PayPal has been building and modifying its fraud analytics systems, incorporating new open-source technologies as they have evolved. For example, the company uses Hadoop to store data, and related analytics tools, such as the Kraken. A data warehouse from Teradata Corp. stores structured data. The fraud analysis systems run on both grid and cloud computing infrastructures.

Several kinds of algorithms analyze thousands of data points in real-time, such as IP address, buying history, recent activity at the merchant’s site or at PayPal’s site and information stored in cookies. Results are compared with external data from identity authentication providers. Each transaction is scored for likely fraud, with suspicious activity flagged for further automated and human scrutiny, Mr. Barrese said.

After implementing multiple large real time data processing applications using Big Data related technologies in financial services, we present a proven architectural pattern & technology stack that has been proven in very large production deployments. The key goal is to process millions of events per second, tens of billions of events per day and tens of terabytes of financial data per day – as is to be expected in a large Payment Processor or Bank.

Business Requirements

  1. Ingest (& cleanse) real time Card usage data to get complete view of every transaction with a view to detecting potential fraud
  2. Support multiple ways of ingest across a pub-sub messaging paradigm,clickstreams, logfile aggregation and batch data – at a minimum
  3. Allow business users to specify 1000’s of rules that signal fraud e.g. when the same credit card is used from multiple IP addresses within a very short span of time
  4. Support batch oriented analytics that provide predictive and historical models of performance
  5. As much as possible, eliminate false positives as these cause inconvenience to customers and also inhibit transaction volumes
  6. Support a very high degree of scalability – 10’s of millions of transactions a day and 100’s of TB of historical information
  7. Predict cardholder behavior (using a 360 degree view) to provide better customer service
  8. Help target customer transactions for personalized communications on transactions that raise security flags
  9. Deliver alerts the ways customers want — web, text, email and mail etc
  10. Track these events end to end from a strategic perspective across dashboards and predictive models
  11. Help provide a complete picture of high value customers to help drive loyalty programs

Design and Architecture

The architecture thus needs to consider two broad data paradigms — data in motion and data at rest.

Data in motion is defined as streaming data that is being sent into an information architecture in real time. Examples of data in motion include credit card swipes, e-commerce tickets, web-based interactions and social media feeds that are a result of purchases or feedback about services. The challenge in this area is to assimilate a huge volume of data and filter it, gather reason from it and to send it to downstream systems such as a business process management (BPM) or a Partner System(s). Managing the event data to make sure changing business rules/regulations are consistently integrated with the data is another key facet in this area.

Data at rest is defined as data that has been collected and ingested in a form that conforms to enterprise data architecture and governance specifications. This data needs to be assimilated or federated with pre-existing sources so that the business can query it in a read/write manner from a strategic and long-term perspective.

A Quick Note on Data Science and it’s applicability to Fraud Monitoring & Detection – 

Various posts in this blog have discussed the augmented capability of financial organizations to acquire, store and process large volumes of data using commodity (x86) hardware.  At the same time, technologies such as Hadoop and Spark have enabled the collection, organization and analysis of Big Data at scale. The convergence of cost effective storage and scalable processing allows us to extract richer insights from data. These insights can then be operationalized to provide commercial and social value.   Data science is a term that refers to the process of extracting meaningful insights from large volumes of structured and unstructured data. Data science is about scientific exploration of data to extract meaning or insight, and the construction of software systems to utilize such insights in a business context.   This involves the art of discovering data insights combined with the science of operationalizing them.  A data scientist uses a combination of machine learning, statistics, visualization, and computer science to extract valuable business insights hiding in data and builds operational systems to deliver that value. Data Science based approaches are core to the design and architecture of a Fraud Detection System. Data Mining techniques range from clustering and classification to find patterns and associations among a large group of data. The machine learning components are classified into two categories: ‘supervised’ and ‘unsupervised’ learning. These methods seek for accounts, customers, suppliers, etc. that behave ‘unusually’ in order to output suspicion scores, rules or visual anomalies, depending on the method. (Ref – Wikipedia).

It needs to be kept in mind that Data science is a cross-functional discipline. A data scientist is part statistician, part developer and part business strategist. The Data Science team collaborates with an extended umbrella team which includes visualization specialists, developers, business analysts, data engineers, applied scientists, architects, LOB owners and DevOps (ref – Hortonworks). The success of data science projects often relies on the communication, collaboration, and interaction that takes place with the extended team, both internally and possibly externally to their organizations.

Reference Architecture

FP1 

Illustration 1:  Candidate Architecture Pattern for a Fraud Detection Application 

 

The key technology components of the above reference architecture stack include:

  1. Information sources are depicted at the left. These encompass a variety of machine and human actors either transmitting potentially thousands of real time messages per second. These are your typical Credit Card Swipes, Online transactions, Fraud databases and other core Banking data.
  2. A highly scalable messaging system to help bring these feeds into the architecture as well as normalize them and send them in for further processing. Apache Kafka is chosen for this tier.Realtime data is published by Payment Processing systems over Kafka queues. Each of the transactions has 100s of attributes that can be analyzed in real time to  detect patterns of usage.  We leverage Kafka integration with Apache Storm to read one value at a time and perform some kind of storage like persist the data into a HBase cluster.In a modern data architecture built on Apache Hadoop, Kafka ( a fast, scalable and durable message broker) works in combination with Storm, HBase (and Spark) for real-time analysis and rendering of streaming data. Kafka has been used to message geospatial data from a fleet of long-haul trucks to financial data to sensor data from HVAC systems in office buildings.
  3. A Complex Event Processing tier that can process these feeds at scale to understand relationships among them; where the relationships among these events are defined by business owners in a non technical or by developers in a technical language. Apache Storm integrates with Kafka to process incoming data. Storm architecture is covered briefly in the below section.
  4. Once the machine learning models are defined, incoming data received from the Storm/Spark tier will be ingested into the models to predict outlier transactions or potential fraud. As a result of specific patterns being met that indicate potential fraud, business process workflows are created that follow a well defined process that is predefined and modeled by the business.
    • Credit card transaction data comes as stream (typically through Kafka)
    • An external system has information about the credit card holder’s recent location (collected from GPS on mobile device and/or from mobile towers)
    • Each credit card transaction is looked up against user’s current location
    • If the geographic distance between the credit card transaction location and user’s recent known location is significant (say 100 miles), the credit card transaction is flagged as potential fraudScreen Shot 2015-10-27 at 9.52.34 PM

Illustration 2 :External Lookup Pattern for a Fraud Detection Application (Sheetal Dolas – Hortonworks)

  1. Data that has business relevance and needs to be kept for offline or batch processing can be handled using the  storage platform based on Hadoop Distributed Filesystem (HDFS). The idea to deploy Hadoop oriented workloads (MapReduce, or, Machine Learning) to understand fraud patterns as they occur over a period of time.Historical data can be fed into Machine Learning models created in Step 1 and commingled with streaming data as discussed in step 2.
  2. Horizontal scale-out is preferred as a deployment approach as this helps the architecture scale linearly as the loads placed on the system increase over time
  3. Output data elements can be written out to HDFS, and managed by HBase. From here, reports and visualizations can easily be constructed.
  4. One can optionally layer in search and/or workflow engines to present the right data to the right business user at the right time.  


Messaging Broker Tier

The messaging broker tier (based on Apache Kafka) is the first point of entry in a system. It fundamentally hosts a set of message queues. The broker tier needs to be highly scalable while supporting a variety of cross language clients and protocols from Java, C, C++, C#, Ruby, Perl, Python and PHP. Using various messaging patterns to support real-time messaging, this tier integrates application, endpoints and devices quickly and efficiently. The architecture of this tier needs to be flexible so as to allow it to be deployed in various configurations to connect to customized solutions at every endpoint, payment outlet, partner, or device.

Pipeline

Illustration 3: Multistage Data Refinery Pipeline for a Fraud Detection Application

Apache Storm is an Open Source distributed, reliable, fault – tolerant system for real time processing of large volume of data. Spout and Bolt are the two main components in Storm, which work together to process streams of data.

  • Spout: Works on the source of data streams. In this use case, Spout will read realtime transaction data from Kafka topics.
  • Bolt: Spout passes streams of data to Bolt which processes and passes it to either a data store or another Bolt.

Storm-Kafka

                                                        Illustration 3:  Kafka-Storm integration

Storage Tier

There are broad needs for two distinct data tiers that can be identified based on business requirements.

  1. Some data needs to be pulled in near realtime, accessed in a low latency pattern as well as have calculations performed on this data. The design principle here needs to be “Write Many and Read Many” with an ability to scale out tiers of servers
  2. In memory technology based on Spark is very suitable for this use case as it not only supports a very high write rate but also gives users the ability to store, access, modify and transfer extremely large amounts of distributed data. A key advantage here is that Hadoop based architectures can pool memory and can scaleout across a cluster of servers in a horizontal manner. Further, computation can be pushed into the tiers of servers running the datagrid as opposed to pulling data into the computation tier.
  3. As the data volumes increase in size, compute can scale linearly to accommodate them. The standard means of doing so is through techniques such as data distribution and replication. Replicas are nothing but copies of the same segment or piece of data that are stored across (aka distributed) a cluster of servers for purposes of fault tolerance and speedy access. Smart clients can retrieve data from a subset of servers by understanding the topology of the grid. This speeds up query performance for tools like business intelligence dashboards and web portals that serve the business community.
  4. The second data access pattern that needs to be supported is storage for data that is older. This is typically large scale historical data. The primary data access principle here is “Write Once, Read Many.” This layer contains the immutable, constantly growing master dataset stored on a distributed file system like HDFS. Besides being a storage mechanism, the data stored in this layer can be formatted in a manner suitable for consumption from any tool within the Apache Hadoop ecosystem like Hive or Pig or Mahout.

The final word [1] – 

Since 2009, PayPal has been building and modifying its fraud analytics systems, incorporating new open-source technologies as they have evolved. For example, the company uses Hadoop to store data, and related analytics tools, such as the Kraken. A data warehouse from Teradata Corp. stores structured data. The fraud analysis systems run on both grid and cloud computing infrastructures.

Several kinds of algorithms analyze thousands of data points in real-time, such as IP address, buying history, recent activity at the merchant’s site or at PayPal’s site and information stored in cookies. Results are compared with external data from identity authentication providers. Each transaction is scored for likely fraud, with suspicious activity flagged for further automated and human scrutiny, Mr. Barrese said.

For example, “a very bad sign” is when one account shows IP addresses from 10 parts of the world, Dr. Wang said, because it suggests the account might have been hacked.

The system tags the account for review by human experts, she said. “They might discover that the IP addresses are at airports and this guy is a pilot,” she said. Once verified, that intelligence is fed back into PayPal’s systems. Humans don’t make the system faster, but they make real-time decisions as a check against, and supplement to, the algorithms, she said.

The combination of open-source technology, online caching, algorithms and “human detectives,” she said, “gives us the best analytical advantage.”

References – 

[1] “PayPal fights Fraud With Machine Learning and Human Detectives” – From WSJ.com

http://blogs.wsj.com/cio/2015/08/25/paypal-fights-fraud-with-machine-learning-and-human-detectives/