The Emerging Role for Big Data and Machine Learning on the Buy Side in Financial Services..

The Buy Side is perhaps the biggest segment of Wall St & the financial markets – there are roughly 7,000+ mutual funds, thousands of hedge funds which invest across 40,000 plus instruments – stocks, bonds and other securities. Thus, one of the important business functions on Buy-Side institutional businesses such as Mutual Funds, Hedge Funds, Trusts, Asset Managers, Pension Funds & Private Equity is to constantly analyze a range of information about companies underlying the above instruments to determine their investment worthiness. 

The Changing Nature of the Buy Side circa 2018…

When compared with the rest of the financial services industry,  the investment and asset management sector has lagged behind in terms of the many business and technology shifts over the recent decades, as we have cataloged in the below series of blogs.

The State of Global Wealth Management..(1/3)

Given the competitive nature of the market, commodified investment strategies will need to rapidly change to incorporate more and more advanced technology into the decision making process.Combined with substandard performance across a crucial sector of the Buy Side – Hedge Funds over the last couple of years – there is all of a sudden a need to incorporate innovative approaches to enhancing Alpha.

This is even more important in this age of real-time information. Market trends, sentiment, and operational risk issues, negative news seem to crop up virtually every day.

All of the above information sources have an ability to dramatically change the quality of an underlying financial instrument. At some point, the ability of a human portfolio manager to keep up with the information onslaught is moot, this calls for techniques around advanced intelligence and automation.

In this blog post, I will discuss key recommendations across the spectrum of  Big Data and Artificial Intelligence techniques to help store, process and analyze hundreds of data points across the universe of millions of potential investments.

Recommendation #1 Focus on Non-Traditional Datasets…

Traditional investment management has tended to focus on a financial analysis. This is rigorous fundamental analysis of the investment worthiness of the underlying company. At larger Buy Side firms especially the big mutual funds, tens of Portfolio Managers & Analysts constantly analyze a range of data – both quantitative – e.g. financial statements such as balance sheets, cash flow statements & qualitative – e.g industry trends, supply chain information etc. The trend analysis is typically broken up into three broad areas – Momentum, Value (relative to other players in the same segment) and Future Profitability. It is also not uncommon for large mutual funds to add and remove companies constantly from their investment portfolio – almost on a weekly basis. I propose that firms expand the underlying data into not just the traditional sources identified above but also some of the newer kinds as depicted in the below illustration. The information asymmetry advantage conferred by using a wider source of data has the potential to produce outsize investment performance.

Recommendation #2 Leverage advances in Big Data storage and processing…

We start moving into the technology now. First, a range of non-traditional data has to be identified and then ingested into a set of commodity servers either in an on-premise data center or using a cloud provider such as Amazon AWS or Microsoft Azure. It then needs to be curated, by applying business level processing. This could include identifying businesses using fundamental analysis or applying algorithms that spot patterns in data that pertain to attractiveness based on certain trending themes etc.

As the below table captures, the advent of Big Data collection, storage, and processing techniques now enable a range of information led capabilities that were simply not possible with older technology.

All of these non-traditional data streams shown above and depicted below can be stored on commodity hardware clusters. This can be done at a fraction of the cost of traditional SAN storage. The combined data can then be analyzed effectively in near real-time thus providing support for advanced business capabilities.

Driver Business Value Example
Data Volumes   Larger data sets allow analysts to query and conduct experiments with fewer iterations to understand which business fit certain investment strength criteria Omnichannel data, Customer engagement data, ticker data, pricing data, sales volumes across longer time horizons. Social media and third-party datasets etc
Data Variety  New data types spanning text, images, time series data and video Business process data, audio data, images, Sensor & device data. Publicly available statements and OTC contracts
Analytics and visualization More powerful analytics and visualization tools to explain and explore investment themes and patterns Complex Event Processing (CEP), predictive analytics.   Portfolio and risk management dashboards
Data Velocity Open source software tools.   Lower server and enterprise storage costs Hadoop, NoSQL. Commodity hardware. Elastic compute capacity.

Recommendation #3 For certain key areas of the process esp Portfolio Backtesting and Risk Management, adopt Parallel Processing Techniques…

We have covered how the rapidly flowing information across markets creates opportunities for buy-side firms that can exploit this data. In this context, a key capability is to perform backtesting of key algorithmic strategies based on years worth of historical data. These strategies can range from deciding when to trade away exposure to capital optimization. The scale of the analysis problem is immense with virtually 10s of thousands of investment prospects (read companies) operating across the globe across 30+ countries in 6 continents. Every time an algorithm is tweaked, extensive backtesting must be performed on a few quarters or years of historical data to assess its performance.

Big Data has a huge inherent architectural advantage here in that it minimizes data movement & can bring the processing to the data, can cut down the time taken to run these kinds of backtesting and risk analyses across TB of data to hours as opposed to a day or two taken by older technology.

Recommendation #4 Adopt & Accelerate Adoption of Machine Learning Techniques…

Given that the process of investment research is rapidly becoming a data and analytics challenge, what are the new techniques in the analytics space that can help?

Big Data & Advanced Analytics drive profits in Financial Services..(1/3)

  • Classification & Class Probability Estimation– For a given set of data, predict for each individual in a population, a discrete set of classes that this individual belongs to. An example classification is – “For all wealth management clients in a given population, who are most likely to respond to an offer to move to a higher segment”. Common techniques used in classification include decision trees, Bayesian models, k-nearest neighbors, induction rules etc. Class Probability Estimation (CPE) is a closely related concept in which a scoring model is created to predict the likelihood that an individual would belong to that class.  Employing such classical machine learning techniques such as clustering, segmentation, and classification to create models that can automatically segment investment prospects into key categories. These could be based on certain key investment criteria or factors.
  • Testing investment hypotheses by understanding hitherto hidden relationships among the underlying data
  • Constantly learning from the underlying data and then ranking companies based on investment metrics and criteria
  • Adopting Natural language processing (NLP) techniques to read from and to analyze thousands of text documents such as regulatory filings, research reports etc. A key use case is to understand what kinds of geopolitical events can use movements in location sensitive instruments such as heavy metals, commodities such as oil. This is very important as markets move in concert. This can be analyzed on the fly to not just rebalance exposures but also client portfolios. The usecases for NLP are myriad.

Recommendation #5 Leverage Partnerships…

We are aware of the fact that the above investments in technology may be a huge ask of small and mid-level buy-side firms which have viewed technology as a supporting function. However, there now exist service providers that provide the infrastructure, curated data feeds, and custom analytics as a SaaS (Software-as-a-service) to interested clients. Let not size and potential upfront CapEx investment deter these firms from driving their investment methodology to a data-driven process.

Recommendation #6 Increase Automation via Analytics but Human still stays in the loop…

None of the above technology recommendations are intended to displace a portfolio manager who has years of rich industry experience and expertise. The above technology stack can enable these expensive resources to focus their valuable time on activities that add meaningful business value. e.g interviewing key investment prospects, real-time analysis/ portfolio rebalancing, trade execution and management/strategic reporting. Technology is just an aid in that sense and serves as an assistant to the portfolio manager.


Leading actively managed funds are all about selection, allocation and risk/return assessments. The business goal is to ultimately generate insights that can drive higher investment returns or to shield from investment risk. As Buy Side firms across the board evolve in 2018, one of the key themes from a business and technological standpoint is leveraging AI & Big Data technologies to transform their internal research process from a resource-intensive process to a data-driven investment process.

OpenShift v3: PaaS for the Software Defined Data Center ..(6/7)

Why OpenShift…

The containers and container orchestration landscape comprise of quite a few players and projects jockeying for market share. However, what makes OpenShift a unique platform is in how it helps enterprises surmount these challenges in five different areas:

  1. Firstly, the provenance of OpenShift is RHEL (Red Hat Enterprise Linux), possibly the industry’s dominant Operating System. Linux is indeed the foundation of containers especially with work done around cgroups, kernel namespaces and SELinux.  OpenShift v3 is a result of about five years of extensive engineering work and learning from live customers which makes it a highly robust platform.
  2. Secondly, by leveraging the Red Hat’s JBoss middleware portfolio, OpenShift offers a multifaceted PaaS for any kind of application architecture spanning Application servers, Messaging Brokers & lightweight integration platforms among others. OpenShift is very mindful of supporting legacy applications that are Stateful in nature.
  3. As we will see, OpenShift largely abstracts operations teams from the complexity involved in deploying & managing containerized workloads at scale.
  4. OpenShift is a true container platform which means that containers are the basic dev, build and runtime units. The platform is made to natively containers across build, deploy and manage continuum. Accordingly, it also provides developers with an integrated toolchain to develop containerized applications.
  5. By leveraging best of breed technologies such as Kubernetes & Docker, OpenShift avoids reinventing the wheel. Using both these foundational blocks, containerized applications deployed in OpenShift can be designed to be high-performance, fault-tolerant & provide a high degree of scalability.

With OpenShift v3, Red Hat offers a few groundbreaking architectural improvements over the older v2.

#1 OpenShift – A Container Management Platform

As mentioned above, OpenShift v3 is based on Linux container technology and is a platform that leverages Docker containers as the standard runtime artifact. Accordingly, everything in OSE is a container in terms of how applications are built, deployed, exposed to developers and orchestrated by administrators on the underlying hardware. For those new to containers, a reread of my Docker post below is highly recommended. Accordingly, everything runs in OSE v3 in a docker container. Docker is the runtime and packaging standard with OSE. Red Hat is also providing a default Docker Registry called the Atomic Registry with a full-fledged UI that gets installed with OSE by default.  This is a Certified Docker Registry which provides secure Images for a range of Open Source technologies

Why Linux Containers and Docker are the Runtime for the Software Defined Data Center (SDDC)..(4/7)

#2 OpenShift – Container Orchestration with Kubernetes –

Once Docker Engine is used to provide formatted application images, OSE uses Kubernetes primitives to provision, to deploy the overall cluster and to orchestrate deployments across multiple hosts.

Kubernetes is the container orchestration engine & cluster services manager of the PaaS. Red Hat has been making significant improvements to underlying services such as networking and storage. Open vSwitch is used for underlying networking and a Docker Registry is added by the OSE team. 

Kubernetes – Container Orchestration for the Software Defined Data Center (SDDC)..(5/7)

#3 OpenShift- Enable Easier Use of Containers across the DevOps Application lifecycle.

OpenShift provides a range of capabilities and tools that enable developers to perform source code management, build and deploy processes.

OpenShift provides facilities for Continuous integration (CI). It does this in several ways. Firstly, code from multiple team members is checked (push and merge code pull requests) into a common source control repository based on Git. This supports constant check-ins and automated checks/gates are added to run various kinds of tests. OpenShift includes Git workflow where a push event causes a Docker image build.

OpenShift can then automate all steps required to promote the work product from a CI standpoint to delivery using CD. These involve automated testing, code dependency checks etc and promoting images from one environment to the other. OpenShift provides a web console, command line tools & an Eclipse-based IDE plugin.

For management, Red Hat provides ManageIQ/Cloudforms for management of OSE Clusters. This tool does an excellent job of showing what containers are running on the platform, which hosts they’re running on and a range of usage statistics such as on each node; memory & CPU footprint

OpenShift Architecture…

OpenShift v3 (OSE) architecture mirrors underlying Kubernetes and Docker concepts as faithfully as it can.. Accordingly, it has a Master-Slave architecture as depicted below.

OpenShift Architecture (courtesy – Red Hat documentation [1])
The following other key concepts are important to note –

  1. OpenShift is an application platform targeted towards both developers and cloud administrators who are developing and deploying Cloud Native microservices based applications.
  2. The OpenShift Master is the control plane of the overall architecture. It is responsible for scheduling deployments, acting as the gateway for the API, and for overall cluster management. As depicted in the below illustration, It consists of several components, such as the Master API  that handles client authentication & authorization, the Controller that hosts the scheduler, and replication controller & the Client tools. It’s important to note that the management functionality only accesses the master to initiate changes in the cluster and does not access the nodes directly. The Master node also runs several processes that manage the cluster including the etcd datastore which stores all state data.
  3. The second primitive in the architecture is the concept of a Node. A node refers to a host which may be virtual or physical. The node is the worker in the architecture and runs application stack components on what are called Pods.
  4. The basic application runtime unit is a Docker Container (which provides container management and packaging) and OSE uses this paradigm across the lifecycle functions of container-based development & packaging – build, provision, schedule, orchestrate and manage – applications.
  5. All of the container & cluster management functionality is provided by Kubernetes – with some changes done by Red Hat primarily persistent storage and networking capabilities. Open vSwitch (OVS) provides the SDN networking implementation for communication between Pods and Services. We will discuss networking in a below section.
  6. OpenShift software installed on a RHEL 7/RHEL Atomic server (supports OSTree based updates) gives the host an OSE node personality – either a master or a slave. Instantiating a Docker image causes an application to be deployed in a container.
  7. Containers within OSE are grouped into Kubernetes Pods. Pods wrap around a group of Containers, thus application containers run inside Kubernetes pods. Kubernetes Pods are a first-class citizen in an OSE architecture.
  8. OSE also provides a private internal docker registry which can serve out container images to a range of consuming applications. The registry itself runs inside OSE as a Pod and stores all images. Red Hat Software Collections provides a range of certified images.
  9. Fluentd provides log management functionality in OSE. Fluentd runs on each OSE node and collects both application level and system logs. These logs are pushed into ElasticSearch for storage and Kibana is used for visualization. All of these packages are themselves in containers.
  10. Red Hat CloudForms is provided as a way to manage containers running on OSE. Using CloudForms, a deep view of the entire OSE cluster is provided – Inventory, Discovery, Monitoring etc. for hosts & pods in them.
  11. OpenShift v3 also introduces the concept of a project, which is built on Kubernetes namespaces. Projects are a way of separating functional applications from one another in an OSE cluster. Users and applications that pertain to one project (and namespace)  can only view resources within that project. Authorization is provided by Groups which are a collection of users.
  12. Given that containers run inside pods, Kubernetes assigns an IP address to these Pods. For example, consider a classical 3 tier architecture – Web layer; Application Server and a Database; Three different images once instantiated become a Docker container; Each of these can now be scaled independently of the other. Thus, it is a better design for each of these to run inside their own Pods. All containers run inside the same pod have the same IP address. However, they are required to use non-conflicting ports. Services are a higher level of abstraction in OpenShift. A service (e.g. an application server, or database) is a collection of pods. The service abstraction is important to note as it enables a given runtime component (e.g. a database, or, a message queue etc) to be reused among various applications.
  13. To reiterate pods are the true first-class citizens inside OSE. A pod runs on a given host. However, if a service consists of 10 pods, they can all be distributed across hosts. Thus, scaling applications imply scaling pods. Pods overall provide a clean architecture by abstracting Docker images from underlying storage and networking.
  14. Real world applications are typically composed of multiple tiers and containers across each, OSE leverages the concept of a Kubernetes Service. Access to the application is managed using the Service abstraction. A service is a proxy that connects multiple pods and exposes them to the outside world. Service also provides the notion of Labels. E.g. a JBoss application server pod can be called “Tier=Middle Tier”. Service can group pods based on labels which enable a range of interesting use cases and flexibility around pod access based on tags. Important examples are A/B deployments, Rolling deployments etc.
  15. All underlying networking is handled by an SDN layer based on Open vSwitch (OVS). This enables cloud and network admins to assign IP address ranges for Services/Pods. These IP’s are only reachable from the internal network. Open vSwitch enables design their network in a way that is best suited to their network. In addition, traffic can be encrypted to enable the highest degree of security.
  16. OpenShift also provides an integrated Routing layer to expose applications running inside pods to the outside world. The routing layer maps to kubernetes Ingress and Ingress Controller. Thus, OSE v3 also includes HAProxy as a reverse proxy. Once an application is deployed into OSE, a DNS entry is automatically created inside the load balancer. All the pods behind a service are added as endpoints behind the applications.
  17. All load balancing (across front-end pods) for external client application requests is done by the Router, which is the entry point for any external requests coming in as shown in the above illustration. OpenShift enables administrators to deploy routers to nodes in a cluster. These routes can be used by developers to expose applications running inside pods to external clients and services.  The routing layer is pluggable and two plugins are provided and supported by default.
  18. OpenShift provides extensive build and deployment tools for developers. An example in this regard are the Builder images are provided by OpenShift. The builder image is combined with source code to create a deployable artifact which is a logical application with all its binaries and dependencies. Once the developers provide the source code and commit it to their GIT repo, this triggers a build by the Master server.  The application source is combined with relevant builder images to create a custom image which is then stored in the OSE registry. Using WebHooks, OSE integrates with Git to automate the entire build and change process. Once an application container image is available, the deployment process takes over and deploys it on a given node, within a pod. Once deployed, a service is created along with a DNS route created in the routing layer for external users to access.
  1. As mentioned above, HA Proxy is running on a server with a static IP address. When MyApp1 and MyApp2 are deployed, corresponding entries are added in the DNS. For example –

Users are able to access the newly created application through the routing layer as shown above. Admins can set runtime resource utilization quotas for projects using the GUI, a major improvement over v2.  

Changes and upgrades to applications follow the same process as outlined above.

OpenShift SDN…

OpenShift as a platform provides a unified cluster network for interpod communication. [2] The implementation of the pod network is maintained by OpenShift SDN. The SDN provides an overlay network using Open vSwitch (OVS).  There are three SDN plugins provided for configuring the pod network.

  • The ovs-subnet plug-in which provides a flat pod network for interpod and service communication.
  • The ovs-multitenant plug-in which provides project level isolation for pods and services. Each project within the cluster is assigned a virtual network ID (VNID_ which is unique. This ensures that traffic originating from pods can be identified easily. Pods not in the original project cannot send or receive packets from pods/services from other projects.
  • The ovs-networkpolicy plug-in which allows custom isolation policies using NetworkPolicy objects.


 With OpenShift v3, Red Hat has built a robust application platform that combines Docker and Kubernetes primitives along with custom build services and 3rd part integrations. Expect to see Fortune 500 companies build Cloud Native applications leveraging this platform in the years to come.


[1] OpenShift v3 Documentation –

[2] OpenShift v3 Networking –



The Big Data Landscape – My Predictions for 2018…

In 2018 we are rapidly entering what I would like to call ‘Big Data 3.0’. This is the age of ‘Converged Big Data’ where its various complementary technologies – Data Science, DevOps, Business Automation begin to all come together to solve complex industry challenges in areas as diverse as Manufacturing, Insurance, IoT, Smart Cities and Banking. 

(Image Credit – Simplilearn)

First, we had Big Data 1.0…

In the first pass of Big Data era, Hadoop was the low-cost storage solution. Companies saved tens of billions of dollars from costly and inflexible enterprise data warehouse (EDW) projects. Nearly every large organization has begun deploying Hadoop as an Enterprise Landing Zone (ELZ) to augment an EDW. The early corporate movers working with the leading vendors more or less figured out the kinks in the technology as applied to their business challenges.

We just passed Big Data 2.0…

As adoption patterns matured and Big Data included projects such as YARN, Spark, and Hive, customers began deploying Big Data to business challenges such as Fraud Detection, Customer Journey et al and began to realize business value from it. Adoption has indeed begun skyrocketing at verticals like Banking, Telecom, Manufacturing & Insurance. The monolithic Big Data market has begun segmenting into well-defined categories – Infrastructure providers, Streaming Data companies, Data Analysis providers, SQL on Hadoop solutions, full-fledged machine learning toolsets etc.

With that said, let us look at my Big Data predictions for 2018.

Trend #1 Big Data 3.0 – where Data fuels Digital Transformation…

Fortune 5000 process large amounts of customer information daily. This is especially true in areas touched by IoT – power and utilities, manufacturing and connected car. However, they have been sorely lacking in their capacity to interpret this in a form that is meaningful to their customers and their business. In areas such as Banking & Insurance, this can greatly help arrive at a real-time understanding of not just the risks posed by a customer/partner relationship (from a credit risk/AML standpoint) but also an ability to increase the returns per client relationship. Digital Transformation can only be fueled by data assets. In 2018, more companies will tie these trends together moving projects from POC to production.

The Six Strategic Questions Every Bank Should Answer with Big Data & AI in 2018…

Trend #2 ‘Predictive Analytics on Hadoop’ projects begin to proliferate…

I have written extensively about efforts to infuse business processes with machine learning. Predictive analytics have typically resembled a line of business project or initiative. The benefits of the learning from localized application initiatives are largely lost to the larger organization if one doesn’t  allow multiple applications and business initiatives to access the models built. In 2018, machine learning expands across more usecases from the mundane (fraud detection, customer churn prediction to customer journey) to the new age (virtual reality, conversational interfaces, chatbots, customer behavior analysis, video/facial recognition) etc. Demand for data scientists will increase.

In areas around Industrie 4.0, Oil & Energy, Utilities – billions of endpoints will send data over to edge nodes and backend AI services which will lead to better business planning, real-time decisions and a higher degree of automation & efficiency across a range of processes. The underpinning data capability around these will be a Data Lake.

This is an area both Big Data and AI have begun to influence in a huge way. 2018 will be the year in which every large and medium-sized company will have an AI strategy built on Big Data techniques. Companies will begin exposing their AI models over the cloud using APIs as shown above using a Models as a Service architecture.

Trend #3 Big Data begins to take baby steps towards replacing the Enterprise Data Warehouse

Infrastructure vendors have been aiming to first augment and then replace EDW systems. As the ability of projects that perform SQL-on-Hadoop, data governance and audit matures, Hadoop will slowly begin replacing EDW footprint. The key capabilities that Data Lakes usually lack from an EDW standpoint – around OLAP, performance reporting will be augmented by niche technology partners. While this is a change that will easily take years, 2018 is when it begins. Expect migrations where clients have not really been using the full power of EDWs beyond simple relational schemas and log data etc to be the first candidates for this migration.

Trend #4 Cybersecurity pivots into Big Data…

Big Data is now the standard by which forward-looking companies will perform their Cybersecurity and threat modeling. Let us take an example to understand what this means from an industry standpoint. For instance, in Banking, in addition to general network level security, we can categorize business level security considerations into four specific buckets –   general fraud, credit card fraud, AML compliance, and cybersecurity. The current best practice in the banking industry is to encourage a certain amount of convergence in the back-end data silos/infrastructure across all of the fraud types – literally in the tens.  Forward-looking enterprises are now building cybersecurity data lakes to aggregate & consolidate all digital banking information, wire data, payment data, credit card swipes, other telemetry data (ATM & POS)  etc in one place to do security analytics. This pivot to a Data Lake & Big Data can pay off in a big way.

The reason this convergence is helpful is that across all of these different fraud types, the common thread is that the fraud is increasingly digital (or internet based) and they fraudster rings are becoming more sophisticated every day. To detect these infinitesimally small patterns, an analytic approach beyond the existing rules-based approach is key to understand for instance – location-based patterns in terms of where transactions took place, Social Graph-based patterns and Patterns which can commingle real-time & historical data to derive insights. This capability is only possible via a Big Data-enabled stack.

Trend #5 Regulators Demand Big Data – PSD2,GPDR et al…

The common thread across virtually a range of business processes in verticals such as Banking, Insurance, and Retail is the fact that they are regulated by a national or supranational authority. In Banking, across the front, mid and back office, processes ranging from risk data aggregation/reporting, customer onboarding, loan approvals, financial crimes compliance (AML, KYC, CRS & FATCA), enterprise financial reporting  & Cyber Security etc – all need to produce verifiable, high fidelity and auditable reports. Regulators have woken up to the fact that all of these areas can benefit from universal access to accurate, cleansed and well-governed cross-organization data from a range of Book Of Record systems.

A POV on Bank Stress Testing – CCAR & DFAST..

Further, applying techniques for data processing such as in-memory processing, the process of scenario analysis, computing,  & reporting on this data (reg reports/risk scorecards/dashboards etc) can be vastly enhanced. They can be made more real time in response to data about using market movements to understand granular risk concentrations. Finally, model management techniques can be clearly defined and standardized across a large organization. RegTechs or startups focused on the risk and compliance space are already leveraging these techniques across a host of areas identified above.

Trend #6 Data Monetization begins to take off…

The simplest and easiest way to monetize data is to begin collecting disparate data generated during the course of regular operations. An example in Retail Banking is to collect data on customer branch visits, online banking usage logs, clickstreams etc. Once collected, the newer data needs to be fused with existing Book of Record Transaction (BORT) data to then obtain added intelligence on branch utilization, branch design & optimization, customer service improvements etc. It is very important to ensure that the right business metrics are agreed upon and tracked across the monetization journey. Expect Data Monetization projects to take off in 2018 with verticals like Telecom, Banking, and Insurance to take the lead on these initiatives.

The Tao of Data Monetization in Banking and Insurance & Strategies to Achieve the Same…

Trend #7 Data Native Architectures converge with Cloud Native Architectures…

Most Cloud Native Architectures are designed in response to Digital Business initiatives – where it is important to personalize and to track minute customer interactions. The main components of a Cloud Native Platform are shown below and the vast majority of these leverage a microservices based design. Given all this, it is important to note that a Big Data stack based on Hadoop (Gen 2) is not just a data processing platform. It has multiple personas – a real-time, streaming data, interactive platform that can perform any kind of data processing (batch, analytical, in memory & graph based) while providing search, messaging & governance capabilities. Thus, Hadoop provides not just massive data storage capabilities but also provides multiple frameworks to process the data resulting in response times of milliseconds with the utmost reliability whether that be real-time data or historical processing of backend data. My bet on 2018 is that these capabilities will increasingly be harnessed as part of a DevOps process to develop a microservices based deployment.


 Big Data will continue to expand exponentially across global businesses in 2018. As with most disruptive innovation, it will also create layers of complexity and opportunity for Enterprise IT. Whatever be the kind of business model – tracking user behavior or location sensitive pricing or business process automation etc – the end goal of IT architecture should be to create enterprise business applications that are heavily data insight and analytics-driven.

Why Kubernetes Will Be A Transformational Cloud Technology..

It is 2018 and Enterprise IT does not question the value of Containerized applications anymore. Given the move to adopting DevOps and Cloud Native Architectures, it is critical to leverage container oriented capabilities to bring together development and operations teams to solve Digital business challenges. However, the lack of a standard control plane for these containerized deployments was always going to be a challenge. Google’s Kubernetes (kube or k8s), an open source container orchestration platform,  is rapidly becoming the defacto standard on how Cloud Native applications are architected, composed, deployed, and managed.

Kubernetes outshines competition…

First off, a deep dive on Kubernetes is provided below for those who are beginning their evaluation of the platform.

Kubernetes – Container Orchestration for the Software Defined Data Center (SDDC)..(5/7)

With it’s Google pedigree, K8s is the only container orchestration platform that is proven at scale in the web-scale, cloud-native world. K8s predecessors Omega/Borg manage vast containerized deployments that deliver services such as Google Search, Gmail, and YouTube.

Let us enumerate both the technology and business advantages that are captured in the below illustration.

Technical Advantages…

With its focus on grouping containers together into logical units called pods, K8s enables lightweight deployment of microservice based multi-tier applications. The service abstraction then gives a set of logical pods an external facing IP address.A Service can be discovered by other services as well as scaled and load balanced independently. Labels (key, value) pairs can be attached to any of the above resources. K8s is designed for both stateless and stateful app as it supports mounting both ephemeral as well as persistent storage volumes.

Service as an architectural construct called (a group of pods exposed to the external world via an IP Address) enables a high-level focus on the deployment, performance, and behavior of an application rather than its underlying infrastructure.

Kubernetes also provides autoscaling (both up and down) to accommodate usage spikes. It also provides load balancing to ensure that usage across hosts is evenly balanced. The Controller also supports rolling updates/canary deployments etc to ensure that applications can be seamlessly and incrementally upgraded.

Developers and Operations can dictate whether the application works on a single container or a group of containers without any impact to the application.

These straightforward concepts enable a range of architectures from the legacy stateful to the microservices to IoT land – data-intensive applications & serverless apps – to be built on k8s.

A Robust Roadmap…

With Google and Red Hat leading a healthy community of contributors, the just-released Kubernetes 1.9 added many useful features. First, it provides a higher degree control over clusters, added detailed storage metrics and makes it an extensible architecture. It also improves many aspects of the API. It also moves Windows support into beta. Coupled with work ongoing in the Open Service Broker API, this moves the needle on support for hybrid architectures one step closer. Just to provide an idea of the robustness of development, this release is expected to include 38 features spanning security, cluster lifecycle management, APIs, networking, storage and additional functionality. [1]

Business & Ecosystem Advantages…

K8s as an open source orchestrator is now a foundational component of market-leading platforms such as Red Hat’s OpenShift and (IaaS Clouds such as) AWS ECS Container Service/Azure/VMWare Pivotal CloudFoundry. There is no fear of lockin around this container standard. 2017 saw a shakeout in this technology segment as competition to K8s essentially folded and announced plans to support the orchestrator. Platforms such as Docker, Mesos, CoreOS now integrate with & support Kubernetes at different levels.

Over the last three years, they have now emerged over 50 Kubernetes powered platforms and distributions. The Cloud Native Computing Foundation’s (CNCF) Kubernetes Conformance model includes API standards for networking and storage. The key benefit to developers is that applications coded for k8s are pretty much lockin free from both an orchestration and storage standpoint.

Credit – CNCF

In the last year, k8s has made tremendous strides in project documentation, developer help & quickstarts, and on improving the overall operator experience.  The 2017 KubeCon held in Austin, TX drew 4200 attendees and had multiple tracks covering everything from CI/CD Pipelines, Operational experience and Special Interest Groups (SIG) covering a range of non-functional areas such as performance and security.

The Road Ahead…

The Cloud Native landscape has an amazing amount of change every year but it is a safe bet that Kubernetes given its massive open source ecosystem and modular architecture and design is a safe bet to emerge as the defacto standard in container orchestration.

Four strategic areas of advances for Kubernetes in 2018 include –

  1. Playing the container factotum for a range of cloud architectures
  2. Refinement of k8s deployments around cloud native microservices based architectures. These include operating in an architecture with Service Meshes, Serverless Computing & Chaos Engineering concepts
  3. Increased vertical industry adoption especially around OpenStack NFV and Telco
  4. Adoption in hybrid cloud usecases


[1] Kubernetes 1.9 –

The 12 Software Architectures That Will Matter in Financial Services in 2018 & Beyond…

Over the last three years, we have examined a succession of business issues in the various sectors of financial services on this blog. These have ranged from the mundane (Trading, Risk management, Market Surveillance, Fraud detection, AML et al) to the transformative (Robo advisors, Customer Journeys, Blockchain, Bitcoin etc). We have also examined the changing paradigms in enterprise architecture – moving from siloed monolithic applications to cloud-native software. This blog summarizes the most 12 important technical posts on innovative application architectures.


Having spent the majority of my career working in Banking and Financial Services has made for a fascinating time. It is amazing to witness business transformation begin to occur across the landscape. However, this transformation is occurring on repeatedly discussed themes. A key challenge that CXOs and Enterprise Architecture teams face is how to deploy much-discussed technologies such as Cloud platforms, Big Data, Enterprise Middleware and AI in real-world architectures.  This blog post sums up eleven real-world application architectures that industry leaders can use as a good reference point for their own implementations.

The common theme to all of the below architectures –

  1. A focus on Cloud native concepts including microservices, lightweight backends, containers
  2. Design Patterns that encourage new age Data Management techniques including Hadoop and Spark
  3. Cloud-agnostic – whether that is public cloud or private cloud
  4. Integrating business process management and business rules engines as first-class citizens
  5. 100% Open Source

#1 Design and Architecture of a Real World Trading Platform…

Design and Architecture of a Real World Trading Platform.. (2/3)

#2 Big Data driven Architecture for Credit and Market Risk Management…

How a Pioneering Bank leverages Hadoop for Enterprise Risk Data Aggregation & Reporting..

#3 Reference Architecture for Big Data-enabled CyberSecurity…

Cybersecurity and the Next Generation Datacenter..(2/4)

#4 Reference Architecture for Payment Card Fraud Detection…

Hadoop counters Credit Card Fraud..(2/3)

#5 Design and Architecture of a Robo-Advisor Platform…

Design and Architecture of A Robo-Advisor Platform..(3/3)

#6 Reference Architecture for Customer Journeys and Single View of a Customer…

Demystifying Digital – Reference Architecture for Single View of Customer / Customer 360..(3/3)

#7 A Reference Architecture for the Open Banking Standard…

A Reference Architecture for The Open Banking Standard..

#8 The Architecture of Blockchain…

The Architecture of Blockchain..(4/5)

#9 The Architecture of Bitcoin…

The Architecture of Bitcoin..(2/5)

#10 How to Re-Architect a Wealth Management Office…

Next Gen Wealth Management Architecture..(3/3)

#11 Reference Architecture for Market Surveillance – CAT, MAR, MiFID II et al…

The Definitive Reference Architecture for Market Surveillance (CAT, UMIR and MiFiD II) in Capital Markets..

#12 Logical Architecture for Operational Risk Management…

Infographic: Logical Architecture for Operational Risk Management


With each passing quarter, financial services is a business that looks increasingly in danger of disintermediation. CXOs have no alternative but to digitize their businesses. IT will be forced to support cloud-native technologies in both key areas – applications and infrastructure in pursuit of business goals. Developers will also be at the forefront of this change. Eventually, quality of Enterprise Architecture decides business destiny.

My Final Post for 2017: How an Enterprise PaaS enables Enterprise Architecture…

With DevOps and Container based automation rapidly gaining industry mindshare in 2017, PaaS is emerging as a “fit for purpose” technology for Digital Projects. With the PaaS market beginning to mature, different product subcategories within the main umbrella are being proposed – Structured PaaS, Containers as a Service, Unstructured PaaS etc. For now, these subcategory definitions look largely academic as technology follows business challenges & any such segmentation should largely follow from the challenges being solved. PaaS is no different. My goal for this post then is to approach the market from the standpoint of the key (business) capabilities in an Enterprise Architecture that an industrial grade PaaS should enable, no matter where it falls on the spectrum of PaaS platforms.

Enterprise Architecture based on a PaaS…

Enterprise Architecture typically spans four different areas – 1) Business Architecture, 2) Data Architecture, 3) Application Design & 4) Deployment Architecture. Given the rapidly maturing cloud-based delivery models (IaaS and SaaS) – many EA standards now include compulsory cloud-native awareness and design across the four domains.

We posit that in 2018, PaaS has emerged as the most important driver of an enterprise architecture. PaaS technology can accomplish a majority of the goals of an EA in a variety of ways, as we will cover below.

The definition of what constitutes a Platform As a Service (PaaS) continue to vary. However, there is no disagreement that PaaS enables the easy but robust buildout of a range of Cloud Native architectures.  The vision of a PaaS is to ultimately enable massive gains in productivity for application developers that intend to leverage a cloud-based IaaS. At the same time, advances in open source technology in 2017 are ensuring management seamlessness & simplicity for Cloud Admins.

The below graphic illustrates the core building blocks of an enterprise architecture based on a PaaS.

The Foundational Services a PaaS provides Enterprise Architects cover a range of areas as depicted above..

Core Benefits of Adopting an Industrial Strength PaaS…

PaaS technology was originally developed as a way of helping developers with a smooth experience in developing polyglot applications. With the advent of Docker and Kubernetes, the focus has also shifted to enabling CI/CD pipelines and in achieving seamless deployment on a cloud-based infrastructure. The following areas confer significant PaaS capabilities that EA (Enterprise Architecture) teams would otherwise have to cobble themselves:

  • Cloud Native via Containers – An industrial grade PaaS abstracts away any & all underlying Hardware/IaaS concerns by leveraging containers. However, it also ensures that the PaaS can leverage the services of the underlying IaaS whether that is Amazon AWS, Microsoft Azure, OpenStack or VMWare. At a minimum, as long as the cloud supports defacto standards such as Linux and Docker, the PaaS can host any platform or application or package as well as support migrations across the underlying Clouds across Dev/Test/QA/Prod environments. Enterprise IT should be able to easily split workloads across these different clouds based on business needs.The key to all of this is to agree on the Container as the standard contract between the PaaS and the IaaS layers. Thus, the few leading PaaS vendors such as OpenShift have adopted standards-based container technology for development, packaging and deploying applications. Further, the availability of a Container registry is also very important to guarantee the provenance and safety of commonly used Docker Images.
  • Developer Services – A PaaS includes development tools that can vastly reduce the amount of time to develop complex n-tier applications. The developer experience needs to be smooth. These should include at a minimum either Docker images or, an easy plugin-based integration that covers a range of enterprise runtimes such as workflow, Big Data libraries, Identity Management, API Management, Broker based messaging integration, Search and Security services. Based on the architectural requirements of a given business project, the PaaS should be able to offer a natural stack of default options for the above services typically using a template such as a simple Dockerfile that calls out the default OS, JVM version & the other runtime dependencies of the application. The PaaS then generates a barebones application that the developer can then just fill in the blanks with their source code. This typically done using a command line, or web interface or by invoking an API. This unified experience then carries over across the CI/CD pipelines, deployment and then management. This way, everyone in the organization speaks & adheres to a common development vocabulary.
  • Mobile Application Development –For developers, a PaaS should encompass the easy provisioning of cloud resources through the application lifecycle while enabling application development using microservices. However, leading PaaS providers also include toolkits for cross-platform development capabilities for mobile devices and a range of browsers.
  • CI – A robust PaaS provides facilities for Continuous integration (CI). It does this in several ways. Firstly, code from multiple team members is checked (push and merge code pull requests) into a common source control repository (typically based on Git). This supports constant check-ins and automated checks/gates are added to run various kinds of tests. Further included are capabilities such as developer workflow based on includes Git where a push event causes a Docker image build.
  • Continuous Delivery – The PaaS can then automate all steps required to deliver the application binaries from a CI standpoint to delivery using CD. These involve supporting automated testing, code dependency checks etc and seamlessly promoting images from one environment to the other.
  • Continuous Deployment – Once the PaaS has containerized workloads & deploy them, the next step is to orchestrate them. The PaaS includes capabilities that can then deploy the application on a family of containers & load balance/manage their runtime footprint. This capability is typically provided by a container orchestration layer such as Kubernetes or Mesos. A range of services around HA, service discovery etc are provided by this layer.
  • Runtime Characteristics – The PaaS finally simplifies how complex n-tier applications are scheduled and then deployed across tiers, how these groups of containers that constitute an application leverage the network & the underlying storage, how they’re exposed to consuming applications via request routing, how the health of various groups of containers (called Pods in the case of Kubernetes) is managed, ensuring high availability and finally, zero downtime deployments.


PaaS provides enterprise architecture teams with a range of capabilities that enable Cloud Native application development and delivery. These range from i) enabling CI/CD capabilities for developers via application automation ii) providing a range of container orchestration capabilities. These enable rapid deployment, version control, rolling updates etc. All of these ultimately enable rapid digital application development. 2018 onwards, Enterprise Architects can only neglect a serious look at PaaS at their peril.

The Six Strategic Questions Every Bank Should Answer with Big Data & AI in 2018…

After a decade of focusing on compliance with regulatory mandates, Banks are back at fixating on technology innovation. The reason is obvious – over the last five years, Silicon Valley majors and FinTechs have begun to rapidly encroach on the highest profit areas of the banking business. The race is on to create next-generation financial services ecosystems in a variety of areas ranging from Retail Banking, Capital Markets, and Wealth Management. The common thread to all these is massive volumes of Data & Advanced analytics on the data. Given that almost every large and small bank has a Big Data & AI strategy in place, it makes sense for us to highlight six key areas where they should all first direct and then benchmark their efforts from an innovation standpoint.

Global Banking in 2016-17…

As 2017 draws to a close, the days of growth and sky-high stock market valuations seem to be largely back. McKinsey Research posits that while the global banking industry appears quite healthy outwardly, profits are at best flat or even falling across geographies[1]. For the seventh year in a row, the industry’s ROE (Return on Equity) was between 8-10%. For 2016, the industry’s ROE was down a full percentage point from 2015, raising concerns about profitability across the board. There are however innovators that are doing well due to their strong focus on execution.

Banks have overall been very slow to respond to the onslaught of the digital business led by Amazon, Google/Alphabet, PayPal and the horde of FinTechs. What all of these disruptors do better than Banks is to harness customer data to drive offerings that appeal to neglected banking consumers who are already used to using these services every waking hour in their lives.

As technology continues to advance and data becomes more available, the twin forces of competition & regulation, are driving overall innovation in across banking. Capital Markets players are using AI in a range of areas from optimising trading execution, contract pricing, strategy backtesting to risk & compliance.

In the staider Retail Banking & Asset Management areas, profitable areas such as customer lending, consumer payments &  wealth management are slowly being disrupted at the cost of established banks. What also lies behind this disruption is the FinTech’s ability to pick and choose the (profitable) areas they want to compete in, their minimal overhead as opposed to & an advanced ability to work with data generated constantly by customer interactions by deploying algorithms that mine historical data & combine it in ways that reveal new insights.

I posit that there are six strategic questions that Banking institutions of all stripes need to glean from their Big Data (& AI) projects. This with a view to attaining sustainable growth for the foreseeable future  –

    • How do we know more about our customers?
    • How do we manage regulation and turn it into a source of lasting competitive advantage?
    • How can we increase our digital quotient in a way that enables us to enter new businesses?
    • How can this deluge of information drive business insight?
    • How can we drive Business transformation both within the Bank and disarm competition?
    • How can this information drive agility in customer responsiveness?
Every Bank has to aim to answer these six questions using Big Data & AI.

Question #1 How much do we know about our customers..really?

Financial institutions, including retail banks, capital markets, payment networks etc process large amounts of customer information daily. However, they have been sorely lacking in their capability to understand their customer profiles as one whole and to interpret this in a form that is meaningful to their business. The ability to do this can result in an understanding of not just the risks posed by this relationship (from a credit risk/AML standpoint) but also an ability to increase the returns per client relationship. This is an area Big Data and AI can influence in a huge way.

Demystifying Digital – Why Customer 360 is the Foundational Digital Capability – ..(1/3)

Question #2 How do we manage the Regulatory Onslaught and Turn it into Competitive Advantage?

There exist two primary reasons for Enterprises such as Banks, Insurers, Payment Providers and FinTechs to pursue best in class Risk Management Processes and Platforms. The first need in Capital Markets is compliance driven by various regulatory reporting mandates such as the Basel Reporting Requirements, the FRTB, the Dodd‐Frank Act, Solvency II, CCAR and CAT/MiFID II in the United States & the EU. The second reason (common to all kinds of Banking) is the need to drive top-line sales growth for both individual and institutional clients.

We have long advocated for the implementation of Big Data across both the areas. The common thread across virtually every business processes across the front, mid and back office is risk management.  Processes ranging from risk data aggregation/reporting, customer onboarding, loan approvals, financial crimes compliance (AML, KYC, CRS & FATCA), enterprise financial reporting  & Cyber Security etc can benefit from universal access to accurate, cleansed and well-governed cross-organization data from a range of Book Of Record systems. Further, applying techniques for data processing such as in-memory processing, the process of scenario analysis, computing,  & reporting on this data (reg reports/risk scorecards/dashboards etc) can be vastly enhanced. They can be made more real time in response to data about using market movements to understand granular risk concentrations. Finally, model management techniques can be clearly defined and standardized across a large organization. RegTechs or startups focused on the risk and compliance space are already leveraging these techniques across a host of areas identified above.

Risk Management – Industry Insights & Reference Architectures…

Question #3 Increase your Digital Quotient…

For decades, Banks have had a monopoly on the financial business. The last few years have seen both FinTechs and other players such as Amazon, Alibaba, Facebook etc enter lucrative areas in banking. These areas include Consumer lending, financial advisory etc. The keyword in all of this is ‘Digital Disintermediation’ and regulators have also begun to take note. In the EU and the UK, regulators are at the forefront of pushing mandates such as SEPA (Single European Payments Area), Open Banking Standard, and PSD-2.  All of these regulations will ensure that Banks are forced to unlock their customer data in a way that encourages consumer choice. The hope is that agile players can then use this data to exploit inefficiencies in the banks business model using technology. Services such as account aggregation, consumer loans, credit scoring services, personal financial management tools, and other financial advisory become easy to provide via Open APIs.

If incumbent Banks don’t respond, they will lose their monopoly on being their customers primary front end. As new players take over areas such as mortgage loans (an area where they’re much faster than banks in granting loans), Banks that cannot change their distribution and product models will be commodified. The challenges start with reworking inflexible core banking systems. These maintain customer demographics, balances, product information and other BORT (Book Of Record Transaction) data that store a range of loan, payment and risk information. These architectures will slowly need to transition from their current (largely) monolithic architectures to compose-able units. There are various strategies that Banks can follow to ‘modernize the core’ but adopting Big Data native mindset is. Banks will also seek to work with FinTechs to create islands of cooperation where they can learn from the new players.

Question #4 Drive Business Insight…

There are two primary areas where business insights need to be driven out of. The first is internal operations and the second is customer service.  This category encompasses a wide range of strategic choices that drive an operating model – product ideation, creation, distribution strategies across channels/geographies etc. Whatever be the right product and strategy focus, the ability to play in select areas of the value chain depends upon feedback received from day to day operations. Much like in a manufacturing company, this data needs to be harnessed, analyzed with a view to ultimately monetizing it.

Question #5 Business Transformation…

There is no question that FinTechs are able to take ideas from nothing to delivery in a matter of months. This is the key reason banks need to transform their business. This is critical in key areas such as sales, wealth management, and origination. There is surely a lot of confusion around how to drive such initiatives but no one questions the need for centralizing data assets.

In my mind, the first and most important reason to move to a unified strategy is to evolve standalone Digital capabilities into a coherent Platform. Different lines of business can use these capabilities to develop an ecosystem of applications that can be offered as a SaaS (Software as a Service). The end state of Digital Platforms is to operate business systems at massive scale in terms of customers, partners, and employees.

Question #6 Enhance Customer Service…

Customer Service is clearly an area of differentiation for nimbler players as compared to Banks. Banks are still largely dealing with ensuring that consistent views of customer accounts & balances can be maintained across channels. On the other hand, FinTechs have moved onto Chatbots and Robo-advisors all built around Big Data & AI. A Chatbot is a virtual assistant that helps clients perform simple transactions using mediums such as text or voice. They’re based on Natural Language Processing and Machine Learning and are being deployed in simple scenarios such as balance checks and other simpler customer service processes. However, as time goes by they will inevitably get more sophisticated and will eventually supplant human service for the vast majority of the service lifecycle.

Big Data Driven Disruption – The Robo-Advisor..(1/3)

Surely, areas such as automated customer service and investment management are still in early stages of maturity. However, they are unmistakably the next big trend in the financial industry and one that players should begin developing capabilities around. 


Increasingly, a Bank’s technology platform(s) centered around Big Data represents a significant competitive differentiator that can generate substantial revenues from existing customers and help acquire new ones. Given the strategic importance and revenue potential of this resource, the C-suite must integrate Big Data & AI into their strategic planning in 2018.


[1] McKinsey – “Remaking the Bank for an Ecosystem World” –