A Digital Reference Architecture for the Industrial Internet Of Things (IIoT)..

A few weeks ago on the invitation of DZone Magazine, I jointly authored a Big Data Reference Architecture along with my friend & collaborator, Tim Spann (https://www.linkedin.com/in/timothyspann/). Tim & I distilled our experience working on IIoT projects to propose an industrial strength digital architecture. It brings together several technology themes – Big Data , Cyber Security, Cognitive Applications, Business Process Management and Data Science. Our goal is to discuss a best in class architecture that enables flexible deployment for new IIoT capabilities allowing enterprises to build digital applications. The abridged article was featured in the new DZone Guide to Big Data: Data Science & Advanced Analytics which can be downloaded at  https://dzone.com/guides/big-data-data-science-and-advanced-analytics

How the Internet Of Things (IoT) leads to the Digital Mesh..

The Internet of Things (IoT) has become one of the four top hyped up technology paradigms affecting the world of business. The other usual suspects being Big Data, AI/Machine Learning & Blockchain. Cisco predicts that the IOT is expected to impact about 25 billion connected things by 2020 and affect about $2 trillion of economic value globally across a diverse range of verticals. These devices are not just consumer oriented devices such as smartphones and home monitoring systems but dedicated industry objects such as sensors, actuators, engines etc.

The interesting angle to all this is the fact that autonomous devices are already beginning to communicate with one another using IP based protocols. They largely exchanging state & control information around various variables. With the growth of computational power on these devices, we are not far off from their sending over more granular and interesting streaming data – about their environment, performance and business operations – all of which will enable a higher degree of insightful analytics to be performed on the data. Gartner Research has termed this interconnected world where decision making & manufacturing optimization can occur via IoT as the “Digital Mesh“.

The evolution of technological innovation in areas such as Big Data, Predictive Analytics and Cloud Computing now enables the integration and analysis of massive amounts of device data at scale while performing a range of analytics and business process workflows on the data.

Image Credit – Sparkling Logic

According to Gartner, the Digital Mesh will thus lead to an interconnected data information deluge powered by the continuous data from these streams. These streams will encompasses classical IoT endpoints (sensors, field devices, actuators etc) sending data in a variety of formats –  text, audio, video & social data streams – along with new endpoints in areas as diverse as Industrial Automation, Remote Healthcare, Public Transportation, Connected Cars, Home Automation etc. These intelligent devices will increasingly begin communicating with their environments in a manner that will encourage collaboration in a range of business scenarios. The industrial cousin of IoT is the Industrial Internet of Things (IIIoT).

Defining the Industrial Internet Of Things (IIoT)

The Industrial Internet of Things (IIoT) can be defined as a ecosystem of capabilities that interconnects machines, personnel and processes to optimize the industrial lifecycle.  The foundational technologies that IIoT leverages are Smart Assets, Big Data, Realtime Analytics, Enterprise Automation and Cloud based services.

The primary industries impacted the most by the IIoT will include Industrial Manufacturing, the Utility industry, Energy, Automotive, Transportation, Telecom & Insurance.

According to Markets and Markets, the annual worldwide Industrial IoT market is projected to exceed $319 billion in 2020, which represents an 8% a compound annual growth rate (CAGR). The top four segments are projected to be manufacturing, energy and utilities, auto & transportation and healthcare.[1]

Architectural Challenges for Industrial IoT versus Consumer IoT..

Consumer based IoT applications generally receive the lion’s share of media attention. However the ability of industrial devices (such as sensors) to send ever more richer data about their operating environment and performance characteristics is driving a move to Digitization and Automation across a range of industrial manufacturing.

Thus, there are four distinct challenges that we need to account for in an Industrial IOT scenario as compared to Consumer IoT.

  1. The IIoT needs Robust Architectures that are able to handle millions of device telemetry messages per second. The architecture needs to take into account that all kinds of devices operating in environments ranging from the constrained to
  2. IIoT also calls for the highest degrees of Infrastructure and Application reliability across the stack. For instance, a lost message or dropped messages in a healthcare or a connected car scenario may mean life or death for a patient, or, an accident.
  3. An ability to integrate seamlessly with existing Information Systems. Lets be clear, these new age IIOT architectures need to augment existing systems such as Manufacturing Execution Systems (MES) or Traffic Management Systems. In Manufacturing, MES systems continually improve the product lifecycle and perform better resource scheduling and utilization. This integration helps these systems leverage the digital intelligence and insights across (potentially) millions of devices across complex areas of operation.
  4. An ability to incorporate richer kinds of analytics than has been possible before that provide a great degree of context. This ability to reason around context is what provides an ability to design new business models which cannot be currently imagined due to lack of agility in the data and analytics space.

What will IIoT based Digital Applications look like..

Digital Applications are being designed for specific device endpoints across industries. While the underlying mechanisms and business models differ from industry to industry, all of these use predictive analytics based on a combination of real time data processing & data science algorithms. These techniques extract insights from streaming data to provide digital services on existing toolchains, provide value added customer service, predict device performance & failures, improve operational metrics etc.

Examples abound. For instance, a great example in manufacturing is the notion of a Digital Twin which Gartner called out last year. A Digital twin is a software personification of an Intelligent device or system.  It forms a bridge between the real world and the digital world. In the manufacturing industry, digital twins can be setup to function as proxies of Things like sensors and gauges, coordinate measuring machines, vision systems, and white light scanning. This data is sent over a cloud based system where it is combined with historical data to better maintain the physical system.

The wealth of data being gathered on the shop floor will ensure that Digital twins will be used to reduce costs and increase innovation. Thus, in global manufacturing – Data science will soon make it’s way into the shop floor to enable the collection of insights from these software proxies. We covered the phenomenon of Servitization in manufacturing in a previous blogpost.

In the Retail industry, an ability to detect a customer’s location in realtime and combining that information with their historical buying patterns can drive real time promotions and an ability to dynamically price retail goods.

Solution Requirements for an IIoT Architecture..

At a high level, the IIoT reference architecture should support six broad solution areas-

  1. Device Discovery – Discovering a range of devices (and their details)  on the Digital Mesh for an organization within and outside the firewall perimeter
  2. Performing Remote Lifecycle Configuration of these devices ranging from startup to modification to monitoring to shut down
  3. Performing Deep Security level introspection to ensure the patch levels etc are adequate
  4. Creating Business workflows on the Digital Mesh. We will do this by marrying these devices to enterprise information systems (EISs)
  5. Performing Business oriented Predictive Analytics on these devices, this is critical to 
  6. On a futuristic basis, support optional integration with the Blockchain to support a distributed organizational ledger that can coordinate activity across all global areas that an enterprise operates in.

Building Blocks of the Architecture

Listed below are the foundational blocks of our reference architecture. Though the requirements will vary across industries, an organization can reasonably standardize on a number of foundational components as depicted below and then incrementally augment them as the interactions between different components increase based on business requirements.

Our reference architecture includes the following major building blocks –

  • Device Layer
  • Device Integration Layer
  • Data & Middleware Tier
  • Digital Application Layer

It also includes the following cross cutting concerns which span across the above layers –

  • Device and Data Security
  • Business Process Management
  • Service Management
  • UX Design
  • Data Governance – Provenance, Auditing, Logging

The next section provides a brief overview of the reference architecture’s components at a logical level.

A Big Data Reference Architecture for the Industrial Internet depicting multiple functional layers

Device Layer – 

The first requirement of IIIoT implementations is to support connectivity from the Things themselves or the Device layer depicted at the bottom. The Device layer includes a whole range of sensors, actuators, smartphones, gateways and industrial equipment etc. The ability to connect with devices and edge devices like routers, smart gateways using a variety of protocols is key. These network protocols include Ethernet, WiFi, and Cellular which can all directly connect to the internet. Other protocols that need a gateway device to connect include Bluetooth, RFID, NFC, Zigbee et al. Devices can connect directly with the data ingest layer shown above but it is preferred that they connect via a gateway which can perform a range of edge processing.

This is important from a business standpoint for instance, in certain verticals like healthcare and financial services, there exist stringent regulations that govern when certain identifying data elements (e.g. video feeds) can leave the premises of a hospital or bank etc. A gateway cannot just perform intelligent edge processing but also can connect thousands of device endpoints and facilitate bidirectional communication with the core IIoT architecture. 

The ideal tool for these constantly evolving devices, metadata, protocols, data formats and types is Apache NiFi.  These agents will send the data to an Apache NiFi gateway or directly into an enterprise Apache NiFi cluster in the cloud or on-premise.

Apache NiFi Eases Dataflow Management & Accelerates Time to Analytics In Banking (2/3)..

 A subproject of Apache NiFi – MiNiFi provides a complementary data collection approach that supplements the core tenets of NiFi in dataflow management. However due to its small footprint and low resource consumption, is well suited to handle dataflow from sensors and other IOT devices. It provides central management of agents while providing full chain of custody information on the flows themselves.

For remote locations, more powerful devices like the Arrow BeagleBone Black Industrial and MyPi Industrial, it is very simple to run a tiny Java or C++ MiNiFi agent for secure connectivity needs.

The data sent by the device endpoints are then modeled into an appropriate domain representation based on the actual content of the messages. The data sent over also includes metadata around the message. A canonical model can optionally be developed (based on the actual business domain) which can support a variety of applications from a business intelligence standpoint.

 Apache NiFi supports the flexibility of ingesting changing file formats, sizes, data types and schemas. The devices themselves can send a range of feeds in different formats. E.g. XML now and based on upgraded capabilities – richer JSON tomorrow. NiFi supports ingesting any file type that the devices or the gateways may send.  Once the messages are received by Apache NiFi, they are enveloped in security with every touch to each flow file controlled, secured and audited.   NiFi flows also provide full data provenance for each file, packet or chunk of data sent through the system.  NiFi can work with specific schemas if there are special requirements for file types, but it can also work with unstructured or semi structured data just as well.  From a scalability standpoint, NiFi can ingest 50,000 streams concurrently on a zero-master shared nothing cluster that horizontally scales via easy administration with Apache Ambari.

Data and Middleware Layer – 

The IIIoT Architecture recommends a Big Data platform with native message oriented middleware (MOM) capabilities to ingest device mesh data. This layer will also process device data in such a fashion – batch or real-time – as the business needs demand.

Application protocols such as AMQP, MQTT, CoAP, WebSockets etc are all deployed by many device gateways to communicate application specific messages.  The reason for recommending a Big Data/NoSQL dominated data architecture for IIOT is quite simple. These systems provide Schema on Read which is an innovative data handling technique. In this model, a format or schema is applied to data as it is accessed from a storage location as opposed to doing the same while it is ingested. From an IIOT standpoint, one must not just deal with the data itself but also metadata such as timestamps, device id, other firmware data such as software version, device manufactured data etc. The data sent from the device layer will consist of time series data and individual measurements.

The IIoT data stream can thus be visualized as a constantly running data pump which is handled by a Big Data pipeline takes the raw telemetry data from the gateways, decides which ones are of interest and discards the ones not deemed significant from a business standpoint.  Apache NiFi is your gateway and gate keeper.   It ingests the raw data, manages the flow of thousands of producers and consumers, does basic data enrichment, sentiment analysis in stream, aggregation, splitting, schema translation, format conversion and other initial steps to prepare the data. It does that all with a user-friendly web UI and easily extendible architecture.  It will then send raw or processed data to Kafka for further processing by Apache Storm, Apache Spark or other consumers.  Apache Storm is a distributed real-time computation engine that reliably processes unbounded streams of data.  Storm excels at handling complex streams of data that require windowing and other complex event processing. While Storm processes stream data at scale, Apache Kafka distributes messages at scale. Kafka is a distributed pub-sub real-time messaging system that provides strong durability and fault tolerance guarantees. NiFi, Storm and Kafka naturally complement each other, and their powerful cooperation enables real-time streaming analytics for fast-moving big data. All the stream processing is handled by NiFi-Storm-Kafka combination.  

Apache Nifi, Storm and Kafka integrate very closely to manage streaming dataflows.


Appropriate logic is built into the higher layers to support device identification, ID lookup, secure authentication and transformation of the data. This layer will process data (cleanse, transform, apply a canonical representation) to support Business Automation (BPM), BI (business intelligence) and visualization for a variety of consumers. The data ingest layer will also providing notification and alerts via Apache NiFi.

Here are some typical uses for this event processing pipeline:

a. Real-time data filtering and pattern matching

b. Enrichment based on business context

c. Real-time analytics such as KPIs, complex event processing etc

d. Predictive Analytics

e. Business workflow with decision nodes and human task nodes

Digital Application Tier – 

Once IIoT knowledge has become part of the Hadoop based Data Lake, all the rich analytics, machine learning and deep learning frameworks, tools and libraries now become available to Data Scientists and Analysts.   They can easily produce insights, dashboards, reports and real-time analytics with IIoT data joined with existing data in the lake including social media data, EDW data, log data.   All your data can be queried with familiar SQL through a variety of interfaces such as Apache Phoenix on HBase, Apache Hive LLAP and Apache Spark SQL.   Using your existing BI tools or the open sourced Apache Zeppelin, you can produce and share live reports.   You can run TensorFlow in containers on YARN for deep learning insights on your images, videos and text data; while running YARN clustered Spark ML pipelines fed by Kafka and NiFi to run streaming machine learning algorithms on trained models.

A range of predictive applications are suitable for this tier. The models themselves should seek to answer business questions around things like -Asset failure, the key performance indicators in a manufacturing process and how they’re trending, insurance policy pricing etc. 

Once the device data has been ingested into a modern data lake, key functions that need to be performed include data aggregation, transformation, enriching, filtering, sorting etc.

As one can see, this can get very complex very quick – both from a data storage and processing standpoint. A Cloud based infrastructure with its ability to provide highly scalable compute, network and storage resources is a natural fit to handle bursty IIoT applications. However, IIoT applications add their own diverse requirements of computing infrastructure, namely the ability to accommodate hundreds of kinds of devices and network gateways – which means that IT must be prepared to support a large diversity of operating systems and storage types

The tier is also responsible for the integration of the IIoT environment into the business processes of an enterprise. The IIoT solution ties into existing line-of-business applications and standard software solutions through adapters or Enterprise Application Integration (EAI) and business-to-business (B2B) gateway capabilities. End users in business-to-business or business-to-consumer scenarios will interact with the IIOT solution and the special- purpose IIoT devices through this layer. They may use the IIoT solution or line-of-business system UIs, including apps on personal mobile devices, such as smartphones and tablets.

Security Implementation

The topic of Security is perhaps the most important cross cutting concern across all layers of the IIoT architecture stack. Needless to say, each of the layers must support the strongest data encryption, authentication and authentication capabilities for devices, users and partner applications. Accordingly, capabilities must be provided to ingest and store security feeds, IDS logs for advanced behavioral analytics, server logs, device telemetry. These feeds must be constantly analyzed across three domains – the Device domain, the Business domain and the IT domain. The below blogpost delves into some of these themes and is a good read to get a deeper handle on this issue from a SOC (security operations center) standpoint.

An Enterprise Wide Framework for Digital Cybersecurity..(4/4)


It is evident from the above that IIoT will enormous opportunity for businesses globally. It will also create layers of complexity and opportunity for Enterprise IT. The creation of smart digital services on the data served up will further depend on the vertical industries. Whatever be the kind of business model – whether tracking behavior, location sensitive pricing, business process automation etc – the end goal of IT architecture should be to create enterprise business applications that are ultimately data native and analytics driven.


Demystifying Digital – Reference Architecture for Single View of Customer / Customer 360..(3/3)

The first post in this three part series on Digital Foundations @ http://www.vamsitalkstech.com/?p=2517 introduced the concept of Customer 360 or Single View of Customer (SVC).  This second post in the series discussed the concept of Customer Journey Mapping (CJM) – http://www.vamsitalkstech.com/?p=3099 . We discussed specific benefits from both a business & operational standpoint that are enabled by SVC & CJM. The third & final post will focus on a technical design & architecture needed to achieve both these capabilities.

Business Requirements for Single View of Customer & Customer Journey Mapping…

The following key business requirements need to be supported for three key personas- Customer, Marketing & Customer Service – from a SVC and CJM standpoint.

  1. Provide an Integrated Experience: A fully integrated omnichannel experience for both the customer and internal stakeholder (marketing, customer service, regulatory, managerial etc) roles. This means a few important elements – consistent information across all touchpoints, the right information to the right user at the right time, an ability to view the CJM graph with realtime metrics on Customer Lifetime Value (CLV) etc.
  2. Continuously Learning Customer Facing System: An ability for the customer facing portion of the architecture to learn constantly to fine-tune it’s understanding of the customers real time picture. This includes an ability to understand the customer’s journey.
  3. Contextual yet Seamless Movement across Channels: The ability for customers to transition seamlessly from one channel to the other while conducting business transactions.
  4. Ability to introduce Marketing Programs for existing Customers: An ability to introduce marketing and customer retention and other loyalty programs in a dynamic manner. These include and ability to combine historical data with real time data about customer interactions and other responses like clickstreams – to provide product recommendations and real time offers.
  5. Customer Acquisition: An ability to perform low cost customer acquisition and to be able to run customized offers for segments of customers from a back-office standpoint.

Key Gaps in existing Single View (SVC) Architectures ..

It needs to be kept in mind that every organization is different from an IT legacy investment and operational standpoint. As such, a “one-size-fits-all” architecture is impossible to create. However, highlighted below are some common key data and application architecture gaps that I have observed from a data standpoint while driving to a SVC (Single View of Customer) with multiple leading enterprises.

  1. The lack of a single, unique & global customer identifier – The need to create a single universal customer identifier (based on various departmental or line of business identifiers) and to use it as a primary key in the customer master list
  2. Once the identifier is created in either the source system or in the datalake, organizations need to figure out a way to cascade that identifier into the Book of Record systems (CRM systems, webapps and ERP systems) so that the architecture can begin knitting together a single view of the customer. This may also involve periodically go out across the BOR systems, link all the customers data and pull the data into the lake;
  3. Many companies deal with multiple customer on-boarding systems. At some point, these  on-boarding processes need  to be centralized. For instance in Banking esp In Capital markets, customer on-boarding done in six or seven different areas; all of these ideally need to be consolidated into one.
  4. Graph Data Semantics – Once created, the Master Customer identifier should be mapped to all the other identifiers lines of business use to uniquely identify their customer; the ability to use simple or more complex matching techniques (Rule based matching, machine learning based matching & search based matching) is highly called for.
  5. MDM (Master Data Management) systems have traditionally automated some of this process by creating & owning that unique customer identifier. However Big Data capabilities help by linking that unique customer identifier to all the other ways the customer may be mapped across the organization. To this end,  data may be exported into an MDM system backed by a traditional RDBMS; or; the computation of the unique identifier can be done in a data lake and then exported into an MDM system.

Let us discuss the generic design of the architecture (depicted above) with a focus on the following subsystems –

A Reference Architecture for Single View of Customer/ Customer 360
  1. At the very top, different channels depict with different touch points In today’s connected world, the customer experience spans multiple different touch points throughout the customer lifecycle. A customer should be able to move through multiple different touch points during the buying process. Customers should be able to start, pause transactions (e.g. An Auto Loan application) from one channel and restart/complete them from another.
  2. A Big Data enabled application architecture is chosen. This needs to account for two different data processing paradigms. The first is a realtime component. The architecture must be capable of handling events within a few milliseconds. The second is an ability to handle massive scale data analysis in a retrospective manner. Both these components are provided by a Hadoop stack. The real time component leverages – Apache NiFi, Apache HBase, HDFS, Kafka, Storm and Spark. The batch component leverages  HBase, Apache Titan, Apache Hive, Spark and MapReduce.
  3. The range of Book of Record and external systems send data into the central datalake. Both realtime and batch components highlighted above send the data into the lake. The design of the lake itself will be covered in more detail in the below section.
  4. Starting from the upper-left side, we have the Book of Record Systems sending across transactions. These are ingested into the lake using any of the different ingestion frameworks provided in Hadoop. E.g. Flume, Kafka, Sqoop, HDFS API for batch transfers etc.  The ingestion layer depicted is based on Apache NiFi and is used to load data into the data lake.  Functionally, it is made up of real time data loaders and end of day data loaders. The real time loaders load the data as it is created in the feeder systems, the EOD data loaders will adjust the data end of the day based on the P&L sign off and the end of day close processes.  The main data feeds for the system will be from the book of record transaction systems (BORTS) but there may also be multiple data feeds from transaction data providers and customer information systems.
  5. The UI Framework is standardized across all kinds of clients. For instance this could be an HTML 5 GUI Framework that contains reusable widgets that can be used for mobile and browser based applications.  The framework also need to deal with common mobile issues such as bandwidth and be able to automatically throttle the data back where bandwidth is limited.It also needs to facilitate the construction of large user defined pivot tables for ad hoc reporting. It utilizes UI framework components for its GUI construction and communicates with the application server via the web services layer.
  6. API access is also provided by Web Services for partner applications to leverage: This is the application layer that that provides a set of RESTful web services that control the GUI behavior and that control access to the persistent data and the data that is cached on the data fabric.
  7. The transactions are taken through the pipeline of enrichment and the profiles of customers are stored in HBase. .
  8. The core data processing platform is then based on a datalake pattern which has been covered in this blog before. It includes the following pattern of processing.
    1. Data is ingested real time into a HBase database (which uses HDFS as the underlying storage layer). Tables are designed in HBase to store the profile of a trade and it’s lifecycle.
    2. Producers are authenticated at the point of ingest.
    3. Once the data has been ingested into HDFS, it is taken through a pipeline of processing (L0 to L3) as depicted in the below blogpost.


  9. Speed Layer: The computational grid that makes up the Speed layer can be a distributed in memory data fabric like Infinispan or GemFire, or a computation process can be overlaid directly onto a stateful data fabric technology like Spark or GemFire. The choice is dependent of the language choices that have been made in building the other key analytic libraries. If multiple language bindings are required (e.g. C# & Java) then the data fabric will typically be a different product than the Grid.

Data Science for Customer 360

 Consider the following usecases that are all covered under Customer 360 –

  1. The ability to segment customers into categories based on granular data attributes
  2. Improve customer targeting for new promotions & increasing acquisition rate
  3. Increasing cross sell and upsell rates
  4. Understanding influencers among customer segments & helping these net promoters recommend products to other customers
  5. Performing market basket analysis of what products/services are typically purchased together
  6. Understanding customer risk profiles
  7. Creating realtime views of customer lifetime value (CLV)
  8. Reducing customer attrition

The obvious capability that underlies all of these is Data Science. Thus, Predictive Analytics is the key compelling paradigm that enables the buildout of the dynamic Customer 360.

The Predictive Analytics workflow always starts with a business problem in mind. Examples of these would be “A marketing project to detect which customers are likely to buy new products or services in the next six months based on their historical & real time product usage patterns – which are denoted by x,y or z characteristics” or “Detect realtime fraud in credit card transactions.” or “Perform certain algorithms based on the predictions”. In usecases like these, the goal of the data science process is to be able to segment & filter customers by corralling them into categories that enable easy ranking. Once this is done, the business is involved to setup easy and intuitive visualization to present the results. In the machine learning process, an entire spectrum of algorithms can be tried to solve such business problems.

A lot of times, business groups working on Customer 360 projects have a hard time explaining what they would like to see – both data and the visualization. In such cases, a prototype makes things way more easy from a requirements gathering standpoint.  Once the problem is defined, the data scientist/modeler identifies the raw data sources (both internal and external) which comprise the execution of the business challenge.  They spend a lot of time in the process of collating the data (from Oracle, DB2, Mainframe, Greenplum, Excel sheets, External datasets etc). The cleanup process involves fixing a lot of missing values, corrupted data elements, formatting fields that indicate time and date etc.

The Data Scientist working with the business needs to determine how much of this raw data is useful and how much of it needs to be massaged to create a Customer 360 view. Some of this data needs to be extrapolated to form the features using formulas – so that a model can be created. The models created often involve using languages such as R and Python.

Feature engineering takes in business features in the form of feature vectors and creates predictive features from them. The Data Scientist takes the raw features and creates a model using a mix of various algorithms. Once the model has been repeatedly tested for accuracy and performance, it is typically deployed as a service.

The transformation phase involves writing code to be able to to join up like elements so that a single client’s complete dataset is gathered in the Data Lake from a raw features standpoint.  If more data is obtained as the development cycle is underway,  the Data Science team has no option but to go back & redo the whole process.

Models as a Service (MaaS) is the Data Science counterpart to Software as a Service.The MaaS takes in business variables (often hundreds of them as inputs) and provides as output business decisions/intelligence, measurements, visualizations that augment decision support systems.

Once these models are deployed and updated nightly based on their performance – the serving layer takes advantage of them to drive real time 360 decisioning.

To Sum Up…

In this short series we have discussed that customers and data about their history, preferences, patterns of behavior, aspirations etc are the most important corporate asset. Big Data technology and advances made in data storage, processing and analytics can help architect a dynamic Single View that can help maximize competitive advantage across every industry vertical.

The Definitive Reference Architecture for Market Surveillance (CAT, UMIR and MiFiD II) in Capital Markets..

We have discussed the topic of market surveillance reporting to some depth in previous blogs. e.g.http://www.vamsitalkstech.com/?p=2984.  Over the last decade, Global Financial Markets have embraced the high speed of electronic trading. This trend has only accelerated with the concomitant explosion in trading volumes. The diverse range of instruments & the proliferation of trading venues pose massive regulatory challenges in the area of market conduct supervision and abuse prevention. Banks, Broker dealers, Exchanges and other market participants across the globe are now shelling out millions of dollars in fines for failure to accurately report on market abuse violations. In response to this complex world of high volume & low touch electronic trading, global capital markets regulators have been hard at work across different jurisdictions & global hubs e.g. the FINRA in the US, the IROC in Canada and the ESMA in the European Union. Regulators have created extensive reporting regimes for surveillance with a view to detecting suspicious patterns of trade behavior (e.g, dumping, quote stuffing & non bonafide fake orders etc). The intent to increase market transparency on both the buy and the sell side. Based on the scrutiny Capital Markets players are under, a Big Data Analytics based architecture has become a “must-have” to ensure timely & accurate compliance with these mandates. This blog attempts to discuss such a reference architecture.

Business Technology Requirements for Market Surveillance..

The business requirements for the Surveillance architecture are covered at the below link in more detail but are reproduced below in a concise fashion.

A POV on European Banking Regulation.. MAR, MiFiD II et al

Some of the key business requirements that can be distilled from regulatory mandates include the below:

  • Store heterogeneous data – Both MiFiD II and MAR mandate the need to perform trade monitoring & analysis on not just real time data but also historical data spanning a few years. Among others this will include data feeds from a range of business systems – trade data, eComms, aComms, valuation & position data, order management systems, position management systems, reference data, rates, market data, client data, front, middle & back office, data, voice, chat & other internal communications etc. To sum up, the ability to store a range of cross asset (almost all kinds of instruments), cross format (structured & unstructured including voice), cross venue (exchange, OTC etc) trading data with a higher degree of granularity – is key.
  • Data Auditing – Such stored data needs to be fully auditable for 5 years. This implies not just being able to store it but also putting in place capabilities in place to ensure  strict governance & audit trail capabilities.
  • Manage a huge volume increase in data storage requirements (5+ years) due to extensive Record keeping requirements
  • Perform Realtime Surveillance & Monitoring of data – Once data is collected,  normalized & segmented, it will need to support realtime monitoring of data (around 5 seconds) to ensure that every trade can be tracked through it’s lifecycle. Detecting patterns that could perform surveillance for market abuse and monitor for best execution are key.
  • Business Rules  – Core logic that deals with identifying some of the above trade patterns are created using business rules. Business Rules have been covered in various areas in the blog but they primarily work based on an IF..THEN..ELSE construct.
  • Machine Learning & Predictive Analytics – A variety of supervised ad unsupervised learning approaches can be used to perform extensive Behavioral modeling & Segmentation to discover transactions behavior with a view to identifying behavioral patterns of traders & any outlier behaviors that connote potential regulatory violations.
  • A Single View of an Institutional Client- From the firm’s standpoint, it would be very useful to have a single view capability for clients that shows all of their positions across multiple desks, risk position, KYC score etc.

A Reference Architecture for Market Surveillance ..

This reference architecture aims to provide generic guidance to banking Business IT Architects building solutions in the realm of Market & Trade Surveillance. This supports a host of hugely important global reg reporting mandates – CAT, MiFiD II, MAR etc that Capital Markets need to comply with. While the concepts discussed in this solution architecture discussed are definitely Big Data oriented, they are largely agnostic to any cloud implementation – private, public or hybrid.

A Market Surveillance system needs to include both real time surveillance of trading activity as well as a retrospective (batch oriented) analysis component. The real time component includes the ability to perform realtime calculations (concerning thresholds, breached limits etc). real time queries with the goal of triggering alerts. Both these kinds of analytics span structured and unstructured data sources. For the batch component, the analytics involve data queries, simple to advanced statistics (min, max, avg, std deviation, sorting, binning, segmentation) to running data science models involving text analysis & search etc.

The system needs to process tens of millions to billions of events in a trading window while providing highest uptime guarantees. Batch analysis is always running in the background.

A Hadoop distribution that includes components such as Kafka, HBase and near real time components such as Storm & Spark Streaming provide a good fit for a responsive architecture. Apache NiFi with its ability to ingest data from a range of sources is preferred for it’s ability to support complex data routing, transformation, and system mediation logic in a complex event processing architecture. The capabilities of Hortonworks Data Flow (the enterprise version of Apache NiFi) is covered in the below blogpost in much detail.

Use Hortonworks Data Flow (HDF) To Connect The Dots In Financial Services..(3/3)

A Quick Note on Data Ingestion..

Data volumes in the area of Regulatory reporting can be huge to insanely massive. For instance, at large banks, they can go up to 100s of millions of transactions a day. At market venues such as stock exchanges, they easily enter into the hundreds of billions of messages every trading day. However the data itself is extremely powerful & is really business gold in terms of allowing banks to not just file mundane reg reports but also to perform critical line of business processes such as Single View of  Customer, Order Book Analysis, TCA (Transaction Cost Analysis), Algo Backtesting, Price Creation Analysis etc. The architecture thus needs to support multiple ways of storage, analysis and reporting ranging from compliance reporting to data scientists to business intelligence.

Real time processing in this proposed architecture are powered by Apache NiFi. There are five important reasons for this decision – 

  • First of all, complex rules can be defined in NiFi in a very flexible manner. As an example, one can execute SQL queries in processor A against incoming data from any source (data that isnt from a relational databases but JSON, Avro etc.) and  then route different results to different downstream processors based on the needs for processing while enriching it. E.g. Processor A could be event driven and if any data is being routed there, a field can be added, or an alert sent to XYZ. Essentially this can be very complex, equivalent to a nested rules engine so to speak. 
  • From a Throughput standpoint, a single NiFi node can typically handle somewhere between 50MB/s to 150MB/s depending on your hardware spec and data structure. Assuming 100-500 kbytes of average messages, for a throughput of 600MB/s, the architecture can be sized to about 5-10 NiFi nodes. It is important to note that performance latency of inbound message processing depends on the network, could be extremely small. Under the hood, you are sending data from source to NIfi node (disk), extract some attributes in memory to process, and deliver to the target system.
  • Data quality can be handled via the aforementioned “nested rules engine” approach, consisting of multiple NiFi processors. One can even embed an entire rules engine into a single processor. Similarly, you can define simple authentication rules at the event level. For instance, if Field A = English, route the message to an “authenticated” relationship; otherwise send it to an “unauthenticated” relationship.

  • One of the corner stones in NiFi is called “Data Provenance“, allowing you to have end to end traceability. Not only can the event lifecycle of trade data be traced but you can also track the time at which it happened & the user role who made the change and metadata around why did it happen.

  • Security – NiFi enables authentication at ingest. One can authenticate data via the rules defined in NiFi, or leverage target system authentication which is implemented at processor level. For example, the PutHDFS processor supports kerberized HDFS, the same applies for Solr and so on.

Overall Processing flow..

The below illustration shows the high-level conceptual architecture. The architecture is composed of core platform services and application-level components to facilitate the processing needs across three major areas of a typical surveillance reporting solution:

  • Connectivity to a range of trade data sources
  • Data processing, transformation & analytics
  • Visualization and business connectivity
Reference Architecture for Market Surveillance Reg Reporting – CAT, MAR,MiFiD II et al

The overall processing of data follows the order shown below and depicted in the diagram below –

  1. Data Production – Data related to Trades and their lifecycle is produced from a range of business systems. These data feeds from a range of business systems (including but not limited to) – trade data, valuation & position data, order management systems, position management systems, reference data, rates, market data, client data, front, middle & back office, data, voice, chat & other internal communications etc.
  2. Data Ingestion – Data produced from the the above layer is ingested using Apache NiFi from a range of sources described above. Data can also be filtered and alerts can be setup based on complex event logic. For time series data support HBase can be leveraged along with OpenTSDB. For CEP requirements, such as sliding windows and complex operators, NiFi can be leveraged along with Kafka and Storm pipeline.  Using NiFi will make the process easier to load data into the data lake while applying guarantees around the delivery itself.  Data can be streamed in real time as it is created in the feeder systems. Data is also loaded at end of the trading day based on the P&L sign off and the end of day close processes.  The majority of the data will be fed in from Book of Record Trading systems as well as from market data providers.
  3. As trade and other data is ingested into the data lake, it is important to note that the route in which certain streams are processed will differ from how other streams are processed. Thus the ingest architecture needs to support multiple types of processing ranging from in memory processing, intermediate transformation processing on certain data streams to produce a different representation of the stream. This is where NiFi adds critical support in not just handling a huge transaction throughput but also enabling “on the fly processing” of data in pipelines. As mentioned, NiFi does this via the concept of “processors”.
  4. The core data processing platform is then based on a datalake pattern which has been covered in this blog before. It includes the following pattern of processing.
    1. Data is ingested real time into a HBase database (which uses HDFS as the underlying storage layer). Tables are designed in HBase to store the profile of a trade and it’s lifecycle.
    2. Producers are authenticated at the point of ingest.
    3. Once the data has been ingested into HDFS, it is taken through a pipeline of processing (L0 to L3) as depicted in the below blogpost.


    4. Historical data (defined as T+1) once in the HDFS tier is taken through layers of processing as discussed above. One of the key areas of processing is to run machine learning on the data to discover any hidden patterns in the trades themselves. Patterns that can connote a range of suspicious behavior. Most surveillance applications are based on a search for data that breaches thresholds and seek to match sell & buy orders. The idea is that when these rules are breached, alerts are then generated for compliance officers to conduct further investigation. However this method falls short with complex types of market abuse.A range of supervised learning techniques can then be applied on data such as creating a behavioral profile of different kinds of traders (for instance junior and senior) by classifying & then scoring them based on their likelihood to commit fraud. Thus a range of Surveillance Analytics can be performed on the data. Apache Spark, is highly recommended for near realtime processing not only due to its high performance characteristics but also due to its native support for graph analytics and machine learning – both of which are critical to surveillance reporting.For a deeper look at data science, I recommend the below post.


    5. The other important value driver in deploying Data Science is to perform Advanced Transaction Monitoring Intelligence.  The core idea is to get years worth of Trade data in one location (i.e the datalake) & then applying  unsupervised learning to glean patterns in those transactions. The goal is then to identify profiles of actors with the intent of feeding it into existing downstream surveillance & TM systems.
    6. This knowledge can then be used to constantly learn transaction behavior for similar traders. This can be a very important capability in detecting fraud in traders, customer accounts and instruments.Some of the usecases are –
      • Profile trading activity of individuals with similar traits (types of customers, trading desks & instruments, geographical areas of operations etc.) to perform Know Your Trader
      • Segment traders by similar experience levels and behavior
      • Understand common fraudulent behavior typologies (e.g. spoofing) and clustering such (malicious) trading activities by trader, instrument and volume etc. The goal being to raise appropriate downstream investigation case management system
      • Using advanced data processing techniques like Natural Language Processing, constantly analyze electronic communications and join them up with trade data sources to both detect under the radar activity but also to keep the false positive rate low.
    7. Graph Database – Given that most kinds of trading fraud happens in groups of actors – traders acting in collusion with  verification & compliance – the ability to view complex relationships of interactions and the strength of those interactions can be a significant monitoring capability
    8. Grid Layer – To improve performance, I propose the usage of a distributed in memory data fabric like JBOSS DataGrid or Pivotal GemFire. This can aid in two ways –

      a. Help with fast lookup of data elements by the visualization layer
      b. Help perform fast computation process by overlaying a framework like Spark or MapReduce directly onto a stateful data fabric.

      The choice of tools here is dependent of the language choices that have been made in building the pricing and risk analytic libraries across the Bank. If multiple language bindings are required (e.g. C# & Java) then the data fabric will typically be a different product than the Grid.

      Data Visualization…

      The visualization solution chose shouldI enable the quick creation of interactive dashboards that provide KPIs and other important business metrics from a process monitoring standpoint. Various levels of dashboard need to be created ranging from compliance officer toolboxes, executive dashboard to help identify trends and discover valuable insights.

      Compliance Officer Toolbox (Courtesy: Arcadia Data)

      Additionally, the visualization layer shall provide

      a) A single view of Trader or Trade or Instrument or Entity

      b) Investigative workbench with Case Management capability

      c) The ability follow the lifecycle of a trade

      d) The ability to perform ad hoc queries over multiple attributes

      e) Activity correlation across historical and current data sets

      f) Alerting on specific metrics and KPIs

      To Sum Up…

      The solution architecture described in this blogpost is designed with peaceful enterprise co-existence in mind. In the sense, it interacts and is also integrated with a range of BORT systems and other enterprise systems such as ERP, CRM, legacy surveillance systems. This includes all and any other line of business solutions that typically exist as shared enterprise resources (such as CRM or ERP systems or other line-of-business solutions).

Design and Architecture of A Robo-Advisor Platform..(3/3)

This three part series explores the automated investment management or the “Robo-advisor” (RA) movement. The first post in this series @- http://www.vamsitalkstech.com/?p=2329 – discussed how Wealth Management has been an area largely untouched by automation as far as the front office is concerned. As a result, automated investment vehicles have largely begun changing that trend and they helping create a variety of business models in the industry esp those catering to the Millenial Mass Affluent Segment. The second post @- http://www.vamsitalkstech.com/?p=2418  focused on the overall business model & main functions of a Robo-Advisor (RA). This third and final post covers a generic technology architecture for a RA platform.

Business Requirements for a Robo-Advisor (RA) Platform…

Some of the key business requirements of a RA platform that confer it advantages as compared to the manual/human driven style of investing are:

  • Collect Individual Client Data – RA Platforms need to offer a high degree of customization from the standpoint of an individual investor. This means an ability to provide a preferably mobile and web interface to capture detailed customer financial background, existing investments as well as any historical data regarding customer segments etc.
  • Client Segmentation – Clients are to be segmented  across granular segments as opposed to the traditional asset based methodology (e.g mass affluent, high net worth, ultra high net worth etc).
  • Algorithm Based Investment Allocation – Once the client data is collected,  normalized & segmented –  a variety of algorithms are applied to the data to classify the client’s overall risk profile and an investment portfolio is allocated based on those requirements. Appropriate securities are purchased as we will discuss in the below sections.
  • Portfolio Rebalancing  – The client’s portfolio is rebalanced appropriately depending on life event changes and market movements.
  • Tax Loss Harvesting – Tax-loss harvesting is the mechanism of selling securities that have a loss associated with them. By doing so or by taking  a loss, the idea is that that client can offset taxes on both gains and income. The sold securities are replaced by similar securities by the RA platform thus maintaining the optimal investment mix.
  • A Single View of a Client’s Financial History- From the WM firm’s standpoint, it would be very useful to have a single view capability for a RA client that shows all of their accounts, interactions & preferences in one view.

User Interface Requirements for a Robo-Advisor (RA) Platform…

Once a customer logs in using any of the digital channels supported (e.g. Mobile, eBanking, Phone etc)  – they are presented with a single view of all their accounts. This view has a few critical areas – Summary View (showing an aggregated view of their financial picture), the Transfer View (allowing one to transfer funds across accounts with other providers).

The Summary View lists the below

  • Demographic info: Customer name, address, age
  • Relationships: customer rating influence, connections, associations across client groups
  • Current activity: financial products, account interactions, any burning customer issues, payments missed etc
  • Customer Journey Graph: which products or services they are associated with since the time they became a customer first etc,

Depending on the clients risk tolerance and investment horizon, the weighted allocation of investments across these categories will vary. To illustrate this, a Model Portfolio and an example are shown below.

Algorithms for a Robo-Advisor (RA) Platform…

There are a variety of algorithmic approaches that could be taken to building out an RA platform. However the common feature of all of these is to –

  • Leverage data science & statistical modeling to automatically allocate client wealth across different asset classes (such as domestic/foreign stocks, bonds & real estate related securities) to automatically rebalance portfolio positions based on changing market conditions or client preferences. These investment decisions are also made based on detailed behavioral understanding of a client’s financial journey metrics – Age, Risk Appetite & other related information. 
  • A mixture of different algorithms can be used such as Modern Portfolio Theory (MPT), Capital Asset Pricing Model (CAPM), the Black Litterman Model, the Fama-French etc. These are used to allocate assets as well as to adjust positions based on market movements and conditions.
  • RA platforms also provide 24×7 tracking of market movements to use that to track rebalancing decisions from not just a portfolio standpoint but also from a taxation standpoint.

Model Portfolios…

  1. Equity  

             A) US Domestic Stock – Large Cap, Medium Cap , Small Cap, Dividend Stocks 

             B) Foreign Stock – Emerging Markets, Developed Markets

       2. Fixed Income

             A) Developed Market Bonds 

             B) US Bonds

             C) International Bonds

             D) Emerging Markets Bonds

      3. Other 

             A) Real Estate  

             B) Currencies

             C) Gold and Precious Metals

             D) Commodities

       4. Cash

Sample Portfolios – for an aggressive investor…

  1. Equity  – 85%

             A) US Domestic Stock (50%) – Large Cap – 30%, Medium Cap – 10% , Small Cap – 10%, Dividend Stocks – 0%

             B) Foreign Stock – (35%) –  Emerging Markets – 18%, Developed Markets – 17% 

       2. Fixed Income – 5%

             A) Developed Market Bonds  – 2%

             B) US Bonds – 1%

             C) International Bonds – 1%

             D) Emerging Markets Bonds – 1%

      3. Other – 5%

             A) Real Estate  – 3%

             B) Currencies – 0%

             C) Gold and Precious Metals – 0%

             D) Commodities – 2%

       4. Cash – 5%

Technology Requirements for a Robo-Advisor (RA) Platform…

An intelligent RA platform has a few core technology requirements (based on the above business requirements).

  1. A Single Data Repository – A shared data repository called a Data Lake is created, that can capture every bit of client data (explained in more detail below) as well as external data. The RA datalake provides more visibility into all data to a variety of different stakeholders. Wealth Advisors access processed data to view client accounts etc. Clients can access their own detailed positions,account balances etc. The Risk group accesses this shared data lake to processes more position, execution and balance data.  Data Scientists (or Quants) who develop models for the RA platform also access this data to perform analysis on fresh data (from the current workday) or on historical data. All historical data is available for at least five years—much longer than before. Moreover, the Hadoop platform enables ingest of data across a range of systems despite their having disparate data definitions and infrastructures. All the data that pertains to trade decisions and lifecycle needs to be made resident in a general enterprise storage pool that is run on the HDFS (Hadoop Distributed Filesystem) or similar Cloud based filesystem. This repository is augmented by incremental feeds with intra-day trading activity data that will be streamed in using technologies like Sqoop, Kafka and Storm.
  2. Customer Data Collection – Existing Financial Data across the below categories is collected & aggregated into the data lake. This data ranges from Customer Data, Reference Data, Market Data & other Client communications. All of this data, can be ingested using a API or pulled into the lake from a relational system using connectors supplied in the RA Data Platform. Examples of data collected include – Customer’s existing Brokerage accounts, Customer’s Savings Accounts, Behavioral Finance Suveys and Questionnaires etc etc. The RA Data Lake stores all internal & external data.
  3. Algorithms – The core of the RA Platform are data science algos. Whatever algorithms are used – a few critical workflows are common to them. The first is Asset Allocation is to take the customers input in the “ADVICE” tab for each type of account and to tailor the portfolio based on the input. The others include Portfolio Rebalancing and Tax Loss Harvesting.
  4. The RA platform should be able to store market data across years both from a macro and from an individual portfolio standpoint so that several key risk measures such as volatility (e.g. position risk, any residual risk and market risk), Beta, and R-Squared – can be calculated at multiple levels.  This for individual securities, a specified index, and for the client portfolio as a whole.


                      Illustration: Architecture of a Robo-Advisor (RA) Platform 

The overall logical flow of data in the system –

  • Information sources are depicted at the left. These encompass a variety of institutional, system and human actors potentially sending thousands of real time messages per hour or by sending over batch feeds.
  • A highly scalable messaging system to help bring these feeds into the RA Platform architecture as well as normalize them and send them in for further processing. Apache Kafka is a good choice for this tier. Realtime data is published by a range of systems over Kafka queues. Each of the transactions could potentially include 100s of attributes that can be analyzed in real time to detect business patterns.  We leverage Kafka integration with Apache Storm to read one value at a time and perform some kind of storage like persist the data into a HBase cluster.In a modern data architecture built on Apache Hadoop, Kafka ( a fast, scalable and durable message broker) works in combination with Storm, HBase (and Spark) for real-time analysis and rendering of streaming data. 
  • Trade data is thus streamed into the platform (on a T+1 basis), which thus ingests, collects, transforms and analyzes core information in real time. The analysis can be both simple and complex event processing & based on pre-existing rules that can be defined in a rules engine, which is invoked with Apache Storm. A Complex Event Processing (CEP) tier can process these feeds at scale to understand relationships among them; where the relationships among these events are defined by business owners in a non technical or by developers in a technical language. Apache Storm integrates with Kafka to process incoming data. 
  • For Real time or Batch Analytics, Apache HBase provides near real-time, random read and write access to tables (or ‘maps’) storing billions of rows and millions of columns. In this case once we store this rapidly and continuously growing dataset from the information producers, we are able  to do perform super fast lookup for analytics irrespective of the data size.
  • Data that has analytic relevance and needs to be kept for offline or batch processing can be stored using the Hadoop Distributed Filesystem (HDFS) or an equivalent filesystem such as Amazon S3 or EMC Isilon or Red Hat Gluster. The idea to deploy Hadoop oriented workloads (MapReduce, or, Machine Learning) directly on the data layer. This is done to perform analytics on small, medium or massive data volumes over a period of time. Historical data can be fed into Machine Learning models created above and commingled with streaming data as discussed in step 1.
  • Horizontal scale-out (read Cloud based IaaS) is preferred as a deployment approach as this helps the architecture scale linearly as the loads placed on the system increase over time. This approach enables the Market Surveillance engine to distribute the load dynamically across a cluster of cloud based servers based on trade data volumes.
  • It is recommended to take an incremental approach to building the RA platform, once all data resides in a general enterprise storage pool and makes the data accessible to many analytical workloads including Trade Surveillance, Risk, Compliance, etc. A shared data repository across multiple lines of business provides more visibility into all intra-day trading activities. Data can be also fed into downstream systems in a seamless manner using technologies like SQOOP, Kafka and Storm. The results of the processing and queries can be exported in various data formats, a simple CSV/txt format or more optimized binary formats, json formats, or you can plug in custom SERDE for custom formats. Additionally, with HIVE or HBASE, data within HDFS can be queried via standard SQL using JDBC or ODBC. The results will be in the form of standard relational DB data types (e.g. String, Date, Numeric, Boolean). Finally, REST APIs in HDP natively support both JSON and XML output by default.
  • Operational data across a bunch of asset classes, risk types and geographies is thus available to investment analysts during the entire trading window when markets are still open, enabling them to reduce risk of that day’s trading activities. The specific advantages to this approach are two-fold: Existing architectures typically are only able to hold a limited set of asset classes within a given system. This means that the data is only assembled for risk processing at the end of the day. In addition, historical data is often not available in sufficient detail. Hadoop accelerates a firm’s speed-to-analytics and also extends its data retention timeline
  • Apache Atlas is used to provide Data Governance capabilities in the platform that use both prescriptive and forensic models, which are enriched by a given businesses data taxonomy and metadata.  This allows for tagging of trade data  between the different businesses data views, which is a key requirement for good data governance and reporting. Atlas also provides audit trail management as data is processed in a pipeline in the lake
  • Another important capability that Big Data/Hadoop can provide is the establishment and adoption of a Lightweight Entity ID service – which aids dramatically in the holistic viewing & audit tracking of trades. The service will consist of entity assignment for both institutional and individual traders. The goal here is to get each target institution to propagate the Entity ID back into their trade booking and execution systems, then transaction data will flow into the lake with this ID attached providing a way to do Client 360.
  • Output data elements can be written out to HDFS, and managed by HBase. From here, reports and visualizations can easily be constructed. One can optionally layer in search and/or workflow engines to present the right data to the right business user at the right time.  


As one can see clearly, though automated investing methods are still in early stages of maturity – they hold out a tremendous amount of promise. As they are unmistakably the next big trend in the WM industry industry players should begin developing such capabilities.

The Three Core Competencies of Digital – Cloud, Big Data & Intelligent Middleware

Ultimately, the cloud is the latest example of Schumpeterian creative destruction: creating wealth for those who exploit it; and leading to the demise of those that don’t.” – Joe Weiman author of Cloudonomics: The Business Value of Cloud Computing


The  Cloud As a Venue for Digital Workloads…

As 2016 draws to a close, it can safely be said that no industry leader questions the existence of the new Digital Economy and the fact that every firm out there needs to create a digital strategy. Myriad organizations are taking serious business steps to making their platforms highly customer-centric via a renewed operational metrics focus. They are also working on creating new business models using their Analytics investments. Examples of these verticals include Banking, Insurance, Telecom, Healthcare, Energy etc.

As a general trend, the Digital Economy brings immense opportunities while exposing firms to risks as well. Customers now demanding highly contextual products, services and experiences – all accessible via an easy API (Application Programming Interfaces).

Big Data Analytics (BDA) software revenues will grow from nearly $122B in 2015 to more than $187B in 2019 – according to Forbes [1].  At the same time, it is clear that exploding data generation across the global economy has become a clear & present business phenomenon. Data volumes are rapidly expanding across industries. However, while the production of data itself that has increased but it is also driving the need for organizations to derive business value from it. As IT leaders know well, digital capabilities need low cost yet massively scalable & agile information delivery platforms – which only Cloud Computing can provide.

For a more detailed technical overview- please visit below link.


Big Data & Big Data Analytics drive consumer interactions.. 

The onset of Digital Architectures in enterprise businesses implies the ability to drive continuous online interactions with global consumers/customers/clients or patients. The goal is not just provide engaging visualization but also to personalize services clients care about across multiple channels of interaction. The only way to attain digital success is to understand your customers at a micro level while constantly making strategic decisions on your offerings to the market. Big Data has become the catalyst in this massive disruption as it can help business in any vertical solve their need to understand their customers better & perceive trends before the competition does. Big Data thus provides the foundational  platform for successful business platforms.

The three key areas where Big Data & Cloud Computing intersect are – 

  • Data Science and Exploration
  • ETL, Data Backups and Data Preparation
  • Analytics and Reporting

Big Data drives business usecases in Digital in myriad ways – key examples include  –  

  1. Obtaining a realtime Single View of an entity (typically a customer across multiple channels, product silos & geographies)
  2. Customer Segmentation by helping businesses understand their customers down to the individual micro level as well as at a segment level
  3. Customer sentiment analysis by combining internal organizational data, clickstream data, sentiment analysis with structured sales history to provide a clear view into consumer behavior.
  4. Product Recommendation engines which provide compelling personal product recommendations by mining realtime consumer sentiment, product affinity information with historical data.
  5. Market Basket Analysis, observing consumer purchase history and enriching this data with social media, web activity, and community sentiment regarding past purchase and future buying trends.

Further, Digital implies the need for sophisticated, multifactor business analytics that need to be performed in near real time on gigantic data volumes. The only deployment paradigm capable of handling such needs is Cloud Computing – whether public or private. Cloud was initially touted as a platform to rapidly provision compute resources. Now with the advent of Digital technologies, the Cloud & Big Data will combine to process & store all this information.  According to the IDC , by 2020 spending on Cloud based Big Data Analytics will outpace on-premise by a factor of 4.5. [2]

Intelligent Middleware provides Digital Agility.. 

Digital Applications are applications modular, flexible and responsive to a variety of access methods – mobile & non mobile. These applications are also highly process driven and support the highest degree of automation. The need of the hour is to provide enterprise architecture capabilities around designing flexible digital platforms that are built around efficient use of data, speed, agility and a service oriented architecture. The choice of open source is key as it allows for a modular and flexible architecture that can be modified and adopted in a phased manner – as you will shortly see.

The intention in adopting a SOA (or even a microservices) architecture for Digital capabilities is to allow lines of business an ability to incrementally plug in lightweight business services like customer on-boarding, electronic patient records, performance measurement, trade surveillance, risk analytics, claims management etc.

Intelligent Middleware adds significant value in six specific areas –

  1. Supports a high degree of Process Automation & Orchestration thus enabling the rapid conversion of paper based business processes to a true digital form in a manner that lends itself to continuous improvement & optimization
  2. Business Rules help by adding a high degree of business flexibility & responsiveness
  3. Native Mobile Applications  enables platforms to support a range of devices & consumer behavior across those front ends
  4. Platforms As a Service engines which enable rapid application & business capability development across a range of runtimes and container paradigms
  5. Business Process Integration engines which enable rapid application & business capability development
  6. Middleware brings the notion of DevOps into the equation. Digital projects bring several technology & culture challenges which can be solved by a greater degree of collaboration, continuous development cycles & new toolchains without giving up proven integration with existing (or legacy)systems.

Intelligent Middleware not only enables Automation & Orchestration but also provides an assembly environment to string different (micro)services together. Finally, it also enables less technical analysts to drive application lifecycle as much as possible.

Further, Digital business projects call out for mobile native applications – which a forward looking middleware stack will support.Middleware is a key component for driving innovation and improving operational efficiency.

Five Key Business Drivers for combining Big Data, Intelligent Middleware & the Cloud…

The key benefits of combining the above paradigms to create new Digital Applications are –

  • Enable Elastic Scalability Across the Digital Stack
    Cloud computing can handle the storage and processing of any amount of data & any kind of data.This calls for the collection & curation of data from dynamic and highly distributed sources such as consumer transactions, B2B interactions, machines such as ATM’s & geo location devices, click streams, social media feeds, server & application log files and multimedia content such as videos etc. It needs to be noted that data volumes here consist of multi-varied formats, differing schemas, transport protocols and velocities. Cloud computing provides the underlying elastic foundation to analyze these datasets.
  • Support Polyglot Development, Data Science & Visualization
    Cloud technologies are polyglot in nature. Developers can choose from a range of programming languages (Java, Python, R, Scala and C# etc) and development frameworks (such as Spark and Storm). Cloud offerings also enable data visualization using a range of tools from Excel to BI Platforms.
  • Reduce Time to Market for Digital Business Capabilities
    Enterprises can avoid time consuming installation, setup & other upfront procedures. consuming  can deploy Hadoop in the cloud without buying new hardware or incurring other up-front costs. On the same vein, even big data analytics should be able to support self service across the lifecycle – from data acquisition, preparation, analysis & visualization.
  • Support a multitude of Deployment Options – Private/Public/Hybrid Cloud 
    A range of scenarios for product development, testing, deployment, backup or cloudbursting are efficiently supported in pursuit of cost & flexibility goals.
  • Fill the Talent Gap
    Open Source technology is the common thread across Cloud, Big Data and Middleware. The hope is that the ubiquity of open source will serve as a critical level in enabling the filling up of the IT-Business skills scarcity gap.

As opposed to building standalone or one-off business applications, a ‘Digital Platform Mindset’ is a more holistic approach capable of producing higher rates of adoption & thus revenues. Platforms abound in the web-scale world at shops like Apple, Facebook & Google etc. Digital Applications are constructed like lego blocks  and they reuse customer & interaction data to drive cross sell and up sell among different product lines. The key components here are to ensure that one starts off with products with high customer attachment & retention. While increasing brand value, it is key to ensure that customers & partners can also collaborate in the improvements in the various applications hosted on top of the platform.


[1] Forbes Roundup of Big Data Analytics (BDA) Report


[2] IDC FutureScape: Worldwide Big Data and Analytics 2016 Predictions

Can Your CIO Do Digital?

Business model innovation is the new contribution of IT”  — Werner Boeing, CIO, Roche Diagnostics

Digital Is Changing the Role of the Industry CIO…

A Motley crew of some what interrelated technologies – Cloud Computing, Big Data Platforms, Predictive Analytics & Mobile Applications are changing the enterprise IT landscape. The common paradigm that captures all of them is Digital. The immense business value of Digital technology no longer in question both from a customer as well as an enterprise standpoint. However, the Digital space calls for strong and visionary leadership both from a business & IT standpoint.

Business Boards and CXOs are now concerned about their organization’s overall level and maturity of digital investments. And the tangible business value in existing business operations– (e.g increasing sales & customer satisfaction, detecting fraud, driving down business & IT costs etc)-but also in helping finetune or create new business models by leveraging Digital paradigms. It is thus an increasingly accurate argument that smart applications & ecosystems built around Digitization will dictate enterprise success.

The onset of Digital Architectures in enterprise businesses implies the ability to drive continuous micro level interactions with global consumers/customers/clients/stockholders or patients depending on the vertical you operate in. Initially enterprises viewed Digital as a bolt-on or a fresh color of paint on an existing IT operation.

How did that change over the last five years?

Mobile applications first begun forcing the need for enterprise to begin supporting multiple channels of interaction with their consumers. We have seen how how exploding data generation across the global economy has become a clear & present business & IT phenomenon. Data volumes are rapidly expanding across industries. However, while the production of data by Mobile Applications that has increased but it is also driving the need for organizations to derive business value from it, using advanced techniques such as Data Science and Machine Learning. As a first step, this calls for the collection & curation of data from dynamic,  and highly distributed sources such as consumer transactions, B2B interactions, machines such as ATM’s & geo location devices, click streams, social media feeds, server & application log files and multimedia content such as videos etc – using Big Data. Often these workloads are run on servers hosted on an agile infrastructure such as a Public or Private Cloud.

As one can understand from the above paragraph, the Digital Age calls for a diverse set of fresh skills – both from IT leadership and the rank & file. The role of the Chief Information Officer (CIO) is thus metamorphosing from being an infrastructure service provider to being the overall organizational thought leader in the Digital Age.

The question is – Can Industry CIOs adapt?

The Classic CIO is a provider of IT Infrastructure services.. 

what_cios_think                                                Illustration: The Concerns of a CIO..

So what do CIOs typically think about nowadays?

  1. Keep the core stable and running so IT delivers minimal services to the business and disarm external competition
  2. Are parts of my business really startups and should they be treated as such and should they be kept away from the shackles of inflexible legacy IT? Do I need a digital strategy?
  3. What does the emergence of the 3rd platform (Cloud, Mobility,Social and Big Data) imply?
  4. Where can I show the value of expertise and IT to the money making lines of business?
  5. How can one do all the above while keeping track of Corporate and IT security?

 CIO’s who do not adapt are on the road to Irrelevance…

Where CIOs are being perceived as managing complex legacy systems, the new role of Chief Digital Officer (CDO) has gained currency. The idea that a parallel & more agile IT organization can be created and run to create an ecosystem of innovation & that the office of the CDO is the right place to drive these innovative applications.

Why is that?

  1. CIOs that cannot or that seem dis-engaged with creating innovation through IT are headed the way of the dodo. At the enterprise officer – CIO/CTO level, it becomes very obvious that more than ever “IT is not just a complementary function or a supplementary service but IT is the Business”. If that was merely something that we all paid lip-service to in the past, it is hard reality now. So it is not a case of which company can make the best widgets or has the fastest trading platforms or efficient electronic health records. It is whose enterprise IT can provide the best possible results within a given cost that will win. Its up to the CIOs to deliver and deliver in such a way that large established organizations can compete with upstarts who do not have the same kind of enterprise constraints & shackles.
  2. Innovation & information now follow an “outside in” model. As opposed to data and value being generated by internal functions (sales,engineering, customer fulfillment, core business processes etc) . Enterprise customers are beginning to now operate in what I like to think of as the new normal: entropy.  It’s these conditions that make it imperative for IT Leadership to reconsider their core business applications at the enterprise level. Does internal IT infrastructure need to look more like those of the internet giants?
  3. As a result of the above trends, CIOs are clearly now business level stakeholders more than ever. This means that they need to engage & understand their business at a deep level from an ecosystem and competitive standpoint. Those that cannot do it are neither very effective nor in those positions for long.
  4. Also,it is not merely enough to be a passive stakeholder, CIOs have to deliver on two very broad fronts. The first is to deliver core services (aka standardized functions) on time and at a reasonable cost. These are things like core banking systems, email, data backups etc. Ensuring smooth operation running transactional systems like ERP/business processing systems in manufacturing, decision support systems, classic IT infrastructure, claims management systems in Insurance and Billing systems in Healthcare. The systems that need to run to keep the business operations.The focus here is to deliver on these on time and within SLAs to increasingly demanding internal customers. Like running the NYC subway – no one praises you for keeping things humming day in and out but all hell breaks loose when the trains are nonoperational for any reason. A thankless task but one essentially needed to win the credibility with lines of business.
  5. The advent of public cloud means that internal IT no longer has a monopoly and a captive internal customer base even with core services. If one cannot compete with the likes of Amazon AWS or any of the SaaS based clouds that are mushrooming on a quarterly basis, you will find that soon enough you have to co-exist with Not-So-Shadow IT. The industry has seen enough back-office CIOs who are not perceived by their organizations as having a largely irrelevant role in the evolution of the larger enterprise.
  6. Despite the continued focus on running a strong core as the price of CIO admission to the internal strategic dances, transformation is starting to emerge as a key business driver and is making its way into the larger industry. It is no longer the province of Wall St trading shops or a Google or a Facebook. Innovation as in “adopt this strategy and reinvent your IT and change the business”. The operative word here is incremental rather than disruptive innovation. More on this key point later.
  7. Most rank and file IT personnel in general cannot really keep up with all the nomenclature of technology. For instance, a majority do not really understand umbrella concepts like Cloud, Mobility and Big Data. They know what these mean at a high level but the complex technology underpinnings, various projects & the finer nuances are largely lost on enterprise customers. There are two stark choices from a time perspective that face overworked IT personnel – a) Do you want to increase your value to your corporation by learning to speak the lingua franca of your business and by investing in those skills away from a traditional IT employee mindset? b) do you want to increase your IT depth in your area of expertise.The first makes one a valued collaborator and paves your way up within the chain, the second may definitely increase your marketability in the industry but it is not that easy to keep up. We find that an increasing number of employees choose the first path which creates interesting openings and arbitrage opportunities for other groups in the organization. The CIO needs to step up and be the internal change agent.


Enterprise wide business innovation will continue to be designed around the four key technologies  (Big Data, Cloud Computing, Technology & Platforms). Business Platforms created leveraging these technologies will create immense operational efficiency, better business models, increased relevance to customers and ultimately drive revenues. Such platforms will separate the visionaries, leaders from the laggards in the years to come. As often noticed, the keyword accompanying transformation is often digital. This means a renewed focus on making IT services appealing to millennial or the self service generation – be they customers or employees or partners. This really touches all areas of enterprise IT while leaving behind a significant impact on organizational culture.

This is the age of IT with no boundaries – the question is whether the role of the CIO will largely remain unscathed in the years to come.

A POV on the FRTB (Fundamental Review of the Trading Book)…

Regulatory Risk Management evolves…

The Basel Committee of supranational supervision was put in place to ensure the stability of the financial system. The Basel Accords are the frameworks that essentially govern the risk taking actions of a bank. To that end, minimum regulatory capital standards are introduced that banks must adhere to. The Bank of International Settlements (BIS) established  1930, is the world’s oldest international financial consortium. with 60+ member central banks, representing countries from around the world that together make up about 95% of world GDP. BIS stewards and maintains the Basel standards in conjunction with member banks.

The goal of Basel Committee and the Financial Stability Board (FSB) guidelines are to strengthen the regulation, supervision and risk management of the banking sector by improving risk management and governance. These have taken on an increased focus to ensure that a repeat of financial crisis 2008 comes to pass again. Basel III (building upon Basel I and Basel II) also sets new criteria for financial transparency and disclosure by banking institutions.

Basel III – the last prominent version of the Basel standards published in 2012 (named for the town of Basel in Switzerland where the committee meets) prescribes enhanced measures for capital & liquidity adequacy and were developed by the Basel Committee on Banking Supervision with voluntary worldwide applicability.  Basel III covers credit, market, and operational risks as well as liquidity risks. As this is known, BCBS 239 –  guidelines do not just apply to the G-SIBs (the Globally Systemically Important Banks) but also to the D-SIBs (Domestic Systemically Important Banks).Any important financial institution deemed ‘too big to fail” needs to work with the regulators to develop a “set of supervisory expectations” that would guide risk data aggregation and reporting.

Basel III & other Risk Management topics were covered in these previous posts – http://www.vamsitalkstech.com/?p=191 && http://www.vamsitalkstech.com/?p=667

Enter the FTRB (Fundamental Review of the Trading Book)…

In May 2012, the Basel Committee on Banking Supervision (BCBS) again issued a consultative document with an intention of revising the way capital was calculated for the trading book. These guidelines which can be found here in their final form [1] were repeatedly refined based on comments from various stakeholders & quantitative studies. In Jan 2016, a final version of this paper was released. These guidelines are now termed  the Fundamental Review of the Trading Book (FRTB) or unofficially as some industry watchers have termed – Basel IV. 

What is new with the FTRB …

The main changes the BCBS has made with the FRTB are – 

  1. Changed Measure of Market Risk – The FRTB proposes a fundamental change to the measure of market risk. Market Risk will now be calculated and reported via Expected Shortfall (ES) as the new standard measure as opposed to the venerated (& long standing) Value At Risk (VaR).  As opposed to the older method of VaR with a 99% confidence level, expected shortfall (ES) with a 97.5% confidence level is proposed. It is to be noted that for normal distributions, the two metrics should be the same however the ES is much superior at measuring the long tail. This is a recognition that in times of extreme economic stress, there is a tendency for multiple asset classes to move in unison. Consequently, under the ES method capital requirements are anticipated to be much higher.
  2. Model Creation & Approval – The FRTB also changes how models are approved & governed.  Banks that want to use the IMA (Internal Model Approach) need to pass  a set of rigorous tests so that they are not forced to used the Standard Rules approach (SA) for capital calculations. The fear is that the SA will increase capital requirements. The old IMA approach has now been revised and made more rigorous in a way that it enables supervisors to remove internal modeling permission for individual trading desks. This approach now enforces more consistent identification of material risk factors across banks, and constraints on hedging and diversification. All of this is now going to be done at a desk level instead of the entity level. FRTB moves the responsibility of showing compliant models, significant backtesting & PnL attribution to the desk level.
  3. Boundaries between the Regulatory Books – The FRTB also assigns explicit boundaries between the trading book (the instruments the bank intends to trade) and the bank book (the instruments held to maturity). These rules have been redefined in such a way that banks now have to contend with stringent rules for internal transfers between both. The regulatory motivation is to eliminate a given bank’s ability to designate individual positions as belonging to either book. Given the different accounting treatment for both, there is a feeling that bank’s were resorting to capital arbitrage with the goal of minimizing regulatory capital reserves. The FRTB also introduces more stringent reporting and data governance requirements for both which in conjunction with the well defined boundary between books. All of these changes should lead to a much better regulatory framework & also a revaluation of the structure of trading desks. 
  4. Increased Data Sufficiency and Quality – The FRTB regulation also introduces Non-Modellable risk factors (NMRF). Risk factors are non modellabe if certain aspects that pertain to the availability and sufficiency of the data are an issue . Thus with the NMRF, Banks now need increased data sufficiency and quality requirements that go into the model itself. This is a key point, the ramifications of which we will discuss in the next section.
  5. The FRTB also upgrades its standardized approach to data structuring – with a new standardized approach (SBA) which is more sensitive to various risk factors across different asset classes as compared to the Basel II SA. Regulators now determine the sensitivities in the data. Approvals will also be granted at the desk level rather than at the entity level.  The revised SA should provide a consistent way to measure risk across geographies and regions, giving regulatory a better way to compare and aggregate systemic risk. The sensitivities based approach should also allow banks to share a common infrastructure between the IMA approach and the SA approach. Thera are a set of buckets and risk factors that are prescribed by the regulator which instruments can then be mapped to.
  6. Models must be seeded with real and live transaction data – Fresh & current transactions will now need to be entered into the calculation of capital requirements as of the date on which they were conducted. Not only that, though reporting will take place at regular intervals, banks are now expected to manage market risks on a continuous basis -almost daily.
  7. Time Horizons for Calculation – There are also enhanced requirements for data granularity depending on the kind of asset. The FRTB does away with the generic 10 day time horizon for market variables in Basel II to time periods based on liquidity of these assets. It propose five different time horizons – 10 day, 20 day, 60 day, 120 day and 250 days.


                                 Illustration: FRTB designated horizons for market variables (src – [1])

To Sum Up the FRTB… 

The FRTB rules are now clear and they will have a profound effect on how market risk exposures are calculated. The FRTB clearly calls out the specific instruments in the trading book vs the banking book. With the new switch over to Expected Shortfall (ES) @ 97.5% over VaR @ 99% confidence levels – it should cause increased reserve requirements. Furthermore, the ES calculations will be done keeping liquidity considerations of the underlying instruments with a historical simulation approach ranging from 10 days to 250 days of stressed market conditions. Banks that use a pure IMA approach will now have to move to IMA plus the SA method.

The FRTB compels Banks to create unified teams from various departments – especially Risk, Finance, the Front Office (where trading desks sit) and Technology to address all of the above significant challenges of the regulation.

From a technology capabilities standpoint, the FRTB now presents banks with both a data volume, velocity and analysis challenge. Let us now examine the technology ramifications.

Technology Ramifications around the FRTB… 

The FRTB rules herald a clear shift in how IT architectures work across the Risk area and the Back office in general.

  1. The FRTB calls for a single source of data that pulls data across silos of the front office, trade data repositories, a range of BORT (Book of Record Transaction Systems) etc. With the FRTB, source data needs to be centralized and available in one location where every feeding application can trust it’s quality.
  2. With both the IMA and the SBA in the FRTB, many more detailed & granular data inputs (across desks & departments) need to be fed into the ES (Expected Shortfall) calculations from varying asset classes (Equity, Fixed Income, Forex, Commodities etc) across multiple scenarios. The calculator frameworks developed or enhanced for FRTB will need ready & easy access to realtime data feeds in addition to historical data. At the firm level, the data requirements and the calculation complexity will be even more higher as it needs to include the entire position book.

  3. The various time horizons called out also increase the need to run a full spectrum of analytics across many buckets. The analytics themselves will be more complex than before with multiple teams working on all of these areas. This calls out for standardization of the calculations themselves across the firm.

  4. Banks will have to also provide complete audit trails both for the data and the processes that worked on the data to provide these risk exposures. Data lineage, audit and tagging will be critical.

  5. The number of runs required for regulatory risk exposure calculations will dramatically go up under the new regime. The FRTB requires that each risk class be calculated separately from the whole set. Couple this with increased windows of calculations as discussed  in #3 above- means that more compute processing power and vectorization.

  6. FRTB also implies that from an analytics standpoint, a large number of scenarios on a large volume of data. Most Banks will need to standardize their libraries across the house. If Banks do not look to move to a Big Data Architecture, they will incur tens of millions of dollars in hardware spend.

The FRTB is the most pressing in a long list of Data Challenges facing Banks… 

The FRTB is yet another regulatory mandate that lays bare the data challenges facing every Bank. Current Regulatory Risk Architectures are based on traditional relational databases (RDBMS) architectures with 10’s of feeds from Core Banking Systems, Loan Data, Book Of Record Transaction Systems (BORTS) like Trade & Position Data (e.g. Equities, Fixed Income, Forex, Commodities, Options etc),  Wire Data, Payment Data, Transaction Data etc. 

These data feeds are then tactically placed in memory caches or in enterprise data warehouses (EDW). Once the data has been extracted, it is transformed using a series of batch jobs which then prepare the data for Calculator Frameworks to which run the risk models on them. 

All of the above applications need access to medium to large amounts of data at the individual transaction Level. The Corporate Finance function within the Bank then makes end of day adjustments to reconcile all of this data up and these adjustments need to be cascaded back to the source systems down to the individual transaction or classes of transaction levels. 

These applications are then typically deployed on clusters of bare metal servers that are not particularly suited to portability, automated provisioning, patching & management. In short, nothing that can automatically be moved over at a moment’s notice. These applications also work on legacy proprietary technology platforms that do not lend themselves to flexible & a DevOps style of development.

Finally, there is always need for statistical frameworks to make adjustments to customer transactions that somehow need to get reflected back in the source systems. All of these frameworks need to have access to and an ability to work with terabtyes (TBs) of data.

Each of above mentioned risk work streams has corresponding data sets, schemas & event flows that they need to work with, with different temporal needs for reporting as some need to be run a few times in a day (e.g. Traded Credit Risk), some daily (e.g. Market Risk) and some end of the week (e.g Enterprise Credit Risk). 

One of the chief areas of concern is that the FRTB may require a complete rewrite of analytics libraries. Under the FRTB, front office libraries will need to do Enterprise Risk –  a large number of analytics on a vast amount of data. Front office models cannot make all the assumptions that enterprise risk can to price a portfolio accurately. Front office systems run a limited number of scenarios thus trading off timeliness for accuracy – as opposed to enterprise risk.

Most banks have stringent vetting processes in place and all the rewritten analytic assets will need to be passed through that. Every aspect of the math of the analytics needs to be passed through this vigorous process. All of this will add to compliance costs as vetting process costs typically cost multiples of the rewrite process. The FRTB has put in place stringent model validation standards along with hypothetical portfolios to benchmark these.

The FRTB also requires data lineage and audit capabilities for the data. Banks will need to establish visual representation of the overall process as data flows from the BORT systems to the reporting application. All data assets have to be catalogued and a thorough metadata management process instituted.

What Must Bank IT Do… 

Given all of the above data complexity and the need to adopt agile analytical methods  – what is the first step that enterprises must adopt?

There is a need for Banks to build a unified data architecture – one which can serve as a cross organizational repository of all desk level, department level and firm level data.

The Data Lake is an overarching data architecture pattern. Lets define the term first. A data lake is two things – a small or massive data storage repository and a data processing engine. A data lake provides “massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs“. Data Lake are created to ingest, transform, process, analyze & finally archive large amounts of any kind of data – structured, semistructured and unstructured data.

The Data Lake is not just a data storage layer but one that can allow different users (traders, risk managers, compliance etc) plug in calculators that work on data that spans intra day activity as well as data across years. Calculators can then be designed that can work on data with multiple runs to calculate Risk Weighted Assets (RWAs) across multiple calibration windows.

The below illustration is a depiction of goal is to create a cross company data-lake containing all asset data and compute applied to the data.


                              Illustration – Data Lake Architecture for FRTB Calculations

1) Data Ingestion: This encompasses creation of the L1 loaders to take in Trade, Position, Market, Loan, Securities Master, Netting  and Wire Transfer data etc across trading desks. Developing the ingestion portion will be the first step to realizing the overall architecture as timely data ingestion is a large part of the problem at most institutions. Part of this process includes understanding examples of a) data ingestion from the highest priority of systems b) apply the correct governance rules to the data. The goal is to create these loaders for versions of different systems (e.g Calypso 9.x) and to maintain it as part of the platform moving forward. The first step is to understand the range of Book of Record transaction systems (lending, payments and transactions) and the feeds they send out. The goal would be to create the mapping to a release of an enterprise grade Open Source Big Data Platform e.g HDP (Hortonworks Data Platform) to the loaders so these can be maintained going forward.

2) Data Governance: These are the L2 loaders that apply the rules to the critical fields for Risk and Compliance. The goal here is to look for gaps in the data and any obvious quality problems involving range or table driven data. The purpose is to facilitate data governance reporting.

3) Entity Identification: This step is the establishment and adoption of a lightweight entity ID service. The service will consist of entity assignment and batch reconciliation.

4) Developing L3 loaders: This phase will involve defining the transformation rules that are required in each risk, finance and compliance area to prep the data for their specific processing.

5) Analytic Definition: Running the analytics that are to be used for FRTB.

6) Report Definition: Defining the reports that are to be issued for each risk and compliance area.


[1] https://www.bis.org/bcbs/publ/d352.pdf

A Reference Architecture for The Open Banking Standard..

This is the second in a series of four posts on the Open Banking Standard (OBS) in the UK. This second post will briefly look at the strategic drivers for banks while proposing an architectural style or approach for incumbents to drive change in their platforms to achieve OBS Compliance. We will examine the overall data layer implications in the next post. The final post will look at key strategic levers and possible business models that the standard could help banks to drive innovation towards. 


The Open Banking Standard will steward the development of layers of guidelines (API interoperability standards, data security & privacy and governance) which primarily deal with data sharing in banking. The belief is that this regulation will ultimately spur open competition and unlock innovation. For years, the industry has grappled with fundamental platform issues that are native to every domain of banking. Some of these include systems are siloe-d by function, platforms that are inflexible in responding to rapidly changing market conditions & consumer tastes. Bank IT is perceived by the business to be glacially slow in responding to their needs.

The Open Banking Standard (OBS) represents a vast opportunity for banking organizations in multiple ways. First off, Bank IT has the luxury of using the regulatory mandate to slowly re-architect hitherto inflexible and siloe-d business systems. Secondly, doing so will enable Banks to significantly monetize their vast data resources in several key business areas.  

This will need to change with the introduction of Open Banking Standard. Banks that do not change will not be able to derive and sustain a competitive advantage. PSD2 Compliance (Payment Systems Directive – 2) – which will be mandated by the EU is one of the first layers in the OBS. Further layers will include API standards definitions for business processes (e.g View Account, Transfer Funds, Chargebacks, Dispute Handling etc). 

The OBWG (Open Banking Working Group) standards include the following key constituencies & their requirements [1] – 

 1. Customers: defined as account holders & businesses who agree to sharing their data & any publishers who share open datasets 

2. Data attribute providers: defined as banks & other financial services providers whose customers produce data as part of daily banking activities 

3. Third parties: Interested developers, financial services startups aka FinTechs, and any organisations (e.g  Retail Merchants) who can leverage the data to provide new views & products  

It naturally follows from the above, that the key technical requirements of the framework will include:

1. A set of Data elements, API definitions and Security Standards to provide both data security and a set of access restrictions 

2. A Governance model, a body which will develop & oversee the standards 

3. Developer resources, which will enable third parties to discover, educate and experiment.

The Four Strategic Drivers in the Open Bank Standard …

Clearly the more intelligently a firm harness technology (in pursuit of OBS compliance goals) will determine it’s overall competitive advantage.  This important to note since a range of players across the value chain (the above Third Parties as designated by the standard) can now obtain seamless access to a variety of data. Once obtained the data can help the Third Parties reimagine it in manifold ways. For example they can help consumers make better personal financial decisions for their clients at the expense of the Banks owning the data. For instance, FinTechs have generally been able to make more productive use of client data. They do this by providing clients with intuitive access to cross asset data, tailoring algorithms based on behavioral characteristics  and by providing clients with a more engaging and unified experience.

So, the four strategic business goals that OBS compliant architectures need to solve in the long run – 

  1. Digitize The Customer Journey –  Bank clients who use services like Uber, Zillow, Amazon etc in their daily lives are now very vocal in demanding a seamless experience across all of their banking ervices using digital channels.  The vast majority of Bank applications still lag the innovation cycle, are archaic & are separately managed. The net issue with this is that the client is faced with distinct user experiences ranging from client onboarding to servicing to transaction management. Such applications need to provide anticipatory or predictive capabilities at scale while understand the specific customers lifestyles, financial needs & behavioral preferences. 
  2. Provide Improved Access to Personal Financial Management & Improved Lending Processes  –  Provide consumers with a single aggregated picture of all their accounts. Also improve lending systems by providing more efficient access to loans by incorporating a large amount of contextual data in the process.
  3. Automate Back & Mid Office Processes Across Lending, Risk, Compliance & Fraud – The needs to forge a closer banker/client experience is not just driving demand around data silos & streams themselves but also forcing players to move away from paper based models to more of a seamless, digital & highly automated model to rework a ton of existing back & front office processes. These processes range from risk data aggregation, supranational compliance (AML,KYC, CRS & FATCA), financial reporting across a range of global regions & Cyber Security. Can the Data architectures & the IT systems  that leverage them be created in such a way that they permit agility while constantly learning & optimizing their behaviors across national regulations, InfoSec & compliance requirements? Can every piece of actionable data be aggregated,secured, transformed and reported on in such a way that it’s quality across the entire lifecycle is guaranteed? 
  4. Tune Existing Business Models Based on Client Tastes and Feedback – While the initial build out of the core architecture may seem to focus on digitizing interactions and exposing data via APIs. What follows fast is strong predictive modeling capabilities working at large scale where systems need to constantly learn and optimize their interactions, responsiveness & services based on client needs & preferences. 

The Key System Architecture Tenets…

The design and architecture of a solution as large and complex as a reference architecture for Open Banking is a multidimensional challenge and it will vary at every institution based on their existing investments, vendor products & overall culture. 

The OBS calls out the following areas of data as being in scope – Customer transaction data, customer reference data, aggregated data and sensitive commercial data. A thorough review of the OBWSG standard leads one to suggest a logical reference architecture as called out below.

Based on all the above, the Open Bank Architecture shall – 

  • Support an API based model to invoke any business process or data elements based on appropriate security  by a third party . E.g client or an advisor or a business partner
  • Support the development and deployment of an application that encourages a DevOps based approach
  • Support the easy creation of scalable business processes (e.g. client on boarding, KYC, Payment dispute check etc) that natively emit business metrics from the time they’re instantiated to throughout their lifecycle
  • Support automated application delivery, configuration management & deployment
  • Support a high degree of data agility and data intelligence. The end goal being that that every customer click, discussion & preference shall drive an analytics infused interaction between the Bank and the client
  • Support algorithmic capabilities that enable the creation of new services like automated (or Robo) advisors
  • Support a very high degree of scale across many numbers of users, interactions & omni-channel transactions while working across global infrastructure
  • Shall support deployment across cost efficient platforms like a public or private cloud. In short, the design of the application shall not constrain the available deployment options – which may vary because of cost considerations. The infrastructure options supported shall range from virtual machines to docker based containers – whether running on a public cloud, private cloud or in a hybrid cloud
  • Support small, incremental changes to business services & data elements based on changing business requirements 
  • Support standardization across application stacks, toolsets for development & data technology to a high degree
  • Shall support the creation of a user interface that is highly visual and feature rich from a content standpoint when accessed across any device


Reference Architecture…

Now that we have covered the business bases, what foundational technology choices comprise the satisfaction of the above? Lets examine that first at a higher level and then in more detail.

Given the above list of requirements – the application architecture that is a “best fit” is shown below.

Open Banking Architecture Diagram

                   Illustration – Candidate Reference Architecture for the Open Bank Standard

Lets examine each of the tiers starting from the lowest –

Infrastructure Layer…

Cloud Computing across it’s three main delivery models (IaaS, PaaS & SaaS) is largely a mainstream endeavor in financial services and no longer an esoteric adventure only for brave innovators. A range of institutions are either deploying or testing cloud-based solutions that span the full range of cloud delivery models. These capabilities include –

IaaS (infrastructure-as-a-service) to provision compute, network & storage, PaaS (platform-as-a-service) to develop applications & exposing their business services as  SaaS (software-as-a-service) via APIs.

Choosing Cloud based infrastructure – whether that is secure public cloud  (Amazon AWS or Microsoft Azure) or an internal private cloud (OpenStack etc) or even a hybrid approach is a safe and sound bet for these applications. Business innovation and transformation are best enabled by a cloud based infrastructure – whether public or private.


Data Layer…

While banking data tiers are usually composed of different technologies like RDBMS, EDW (Enterprise Data Warehouses), CMS (Content Management Systems) & Big Data etc. My recommendation for the OBSWG target state is largely dominated by a Big Data Platform powered by Hadoop technology. The vast majority of initial applications recommended by the OBSWG call out for predictive analytics to create tailored Customer Journeys. Big Data is a natural fit as it is fast emerging as the platform of choice for analytic applications.

Financial services firms specifically deal with manifold data types ranging from Customer Account data, Transaction Data, Wire Data, Trade Data, Customer Relationship Management (CRM), General Ledger and other systems supporting core banking functions. When one factors in social media feeds, mobile clients & other non traditional data types, the challenge is not just one of data volumes but also variety and the need to draw conclusions from fast moving data streams by commingling them with years of historical data.

The reasons for choosing Big Data as the dominant technology in the data tier are the below – 

  1. Hadoop’s ability to ingest and work with all the above kinds of data & more (using the schema on read method) has been proven at massive scale. Operational data stores are being built on Hadoop at a fraction of the cost & effort involved with older types of data technology (RDBMS & EDW)
  2. The ability to perform multiple types of processing on a given data set. This processing varies across batch, streaming, in memory and realtime which greatly opens up the ability to create, test & deploy closed loop analytics quicker than ever before
  3. The DAS (Direct Attached Storage) model that Hadoop provides fits neatly in with the horizontal scale out model that the services, UX and business process tier leverage. This keeps cuts Capital Expenditure  to a bare minimum.
  4. The ability to retain data for long periods of time thus providing WM applications with predictive models that can reason on historical data
  5. Hadoop provides the ability to run a massive volumes of models in a very short amount of time helps with modeling automation
  6. Due to it’s parallel processing nature, Hadoop can run calculations (pricing, risk, portfolio, reporting etc) in minutes versus the hours it took using older technology
  7. Hadoop has to work with existing data investments and to augment them with data ingestion & transformation leaving EDW’s to perform complex analytics that they excel at – a huge bonus.

Services Layer…

The overall goals of the OBSWG services tier are to help design, develop,modify and deploy business components in such a way that overall WM application delivery follows a continuous delivery/deployment (CI/CD) paradigm.Given that WM Platforms are some of the most complex financial applications out there, this also has the ancillary benefit of leaving different teams – digital channels, client on boarding, bill pay, transaction management & mid/back office teams to develop and update their components largely independent of other teams. Thus a large monolithic WM enterprise platform is decomposed into its constituent services which are loosely coupled and each is focused on one independent & autonomous business task only. The word ‘task’ here referring to a business capability that has tangible business value.

A highly scalable, open source & industry leading platform as a service (PaaS) is recommended as the way of building out and hosting banking business applications at this layer.  Microservices have moved from the webscale world to fast becoming the standard for building mission critical applications in many industries. Leveraging a PaaS such as OpenShift provides a way to help cut the “technical debt” that has plagued both developers and IT Ops. OpenShift provides the right level of abstraction to encapsulate microservices via it’s native support for Docker Containers. This also has the concomitant advantage of standardizing application stacks, streamlining deployment pipelines thus leading the charge to a DevOps style of building applications. 

Further I recommend that service designer adopt such an approach that the applications are microservices native. This implies a deployment approach similar to a SaaS model where capabilities can be exposed via  APIs.

Now, the services tier has the following global responsibilities – 

  1. Promote a Microservices/SOA style of application development
  2. Support component endpoint invocation via standards based REST APIs
  3. Promote a Cloud, OS & ,development language agnostic style of application development
  4. Promote Horizontal scaling and resilience

Predictive Analytics & Business Process Layer…

Though segments of the banking industry have historically been early adopters of analytics, areas being targeted by the OBSWG – Retail lines of business &Payments have generally been laggards. However, the large datasets that are prevalent in Open Bank Standard world as well as the need to drive customer interaction & journeys, risk & compliance reporting, detecting fraud etc calls for a strategic relook at this space. 

Techniques like Machine Learning, Data Science & AI feed into core business processes thus improving them. For instance, Machine Learning techniques support the creation of self improving algorithms which get better with data thus making accurate business predictions. Thus, the overarching goal of the analytics tier should be to support a higher degree of automation by working with the business process and the services tier. Predictive Analytics can be leveraged across the value chain of the Open Bank Standard – ranging from new customer acquisition to customer journey to the back office. More recently these techniques have found increased rates of adoption with enterprise concerns from cyber security to telemetry data processing.

Another area is improved automation via light weight business process management (BPM). Though most large banks do have pockets of BPM implementations that are adding or beginning to add significant business value, an enterprise-wide re-look at the core revenue-producing activities is called for, as is a deeper examination of driving competitive advantage. BPM now has evolved into more than just pure process management. Meanwhile, other disciplines have been added to BPM — which has now become an umbrella term. These include business rules management, event processing, and business resource planning.

Financial Services firms general are fertile ground for business process automation, since most managers across their various lines of business are simply a collection of core and differentiated processes. Examples are private banking (with processes including onboarding customers, collecting deposits, conducting business via multiple channels, and compliance with regulatory mandates such as KYC and AML); investment banking (including straight-through-processing, trading platforms, prime brokerage, and compliance with regulation); payment services; and portfolio management (including modeling model portfolio positions and providing complete transparency across the end-to-end life cycle). The key takeaway is that driving automation can result not just in better business visibility and accountability on behalf of various actors. It can also drive revenue and contribute significantly to the bottom line.

A business process system should allow an IT analyst or customer or advisor to convey their business process by describing the steps that need to be executed in order to achieve the goal (and explain the order of those steps, typically using a flow chart). This greatly improves the visibility of business logic, resulting in higher-level and domain-specific representations (tailored to finance) that can be understood by business users and should be easier to monitor by management. Again , leveraging a PaaS such as OpenShift in conjunction with an industry leading open source BPMS (Business Process Management System) such as JBOSS BPMS provides an integrated BPM capability that can create cloud ready and horizontally scalable business processes.

API & UX Layer…

The API & UX (User Experience) tier fronts humans – clients. business partners, regulators, internal management and other business users across omnichannel touchpoints. A standards based API tier is provided for partner applications and other non-human actors to interact with business service tier. Once the OBSWG defines the exact protocols, data standards & formats – this should be straightforward to implement.

The API/UX tier has the following global responsibilities  – 

  1. Provide a seamless experience across all channels (mobile, eBanking, tablet etc) in a way that is a continuous and non-siloed. The implication is that clients should be able to begin a business transaction in channel A and be able to continue them in channel B where it makes business sense.
  2. Understand client personas and integrate with the business & predictive analytic tier in such a way that the API is loosely yet logically integrated with the overall information architecture
  3. Provide advanced visualization (wireframes, process control, social media collaboration) and cross partner authentication & single sign on
  4. Both the API & UX shall also be designed is such a manner that their design, development & ongoing enhancement lends themselves to an Agile & DevOps methodology.

It can all come together…

In most existing Banking systems, siloed functions have led to brittle data architectures operating on custom built legacy applications. This problem is firstly compounded by inflexible core banking systems and secondly exacerbated by a gross lack of standardization in application stacks underlying capabilities like customer journey, improved lending & fraud detection. These factors inhibit deployment flexibility across a range of platforms thus leading to extremely high IT costs and technical debut. The consequence is that these inhibit client facing applications from using data in a manner that constantly & positively impacts the client experience. There is clearly a need to provide an integrated digital experience across a global customer base. And then to offer more intelligent functions based on existing data assets. Current players do possess a huge first mover advantage as they offer highly established financial products across their large (and largely loyal & sticky) customer bases, a wide networks of physical locations, rich troves of data that pertain to customer accounts & demographic information. However, it is not enough to just possess the data. They must be able to drive change through legacy thinking and infrastructures as things change around the entire industry as it struggles to adapt to a major new segment – the millenials – who increasingly use mobile devices and demand more contextual services as well as a seamless and highly analytic driven & unified banking experience – akin to what they commonly experience via the internet – at web properties like Facebook, Amazon, Google or Yahoo etc


Platforms designed technology platforms designed around the four key business needs   will create immense operational efficiency, better business models, increased relevance and ultimately drive revenues. These will separate the visionaries, leaders from the laggards in the years to come. The Open Bank Standard will be a catalyst in this immense disruption. 


[1] The Open Banking Standard –

The Five Deadly Sins of Financial Services IT..


This blog has time & again discussed how Global, Domestic and Regional banks need to be innovative with their IT platform to constantly evolve their product offerings & services. This is imperative due to various business realities –  the increased competition by way of the FinTechs, web scale players delivering exciting services & sharply increasing regulatory compliance pressures. However, systems and software architecture has been a huge issue at nearly every large bank across the globe.

Regulation is also afoot in parts of the globe which will give non traditional banks access to hitherto locked customer data. E.g PSD-2 in the European Union. Further, banking licenses have been more easily granted to non-banks which are primarily technology pioneers. e.g. Such as a Paypal, Square etc

In 2016, Banks are waking up to the fact that IT Architecture is a critical strategic differentiator. Players that have agile & efficient architecture platforms and practices can not only add new service offerings but also are able to experiment across a range of analytic led offerings that create & support multi-channel products. These digital services and usecases can now be found abundantly areas ranging from Retail Banking, Capital Markets. FinTechs have innovated in areas such as Payments & Wealth Management.

So, How did we get here…

The Financial Services IT landscape – no matter the segment – one picks across the spectrum – Capital Markets, Retail & Consumer Banking, Payment Networks & Cards, Asset Management etc are all largely predicated on a few legacy technology anti-patterns. These anti-patterns have evolved over the years from a systems architecture, data architecture & middleware standpoint.

These have resulted in a mishmash of organically developed & shrink wrapped systems that do everything from running critical Core Banking Applications to Trade Lifecycle to Securities Settlement to Financial Reporting etc.  Each of these systems operates in an application, workflow, data silo with it’s own view of the enterprise. These are all kept in sync largely via data replication & stove piped process integration.

If this sounds too abstract, let us take an example &  a rather topical one at that. One of the most critical back office functions every financial services organization needs to perform is Risk Data Aggregation & Regulatory Reporting (RDARR). This spans areas from Credit Risk, Market Risk, Operational Risk , Basel III, Solvency II etc..the list goes on.

The basic idea in any risk calculation is to gather a whole range of quality data in one place and to run computations to generate risk measures for reporting.

So, how are various risk measures calculated currently? 

Current Risk Architectures are based on traditional relational databases (RDBMS) architectures with 10’s of feeds from Core Banking Systems, Loan Data, Book Of Record Transaction Systems (BORTS) like Trade & Position Data (e.g. Equities, Fixed Income, Forex, Commodities, Options etc),  Wire Data, Payment Data, Transaction Data etc. 

These data feeds are then tactically placed in memory caches or in enterprise data warehouses (EDW). Once relevant data has been extracted, it is transformed using a series of batch jobs. These jobs which then prepare the data for Calculator Frameworks to which run their risk models  across hundreds of scenarios. 

All of the above need access to large amounts of data at the individual transaction Level. The Corporate Finance function within the Bank then makes end of day adjustments to reconcile all of this data up and these adjustments need to be cascaded back to the source systems down to the individual transaction or classes of transaction levels. 

These applications are then typically deployed on clusters of bare metal servers that are not particularly suited to portability, automated provisioning, patching & management. In short, nothing that can automatically be moved over at a moment’s notice. These applications also work on legacy proprietary technology platforms that do not lend themselves to flexible & a DevOps style of development.

Finally, there is always need for statistical frameworks to make adjustments to customer transactions that somehow need to get reflected back in the source systems. All of these frameworks need to have access to and an ability to work with terabtyes (TBs) of data.

Each of above mentioned risk work streams has corresponding data sets, schemas & event flows that they need to work across. They also have different temporal needs for reporting. Some need to be run a few times in a day (e.g. Traded Credit Risk), some daily (e.g. Market Risk) and some end of the week (e.g Enterprise Credit Risk)


                          Illustration – The Five Deadly Sins of Financial IT Architectures

Let us examine why this is in the context of these anti-patterns as proposed below –


The key challenges with current architectures –

  1. Utter, total and complete lack of centralized Data leading to repeated data duplication  – In the typical Risk Data Aggregation application – a massive degree of Data is duplicated from system to system leading to multiple inconsistencies at the summary as well as transaction levels. Because different groups perform different risk reporting functions (e.g Credit and Market Risk) – the feeds, the ingestion, the calculators end up being duplicated as well. A huge mess, any way one looks at it. 
  2. Analytic applications which are not designed for throughput – Traditional Risk algorithms cannot scale with this explosion of data as well as the heterogeneity inherent in reporting across multiple kinds of risks. E.g Certain kinds of Credit Risk need access to around 200 days of historical data where one is looking at the probability of the counter-party defaulting & to obtain a statistical measure of the same. These models are highly computationally intensive and can run for days if the data architecture cannot scale in providing efficient compute on massive volumes of data. 
  3. Lack of Application Blueprint, Analytic Model & Data Standardization – There is nothing that is either SOA or microservices-like in most RDA applications and that precludes best practice development & deployment. All of this only leads to maintenance headaches. The reason that Cloud Computing based frameworks such a a PaaS (Platform as a Service) are highly elegant are that they enforce standardization of systems software components across the stack. Areas like Risk Model and Analytic development needs to be standardized to reflect realities post BCBS 239 (and the upcoming FRTB). With the Volcker Rule reporting that bans prop trading activity on part of the Banks, they must now report on seven key metrics across 10s of different data feeds across PB’s of data. Most existing Risk applications cannot do that without undertaking a large development and change management headache.
  4. Lack of Scalability –  It must be possible to operate it as a central system that can scale to carry the full load of the organization and operate with hundreds of applications built by disparate teams all plugged into the same central nervous system.One other factor to consider is the role of cloud computing in customer retention efforts. The analytical computational power required to understand insights from gigantic data sets is costly to maintain on an individual basis. The traditional owned data center will probably not disappear, but banks need to be able to leverage the power of the cloud to perform big data analysis in a cost-effective manner.
  5. A Lack of Deployment Flexibility –  The application & data requirements dictate the deployment platforms. This massive anti pattern leads to silos and legacy OS’s that can not easily be moved to Containers like Docker & instantiated by a modular Cloud OS like OpenStack.


Doing IT Architecture right and in a responsive manner to the business results in critical value drivers that that are met & exceeded this transformation are – 

  1. Effective Compliance with increased Regulatory Risk mandates ranging from Basel – III, FTRB, Liquidity Risk – which demand flexibility of all the different traditional IT tiers.
  2. An ability to detect and deter fraud – Anti Money Laundering (AML) and Retail/Payment Card Fraud etc
  3. Fendoff competition from the FinTechs
  4. Exist & evolve in a multichannel world dominated by the millennial generation
  5. Reduced costs to satisfy pressure on the Cost to Income Ratio (CIR)
  6. The ability to open up data & services that operate on the customer data to other institutions

 A uniform architecture that works across of all these various types would seem a commonsense requirement. However, this is a major problem for most banks. Forward looking approaches that draw heavily from microservices based application development, Big Data enabled data & processing layers, the adoption of Message Oriented Middleware (MOM) & a cloud native approach to developing applications (PaaS) & deployment (IaaS) are the solution to the vexing problem of inflexible IT.

The question is if banks can change before they see a perceptible drop in revenues over the years?  

How Robo-Advisors work..(2/3)

Millennials want “finance at their fingertips”..they want to be able to email and text the financial advisors and talk to them on a real-time basis,” – Greg Fleming, Ex-Morgan Stanley
The first post in this series on Robo-advisors touched on the fact that Wealth Management has been an area largely untouched by automation as far as the front office is concerned. Automated investment vehicles have largely begun changing that trend and they helping create a variety of business models in the industry. This three part series explored the automated “Robo-advisor” movement in the first post.This second post will focus on the overall business model & main functions of a Robo-advisor.
FinTechs led by Wealthfront and Betterment have pioneered the somewhat revolutionary concept of Robo-advisors. To define the term – a Robo-advisor is an algorithm based automated investment advisor that can provide a range of Wealth Management services tailored to a variety of life situations.
Robo-advisors offer completely automated financial planning services. We have seen how the engine of the Wealth Management business is new customer acquisition. The industry is focused on acquiring the millennial or post millennial HNWI (High Net Worth Investor) generation. The technology friendliness of this group ensures that are the primary target market for automated investment advice. Not just the millenials, anyone who is comfortable with using technology and wants lower cost services can benefit from automated investment planning. However,  leaders in the space such as – Wealthfront & Betterment – have disclosed that their average investor age is around 35 years. [1]
Robo-advisors provide algorithm-based portfolio management methods around investment creation, provide automatic portfolio rebalancing and value added services like tax-loss harvesting as we will see. The chief investment vehicle of choice seems to be low-cost, passive exchange-traded funds (ETFs).

What are the main WM business models

Currently there are a few different business models that are being adopted by WM firms.

  1. Full service online Robo-advisor that is a 100% automated without any human element
  2. Hybrid Robo-advisor model being pioneered by firms like Vanguard & Charles Schwab
  3. Pure online advisor that is primarily human in nature

What do Robo-advisors typically do?

The Robo-advisor can be optionally augmented & supervised by a human adviser. At the moment, owing to the popularity of Robo-advisors among the younger high networth investors (HNWI), a range of established players like Vanguard, Charles Schwab as well as a number of FinTech start-ups have developed these automated online investment tools or have acquired FinTech’s in this space.e.g Blackrock. The Robo-advisor is typically offered as  a free service (below a certain minumum) and typically invests in low cost ETFs.  built using digital techniques – such as data science & Big Data.


                                  Illustration: Essential functions of a Robo-advisor

The major business areas & client offerings in the Wealth & Asset Management space have been covered in the first post in this series at http://www.vamsitalkstech.com/?p=2329

Automated advisors only cover a subset of all of the above at the moment. The major usecases are as below –

  1. Determine individual Client profiles & preferences—e.g. For a given client profile- determine financial goals, expectations of investment return, diversification etc
  2. Identify appropriate financial products that can be offered either as pre-packaged portfolios or custom investments based on the client profile identified in the first step
  3. Establish correct Investment Mix for the client’s profile – these can included but are not ,limited to equities, bonds, ETFs & other securities in the firm’s portfolios . For instance, placing  tax-inefficient assets in retirement accounts like IRA’s as well as  tax efficient municipal bonds in taxable accounts etc.
  4. Using a algorithmic approach, choose the appropriate securities for each client account
  5. Continuously monitor the portfolio & transactions within it to tune performance , lower transaction costs, tax impacts etc based on how the markets are doing. Also ensure that a client’s preferences are being incorporated so that appropriate diversification and risk mitigation is being performed
  6. Provide value added services like Tax loss harvesting to ensure that the client is taking tax benefits into account as they rebalance portfolios or accrue dividends.
  7. Finally ,ensure the best user experience by handling a whole range of financial services – trading, account administration, loans,bill pay, cash transfers, tax reporting, statements in one intuitive user interface.


                             Illustration: Betterment user interface. Source – Joe Jansen

To illustrate these concepts in action, leaders like Wealthfront & Betterment are increasingly adding features where  highly relevant, data-driven advice is being provided based on existing data as well as aggregated data from other providers. Wealthfront now provides recommendations on diversification, taxes and fees that are personalized not only to the specific investments in client’s account, but also tailored to their specific financial profile and risk tolerance. For instance, is enough cash being set aside in the emergency fund ? Is a customer holding too much stock in your employer? [1]

The final post will look at a technology & architectural approach to building out a Robo-advisor. We will also discuss best practices from a WM & industry standpoint in the context of Robo-advisors.


  1. Wealthfront Blog – “Introducing the new Dashboard”