Home Digital Use Hortonworks Data Flow (HDF) To Connect The Dots In Financial Services..(3/3)

Use Hortonworks Data Flow (HDF) To Connect The Dots In Financial Services..(3/3)

by vamsi_cz5cgo

This is the final blogpost in our three part series on Enterprise Dataflow Management and it’s applicability in Financial Services. This post discusses common business drivers and usecases to a good degree of depth.

As  2015 draws to a close, the Financial Services Industry seems to be in the midst of a perfect storm – a storm that began blowing up the horizon a few years ago. For an industry thats always enjoyed relatively high barriers to entry & safe incumbency  due to factors like highly diversified operations, access to a huge deposit base etc – new age Fintechs and other nimble competitors have begun upending business dynamics across many of the domains that make up the industry – Retail & Consumer Banking, Capital Markets, Wealth Management et al.

Fintechs are capturing market share using a mix of innovative technology, crowd funding, digital wallets & currencies to create new products & services – all aimed at dis-intermediating & disrupting existing value chains. It is also interesting that incumbent firms still continue to spend billions of dollars in technology projects to maintain legacy platforms as well as create lateral (& tactical) innovation.

However, large to medium sized Banks (defined based on the average number of customer accounts as greater than 5 million), which have built up massive economies of scale over the years over a large geographical area, do hold a massive first mover advantage. This is due to a range of factors like well defined operating models, highly established financial products across their large (and largely loyal & sticky) customer bases, a wide networks of Branches & ATMs, rich troves of data that pertain to customer transactions & demographic information. However, it is not enough to just possess the data. They must be able to drive change through legacy thinking and infrastructures as things change around the entire industry as it struggles to adapt to a major new segment – millenial customers – who increasingly use mobile devices and demand more contextual services as well as a seamless unified banking experience – akin to what they commonly experience via the internet – at web properties like Facebook, Amazon,Google or Yahoo etc.

What are some of the exciting new business models that the new entrants in Fintech are pioneering at the expense of the traditional Bank ?

  • Offering targeted banking services to technology savvy customers at a fraction of the cost e.g. Mint.com in retirement planning in the USA
  • Lowering cost of access to discrete financial services for business customers in areas like foreign exchange transactions & payments e.g. Currency Cloud
  • Differentiating services like peer to peer lending among small businesses and individuals e.g. Lending Club
  • Providing convenience through use of handheld devices like iPhones .e.g. Square Payments

The core capability needed is being able to deliver realtime services via  Predictive Analytics @ Low Latency & Massive Scale. Predictive Analytics provides the most important capability in terms of facilitating speedier deposits, payments, risk mitigation, compliance transaction monitoring and fraud detection. This ability to improve customer engagement and retention while providing both realtime and deeper insight across hundreds of myriad business scenarios is what separates industry leaders from the laggards.

The four common themes to becoming a Data driven & Predictive Bank –

  1. Constant Product innovation based on an incremental approach to innovation – Are we building the right products that cater to our dynamic clientele?
  2. A Unified & Seamless Channel experience as an ever higher rate of transactions are performed over Digital mediums
  3. A relentless drive to Automation – replace obsolete manual processes with an automated operating processes across both business & IT
  4. Constant push to Innovate across the BiModal World –  the stable core as well as the new edge by incorporating the latest advances. Please visit http://www.vamsitalkstech.com/?p=1244 for more detail..

So what do Banking CXOs & Architects need to do to drive an effective program of Predictive Analytics Enabled Transformation?

  • Drive innovation to provide personalized services and a seamless customer experience across multiple diffused channels
  • Eliminate data silos that have built up over the years which inhibit an ability to cross sell services that clients are interested in
  • Offer data driven capabilities that can detect customer preferences on the fly, match them with existing history and provide value added services. Services that not only provide a better experience but also help in building a longer term customer relationship
  • On the Defensive side, provide next generation and forward looking Data Ingestion & Processing capabilities in the highly regulated (and hugely vexing) areas of Risk Data Aggregation & Reporting, Various Types of Fraud Monitoring & Detection and AML (Anti Money Laundering) Compliance

So how can Hortonworks Dataflow (HDF) Create Business Value – 

What are some of the key requirements & desired capabilities such a technology category can provide ?

  1. Provides strong lifecycle management capabilities for Data In Motion. This includes not just data flowing in from traditional sources of data including Customer Account data, Transaction Data, Wire Data, Trade Data, Customer Relationship Management (CRM), General Ledger but also from streaming datasources like Market Data Systems which feed instrument data, Machine generated data – ATMs, POS & systems supporting Digital functions like Social Media, Customer Feedback etc. It is format and protocol agnostic
  2. Helps maintain a strong Chain of Custody capability for this data. Essentially, know and understand how every piece of data originated from, how it was modified and it’s lineage. This is key from both a regulatory and a market lineage perspective
  3. Provides Data and System Administrators an ability to visualize enterprise wide or departmental flows in one place and to be able to modify it was on the fly
  4. Ability to run Stream based Processing for the data as it  is primarily a dataflow system based on the concepts of flow-based programming. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

Hortonworks DataFlow – HDF (based on Apache NiFi) is based on technology originally created by the NSA that encountered big data collection and processing issues at a scale and stage that is beyond most enterprise implementations today. HDF was designed inherently to meet the timely decision making needs from collecting and analyzing data from a wide range of disparate data sources, securely, efficiently and over a geographically disperse and possibly fragmented data silos the likes of which are commonplace in financial services.

Banking organizations are beginning to leverage HDF and HDP to create common cross-company data lake for data from different LOBs: mortgage, consumer banking, personal credit, wholesale and treasury banking. Both Internal Managers, Business Analysts, Data Scientists and finally Consumers are able to derive immense value from the data. A single point of data management allows the bank to operationalize security and privacy measures such as de-identification, masking, encryption, and user authentication.From a data processing perspective Hadoop supports multiple ways of running models and algorithms that are used to find patterns of fraud and anomalies in the data to predict customer behavior. Examples include Bayesian filters, Clustering, Regression Analysis, Neural Networks etc. Data Scientists & Business Analysts have a choice of MapReduce, Spark (via Java,Python,R), Storm etc and SAS to name a few – to create these models.

With the stage being said, let us now examine some concrete usecases and business drivers that are a great fit for HDF in Finance across four core domains across the spectrum – Retail & Consumer Banking, Capital Markets & Wealth Management, RFC (Risk, Fraud & Compliance) & Bank IT Ops.

NiFi-FSI                  Illustration – Enterprise Dataflow Management Business Drivers In Banking

Retail and Consumer Banking – 

According to McKinsey[1], there are a few ways to approach digital banking but for leading banks, there are typically four interconnected, mutually reinforcing elements: a) Improving Connectivity, b) Increased Automation,c) Accelerating Innovation, and d) Improved Decisioning.

Connectivity deals with being able to harness newer data sources along with internal and 3rd party data refers to build loyalty and competition-disrupting offerings. Automation deals with optimizing internal and external value chains in support of better automation. Innovation refers to how should banks continue to renew themselves, given the rapid pace of change in the industry. Decisioning refers to how big data can be used to make better, faster, and more accurate decisions regarding customer purchase choices as well as banks’ decisions on issues such as risk.

The challenge is retail banking is to seamlessly combine data from hundreds of internal databases and external systems (like 3rd party services & data providers). For instance Core Banking data needs to be combined with  Payments (made or missed) data, along with any  notes from the retail banker, and combined with behavior data to segment customers, predict loan defaults, to optimize portfolios etc etc. 

HDF can be used to ingest data from multiple channel endpoints like ATMs, POS terminals, Mobile Banking Clients etc. As the data is in motion, Predictive models are created that can be analyze transaction data and combine it with historical data on individuals to produce metrics known as scores. A given score indicate  a range of metrics around business considerations like fraud detection, risk detection as well as segmenting & ranking customers based on their likelihood of purchasing a certain product, the creditworthiness of a CDO (Collaterized Debt Obligation) etc.

Traditional Banking algorithms cannot scale with this explosion of data as well as the heterogeneity inherent in reporting across areas such as Risk management. E.g Certain kinds of Credit Risk need access to around 200 days of historical data where one is looking at the probability of the counter-party defaulting & to obtain a statistical measure of the same.

  1. Customer Profitability & Lifetime Value –
    Banking is typically a long term relationship with the provider realizing immense benefits from highly engaged customers. Understanding the journey of a single customer across multiple financial products and mapping their journey from a BI standpoint is a key capability.Accordingly, the business requirements are –

    • Integrate realtime transactions with core financial systems
    • Increase revenue & net yield per customer by showing them the right long term plan – financial products (e.g. a more cost effective mortgage or autoloan or provide portfolio optimization) etc
    • Understanding CLV across all products (savings, checking accounts, mortgages, auto loans & credit cards etc) with a view to understanding P&L(Profit and Loss) on an individual customer and segment basis


  2. Customer 360 & Segmentation –
    Currently most Retail and Consumer Banks lack a  comprehensive view of their customers. Each department has a limited view of customer due to which the offers and interactions with customers across multiple channels are typically inconsistent and vary a lot.  This also results in limited collaboration within the bank when servicing customer needs. Leveraging the ingestion and predictive capabilities of a Hadoop based platform, Banks can provide a user experience that rivals Facebook, Twitter or Google that provide a full picture of customer across all touch points
  3. Sentiment & Social Media Analysis
    Leveraging HDF’s ability to ingest data from various Social media and Digital Marketing type datasources, one can mine Twitter, Facebook and other social media conversations for sentiment data about products, services and competition, and use it to make targeted, real-time, decisions that increase market share.HDF comes out of the box with 90 data processors including encoders, encrypters, compressors, converters, creating Hadoop sequence files from data flows, interacting with AWS, sending messages to Kafka, getting messages from Twitter, and others etc. One can configure the data processors through a drag&drop visual UI, chaining them and using back-pressure between them to control the data flow.

  4. Realtime Payments
    The real time data processing capabilities of HDF allow it to process data in a continual or bursty or streaming or micro batching fashion. Once ingested, Payment data must be processed in a very small time period which is typically termed near real time(NRT). When combined with predictive capabilities via behavioral modeling & transaction profiling – HDF can provide significant operational, time & cost savings.

According to Mckinsey, the benefits of  digital banking amount to much more than just provision of financial services through the new facades of mobile and Internet channels. HDF can augment Digital Banking with more powerful capabilities—around greater governance, strong data security, and  privacy protection – all of which enable the creation & fine tuning of new products and services.

Capital Markets & Wealth Management –

Within large bulge bracket firms, Capital Markets groups engineer custom derivative trades that hedge exposure for their clients as well as for their own internal treasury groups. They may also do proprietary trading (on the bank’s behalf) for a profit (though this is the type of trading that the Volcker Rule seeks to eliminate). These groups typically lead the way in being forward looking from a high tech perspective.

Most of the compute intensive problems are generated out of either this group or the enterprise risk group. They typically own system that interface with the exchange facing order management systems, the trade booking systems, the pricing libraries for the products the bank trades as well as the tactical systems that are used to manage their market and credit risks, customer profitability, compliance and collateral systems. As a result, they usually get about a large chunk of a Bank’s total IT budget and see technology as a key competitive advantage. The above business drivers are already being tackled in many areas within the Capital Markets spectrum.

  1. Simulations & Cross LOB Analytics-
    Currently most Cross Line Of Business Analytics and simulations use limited data as high storage costs mean only a few months of data could be kept due to which simulations use only limited signals (data sources) and this affects Model accuracy. With an Hortonworks Data Platform (HDP) based Operational Store –

    • Data can now be kept indefinitely
    • Augmented with data from other LoBs
    • Provide an ability to simulating things like consumer demand and macro-trends

    HDF can constantly ingests, stores and processes market data, social media data, reference data, position data etc and constantly precomputes that can be persisted into the batch layer.

  2. Algorithmic Trading- HDF can augment trading infrastructures in several ways –

    1.Re-tool existing trading infrastructures so that they are more integrated yet loosely coupled and efficient
    2. Helping plug in algorithm based complex trading strategies that are quantitative in nature across a range of asset classes like equities, forex,ETFs and commodities etc
    3.Needing to incorporate newer & faster sources of data (social media, sensor data, clickstream date) and not just the conventional sources (market data, position data, M&A data, transaction data etc).
    4.Retrofitting existing trade systems to be able to accommodate a range of mobile clients who have a vested interest in deriving analytics. e.g marry tick data with market structure information to understand why certain securities dip or spike at certain points and the reasons for the same (e.g. institutional selling or equity linked trades with derivatives)
    5.Helping traders integrate algorithms as well as customizing these to be able to generate constant  competitive advantage

  3. Wealth Management Lifecycle – The lifecycle of Wealth Management ranging from Investment strategy development to Portfolio Optimization to Digital Marketing all depend on ingesting, analyzing and acting on complex data feeds. HDF augments Hadoop based capability by providing Data In Motion insights across this spectrum.
  4. Market & Trade Surveillance 

    An intelligent surveillance system needs to store trade data, reference data, order data, and market data, as well as all of the relevant communications from all the disparate systems, both internally and externally, and then match these things appropriately. The system needs to account for multiple levels of detection capabilities starting with a) configuring business rules (that describe a fraud pattern) as well as b) dynamic capabilities based on machine learning models (typically thought of as being more predictive). Such a system also needs to parallelize execution at scale to be able to meet demanding latency requirements for a market surveillance platform.

    HDF can augment existing systems ranging from CEP Engines to Trade Lifecycle Systems by –

    1. Supporting end to end monitoring across a variety of financial instruments across multiple venues of trading. Support a wide variety of analytics that enable the discovery of interrelationships between customers, traders & trades as the next major advance in surveillance technology.
    2. Providing a platform that can ingest from tens of millions to billions of market events (spanning a range of financial instruments – Equities, Bonds, Forex, Commodities and Derivatives etc) on a daily basis from thousands of institutional market participants
    3. Providing the ability to add new business rules (via either a business rules engine and/or a model based system that supports machine learning) is a key requirement. As we can see from the first post, market manipulation is an activity that seems to constantly push the boundaries in new and unforseen ways
    4. Providing advanced visualization techniques thus helping Compliance and Surveillance officers manage the information overload.
    5. Supporting the ability to perform deep cross-market analysis i.e. to be able to look at financial instruments & securities trading on multiple geographies and exchanges e.g.
    6. Supporting the ability to create views and correlate data that are both wide and deep. A wide view will look at related securities across multiple venues; a deep view will look for a range of illegal behaviors that threaten market integrity such as market manipulation, insider trading, watch/restricted list trading and unusual pricing.
    7. Supporting the ability to provide in-memory caches of data (based on Apache Spark)  for rapid pre-trade compliance checks.
    8. Supporting the ability to create prebuilt analytical models and algorithms that pertain to trading strategy (pre- trade models –. e.g. best execution and analysis). The most popular way to link R and Hadoop is to use HDFS as the long-term store for all data, and use MapReduce jobs (potentially submitted from Hive or Pig) to encode, enrich, and sample data sets from HDFS into SAS/R.

 

Risk, Fraud & Compliance –

Risk management is not just a defensive business imperative but the best managed banks deploy their capital to obtain the best possible business outcomes. The last few posts have more than set the stage from a business and regulatory perspective. This one will take a bit of a deep dive into the technology.

Existing data architectures are siloed with bank IT creating or replicating data marts or warehouses to feed internal lines of business. These data marts are then accessed by custom reporting applications thus replicating/copying data many times over which leads to massive data management & governance challenges.

Furthermore, the explosion of new types of data in recent years has put tremendous pressure on the financial services datacenter, both technically and financially, and an architectural shift is underway in which multiple LOBs can consolidate their data into a unified data lake.

All of the below areas have been exhaustively covered in the blog. By providing a scalable platform that enables simple, fast data acquisition, secure data transport, prioritized data flow and clear traceability of data from BORT (Book Of Record Transaction) Systems –  HDF is the perfect complement to HDP to helps bring together historical and perishable insights in the Classic RFC Areas.

  1. Risk Data Aggregation & Reporting – In depth discussion after the jump – http://www.vamsitalkstech.com/?p=667
  2. AML (Anti Money Laundering) Compliance – http://www.vamsitalkstech.com/?p=833
  3. Cyber Security – A cursory study of the significant data breaches in 2015 reads like a comprehensive list of enterprises across both Banking and Retail Verticals etc. The world of Banking now understands that an comprehensive & strategic approach to Cybersecurity is now far from being an IT challenge a few years ago to a “must have”. As Digital and IoT ecosystems evolve to loose federations of API accessible and cloud native applications, more and more assets are at danger of being targeted by extremely well funded and sophisticated adversaries. In conjunction with frameworks like OpenSOC, HDF can provide a unified data ingestion platform that can onboard & combine SIEM data, Advance Threat Intelligence,geolocation, and DNS information, Network Packet Capture to automate security threat detection while merging it all with Telemetry data. More on this in the next set of blogposts. 
  4. Fraud Monitoring & Detection – http://www.vamsitalkstech.com/?p=1098

IT Operations –

In addition to the business areas above, HDF shines in the below popular systems oriented usecases as well.

  1. Log Data Processing – The ability to process log data coming in from application endpoints, telemetry devices (e.g. ATM’s, Point Of Sale Terminals & IoT devices) is a tremendously useful capability to have across a range of usecases ranging from Customer Journey to Fraud Detection to Digital Marketing. HDF excels at log data aggregation and visualization at massive scale.
  2. Digitizing Endpoints – As larger Banks gradually adopt IoT technologies across Retail Banking, HDF can help take out the complexity of the  management of such large-scale systems that encompass a variety of endpoints and platforms e.g IP Cameras for Security, Wireless Access Points, Industrial Grade Routers, HVAC equipment etc that are commonly seen across physical locations.
  3. Cross Bank Report Generation – The ability to setup data pipelines that enable secure data sharing among different bank branches as well as data centers when personal data are shared between offices or branches of a bank is a key capability that HDF provides as well as helping produce fresh reports
  4. Cross LOB (Line Of Business) Analytics – When combined with the Hortonworks Data Platform , HDF accelerates the firm’s speed-to-analytics and also extends its data retention timeline. A shared data repository across multiple LOBs provides more visibility into all trading activities. The trading risk group accesses this shared data lake to processes more position, execution and balance data. They can do this analysis on data from the current workday, and it is highly available for at least five years—much longer than before.

Summary

Financial Services is clearly a data intensive business. Forward looking Banks, insurance companies and Securities firms have already begun to store and process huge amounts of data in Apache Hadoop and they have better insight into both their risks and opportunities. However, a significant inhibitor to enabling Predictive Analytics is the lack of strong enterprise-wide dataflow management capabilities.  Deployed at scale for almost a decade before being contributed to the Open Source Community Hortonworks Dataflow (HDF) has been proven to be an excellent and effective tool that integrates the most common current and future needs of big data acquisition and ingestion for accurately informed, on-time decision making.

References

[1]

http://docplayer.net/1202013-Asia-financial-institutions-digital-banking-in-asia-winning-approaches-in-a-new-generation-of-financial-services.html

Discover more at Industry Talks Tech: your one-stop shop for upskilling in different industry segments!

You may also like

2 comments

Steve Galauiz December 20, 2015 - 10:19 am

Great job of highlighting that HDF moves beyond the traditional integration scenarios.

Reply
Trevor Mize August 8, 2016 - 2:43 pm

Great Post. We are already considering Nifi for a few projects..

Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.