Banking on Big Data….

Everyday one hears more about how Big Data ecosystem technologies are helping create incremental innovation & disruption in any given industry vertical – be it in exciting new cross industry areas like Internet Of Things (IoT) or in reasonably staid areas like Banking, Manufacturing & Healthcare.

Big Data platforms, powered by Open Source Hadoop, can economically store large volumes of structured, unstructured or semistructured data & help process it at scale thus enabling predictive and actionable intelligence.

Corporate IT organizations in the financial industry have been tackling data challenges at scale for many years now.

Traditional sources of data in banking include

  1. Customer Account data e.g. Names,Demographics, Linked Accounts etc
  2. Transaction Data which captures the low level details of every transaction (e.g debit, credit, transfer, credit card usage etc),
  3. Wire & Payment Data,
  4. Trade & Position Data,
  5. General Ledger Data and Data from other systems supporting core banking functions.

Shortly after these “systems of record” became established, enterprise data warehouse (EDW) based architectures began to proliferate with the intention of mining the trove of real world data that Banks possess with an intention of providing Business Intelligence (BI) capabilities across a range of use cases – Risk Reporting, Customer Behavior, Trade Lifecycle, Compliance Reporting etc. Added to all of this, data architecture groups are responsible for maintaining an ever growing hodgepodge of business systems for customer metrics, adhoc analysis, massive scale log processing across a variety of business functions. All of the above data types have to be extensively processed before being adapted for analytic reasoning.

You also have a proliferation of data providers who want to now provide financial data as a product. These offerings range from Market Data (e.g. Bloomberg, Thomson Reuters) to Corporate Data to Macroeconomic Data (e.g Credit Bureaus) to Credit Risk Data. Providers in this business like the above typically construct models (e.g credit risk) on top of these sources and sell the models as well as the raw data to interested parties. Thus architectures need to adapt in an agile manner to able to scale, ingest and process these feeds in a manner that the business can leverage to react to rapidly changing business conditions.

Thus, Bank IT world was a world of silos till the Hadoop led disruption happened.

Where pre-Hadoop systems fall short- 

The key challenges with current architectures in ingesting & processing above kinds of data –

  1.  A high degree of Data is duplicated from system to system leading to multiple inconsistencies at the summary as well as transaction levels. Because different groups perform different risk reporting functions (e.g Credit and Market Risk) – the feeds, the ingestion, the calculators end up being duplicated as well.
  2. Traditional Banking algorithms cannot scale with this explosion of data as well as the heterogeneity inherent in reporting across areas such as Risk management. E.g Certain kinds of Credit Risk need access to around 200 days of historical data where one is looking at the probability of the counter-party defaulting & to obtain a statistical measure of the same. The latter are highly computationally intensive.

Circa 2015, Open source software offerings have immensely matured with compelling functionality in terms of data processing, deployment scalability, much lower cost & support for enterprise data governance. Hadoop, which is now really a platform ecosystem of 30+ projects – as opposed to a standalone technology, has been reimagined twice and now forms the backbone of any enterprise grade innovative data management project.


I hold that the catalyst for this disruption is Predictive Analytics – which provides both realtime and deeper insight across hundreds of myriad of scenarios –

  1. Predicting customer behavior in realtime,
  2. Creating models of customer personas (micro and macro) to track their journey across a Bank’s financial product offerings,
  3. Defining 360 degree views of a customer so as to market to them as one entity,
  4. Fraud detection
  5. Risk Data Aggregation (e.g Volcker Rule)
  6. Compliance  etc.

The net result is that Hadoop is no longer an unknown term in the world of high finance.


Banks, insurance companies and securities firms that have begun to store and process huge amounts of data in Apache Hadoop have better insight into both their risks and opportunities.

So what capabilities does Hadoop add to existing RDBMS based technology that did not exist before?

The answer is that using Hadoop a vast amount of information can be stored at much lower price point. Thus, Banks can not only generate insights using a traditional ad-hoc querying model but also build statistical models & leverage Data Mining techniques (like classification, clustering, regression analysis, neural networks etc) to perform highly robust predictive modeling. Such models encompass the Behavioral and Realtime paradigms in addition to the traditional Historical mode.

However the story around Big Data adoption in your average Bank is typically not all that revolutionary – it typically follows a more evolutionary cycle where a rigorous engineering approach is applied to gain small business wins before scaling up to more transformative projects.

Now, from a technology perspective, Hadoop helps the IT in five major ways –

  1. enables more agile business & data development projects
  2. enables exploratory data analysis to be performed on full datasets or samples within those datasets
  3. reduces time to market for business capabilities
  4. helps store raw historical data at very short notice at very low cost
  5. helps store data for months and years at a much lower cost per TB compared to tape drives and other archival solutions

To sum up, why should Banks look at Hadoop and Big Data?

  1. Realize enormous Business Value in a range of areas as diverse as –  Defensive (Risk, Fraud and Compliance  – RFC ) to Competitive Parity (e.g Single View of Customer) to the Offensive (Digital Transformation across their Retail Banking business)
  2. Drastically Reduced Time to Market for new business projects
  3. Hugely Improved Quality & Access to information & realtime analytics for customers, analysts and other stakeholders
  4. Huge Reduction in CapEx & OpEx spend on data management projects  (Big Data augments and even helps supplant legacy investments in MPP systems, Data Warehouses, RDBMS’s etc)
  5. Becoming the Employer of Choice for talent due to their vocal thought leadership in this field – in areas as diverse as Hadoop, Data Science and Machine Learning

How a Pioneering Bank leverages Hadoop for Enterprise Risk Data Aggregation & Reporting..

We honestly believe technology is going to define banking for the next five years, so it’s incredibly and strategically important that putting two individuals in charge now allows us to diversify our areas of focus.” Kyle McNamara, Co-Head IT & CIO, ScotiaBank

Source –

We’ve already explored Risk Data Aggregation & the applications of Big Data techniques to the same in depth in other posts in this blog.  This post covers many of the themes I’ve covered on this blog but before from the perspective of a actual Banking major (ScotiaBank) who have gone public about implementing a Hadoop Data Lake and are beginning to derive massive business value from it .

The above article at Waters Technology highlights the two co-CIO’s of Scotiabank discussing the usage of Hadoop in their IT to solve Volcker Rule & BCBS 239 related challenges.  It is certainly enlightening for a Bank IT audience to find CIOs discussing overall strategy & specific technology tools. The co-CIOs are charged with taking on the enterprise technology, focusing on growing and improving Scotiabank’s internal systems, platforms and applications.

Business Background ,or, Why is Data Management Such a Massive Challenge in Banking-

Banks need to operate their IT across two distinct prongs – defense and offense. Defensive in areas like Risk, Fraud and Compliance (RFC) ; Offensive as in revenue producing areas of the business like Customer 360 (whether Institutional or Retail), Digital Marketing, Mobile Payments, Omni channel Wealth Management etc. If one really thinks about it – the biggest activity that banks do is manipulate and deal in information whether customer or transaction or general ledger etc.

Looking at it as technologists, advances in Big Data Architectures and paradigms are causing tectonic shifts in enterprise data management (EDM). The Hadoop ecosystem is spurring consolidation and standardization activity across a hitherto expensive, inflexible and proprietary data landscape. Massive investments in data products to “keep the lights on” are being discounted to free up budgets for innovation related spending – whether that is Defensive  (Risk, Fraud and Compliance) or Offensive (Digital Transformation) areas. Technology like Oracle databases, MPP systems and EDW’s are an ill fit for the new democratized reality where consumers now have access to multiple touch points – cellphones, tablets, PCs etc.

The ecosystem of spending around high end hardware, hosting and service fees around these technologies is just too high to maintain and occupies a huge (and unnecessary) portion of Bank IT spend.

The implications of all of this from a data management perspective, for any typical Bank –

  1. More regulatory pressures are driving up mandatory Risk and Compliance expenditures to unprecedented levels. The Basel Committee guidelines on risk data reporting & aggregation (RDA), Dodd Frank, Volcker Rule as well as regulatory capital adequacy legislation such as CCAR are causing a retooling of existing data architectures that are hobbled by all of the problems mentioned above. The impact of the Volcker Rule has been to shrink margins in the Capital Markets space as business moves to a a flow based trading model that relies less on proprietary trading and more on managing trading for clients. At the same time more intraday data needs to be available for the intraday management of market, credit and liquidity risks.
  2. T+0 reconciliation is also required for Cash and P&L (Profit & Loss) to align the Enterprise Risk reporting and limit management functions with the Front Office risk management and trading initiatives.
  3. Reporting & even Reconciliation are not just an end-of-month events anymore.
  4. Daily enforcement of the enterprises data rules and governance procedures now need to be fully auditable and explainable to the regulators, the CEO and the Board of Directors.

Risk data aggregation, analytic & reporting practices are very closely intertwined with IT and data architectures.  Current industry-wide Risk IT practices span the spectrum from the archaic to the prosaic.

Areas like Risk and Compliance however provide unique and compelling opportunities for competitive advantage  for those Banks that can build agile data architectures that can help them navigate regulatory changes faster and better than others.

Enter BCBS 239-

The Basel Committee and the Financial Stability Board (FSB) have published an addendum to Basel III widely known as BCBS 239 (BCBS = Banking Committee on Banking Supervision) to provide guidance to enhance banks’ ability to identify and manage bank-wide risks. BCBS 239 guidelines do not just apply to the G-SIBs (the Globally Systemically Important Banks) but also to the D-SIBs (Domestic Systemically Important Banks) . Any important financial institution deemed ‘too big to fail” needs to work with the regulators to develop a “set of supervisory expectations” that would guide risk data aggregation and reporting.

The document can be read below in its entirety and covers four broad areas – a) Improved risk aggregation b) Governance and management c) Enhanced risk reporting d) Regular supervisory review

The business ramifications of BCBS 239 (banks are expected to comply by early 2016) –

1. Banks shall measure risk across the enterprise i.e across all lines of business and across what I like to call “internal” (finance, compliance, GL & risk) and “external” domains (Capital Mkts, Retail, Consumer,Cards etc).

2. All key risk measurements need to be consistent & accurate across the above internal and external domains across multiple geographies & regulatory jurisdictions. A 360 degree view of every risk type is needed and this shall be consistent without discrepancies.

3.Delivery of these reports needs to be flexible and timely, an a on demand basis as needed.

4.Banks need to have strong data governance and ownership functions in place to measure this data across a complex organizational structure


The Banking IT landscape (whatever the segment one picks across the spectrum – Capital Markets, Retail & Consumer Banking, Cards etc etc) are all largely predicated on a legacy pattern – a mishmash of organically developed & shrink wrapped vendor systems that do everything from Core Banking to Trade Lifecycle to Settling Securities etc.  Each of these systems operates in a data silo with it’s own view of the enterprise. These are all kept in sync largely via data replication.

Current Risk Architectures are based on traditional RDBMS architectures with 10’s of feeds from Book Of Record Transaction Systems (BORTS) like Trade & Position Data (e.g. Equities, Fixed Income, Forex, Commodities, Options etc),  Wire Data, Payment Data, Transaction, Data etc.

These data feeds are then tactically placed in memory caches or in enterprise data warehouses. Once the data has been extracted, it is transformed using a series of batch jobs which then prepare the data for Calculator Frameworks to which run the risk models on them.

All of the above need access to large amounts of Data at the individual transaction Level. Finance makes end of day adjustments to tie all of this up and these adjustments need to be cascaded back to the source systems down to the individual transaction or classes of transaction levels. This is a major problem for most banks.
Finally, there is always need for statistical framework to make adjustments to Transactions that somehow need to get reflected in the source systems. All of these frameworks need to have access to and an ability to work with TBs of data.

Where current systems fall short- 

The key challenges with current architectures –

  1.  A high degree of Data is duplicated from system to system leading to multiple inconsistencies at the summary as well as transaction levels. Because different groups perform different risk reporting functions (e.g Credit and Market Risk) – the feeds, the ingestion, the calculators end up being duplicated as well.
  2. Traditional Risk algorithms cannot scale with this explosion of data as well as the heterogenity inherent in reporting across multiple kinds of risks. E.g Certain kinds of Credit Risk need access to around 200 days of historical data where one is looking at the probability of the counter-party defaulting & to obtain a statistical measure of the same. The latter are highly computationally intensive.
  3. Risk Model and Analytic development needs to be standardized to reflect realities post BCBS 239.

  4. The Volcker Rule aims to ban prop trading activity on part of the Banks. Banks must now report on seven key metrics across 10s of different data feeds across PB’s of data.

Hadoop to the Rescue –

Since the financial crisis of 2008, open source software offerings have immensely matured with compelling functionality in terms of scalability & governance. Hadoop ,which is really an ecosystem of 30+ projects, has been reimagined twice and now forms the backbone of any enterprise grade innovative data management project.

Hadoop Based Target State Architecture of a Risk Data Aggregation Project –

The overall goal is to create a cross company data-lake containing all cross asset data in one place as depicted in the below graphic.


1) Data Ingestion: This encompasses creation of the L1 loaders to take in Trade, Loan, Payment and Wire Transfer data. Developing this portion will be the first step to realizing the overall architecture as timely data ingestion is a large part of the problem at most institutions. Part of this process includes understanding examples of a) data ingestion from the highest priority of systems b) apply the correct governance rules to the data. The goal is to create these loaders for versions of different systems (e.g Calypso 9.x) and to maintain it as part of the platform moving forward. The first step is to understand the range of Book of Record transaction systems (lending, payments and transactions) and the feeds they send out. The goal would be to create the mapping to a release of an enterprise grade Open Source Big Data Platform e.g HDP (Hortonworks Data Platform) to the loaders so these can be maintained going forward.

2) Data Governance: These are the L2 loaders that apply the rules to the critical fields for Risk and Compliance. The goal here is to look for gaps in the data and any obvious quality problems involving range or table driven data. The purpose is to facilitate data governance reporting.

3) Entity Identification: This is the establishment and adoption of a lightweight entity ID service. The service will consist of entity assignment and batch reconciliation. The goal here is to get each target bank to propagate the Entity ID back into their booking and payment systems, then transaction data will flow into the lake with this ID attached providing a way to do customer 360.

4) Developing L3 loaders: This will involve defining the transformation rules that are required in each risk, finance and compliance area to prep the data for their specific processing.

5) Analytic Definition: Defining the analytics that are to be used for each risk and compliance area.

6) Report Definition: Defining the reports that are to be issued for each risk and compliance area.

How has ScotiaBank’s experience been?  (Reproduced from the Article)

As at many banks, Zerbs (co-CIO) says BCBS 239 triggered a fundamental rethink at Scotiabank about how to bring data together to meet the requirements of the regulation. The main goal was to develop a solution that would be scalable and repeatable in other aspects of the organization.

What Zerbs says Scotiabank wasn’t interested in was developing a solution that would need to be overhauled over and over again.

“You start a project where you extract the data in exactly the form that was required. Time goes by. By the time you’re done, the business says, ‘By the way, now that I think about it, I need another five data attributes.’ And you’re saying, ‘Well you should have told me earlier, because now it’s a six-month change project and it’s going to cost you more, on the magnitude of hundreds of thousands of dollars,'” says Zerbs. “That’s the kind of vicious circle that repeats itself too often. So we thought: How do we avoid that?”

Apache Hadoop seemed like the obvious choice, according to Zerbs. At the time, though, the bank didn’t have a lot of experience with the big data application. Zerbs says Scotia decided to use another pressing regulation ─ Volcker Rule compliance ─ as a test case for its first product application on the Hadoop platform.

The Volcker Rule, part of the US Dodd-Frank Act, requires banks to delineate client-related trading from proprietary trading. Regulators look at several metrics to decipher whether something is considered proprietary or client-related. Client metrics are also looked at, including how much inventory needs to be held to satisfy client demands in areas the firm is considered a market-maker.

Because Scotia’s prop-trading activity was already fairly small relative to its overall size, the project was deemed more manageable than jumping in with BCBS 239. And due to Volcker having what Zerbs describes as a “fuzzy set of requirements,” the scalability and reusability of a big data solution seemed like the perfect option and chance to test the waters in an area that Scotiabank lacked experience.

The bank had less than six months to go from conceptualization to initial production to meet the first deadline, which passed on July 21. Zerbs says it’s an example of bringing together the customer, risk, and technology focus all at once to create a solution that has multiple benefits to the entire organization.


Adopting fresh & new age approaches that leverage the best in Big Data can result  in –

1. Improved insight and a higher degree of transparency in business operations and capital allocation

2. Better governance procedures and policies that can help track risks down to the individual transaction level & not just at the summary level

3.  Streamlined processes across the enterprise and across different banking domains. Investment banking, Retail & Consumer, Private Banking etc.

Indeed, Lines of Businesses (LOB’s) can drive more profitable products & services once they understand their risk exposures better. Capital can only be allocated more efficiently, better road-maps created in support of business operations instead of constant fire-fighting, regulatory heartburn and concomitant bad press.Another trend as it is evident now is the creation of consortiums in banking to solve the most common challenges – Risk, Compliance, Fraud Detection which in and of themselves result in no tangible topline growth.

Let us examine that in a followup post.

Leveraging Big Data to Revolutionize Mortgage Banking..


(Image Courtesy –

Perhaps more than anything else, failure to recognize the precariousness and fickleness of confidence-especially in cases in which large short-term debts need to be rolled over continuously-is the key factor that gives rise to the this-time-is-different syndrome.Highly indebted governments, banks, or corporations can seem to be merrily rolling along for an extended period, when bang!-confidence collapses, lenders disappear, and a crisis hits.”   – This Time is Different (Carmen M. Reinhart and Kenneth Rogoff)

Tomes have been written about the financial crisis of 2008 (GFC  -as it’s affectionately called in financial circles). Whatever be the mechanics of the financial instruments that caused the meltdown –  one thing everyone broadly agrees on was that easy standards with respect to granting credit (and specifically consumer mortgages in the US with the goal of securitizing & reselling them in tranches – the infamous CDO’s) were the main causes of the crisis.

Banks essentially granted easy mortgages (in part to huge numbers of high risk, unqualified customers) with the goal of securitizing these, marketing and selling them into the financial markets by dressing them as low risk & high return investments.  AIG Insurance’s financial products (FP) division created & marketed another complex instrument – credit default swaps – which effectively insured the buyer from losses in the case any of the original derivative instruments made a loss.

For a few years, the Mortgage Market had largely been transformed into a risk averse operation however rebounding in recent times with the economic recovery. Higher loan production efficiencies and  favorable hedging outcomes on hedges helped drive an increase in mortgage banking profits during the second quarter of 2015.

The Mortgage Bankers Association reported that average net pretax income jumped 55.7 percent from the first quarter to $3.50 million in the second. That was the best pretax income figure since the first quarter of 2013. (Source – Inside Mortgage Banking).

However internet based lenders and new Age FinTechs are encroaching on this established space by creating agile applications that are internet enabled by default & Digital by Design across the front, back and mid offices. These services vastly ease the loan application & qualification  processes (sometimes processing loans in a day as compared to the weeks with traditional lenders), while offering a surfeit of other integrated services like financial planning & advisory, online brokerage and bill payments etc. All of these services are primarily underpinned on advanced data analytics that provide – 1) a seamless Single View of Customer (SVC) as well as 2) advanced Digital Marketing capabilities that can capture a 3) Customer Journey across a slew of financial products.

However there is a significant need for existing players to be able to gain such efficiencies that are missing in their IT capabilities due to antiquated technology & data architectures. It is no longer possible for them to remain profitable in the coming years unless innovation is adopted at the core of their IT infrastructures.

If Mortgage lenders are to take a Big Data approach augmenting complementary investments in other Digital technology – Mobile, Web Scale, DevOps, Automation and Cloud Computing – then what are the highest value business use-cases to apply this to?

Big Data can be applied to the Mortgage Market business spanning six broad buckets as delineated below –

  1. Account Origination & Underwriting – Qualifying borrowers for Mortgages based on not just historical data that is used as part of the origination & underwriting process (credit reports, employment & income history etc) but also data that was not mined hitherto (social media data, financial purchasing patterns,). It is a well known fact there are huge segments of the population (especially the Millenial’s) who are broadly eligible but under banked as they do not satisfy some of the classical business rules needed to obtain approvals on mortgages
  2. Account Servicing –Servicing is a highly commodified, low margin, high transaction volume business, and it serves an industry that has shrunk over 12% since 2008 (from $11.3 trillion to $9.9 trillion) – Ref Todd Fischer in National Mortgage News. Innovation here will largely be driven by players who  apply sophisticated analytics to make better-informed decisions that will result in enhanced risk mitigation, improved loan quality, higher per transaction margin, and increased profitability. Also, combining real time consumer data (household spending, credit card usage, income changes) with historical data to assess eligibility for either approvals in Home Equity Lines of Credit (HELOC) or an increases in mortgage borrowing. Predicting when a young family will want to move out of a starter home to a larger home based on data such as childbirth etc. On the flip side, being able to detect patterns that could indicate financial distress & subsequent delinquency (based on macro indicators like large numbers of defaults in a specific county or micro indicators like loss or break in employment) on the part of borrowers – across a range of timelines is an excellent example of this capability.
  3. Cross Product Selling – Mortgages have historically been a highly sticky financial product that entails a Bank-Customer relationship spanning 10+ years. Such considerable timelines ensure that Banks can build relationships with a customer than enables them to sell bundled products like Auto Loans, Private Banking Services, Credit Cards, Student and Consumer loans over the lifespan of the account. Underpinning all of this are the rich troves of data that pertain to customer transactions & demographic information.
  4. Risk & Regulatory Reporting –  Post the financial crisis, the US Government via the Federal Housing Administration (FHA) has put in place a stringent regulatory mandate with a series of housing loan programs that aim to protect the consumer against predatory lending. These range from FHA- HARP to FHA-HAMP to the Short Refinance Program to HEAP to the FHA-HAPA. Banks need to understand their existing customer data to predict and modify mortgages as appropriate for borrowers in financial distress. Predictive modeling using Big Data techniques is a huge help in this analysis.
  5. Fraud Detection – Mortgage fraud is a huge economic challenge and spans areas like foreclosure fraud, subprime fraud, property valuation fraud etc.  Law enforcement organizations including the FBI are constantly developing and fine-tuning  new techniques to analyze, detect and combat mortgage fraud. A large portion of this to collect and analyze data to spot emerging trends and patterns. And we are using the full array of investigative techniques to find and stop criminals before the fact, rather than after the damage has been done.
  6. Business Actions –  One of the facts of life in the fast moving mortgage market are business actions ranging from whole sale acquisitions of lenders to selling tranches of loans for sub-servicing. The ability to analyze a vast amount of data (ranging in Petabytes) with multiple structures to determine an acquisition target’s risk profile, portfolio worthiness are key to due diligence. The lack of such diligence has led to (famously) suboptimal acquisitions (e.g. BofA – Countrywide & JP Morgan – Washington Mutual to name a couple). These in turn have led to executive churn,  negative press, massive destruction of shareholder value & the distraction of multiple lawsuits.

How can Big Data Help? 

Existing data architectures in the mortgage sector are largely silo-ed with IT creating or replicating data marts or warehouses to feed internal lines of business. These data marts are then accessed by custom reporting applications thus replicating/copying data many times over which leads to massive data management headaches & governance challenges.

Furthermore, the explosion of new types of data (e.g Social Media, Clickstream,housing price indices,demographic migration data etc) in recent years has put tremendous pressure on the financial services datacenter, both technically and financially, and an architectural shift is underway in which multiple LOBs can consolidate their data into a unified data lake.

It is also interesting that the industry is moving to an approach of integration & augmentation given that all of the leading databases, ETL products and BI tools provide robust and certified Hadoop plugins. Hadoop can thus integrate all cross company data (mortgages, clickstreams, transaction data, payment data, account data etc) to create one scalable and low cost data repository that can be mined differently based on differing line of business requirements.

The Financial Services Data Lake (as shown below) supports multiple access methods (batch, real-time, streaming, in-memory, etc.) to a common data set which is the unified repository of all financial data, it also enables users to transform and view data in multiple ways (across various schemas) and deploy closed-loop analytics applications that bring time-to-insight closer to real time than ever before.


Finally, a Hadoop cluster with tools like Hadoop, Storm, and Spark is not limited to one purpose, like older dedicated-computing platforms. The same cluster you use for running Risk models tests can also be used for text mining, predictive analytics, compliance, fraud detection, customer sentiment analysis, and many many other purposes. This is a key point, once you can bring in siloe-d data into a data lake, it is available to running multiple business scenarios – limited only by the overall business scope.

Analysts and Data Scientists can use a variety of tools to glean insights, build & backtest models. Plenty of organizations already have purchased BI tools like Tableau, Spotfire or Qlikview. Data Scientists can use platforms like SAS or R. A series of MapReduce jobs (potentially submitted from Hive, Pig, or Oozie) are typically run to ingest, transform encode, sort, the data sets from HDFS into R. Data analysts can then perform complex modeling exercises such as linear or logistic regression,  directly on the cleansed & enriched data in R

To conclude – we are still in the early days of understanding how Big Data can impact the mortgage business. Over-regulating data management & architecture, discouraging innovation among data & business teams, as a result of an overly conservative approach or long budget cycles, is a recipe for suboptimal business results.

Digital Transformation in Financial Services..(1/4)

This article is the first installment in a four part series that talks about the ongoing digital disruption across financial services. This first post sets the stage for the entire market. Subsequent posts will cover each of the mini worlds – Capital Markets, Retail & Consumer Banking & Wealth Management – that comprise the banking industry. Each post will cover the evolving landscape in the given sector and the impact of digitization. We also cover the general business direction in the context of disruptive technology innovation.

The Financial Services Industry is in the midst of a perfect storm. For an industry thats always enjoyed relatively high barriers to entry & safe incumbency, a wave of Fintechs and agile competitors are upending business dynamics across many Banking domains. It is also interesting that while incumbent firms continue to billions of dollars in technology projects to maintain legacy platforms as well as create lateral innovation – the Fintechs are capturing market share using a mix of innovative technology, , as well as creating new products & services – all aimed at disintermediating & disrupting existing value chains via Digital Transformation.

Most banks run digital front ends as opposed to a true digital enterprise.  They have simply wrapped old BORT (Book Of Record Transaction Systems like Wire, Payment & Transaction)  systems for Internet and Mobile access. Digital is not just another a channel but a way of life. This is what the Fintech’s and companies like PayPal Google get and banks don’t. Banks need to tighten their processes up to run at Internet speeds. The backend itself needs to be a scalable platform not a ragtag assembly of software components that were designed and built 20 or 30 years ago.
Much industry research has showed that banks are not making very good progress towards  becoming a digital organization and in fact they are waiting for the regulators to beat them into re-engineering their BORT systems.

Thus digital disruption has arrived late in financial services as compared to sectors like Retail & Healthcare. In Retail for instance, has caused tectonic shifts for years causing existing brick and mortar players to build out substantial online storefronts which are integrated into backoffice operations and across their supply chains

So what are the major technology prongs driving this transformation and how does such an approach boil down in terms of technology principles? And pray, what are the technology ingredients that make up a successful Digital Strategy or more importantly how do all the principles of webscale apply at a large organization? I would wager that there are five or six major factors, chief among them – an intelligent approach to leveraging data (ingesting, mining & linking microfeeds to existing data – thus a deep analytical approach based on predictive analytics and machine learning), an agile infrastructure based on cloud computing principles, a microservice based approach to building out software architectures, mobile platforms that accelerate customers abilities to Bank Anywhere, an increased focus on automation both from a business process to software system delivery  and finally a culture that encourages risk taking & a “fail fast” approach.

Thus, the six trends are –

1. Big Data,
2.Cloud Computing,
3.Mobile Computing & Platforms,
4.Social Media,
5.DevOps and Microservices


Digital Banking is the age of the hyper-connected consumer. Customers are expecting to be able to Bank from anywhere, be it a mobile device or use internet banking from their personal computer. The business areas shown in the above picture are a mix of both legacy capabilities (Risk, Fraud and Compliance) to the new value added areas (Mobile Banking, Payments, Omni-channel Wealth Management etc).

Accenture reports that global investment in Fintech ventures tripled to $12.21 billion in 2014. This clearly signifying that the digital revolution has well and truly arrived in the financial services sector.


What are some of the exciting new business models that the new entrants in Fintech are pioneering at the expense of the traditional Bank ?

  • Offering targeted banking services to technology savvy customers at a fraction of the cost e.g. in retirement planning in the USA
  • Lowering cost of access to discrete financial services for business customers in areas like foreign exchange transactions & payments e.g. Currency Cloud
  • Differentiating services like peer to peer lending among small businesses and individuals e.g. Lending Club
  • Providing convenience through use of handheld devices like iPhones .e.g. Square Payments

Large Banks, which have built up massive economies of scale over the years, do hold a large first mover advantage. This is due to a range of established services across their large (and loyal) customer bases, rich troves of data that pertain to customer transactions & demographic information. However, it is not enough to just possess the data. They must be able to drive change through legacy thinking and infrastructures.

So what do Banking CXOs need to do to drive an effective program of Digital Transformation?

  • Change employee mindset and culture that are largely set on ‘business as usual’ to a mindset of unafraid experimentation across the business areas shown on the right side of the above pictorial
  • Create the roles of Digital and Technology entrepreneurs as change agents across these complex technology areas. These leaders should help seed digital thinking into these lines of business
  • Drive culture to adapt to new ways of doing business with a population going increasingly digital by offer multiple channels and avenues
  • Offer data driven capabilities that can detect customer preferences on the fly, match them with existing history and provide value added services. Services that not only provide a better experience but also help in building a longer term customer relationship
  • Help the business & IT rapidly develop, prototype, test new business capabilities

To their credit, the large Banks are not sitting still. Bank of America, as one example has been in the news as bringing in 60% of all it’s sales from Digital Channels from their last quarter – Q2 of 2015.

Bank Of America might want to change its name to Digital Bank of America.

The Charlotte, N.C., megabank is more digital bank than conventional financial institution today. That’s because 60% of the bank’s “sales” are “all digital now,” Brian T. Moynihan, Chairman and CEO of Bank of America, told investors yesterday.

Moynihan also disclosed that about 6% of the bank’s digital “sales” – it is difficult to identify exactly what he means by “sales,” unfortunately – are via mobile device, “and that’s growing at 300%,” he said.

Moynihan’s disclosures yesterday were the most publicly detailed on digital banking at a major bank to date.

Ref –

The definition of Digital is somewhat nebulous, I would like to define the key areas where it’s impact and capabilities will need to be felt for this gradual transformation to occur.

A true Digital Bank needs to –

  • Offer a seamless customer experience much like the one provided by the likes of Facebook & Amazon i.e highly interactive & intelligent applications that can detect a single customer’s journey across multiple channels
  • offer data driven interactive services and products that can detect customer preferences on the fly, match them with existing history and provide value added services. Services that not only provide a better experience but also foster a longer term customer relationship
  • to be able to help the business prototype, test, refine and rapidly develop new business capabilities
  • Above all, treat Digital as a Constant Capability and not as an ‘off the shelf’ product or a one off way of doing things

Though some of the above facts & figures may seem startling, it’s how individual banks  put both data and technology to work across their internal value chain that will define their standing in the rapidly growing data economy.

If you really think about it –  all that banks do is manipulate and deal in information. If that is not primed for a Über type of revolution I do not know what is.
Now that we have defined the Digital Challenge for Banking, the next post will examine transformation in more depth across Capital Markets.

How Facebook scales to a billion users in 24 hrs..(1/2)


(Image Credit –

The first in this two post series examines Facebook’s corporate philosophy in architecting systems for web scale. Part two will look at their technology stack in more depth. 

Last week, buried in the news of all the tumult in the stock markets, we had what I believe, a major milestone for web scale architectures and practices. A billion people logged into Facebook on a single day (Monday – Aug 26, 2015), marking the maximum number of users (one-seventh of the planet’s population) on any platform ever in a 24 hr period. An incredible number any way you want to slice it and dice it.

Sociologists will no doubt examine the tremendous impact the defining Social Network is having on humans & it’s becoming an absolute necessity in the daily lives of billions . As technologists however, we are left the more mundane job of understanding how any one platform can scale to support an astounding number of users & do so with panache.

The idea ultimately is to understanding how one can bring such innovation (albeit incrementally) into the more run of the mill enterprise.

Lets parse this in context of the bigger picture –

Firstly, Facebook (among an elite grouping of other luminaries – Google, Amazon et al) has figured out how to build platforms (that host an ecosystem of services – social in FB’s case) that provide massively scalable applications while providing an engaging user experience. It is no surprise that the average Facebook user spends around 40+ minutes in the USA and 20+ minutes outside the USA (source – Wikipedia).


(Source – Facebook)

Secondly, Facebook’s achievements are all the more amazing since they did not have the early mover advantage that MySpace did. Their recent earnings report for Q2 2015 shows continued strength across a raft of metrics their business is gauged across – increase in users, revenue per user, advertising revenue as well as revenue from mobile users.

So, what makes Facebook different from an overall strategy & DNA ? In my opinion, four important factors –

  1. Like the other web scale giants, Facebook (FB) is a realtime data driven enterprise. Right from the projects they leverage to the algorithms they create that drive interaction on their site –  It all comes down to data and it’s ingestion, processing, logging, analytics & insights driven from it. Facebook’s IT Architecture and Technology stack is built around managing an entire data lifecycle as we will see in the next post.  They also leverage Data Science in a big way. FB’s Data Science team regularly posts on all their latest research & insights. Areas they spend a lot of time include Identity Research (creating and testing models of people’s online identity to power next-generation products and to gain deeper insights into how people interact with digital technology), Economic Research (pricing,forecasting etc), In house analytics, Product Science & Statistical Research. An outstanding approach to digitizing FB’s offerings in a way that supports superior customer interactions and constantly creates new value.The Data Science team even maintain a Facebook page where they publish their recent research and insights at –
  2. Facebook is one of the world’s largest open source companies. Everything they do from their web application development to back end systems uses Open Source. Further they have created projects like Hive, Cassandra, Scribe and Thrift to name a few popular ones. They also contribute millions of lines of code to other open source ecosystems (e.g Apache Hadoop). Thus FB adopts a model of constant incubation & invention of open technologies.
  3. In fact, FB takes their open source philosophy to somewhat extreme lengths – to creating an open source hardware platform. They incubated the Open Compute Project which is a highly credible attempt to create a lower cost, highly efficient and highly scalable datacenter. OpenCompute aims to create an open source standard that spans server designs,rack specifications, power systems, high end networking equipment like switches, storage & cooling equipment. Open Compute is rapidly evolving into a formidable ecosystem with participation from Apple, Microsoft, Cisco, Goldman Sachs, Fidelity and Bank of America. In fact, BoFA has been vocal about their vision of running commodity white box servers in their datacenters and dynamically reconfiguring these via an OpenStack controller to perform compute, storage and networking functionalities as dictated by business workloads. The charter for the Open Compute project lays out very ambitious goals and over time, the industry will move to a model where a range of bespoke datacenter equipment can be assembled, at much lower cost, using a LEGO style approach as long as the components themselves as OCF (Open Compute Foundation) spec compliant.

  4. Building for the future by inculcating innovation into the organizational DNA. This is done by generating new ideas, being unafraid to cannibalize older (and even profitable) ideas and constant experimenting across new businesses. FB is famous for not having a review board that designers and engineers go present to with PowerPoint slides – prototypes and pilot projects are directly presented to executives – even CEO Mark Zuckerberg. Facebook pivoted in a couple of years from weak mobile offerings to becoming the #1 mobile app company (more users access their FB pages using mobile devices running iOS & Android compared to using laptops).  FB’s culture is almost Steve Jobs-ian in this respect.

A good chunk of FB’s success is owed to a contrarian mindset in terms of creating a technology platform & generating a huge competitive advantage from it via continuous improvement.

The next post will be a technical one in nature; we will examine Facebook’s generic technology stack (from publicly available information) and take a brief look at the individual projects that are brought together to support such massive horizontal scalability.