How a Pioneering Bank leverages Hadoop for Enterprise Risk Data Aggregation & Reporting..

We honestly believe technology is going to define banking for the next five years, so it’s incredibly and strategically important that putting two individuals in charge now allows us to diversify our areas of focus.” Kyle McNamara, Co-Head IT & CIO, ScotiaBank

Source –

We’ve already explored Risk Data Aggregation & the applications of Big Data techniques to the same in depth in other posts in this blog.  This post covers many of the themes I’ve covered on this blog but before from the perspective of a actual Banking major (ScotiaBank) who have gone public about implementing a Hadoop Data Lake and are beginning to derive massive business value from it .

The above article at Waters Technology highlights the two co-CIO’s of Scotiabank discussing the usage of Hadoop in their IT to solve Volcker Rule & BCBS 239 related challenges.  It is certainly enlightening for a Bank IT audience to find CIOs discussing overall strategy & specific technology tools. The co-CIOs are charged with taking on the enterprise technology, focusing on growing and improving Scotiabank’s internal systems, platforms and applications.

Business Background ,or, Why is Data Management Such a Massive Challenge in Banking-

Banks need to operate their IT across two distinct prongs – defense and offense. Defensive in areas like Risk, Fraud and Compliance (RFC) ; Offensive as in revenue producing areas of the business like Customer 360 (whether Institutional or Retail), Digital Marketing, Mobile Payments, Omni channel Wealth Management etc. If one really thinks about it – the biggest activity that banks do is manipulate and deal in information whether customer or transaction or general ledger etc.

Looking at it as technologists, advances in Big Data Architectures and paradigms are causing tectonic shifts in enterprise data management (EDM). The Hadoop ecosystem is spurring consolidation and standardization activity across a hitherto expensive, inflexible and proprietary data landscape. Massive investments in data products to “keep the lights on” are being discounted to free up budgets for innovation related spending – whether that is Defensive  (Risk, Fraud and Compliance) or Offensive (Digital Transformation) areas. Technology like Oracle databases, MPP systems and EDW’s are an ill fit for the new democratized reality where consumers now have access to multiple touch points – cellphones, tablets, PCs etc.

The ecosystem of spending around high end hardware, hosting and service fees around these technologies is just too high to maintain and occupies a huge (and unnecessary) portion of Bank IT spend.

The implications of all of this from a data management perspective, for any typical Bank –

  1. More regulatory pressures are driving up mandatory Risk and Compliance expenditures to unprecedented levels. The Basel Committee guidelines on risk data reporting & aggregation (RDA), Dodd Frank, Volcker Rule as well as regulatory capital adequacy legislation such as CCAR are causing a retooling of existing data architectures that are hobbled by all of the problems mentioned above. The impact of the Volcker Rule has been to shrink margins in the Capital Markets space as business moves to a a flow based trading model that relies less on proprietary trading and more on managing trading for clients. At the same time more intraday data needs to be available for the intraday management of market, credit and liquidity risks.
  2. T+0 reconciliation is also required for Cash and P&L (Profit & Loss) to align the Enterprise Risk reporting and limit management functions with the Front Office risk management and trading initiatives.
  3. Reporting & even Reconciliation are not just an end-of-month events anymore.
  4. Daily enforcement of the enterprises data rules and governance procedures now need to be fully auditable and explainable to the regulators, the CEO and the Board of Directors.

Risk data aggregation, analytic & reporting practices are very closely intertwined with IT and data architectures.  Current industry-wide Risk IT practices span the spectrum from the archaic to the prosaic.

Areas like Risk and Compliance however provide unique and compelling opportunities for competitive advantage  for those Banks that can build agile data architectures that can help them navigate regulatory changes faster and better than others.

Enter BCBS 239-

The Basel Committee and the Financial Stability Board (FSB) have published an addendum to Basel III widely known as BCBS 239 (BCBS = Banking Committee on Banking Supervision) to provide guidance to enhance banks’ ability to identify and manage bank-wide risks. BCBS 239 guidelines do not just apply to the G-SIBs (the Globally Systemically Important Banks) but also to the D-SIBs (Domestic Systemically Important Banks) . Any important financial institution deemed ‘too big to fail” needs to work with the regulators to develop a “set of supervisory expectations” that would guide risk data aggregation and reporting.

The document can be read below in its entirety and covers four broad areas – a) Improved risk aggregation b) Governance and management c) Enhanced risk reporting d) Regular supervisory review

The business ramifications of BCBS 239 (banks are expected to comply by early 2016) –

1. Banks shall measure risk across the enterprise i.e across all lines of business and across what I like to call “internal” (finance, compliance, GL & risk) and “external” domains (Capital Mkts, Retail, Consumer,Cards etc).

2. All key risk measurements need to be consistent & accurate across the above internal and external domains across multiple geographies & regulatory jurisdictions. A 360 degree view of every risk type is needed and this shall be consistent without discrepancies.

3.Delivery of these reports needs to be flexible and timely, an a on demand basis as needed.

4.Banks need to have strong data governance and ownership functions in place to measure this data across a complex organizational structure


The Banking IT landscape (whatever the segment one picks across the spectrum – Capital Markets, Retail & Consumer Banking, Cards etc etc) are all largely predicated on a legacy pattern – a mishmash of organically developed & shrink wrapped vendor systems that do everything from Core Banking to Trade Lifecycle to Settling Securities etc.  Each of these systems operates in a data silo with it’s own view of the enterprise. These are all kept in sync largely via data replication.

Current Risk Architectures are based on traditional RDBMS architectures with 10’s of feeds from Book Of Record Transaction Systems (BORTS) like Trade & Position Data (e.g. Equities, Fixed Income, Forex, Commodities, Options etc),  Wire Data, Payment Data, Transaction, Data etc.

These data feeds are then tactically placed in memory caches or in enterprise data warehouses. Once the data has been extracted, it is transformed using a series of batch jobs which then prepare the data for Calculator Frameworks to which run the risk models on them.

All of the above need access to large amounts of Data at the individual transaction Level. Finance makes end of day adjustments to tie all of this up and these adjustments need to be cascaded back to the source systems down to the individual transaction or classes of transaction levels. This is a major problem for most banks.
Finally, there is always need for statistical framework to make adjustments to Transactions that somehow need to get reflected in the source systems. All of these frameworks need to have access to and an ability to work with TBs of data.

Where current systems fall short- 

The key challenges with current architectures –

  1.  A high degree of Data is duplicated from system to system leading to multiple inconsistencies at the summary as well as transaction levels. Because different groups perform different risk reporting functions (e.g Credit and Market Risk) – the feeds, the ingestion, the calculators end up being duplicated as well.
  2. Traditional Risk algorithms cannot scale with this explosion of data as well as the heterogenity inherent in reporting across multiple kinds of risks. E.g Certain kinds of Credit Risk need access to around 200 days of historical data where one is looking at the probability of the counter-party defaulting & to obtain a statistical measure of the same. The latter are highly computationally intensive.
  3. Risk Model and Analytic development needs to be standardized to reflect realities post BCBS 239.

  4. The Volcker Rule aims to ban prop trading activity on part of the Banks. Banks must now report on seven key metrics across 10s of different data feeds across PB’s of data.

Hadoop to the Rescue –

Since the financial crisis of 2008, open source software offerings have immensely matured with compelling functionality in terms of scalability & governance. Hadoop ,which is really an ecosystem of 30+ projects, has been reimagined twice and now forms the backbone of any enterprise grade innovative data management project.

Hadoop Based Target State Architecture of a Risk Data Aggregation Project –

The overall goal is to create a cross company data-lake containing all cross asset data in one place as depicted in the below graphic.


1) Data Ingestion: This encompasses creation of the L1 loaders to take in Trade, Loan, Payment and Wire Transfer data. Developing this portion will be the first step to realizing the overall architecture as timely data ingestion is a large part of the problem at most institutions. Part of this process includes understanding examples of a) data ingestion from the highest priority of systems b) apply the correct governance rules to the data. The goal is to create these loaders for versions of different systems (e.g Calypso 9.x) and to maintain it as part of the platform moving forward. The first step is to understand the range of Book of Record transaction systems (lending, payments and transactions) and the feeds they send out. The goal would be to create the mapping to a release of an enterprise grade Open Source Big Data Platform e.g HDP (Hortonworks Data Platform) to the loaders so these can be maintained going forward.

2) Data Governance: These are the L2 loaders that apply the rules to the critical fields for Risk and Compliance. The goal here is to look for gaps in the data and any obvious quality problems involving range or table driven data. The purpose is to facilitate data governance reporting.

3) Entity Identification: This is the establishment and adoption of a lightweight entity ID service. The service will consist of entity assignment and batch reconciliation. The goal here is to get each target bank to propagate the Entity ID back into their booking and payment systems, then transaction data will flow into the lake with this ID attached providing a way to do customer 360.

4) Developing L3 loaders: This will involve defining the transformation rules that are required in each risk, finance and compliance area to prep the data for their specific processing.

5) Analytic Definition: Defining the analytics that are to be used for each risk and compliance area.

6) Report Definition: Defining the reports that are to be issued for each risk and compliance area.

How has ScotiaBank’s experience been?  (Reproduced from the Article)

As at many banks, Zerbs (co-CIO) says BCBS 239 triggered a fundamental rethink at Scotiabank about how to bring data together to meet the requirements of the regulation. The main goal was to develop a solution that would be scalable and repeatable in other aspects of the organization.

What Zerbs says Scotiabank wasn’t interested in was developing a solution that would need to be overhauled over and over again.

“You start a project where you extract the data in exactly the form that was required. Time goes by. By the time you’re done, the business says, ‘By the way, now that I think about it, I need another five data attributes.’ And you’re saying, ‘Well you should have told me earlier, because now it’s a six-month change project and it’s going to cost you more, on the magnitude of hundreds of thousands of dollars,'” says Zerbs. “That’s the kind of vicious circle that repeats itself too often. So we thought: How do we avoid that?”

Apache Hadoop seemed like the obvious choice, according to Zerbs. At the time, though, the bank didn’t have a lot of experience with the big data application. Zerbs says Scotia decided to use another pressing regulation ─ Volcker Rule compliance ─ as a test case for its first product application on the Hadoop platform.

The Volcker Rule, part of the US Dodd-Frank Act, requires banks to delineate client-related trading from proprietary trading. Regulators look at several metrics to decipher whether something is considered proprietary or client-related. Client metrics are also looked at, including how much inventory needs to be held to satisfy client demands in areas the firm is considered a market-maker.

Because Scotia’s prop-trading activity was already fairly small relative to its overall size, the project was deemed more manageable than jumping in with BCBS 239. And due to Volcker having what Zerbs describes as a “fuzzy set of requirements,” the scalability and reusability of a big data solution seemed like the perfect option and chance to test the waters in an area that Scotiabank lacked experience.

The bank had less than six months to go from conceptualization to initial production to meet the first deadline, which passed on July 21. Zerbs says it’s an example of bringing together the customer, risk, and technology focus all at once to create a solution that has multiple benefits to the entire organization.


Adopting fresh & new age approaches that leverage the best in Big Data can result  in –

1. Improved insight and a higher degree of transparency in business operations and capital allocation

2. Better governance procedures and policies that can help track risks down to the individual transaction level & not just at the summary level

3.  Streamlined processes across the enterprise and across different banking domains. Investment banking, Retail & Consumer, Private Banking etc.

Indeed, Lines of Businesses (LOB’s) can drive more profitable products & services once they understand their risk exposures better. Capital can only be allocated more efficiently, better road-maps created in support of business operations instead of constant fire-fighting, regulatory heartburn and concomitant bad press.Another trend as it is evident now is the creation of consortiums in banking to solve the most common challenges – Risk, Compliance, Fraud Detection which in and of themselves result in no tangible topline growth.

Let us examine that in a followup post.

2 thoughts on “How a Pioneering Bank leverages Hadoop for Enterprise Risk Data Aggregation & Reporting..”

  1. I wanted to thank you for this wonderful article on RDA!!
    I absolutely enjoyed every bit of it. I have got you book-marked to check out new things you post…

Leave a Reply

Your email address will not be published. Required fields are marked *