Why Banks, Payment Providers and Insurers Should Digitize Their Risk Management..

When models turn on, brains turn off.” – Dr. Til Schuermann, Formerly Research Officer in the Banking Studies function at the Federal Reserve Bank of New York.Currently Partner at Oliver Wyman & Company.

There exist two primary reasons for Enterprises such as Banks, Insurers, Payment Providers and FinTechs to pursue best in class Risk Management Processes and Platforms. The first need is compliance driven by various regulatory reporting mandates such as the Basel Reporting Requirements, the FRTB, the Dodd‐Frank Act, Solvency II, CCAR and CAT/MiFiD II in the United States & the EU. The second reason is the need to drive top-line sales growth leveraging using Digital technology. This post advocates the implementation of Digital Technology on Risk Management across both the areas.

Image Credit – Digital Enterprise

Recapping the Goals of Regulatory Reform..

There are many kinds of Risk, ranging from the three keystone kinds – Credit, Market and Operational to the Basel-II.5/III accords, FRTB, Dodd Frank etc. The best enterprises not only manage Risk well but they also turn it into a source of competitive advantage. Leading banks have recognized this and according to McKinsey forecasts, while risk-operational processes such as credit administration today account for the majority of the some (50 percent) of the Risk function’s staff, and analytics just 15 percent, by 2025 those figures will be around 25 percent and 40 percent respectively. [1]

Whatever be the kind of Risk, certain themes are common from a regulatory intention standpoint-

  1. Limiting risks that may cause wider harm to the economy by restricting certain activities such as preventing banks with retail operations from engaging in proprietary trading activities
  2. Requiring that banks increase the amount of and quality of capital held on reserve to back their assets and by requiring higher liquidity positions
  3. Ensuring that banks put in place appropriate governance standards ensuring that boards and management interact not just internally but also with regulators and their clients
  4. Upgrading governance standards, enabling a fundamental change in bank governance and the way boards interact with both management and regulators. These ambitions were expressed in various new post‐crisis rules and approaches.
  5.  Tackle the “too big to fail” challenge for highly complex businesses spanning multiple geographies, product lines and multifaceted customer segments. Accurate risk reporting ensures adequate capital conservation buffers.

Beyond the standard models used for Risk regulatory reporting, Banks & FinTechs are pushing the uses of risk modeling to new areas such as retail lending, SME lending. Since the crisis of 2008, new entrants have begun offering alternatives to traditional financial services in areas such as payments, mortgage loans, cryptocurrency, crowdfunding, alternative lending, and Investment management. The innovative use of Risk analytics lies at the core of the FinTechs success.

Across these areas, risk models are being leveraged in diverse areas such as marketing analytics to gain customers, defend against competition etc. For instance, realtime analytic tools are also being used to improve the credit granting processes. The intention is to gain increased acceptance by pre-approving qualified customers quickly without the manual intervention that can cause weeks of delays. Again, according to McKinsey, the goals of leading Banks are to approve up to 90 percent of consumer loans in seconds, generate efficiencies of 50 percent leading to revenue increases of 5 to 10 percent. Thus, leading institutions are using Risk Analytics to rethink their business models and to expand their product portfolios. [2]

Over the last two years, this blog has extensively covered areas such as cyber security, fraud detection, anti money laundering (AML) etc from a data analytics standpoint. The industry has treated Risk as yet another defensive function but over the next 10 years, it is expected that the Risk function will be an integral part of all of these above areas thus driving business revenue growth & detecting financial fraud, crimes. There is no doubt that Risk is a true cross cutting concern across a range of business functions & not just the traditional Credit, Market, Liquidity and Operational silos. Risk strategy needs to be a priority at the highest levels of an organization.

The Challenges with Current Industry Risk Architectures..

Almost an year ago, we discussed these technology issues in the below blogpost. To recap – most industry players have a mishmash of organically developed & shrink wrapped IT systems. These platforms run critical Core Banking Applications to Trade Lifecycle to Securities Settlement to Financial Reporting etc.  Each of these systems operates in an application, workflow, data silo with it’s own view of the enterprise. These are all kept in sync largely via data replication & stove piped process integration. Further siloed risk functions ensure that different risk reporting applications are developed using duplicative technology paradigms causing massive IT spend. Further, the preponderance of complex vendor supplied systems ensures lengthy release cycles and complex data center deployment requirements.

The Five Deadly Sins of Financial Services IT..

Industry Risk Architectures Suffer From Five Limitations

 A Roadmap for Digitization of Risk Architectures..

The end state or how a Digital Risk function will look like will vary for every institution embarking on this journey. There are six foundational elements we can still point out a few guideposts based on the .

#1 Automate Back & Mid Office Processes Across Risk and Compliance  –

As discussed, Many business processes across the front, mid and back office involve risk management. These processes range from risk data aggregation, customer on boarding, loan approvals, regulatory compliance (AML,KYC, CRS & FATCA), enterprise financial reporting  & Cyber Security.It is critical to move all and any manual steps from these business functions to a highly automated model. Doing so will not only reduce operational costs in a huge way but also demonstrate substantial auditability capabilities to regulatory authorities.

#2 Design Risk Architectures to handle Real time Data Feeds –

A critical component of Digital Risk is the need to incorporate real time data feeds across Risk applications. While Risk algorithms have traditionally dealt with historical data, new regulations such as FRTB explicitly call for various time horizons. These imply that Banks  to run a full spectrum of analytics across many buckets on data seeded from real time interactions. While the focus has been on the overall quality and auditability of data, the real time requirement is critical as one moves from front office applications such as customer on boarding, loan qualifications & pre-approvals to  key areas such as  market, credit and liquidity risks. Why is this critical? We have discussed the need for real time decision making insights for business leaders. Understanding risk exposures and performing root cause analysis in real time is a huge business capability for any Digital Enterprise.

#3 Experiment with Advanced Analytics and Machine Learning 

In response to real time risk reporting, the analytics themselves will be begin to get considerably more complex. This technology complexity will only be made more difficult with multiple teams working on all of these areas. This calls out for standardization of the calculations themselves across the firm. This also implies that from an analytics standpoint, a large number of scenarios on a large volume of data.  For Risk to become truly a digital practice, the innovative uses of Data Science across areas such as customer segmentation, fraud detection, social graph analysis must all make their way into risk management. Insurance companies and Banks are already deploying self learning algorithms in applications that deal with credit underwriting, employee surveillance and fraud detection. Wealth Managers are deploying these in automated investment advisory.  Thus, machine learning will support critical risk influenced areas such as Loan Underwriting, Credit Analytics, Single view of risk etc. All of these areas will need to leverage predictive modeling leading to better business decisions across the board.

#4 Technology Led Cross Organization Collaboration –

McKinsey predicts [1] that in the coming five to ten years, different regulatory ratios such as capital, funding, leverage, total loss-absorbing capacity etc will drive  the composition of the balance sheet to support profitability. Thus the risk function will work with finance and strategy functions to help optimize the enterprise balance sheet across various economic scenarios and then provide executives with strategic choices (e.g. increase or shrink a loan portfolio, for example), and likely regulatory impacts across these scenarios. Leveraging analytical optimization tools, an improvement on return on equity (ROE) by anywhere between 50 and 400 basis points has been forecasted.

The Value Drivers in Digitization of Risk Architectures..

McKinsey contends that the automation of credit processes and the digitization of the key steps in the credit value chain can yield cost savings of up to 50 percent. The benefits of digitizing credit risk go well beyond even these improvements. Digitization can also protect bank revenue, potentially reducing leakage by 5 to 10 percent. [2]

To give an example, by putting in place real-time credit decision making in the front line, banks reduce the risk of losing creditworthy clients to competitors as a result of slow approval processes. Additionally, banks can generate credit leads by integrating into their suite of products new digital offerings from third parties and Fintech’s, such as unsecured lending platforms for business. Finally, credit risk costs can be further reduced through the integration of new data sources and the application of advanced-analytics techniques. These improvements generate richer insights for better risk decisions and ensure more effective and forward-looking credit risk monitoring. The use of machine-learning techniques, for example, can help banks improve the predictability of credit early-warning systems by up to 25 percent [2].

The Questions to Ask at the Start of Risk Transformation..

There are three questions at this phase every Enterprise needs to ask at the outset –

  • What customer focused business capabilities can be enabled across the organization by incorporating an understanding of the various kinds of Risk ?
  • What aspects of this Risk transformation can be enabled by digital technology? Where are the current organizational and technology gaps that inhibit innovation?
  • How do we measure ROI and Business success across these projects before and after the introduction of ? How do we benchmark ourselves from a granular process standpoint against the leaders?

Conclusion..

As the above makes it clear, traditional legacy based approaches to risk data management reporting do not lend themselves well to managing your business effectively. When things are going well it has become very difficult for executives and regulators to get a good handle on how the business is functioning. In the worst of times, the risk function can fail to function well as models do not perform effectively.  It is not enough to take an incremental approach to improving current analytics approaches. The need of the hour is to incorporate the state of the art data management and analytic approaches based on Big Data, Machine Learning and Artificial Intelligence.

References

How Big Data & Advanced Analytics can help Real Estate Investment Trusts (REITS)

                                                         Image Credit – Kiplinger’s

Introduction…

Real Estate Investment Trust’s (REITS) are financial companies that own various forms of commercial and residential real estate. These assets include office buildings, retail shopping centers, hospitals, warehouses, timberland and hotels etc. Real estate is growing quite nicely as a component of the global financial business. Given their focus on real estate investments, REITS have always occupied a specialized position in global finance.

Fundamentally, there are three types of REITS –

  1. Equity REITS which exclusively deal in acquiring, improving and selling properties with the aim of higher returns for their investors
  2. Mortgage REITS only buy and sell mortgages
  3. Hybrid REITS which do both #1 and #2 above

REITS have a reasonably straightforward business model – you take the yields from the properties you own and reinvest the funds to be able to pay your investors (a mandated 95% of dividends). Most of the traditional REIT business processes are well handled by conventional types of technology. However more and more REITs are being challenged to develop a compelling Big Data strategy that leverages their tremendous data assets. 

The Five Key Big Data Applications for REITS… 

Let us consider at the five key areas where advanced analytics built on a Big Data foundation can immensely help REITS.

#1 Property Acquisition Modeling 

REITS owners can leverage the rich datasets available around renters demographics, preferences, seasonality, economic conditions in specific markets to better guide capital decisions on acquiring property. This modeling needs to take into account land costs, development costs, fixture costs & any other sales and marketing costs to appeal to tenants. I’d like to call this macro business perspective. Also from a micro business perspective, being able to better study individual properties using a variety of widely available data – MLS listings for similar properties, foreclosures, closeness to retail establishments, work sites, building profiles, parking spaces, energy footprint etc can help them match tenants to their property holdings. All this is critical to getting their investment mix right to meet profitability targets.

                                  Click on the Image for a blogpost discussing Predictive Analytics in Capital Markets

#2 Portfolio Modeling 

REITS can leverage Big Data to perform more granular modeling of their MBS portfolios. As an example, they can feed in a lot more data into their existing models as discussed above. E.g.  Demographic data, macroeconomic factors et al.  

A simple scenario would be if Interest Rates go up by X basis points – what does that mean for my portfolio exposure, Default Rate, Cost Picture, Optimal times to buy certain MBS’s etc ?  REITS can then use that info to enter hedges etc to protect against any downside. Big Data can also help with a range of predictive modeling across all of the above areas as discussed below.  An example is to build a 360 degree view of a given investment portfolio.

                                                         Click on Image for a Customer 360 discussion 

#3 Risk Data Aggregation & Calculations 

The instruments underlying the portfolios themselves carry large amounts of credit & interest rate risk. Big Data is a fantastic platform for aggregating and calculating many kinds of risk exposures as the below link discuss in detail. 

  

                                            Click on Image for a discussion of Risk Data Aggregation and Measurement 

 

#4 Detect and Prevent Money Laundering (AML)

Due to the global nature of investment funds flowing into real estate, REITS are highly exposed to money laundering and sanctions risks. Whether or not REITS operate in high risk geographies (India,China, South America, Russia etc) or have complex holding structures – they need to file SAR (Suspicious Activity Reports) with the FinCEN.  There has always been a strong case to be made that shady foreign entities and individuals were laundering ill gotten proceeds to buy US real estate. In early 2016, the FinCEN began implementing Geographic Targeting Orders (GTOs). Title companies based in the United States are now required to clearly identify the real owners of either limited liability companies (LLCs) or any other partnerships, and other legal entities being used to purchase high end residential real estate using cash.

AML as a topic is covered exhaustively in the below series of blogposts (please click on image to open the first one).

                                                         Click on Image for a Deepdive on AML

#5 Smart Cities, Net New Investments and Property Management

In the future, REITS would want to invest in Smart Cities which are positioned to be leading urban centers offering mobility, green technology, personalized medicine, safe services, clean water, traffic management and other forward looking urban amenities. These Smart Cities target a new kind of client- upwardly mobile, technologically savvy, environment conscious millenials. According to RBC Capital Markets, Smart Cities presents a massive investment opportunity for REITS. Such investments could provide REITS offering income yields of around 10-20%. (Source – Ben Forster @ Schroeders).

Smart Cities will be created using a number of high end technologies such as IoT, AI, Virtual Reality, Device Meshes etc. By 2020, it is estimated that these buildings will be generating an enormous amount of data that needs to be stored and analyzed by landlords.

As the below graphic from Cisco attests, the ability to work with IoT data to analyze a range of these micro investment opportunities is a Big Data challenge.

The ongoing maintenance and continuous refurbishment of rental properties is a large portion of the business operation of a REIT. The availability of smart sensors and such IoT devices that can track air quality, home appliance malfunction etc can help greatly with preventive maintenance.

Conclusion..

As can be seen from some of the above business areas, most REITS data needs require a holistic approach across the value chain (capital sourcing, investment decisions, portfolio management & operations). This approach spans various horizontal functions like Customer Segmentation, Property Acquisition, Risk, Finance and Business Operations.
The need of the hour for larger REITS is to move to a common model for data storage, model building and testing.  It is becoming increasingly obvious that Big Data can provide massive business opportunities for REITS.

A POV on the FRTB (Fundamental Review of the Trading Book)…

Regulatory Risk Management evolves…

The Basel Committee of supranational supervision was put in place to ensure the stability of the financial system. The Basel Accords are the frameworks that essentially govern the risk taking actions of a bank. To that end, minimum regulatory capital standards are introduced that banks must adhere to. The Bank of International Settlements (BIS) established  1930, is the world’s oldest international financial consortium. with 60+ member central banks, representing countries from around the world that together make up about 95% of world GDP. BIS stewards and maintains the Basel standards in conjunction with member banks.

The goal of Basel Committee and the Financial Stability Board (FSB) guidelines are to strengthen the regulation, supervision and risk management of the banking sector by improving risk management and governance. These have taken on an increased focus to ensure that a repeat of financial crisis 2008 comes to pass again. Basel III (building upon Basel I and Basel II) also sets new criteria for financial transparency and disclosure by banking institutions.

Basel III – the last prominent version of the Basel standards published in 2012 (named for the town of Basel in Switzerland where the committee meets) prescribes enhanced measures for capital & liquidity adequacy and were developed by the Basel Committee on Banking Supervision with voluntary worldwide applicability.  Basel III covers credit, market, and operational risks as well as liquidity risks. As this is known, BCBS 239 –  guidelines do not just apply to the G-SIBs (the Globally Systemically Important Banks) but also to the D-SIBs (Domestic Systemically Important Banks).Any important financial institution deemed ‘too big to fail” needs to work with the regulators to develop a “set of supervisory expectations” that would guide risk data aggregation and reporting.

Basel III & other Risk Management topics were covered in these previous posts – http://www.vamsitalkstech.com/?p=191 && http://www.vamsitalkstech.com/?p=667

Enter the FTRB (Fundamental Review of the Trading Book)…

In May 2012, the Basel Committee on Banking Supervision (BCBS) again issued a consultative document with an intention of revising the way capital was calculated for the trading book. These guidelines which can be found here in their final form [1] were repeatedly refined based on comments from various stakeholders & quantitative studies. In Jan 2016, a final version of this paper was released. These guidelines are now termed  the Fundamental Review of the Trading Book (FRTB) or unofficially as some industry watchers have termed – Basel IV. 

What is new with the FTRB …

The main changes the BCBS has made with the FRTB are – 

  1. Changed Measure of Market Risk – The FRTB proposes a fundamental change to the measure of market risk. Market Risk will now be calculated and reported via Expected Shortfall (ES) as the new standard measure as opposed to the venerated (& long standing) Value At Risk (VaR).  As opposed to the older method of VaR with a 99% confidence level, expected shortfall (ES) with a 97.5% confidence level is proposed. It is to be noted that for normal distributions, the two metrics should be the same however the ES is much superior at measuring the long tail. This is a recognition that in times of extreme economic stress, there is a tendency for multiple asset classes to move in unison. Consequently, under the ES method capital requirements are anticipated to be much higher.
  2. Model Creation & Approval – The FRTB also changes how models are approved & governed.  Banks that want to use the IMA (Internal Model Approach) need to pass  a set of rigorous tests so that they are not forced to used the Standard Rules approach (SA) for capital calculations. The fear is that the SA will increase capital requirements. The old IMA approach has now been revised and made more rigorous in a way that it enables supervisors to remove internal modeling permission for individual trading desks. This approach now enforces more consistent identification of material risk factors across banks, and constraints on hedging and diversification. All of this is now going to be done at a desk level instead of the entity level. FRTB moves the responsibility of showing compliant models, significant backtesting & PnL attribution to the desk level.
  3. Boundaries between the Regulatory Books – The FRTB also assigns explicit boundaries between the trading book (the instruments the bank intends to trade) and the bank book (the instruments held to maturity). These rules have been redefined in such a way that banks now have to contend with stringent rules for internal transfers between both. The regulatory motivation is to eliminate a given bank’s ability to designate individual positions as belonging to either book. Given the different accounting treatment for both, there is a feeling that bank’s were resorting to capital arbitrage with the goal of minimizing regulatory capital reserves. The FRTB also introduces more stringent reporting and data governance requirements for both which in conjunction with the well defined boundary between books. All of these changes should lead to a much better regulatory framework & also a revaluation of the structure of trading desks. 
  4. Increased Data Sufficiency and Quality – The FRTB regulation also introduces Non-Modellable risk factors (NMRF). Risk factors are non modellabe if certain aspects that pertain to the availability and sufficiency of the data are an issue . Thus with the NMRF, Banks now need increased data sufficiency and quality requirements that go into the model itself. This is a key point, the ramifications of which we will discuss in the next section.
  5. The FRTB also upgrades its standardized approach to data structuring – with a new standardized approach (SBA) which is more sensitive to various risk factors across different asset classes as compared to the Basel II SA. Regulators now determine the sensitivities in the data. Approvals will also be granted at the desk level rather than at the entity level.  The revised SA should provide a consistent way to measure risk across geographies and regions, giving regulatory a better way to compare and aggregate systemic risk. The sensitivities based approach should also allow banks to share a common infrastructure between the IMA approach and the SA approach. Thera are a set of buckets and risk factors that are prescribed by the regulator which instruments can then be mapped to.
  6. Models must be seeded with real and live transaction data – Fresh & current transactions will now need to be entered into the calculation of capital requirements as of the date on which they were conducted. Not only that, though reporting will take place at regular intervals, banks are now expected to manage market risks on a continuous basis -almost daily.
  7. Time Horizons for Calculation – There are also enhanced requirements for data granularity depending on the kind of asset. The FRTB does away with the generic 10 day time horizon for market variables in Basel II to time periods based on liquidity of these assets. It propose five different time horizons – 10 day, 20 day, 60 day, 120 day and 250 days.

FRTB_Horizons

                                 Illustration: FRTB designated horizons for market variables (src – [1])

To Sum Up the FRTB… 

The FRTB rules are now clear and they will have a profound effect on how market risk exposures are calculated. The FRTB clearly calls out the specific instruments in the trading book vs the banking book. With the new switch over to Expected Shortfall (ES) @ 97.5% over VaR @ 99% confidence levels – it should cause increased reserve requirements. Furthermore, the ES calculations will be done keeping liquidity considerations of the underlying instruments with a historical simulation approach ranging from 10 days to 250 days of stressed market conditions. Banks that use a pure IMA approach will now have to move to IMA plus the SA method.

The FRTB compels Banks to create unified teams from various departments – especially Risk, Finance, the Front Office (where trading desks sit) and Technology to address all of the above significant challenges of the regulation.

From a technology capabilities standpoint, the FRTB now presents banks with both a data volume, velocity and analysis challenge. Let us now examine the technology ramifications.

Technology Ramifications around the FRTB… 

The FRTB rules herald a clear shift in how IT architectures work across the Risk area and the Back office in general.

  1. The FRTB calls for a single source of data that pulls data across silos of the front office, trade data repositories, a range of BORT (Book of Record Transaction Systems) etc. With the FRTB, source data needs to be centralized and available in one location where every feeding application can trust it’s quality.
  2. With both the IMA and the SBA in the FRTB, many more detailed & granular data inputs (across desks & departments) need to be fed into the ES (Expected Shortfall) calculations from varying asset classes (Equity, Fixed Income, Forex, Commodities etc) across multiple scenarios. The calculator frameworks developed or enhanced for FRTB will need ready & easy access to realtime data feeds in addition to historical data. At the firm level, the data requirements and the calculation complexity will be even more higher as it needs to include the entire position book.

  3. The various time horizons called out also increase the need to run a full spectrum of analytics across many buckets. The analytics themselves will be more complex than before with multiple teams working on all of these areas. This calls out for standardization of the calculations themselves across the firm.

  4. Banks will have to also provide complete audit trails both for the data and the processes that worked on the data to provide these risk exposures. Data lineage, audit and tagging will be critical.

  5. The number of runs required for regulatory risk exposure calculations will dramatically go up under the new regime. The FRTB requires that each risk class be calculated separately from the whole set. Couple this with increased windows of calculations as discussed  in #3 above- means that more compute processing power and vectorization.

  6. FRTB also implies that from an analytics standpoint, a large number of scenarios on a large volume of data. Most Banks will need to standardize their libraries across the house. If Banks do not look to move to a Big Data Architecture, they will incur tens of millions of dollars in hardware spend.

The FRTB is the most pressing in a long list of Data Challenges facing Banks… 

The FRTB is yet another regulatory mandate that lays bare the data challenges facing every Bank. Current Regulatory Risk Architectures are based on traditional relational databases (RDBMS) architectures with 10’s of feeds from Core Banking Systems, Loan Data, Book Of Record Transaction Systems (BORTS) like Trade & Position Data (e.g. Equities, Fixed Income, Forex, Commodities, Options etc),  Wire Data, Payment Data, Transaction Data etc. 

These data feeds are then tactically placed in memory caches or in enterprise data warehouses (EDW). Once the data has been extracted, it is transformed using a series of batch jobs which then prepare the data for Calculator Frameworks to which run the risk models on them. 

All of the above applications need access to medium to large amounts of data at the individual transaction Level. The Corporate Finance function within the Bank then makes end of day adjustments to reconcile all of this data up and these adjustments need to be cascaded back to the source systems down to the individual transaction or classes of transaction levels. 

These applications are then typically deployed on clusters of bare metal servers that are not particularly suited to portability, automated provisioning, patching & management. In short, nothing that can automatically be moved over at a moment’s notice. These applications also work on legacy proprietary technology platforms that do not lend themselves to flexible & a DevOps style of development.

Finally, there is always need for statistical frameworks to make adjustments to customer transactions that somehow need to get reflected back in the source systems. All of these frameworks need to have access to and an ability to work with terabtyes (TBs) of data.

Each of above mentioned risk work streams has corresponding data sets, schemas & event flows that they need to work with, with different temporal needs for reporting as some need to be run a few times in a day (e.g. Traded Credit Risk), some daily (e.g. Market Risk) and some end of the week (e.g Enterprise Credit Risk). 

One of the chief areas of concern is that the FRTB may require a complete rewrite of analytics libraries. Under the FRTB, front office libraries will need to do Enterprise Risk –  a large number of analytics on a vast amount of data. Front office models cannot make all the assumptions that enterprise risk can to price a portfolio accurately. Front office systems run a limited number of scenarios thus trading off timeliness for accuracy – as opposed to enterprise risk.

Most banks have stringent vetting processes in place and all the rewritten analytic assets will need to be passed through that. Every aspect of the math of the analytics needs to be passed through this vigorous process. All of this will add to compliance costs as vetting process costs typically cost multiples of the rewrite process. The FRTB has put in place stringent model validation standards along with hypothetical portfolios to benchmark these.

The FRTB also requires data lineage and audit capabilities for the data. Banks will need to establish visual representation of the overall process as data flows from the BORT systems to the reporting application. All data assets have to be catalogued and a thorough metadata management process instituted.

What Must Bank IT Do… 

Given all of the above data complexity and the need to adopt agile analytical methods  – what is the first step that enterprises must adopt?

There is a need for Banks to build a unified data architecture – one which can serve as a cross organizational repository of all desk level, department level and firm level data.

The Data Lake is an overarching data architecture pattern. Lets define the term first. A data lake is two things – a small or massive data storage repository and a data processing engine. A data lake provides “massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs“. Data Lake are created to ingest, transform, process, analyze & finally archive large amounts of any kind of data – structured, semistructured and unstructured data.

The Data Lake is not just a data storage layer but one that can allow different users (traders, risk managers, compliance etc) plug in calculators that work on data that spans intra day activity as well as data across years. Calculators can then be designed that can work on data with multiple runs to calculate Risk Weighted Assets (RWAs) across multiple calibration windows.

The below illustration is a depiction of goal is to create a cross company data-lake containing all asset data and compute applied to the data.

RDA_Vamsi

                              Illustration – Data Lake Architecture for FRTB Calculations

1) Data Ingestion: This encompasses creation of the L1 loaders to take in Trade, Position, Market, Loan, Securities Master, Netting  and Wire Transfer data etc across trading desks. Developing the ingestion portion will be the first step to realizing the overall architecture as timely data ingestion is a large part of the problem at most institutions. Part of this process includes understanding examples of a) data ingestion from the highest priority of systems b) apply the correct governance rules to the data. The goal is to create these loaders for versions of different systems (e.g Calypso 9.x) and to maintain it as part of the platform moving forward. The first step is to understand the range of Book of Record transaction systems (lending, payments and transactions) and the feeds they send out. The goal would be to create the mapping to a release of an enterprise grade Open Source Big Data Platform e.g HDP (Hortonworks Data Platform) to the loaders so these can be maintained going forward.

2) Data Governance: These are the L2 loaders that apply the rules to the critical fields for Risk and Compliance. The goal here is to look for gaps in the data and any obvious quality problems involving range or table driven data. The purpose is to facilitate data governance reporting.

3) Entity Identification: This step is the establishment and adoption of a lightweight entity ID service. The service will consist of entity assignment and batch reconciliation.

4) Developing L3 loaders: This phase will involve defining the transformation rules that are required in each risk, finance and compliance area to prep the data for their specific processing.

5) Analytic Definition: Running the analytics that are to be used for FRTB.

6) Report Definition: Defining the reports that are to be issued for each risk and compliance area.

References..

[1] https://www.bis.org/bcbs/publ/d352.pdf

The Five Deadly Sins of Financial Services IT..

THE STATE OF GLOBAL FINANCIAL SERVICES IT ARCHITECTURE…

This blog has time & again discussed how Global, Domestic and Regional banks need to be innovative with their IT platform to constantly evolve their product offerings & services. This is imperative due to various business realities –  the increased competition by way of the FinTechs, web scale players delivering exciting services & sharply increasing regulatory compliance pressures. However, systems and software architecture has been a huge issue at nearly every large bank across the globe.

Regulation is also afoot in parts of the globe which will give non traditional banks access to hitherto locked customer data. E.g PSD-2 in the European Union. Further, banking licenses have been more easily granted to non-banks which are primarily technology pioneers. e.g. Such as a Paypal, Square etc

In 2016, Banks are waking up to the fact that IT Architecture is a critical strategic differentiator. Players that have agile & efficient architecture platforms and practices can not only add new service offerings but also are able to experiment across a range of analytic led offerings that create & support multi-channel products. These digital services and usecases can now be found abundantly areas ranging from Retail Banking, Capital Markets. FinTechs have innovated in areas such as Payments & Wealth Management.

So, How did we get here…

The Financial Services IT landscape – no matter the segment – one picks across the spectrum – Capital Markets, Retail & Consumer Banking, Payment Networks & Cards, Asset Management etc are all largely predicated on a few legacy technology anti-patterns. These anti-patterns have evolved over the years from a systems architecture, data architecture & middleware standpoint.

These have resulted in a mishmash of organically developed & shrink wrapped systems that do everything from running critical Core Banking Applications to Trade Lifecycle to Securities Settlement to Financial Reporting etc.  Each of these systems operates in an application, workflow, data silo with it’s own view of the enterprise. These are all kept in sync largely via data replication & stove piped process integration.

If this sounds too abstract, let us take an example &  a rather topical one at that. One of the most critical back office functions every financial services organization needs to perform is Risk Data Aggregation & Regulatory Reporting (RDARR). This spans areas from Credit Risk, Market Risk, Operational Risk , Basel III, Solvency II etc..the list goes on.

The basic idea in any risk calculation is to gather a whole range of quality data in one place and to run computations to generate risk measures for reporting.

So, how are various risk measures calculated currently? 

Current Risk Architectures are based on traditional relational databases (RDBMS) architectures with 10’s of feeds from Core Banking Systems, Loan Data, Book Of Record Transaction Systems (BORTS) like Trade & Position Data (e.g. Equities, Fixed Income, Forex, Commodities, Options etc),  Wire Data, Payment Data, Transaction Data etc. 

These data feeds are then tactically placed in memory caches or in enterprise data warehouses (EDW). Once relevant data has been extracted, it is transformed using a series of batch jobs. These jobs which then prepare the data for Calculator Frameworks to which run their risk models  across hundreds of scenarios. 

All of the above need access to large amounts of data at the individual transaction Level. The Corporate Finance function within the Bank then makes end of day adjustments to reconcile all of this data up and these adjustments need to be cascaded back to the source systems down to the individual transaction or classes of transaction levels. 

These applications are then typically deployed on clusters of bare metal servers that are not particularly suited to portability, automated provisioning, patching & management. In short, nothing that can automatically be moved over at a moment’s notice. These applications also work on legacy proprietary technology platforms that do not lend themselves to flexible & a DevOps style of development.

Finally, there is always need for statistical frameworks to make adjustments to customer transactions that somehow need to get reflected back in the source systems. All of these frameworks need to have access to and an ability to work with terabtyes (TBs) of data.

Each of above mentioned risk work streams has corresponding data sets, schemas & event flows that they need to work across. They also have different temporal needs for reporting. Some need to be run a few times in a day (e.g. Traded Credit Risk), some daily (e.g. Market Risk) and some end of the week (e.g Enterprise Credit Risk)

Five_Deadly_Sins_Banking_Arch

                          Illustration – The Five Deadly Sins of Financial IT Architectures

Let us examine why this is in the context of these anti-patterns as proposed below –

THE FIVE DEADLY SINS…

The key challenges with current architectures –

  1. Utter, total and complete lack of centralized Data leading to repeated data duplication  – In the typical Risk Data Aggregation application – a massive degree of Data is duplicated from system to system leading to multiple inconsistencies at the summary as well as transaction levels. Because different groups perform different risk reporting functions (e.g Credit and Market Risk) – the feeds, the ingestion, the calculators end up being duplicated as well. A huge mess, any way one looks at it. 
  2. Analytic applications which are not designed for throughput – Traditional Risk algorithms cannot scale with this explosion of data as well as the heterogeneity inherent in reporting across multiple kinds of risks. E.g Certain kinds of Credit Risk need access to around 200 days of historical data where one is looking at the probability of the counter-party defaulting & to obtain a statistical measure of the same. These models are highly computationally intensive and can run for days if the data architecture cannot scale in providing efficient compute on massive volumes of data. 
  3. Lack of Application Blueprint, Analytic Model & Data Standardization – There is nothing that is either SOA or microservices-like in most RDA applications and that precludes best practice development & deployment. All of this only leads to maintenance headaches. The reason that Cloud Computing based frameworks such a a PaaS (Platform as a Service) are highly elegant are that they enforce standardization of systems software components across the stack. Areas like Risk Model and Analytic development needs to be standardized to reflect realities post BCBS 239 (and the upcoming FRTB). With the Volcker Rule reporting that bans prop trading activity on part of the Banks, they must now report on seven key metrics across 10s of different data feeds across PB’s of data. Most existing Risk applications cannot do that without undertaking a large development and change management headache.
  4. Lack of Scalability –  It must be possible to operate it as a central system that can scale to carry the full load of the organization and operate with hundreds of applications built by disparate teams all plugged into the same central nervous system.One other factor to consider is the role of cloud computing in customer retention efforts. The analytical computational power required to understand insights from gigantic data sets is costly to maintain on an individual basis. The traditional owned data center will probably not disappear, but banks need to be able to leverage the power of the cloud to perform big data analysis in a cost-effective manner.
    EndFragment
  5. A Lack of Deployment Flexibility –  The application & data requirements dictate the deployment platforms. This massive anti pattern leads to silos and legacy OS’s that can not easily be moved to Containers like Docker & instantiated by a modular Cloud OS like OpenStack.

THE BUSINESS VALUE DRIVERS OF EFFICIENT ARCHITECTURES …

Doing IT Architecture right and in a responsive manner to the business results in critical value drivers that that are met & exceeded this transformation are – 

  1. Effective Compliance with increased Regulatory Risk mandates ranging from Basel – III, FTRB, Liquidity Risk – which demand flexibility of all the different traditional IT tiers.
  2. An ability to detect and deter fraud – Anti Money Laundering (AML) and Retail/Payment Card Fraud etc
  3. Fendoff competition from the FinTechs
  4. Exist & evolve in a multichannel world dominated by the millennial generation
  5. Reduced costs to satisfy pressure on the Cost to Income Ratio (CIR)
  6. The ability to open up data & services that operate on the customer data to other institutions

 A uniform architecture that works across of all these various types would seem a commonsense requirement. However, this is a major problem for most banks. Forward looking approaches that draw heavily from microservices based application development, Big Data enabled data & processing layers, the adoption of Message Oriented Middleware (MOM) & a cloud native approach to developing applications (PaaS) & deployment (IaaS) are the solution to the vexing problem of inflexible IT.

The question is if banks can change before they see a perceptible drop in revenues over the years?  

Capital Markets Pivots to Big Data in 2016

Previous posts in this blog have discussed how Capital markets firms must create new business models and offer superior client relationships based on their vast data assets. Firms that can infuse a data driven culture in both existing & new areas of operation will enjoy superior returns and raise the bar for the rest of the industry in 2016 & beyond. 

Capital Markets are the face of the financial industry to the general public and generate a large percent of the GDP for the world economy. Despite all the negative press they have garnered since the financial crisis of 2008, capital markets perform an important social function in that they contribute heavily to economic growth and are the primary vehicle for household savings. Firms in this space allow corporations to raise capital using the underwriting process. However, it is not just corporations that benefit from such money raising activity – municipal, local and national governments do the same as well. Just that the overall mechanism differs – while business enterprises issue both equity and bonds, governments typically issue bonds. According to the Boston Consulting Group (BCG), the industry will grow to annual revenues of $661 billion in 2016 from $593 billion in 2015 – a healthy 12% increase. On the buy side, the asset base (AuM – Assets under Management) is expected to reach around $100 trillion by 2020 up from $74 trillion in 2014.[1]

Within large banks, the Capital Markets group and the Investment Banking Group perform very different functions.  Capital Markets (CM) is the face of the bank to the street from a trading perspective.  The CM group engineers custom derivative trades that hedge exposure for their clients (typically Hedge Funds, Mutual Funds, Corporations, Governments and high net worth individuals and Trusts) as well as for their own treasury group.  They may also do proprietary trading on the banks behalf for a profit – although it is this type of trading that Volcker Rule is seeking to eliminate.

If a Bank uses dark liquidity pools (DLP) they funnel their Brokerage trades through the CM group to avoid the fees associated with executing an exchange trade on the street.  Such activities can also be used to hide exchange based trading activity from the Street.  In the past, Banks used to make their substantial revenues by profiting from their proprietary trading or by collecting fees for executing trades on behalf of their treasury group or other clients.

Banking and within it, capital markets continues to generate insane amounts of data. These producers range from news providers to electronic trading participants to stock exchanges which are increasingly looking to monetize data. And it is not just the banks, regulatory authorities like the FINRA in the US are processing peak volumes of 40-75 billion market events a day http://www.vamsitalkstech.com/?p=1157 [2]. In addition to data volumes, Capital Markets has always  possessed a variety challenge as well. They have tons of structured data around traditional banking data, market data, reference data & other economic data. You can then factor in semi-structured data around corporate filings,news,retailer data & other gauges of economic activity. An additional challenge now is the creation of data from social media, multimedia etc – firms are presented with significant technology challenges and business opportunities.

Within larger financial supermarkets, the capital markets group typically leads the way in  being forward looking in terms of adopting cutting edge technology and high tech spends.  Most of the compute intensive problems are generated out of either this group or the enterprise risk group. These groups own the exchange facing order management systems, the trade booking systems, the pricing libraries for the products the bank trades as well as the tactical systems that are used to manage their market and credit risks, customer profitability, compliance and collateral systems.  They typically hold about one quarter of a Banks total IT budget. Capital Markets thus has the largest number of use cases for risk and compliance.

Players across value chain on the buy side, the sell side, the intermediaries (stock exchanges & the custodians) & technology firms such as market data providers are all increasingly looking at leveraging these new data sets that can help unlock the value of data for business purposes beyond operational efficiency.

So what are the  different categories of applications that are clearly leveraging Big Data in production deployments.

CapMkts_UseCases

                      Illustration – How are Capital Markets leveraging Big Data In 2016

I have catalogued the major ones below based on my work with the majors in the spectrum over the last year.

  1. Client Profitability Analysis or Customer 360 view:  With the passing of the Volcker Rule, the large firms are now moving over to a model based on flow based trading rather than relying on prop trading. Thus it is critical for capital market firms to better understand their clients (be they institutional or otherwise) from a 360-degree perspective so they can be marketed to as a single entity across different channels—a key to optimizing profits with cross selling in an increasingly competitive landscape. The 360 view encompasses defensive areas like Risk & Compliance but also the ability to get a single view of profitability by customer across all of their trading desks, the Investment Bank and Commercial Lending.
  2. Regulatory Reporting –  Dodd Frank/Volcker Rule Reporting: Banks have begun to leverage data lakes to capture every trade intraday and end of day across it’s lifecycle. They are then validating that no proprietary trading is occurring on on the banks behalf.  
  3. CCAR & DFast Reporting: Big Data can substantially improve the quality of  raw data collected across multiple silos. This improves the understanding of a Bank’s stress test numbers.
  4. Timely and accurate risk management: Running Historical, stat VaR (Value at Risk) or both to run the business and to compare with the enterprise risk VaR numbers.
  5. Timely and accurate liquidity management:  Look at the tiered collateral and their liquidity profiles on an intraday basis to manage the unit’s liquidity.  They also need to look at credit and market stress scenarios and be able to look at the liquidity impact of those scenarios.
  6. Timely and accurate intraday Credit Risk Management:  Understanding when  & if  deal breaches a tenor bucketed limit before they book it.  For FX trading this means that you have about 9 milliseconds  to determine if you can do the trade.  This is a great place to use in memory technology like Spark/Storm and a Hadoop based platform. These usecases are key in increasing the capital that can be invested in the business.  To do this they need to convince upper management that they are managing their risks very tightly.
  7. Timely and accurate intraday Market Risk Management:  Leveraging Big Data to market risk computations ensures that Banks have a real time idea of any market limit breaches for any of the tenor bucketed market limits.
  8. Reducing Market Data costs: Market Data providers like Bloomberg, Thomson Reuters and other smaller agencies typically charge a fee each time data is accessed.  With a large firm, both the front office and Risk access this data on an ad-hoc fairly uncontrolled basis. A popular way to save on cost is to  negotiate the rights to access the data once and read it many times.  The key is that you need a place to put it & that is the Data Lake.
  9. Trade Strategy Development & Backtesting: Big Data is being leveraged to constantly backtest trading strategies and algorithms on large volumes of historical and real time data. The ability to scale up computations as well as to incorporate real time streams is key to
  10. Sentiment Based Trading: Today, large scale trading groups and desks within them have begun monitoring economic, political news and social media data to identify arbitrage opportunities. For instance, looking for correlations between news in the middle east and using that to gauge the price of crude oil in the futures space.  Another example is using weather patterns to gauge demand for electricity in specific regional & local markets with a view to commodities trading. The realtime nature of these sources is information gold. Big Data provides the ability to bring all these sources into one central location and use the gleaned intelligence to drive various downstream activities in trading & private banking.
  11. Market & Trade Surveillance:Surveillance is an umbrella term that usually refers to a wide array of trading practices that serve to distort securities prices thus enabling market manipulators to illicitly profit at the expense of other participants, by creating information asymmetry. Market surveillance is generally out by Exchanges and Self Regulating Organizations (SRO) in the US – all of which have dedicated surveillance departments set up for this purpose. However, capital markets players on the buy and sell side also need to conduct extensive trade surveillance to report up internally. Pursuant to this goal, the exchanges & the SRO’s monitor transaction data including orders and executed trades & perform deep analysis to look for any kind of abuse and fraud.
  12. Buy Side (e.g. Wealth Management) – A huge list of usecases I have catalogued here – https://dzone.com/articles/the-state-of-global-wealth-management-part-2-big-d 
  13. AML Compliance –  Covered in various blogs and webinars.
    http://www.vamsitalkstech.com/?s=AML
    https://www.boozallen.com/insights/2016/04/webinar-anti-money-laudering – 

The Final Word

A few tactical recommendations to industry CIOs:

  • Firstly, capital markets players should look to create centralized trade repositories for Operations, Traders and Risk Management.  This would allow consolidation of systems and a reduction in costs by providing a single platform to replace operations systems, compliance systems and desk centric risk systems.  This would eliminate numerous redundant data & application silos, simplify operations, reduce redundant quant work, improve and understanding of risk.
  • Secondly, it is important to put in place a model to create sources of funding for discretionary projects that can leverage Big Data.
  • Third, Capital Markets groups typically have to fund their portion of AML, Dodd Frank, Volcker Rule, Trade Compliance, Enterprise Market Risk and Traded Credit Risk projects.  These are all mandatory spends.  After this they typically get to tackle discretionary business projects. Eg- fund their liquidity risk, trade booking and tactical risk initiatives.  These defensive efforts always get the short end of the stick and are not to be neglected while planning out new initiatives.
  • Finally, an area in which a lot of current players are lacking is the ability to associate clients using a Lightweight Entity Identifier (LEI). Using a Big Data platform to assign logical and physical entity ID’s to every human and business the bank interacts can have salubrious benefits. Big Data can ensure that firms can do this without having to redo all of their customer on-boarding systems. This is key to achieving customer 360 views, AML and FATCA compliance as well as accurate credit risk reporting.

It is no longer enough for CIOs in this space to think of tactical Big Data projects, they must be thinking around creating platforms and ecosystems around those platforms to be able to do a variety of pathbreaking activities that generate a much higher rate of return.

References

[1] “The State of Capital Markets in 2016” – BCG Perspectives

Big Data architectural approaches to Financial Risk Mgmt..

Risk management is not just a defensive business imperative but the best managed banks deploy their capital to obtain the best possible business outcomes. The last few posts have more than set the stage from a business and regulatory perspective. This one will take a bit of a deep dive into the technology.

Existing data architectures are siloed with bank IT creating or replicating data marts or warehouses to feed internal lines of business. These data marts are then accessed by custom reporting applications thus replicating/copying data many times over which leads to massive data management & governance challenges.

Furthermore, the explosion of new types of data in recent years has put tremendous pressure on the financial services datacenter, both technically and financially, and an architectural shift is underway in which multiple LOBs can consolidate their data into a unified data lake.

Banking data architectures and how Hadoop changes the game

Most large banking infrastructures , on a typical day, process millions of derivative trades. The main implication is that there are a large number of data inserts and updates to handle. Once the data is loaded into the infrastructure there needs to be complex mathematical calculations that need to be done in near real time to calculate intraday positions. Most banks use techniques like Monte Carlo modeling and other computational simulations to build & calculate these exposures. Hitherto, these techniques were extremely expensive from both the cost of hardware and software needed to run them. Neither were tools & projects available that supported a wide variety of data processing paradigms – batch, interactive, realtime and streaming.

The Data Lake supports multiple access methods (batch, real-time, streaming, in-memory, etc.) to a common data set which is the unified repository of all financial data, it also enables users to transform and view data in multiple ways (across various schemas) and deploy closed-loop analytics applications that bring time-to-insight closer to real time than ever before.

DL

                                                 Figure 1 – From Data Silos to a Data Lake

Also, with the advent and widespread availability of Open Source software like Hadoop (I mean a full Hadoop platform ecosystem with Hortonworks Data Platform HDP, and it’s support of multiple computing frameworks like Storm, Spark, Kafka, MapReduce and HBase) which can turn a cluster of commodity x86 based servers into a virtual mainframe, cost is no longer a limiting factor. The Application ecosystem of a financial institution can now be a deciding factor in how data is created, ingested, transformed and exposed to consuming applications.

Thus clusters of inexpensive x86 servers running Linux and Hortonworks Data Platform (HDP) provide an extremely cost-effective environment for deploying and running simulations and stress tests.

arch2

                                                 Figure 2 – Hadoop now supports multiple processing engines

Finally, an HDP cluster with tools like Hadoop, Storm, and Spark is not limited to one purpose, like older dedicated-computing platforms. The same cluster you use for running stress tests can also be used for text mining, predictive analytics, compliance, fraud detection, customer sentiment analysis, and many many other purposes. This is a key point, once you can bring in siloed data into a data lake, it is available to running multiple business scenarios – limited only by the overall business scope.

Now typical Risk Management calculations require that for each time point, and for each product line, separate simulations are run to derive higher order results. Once this is done, the resulting intermediate data then needs to be aligned to collateral valuations, derivate settlement agreements and any other relevant regulatory data to arrive at a final portfolio position. Further there needs to be a mechanism to pull in data that needs be available from a reference perspective for a given set of clients and/or portfolios.

The following are the broad architectural goals for any such implementation –

* Provide a centralized location for aggregating at a housewide level and subsequent analysis of market data, counterparties, liabilities and exposures

* Support the execution of liquidity analysis on a intraday or multi-day basis while providing long term data retention capabilities

* Provide strong but optional capailities for layering in business workflow and rule based decisioning as an outcome of analysis

* Support the execution of liquidity analysis on a intraday or multi-day basis while providing long term data retention capabilities

* Provide strong but optional capailities for layering in business workflow and rule based decisioning as an outcome of analysis

At the same time, long term positions need to be calculated for stress tests, for instance, typically using at least 12 months of data pertaining to a given product set. Finally the two streams of data may be compared to produce a CVA (Credit Valuation Adjustment) value.

The average Investment Bank deals with potentially 50 to 80 future dates and upto 3,000 different market paths, thus computation resource demands are huge. Reports are produced daily, and under special conditions multiple times per day. What-if scenarios with strawman portfolios can also be run to assess regulatory impacts and to evaluate business options.

image005


                                                Figure 3 – Overall Risk Mgmt Workflow

 

As it can be seen from the above, computing arbitrary functions on a large and growing master dataset in real time is a daunting problem (to quote Nathan Marz). There is no single product or technology approach that satisfies all business requirements. Instead, one has to use a variety of tools and techniques to build a complete Big Data system. I present two approaches both of whom have been tried and tested in enterprise architecture.

 

Solution Patterns 

Pattern 1 – Integrate a Big Data Platform with an In memory datagrid

There are broad needs for two distinct data tiers that can be identified based on the business requirements above –

  • It is very clear from the above that data needs to be pulled in near realtime, accessed in a low latency pattern as well as calculations performed on this data. The design principle here needs to be “Write Many and Read Many” with an ability to scale out tiers of servers.In memory datagrids (IMDGs) are very suitable for this use case as they support a very high write rate. IMDGs like GemFire & JBOSS Data Grid (JDG) are highly scalable and proven implementations of distributed datagrids that gives users the ability to store, access, modify and transfer extremely large amounts of distributed data. Further, these products offers a universal namespace for applications to pull in data from different sources for all the above functionality. A key advantage here is that datagrids can pool memory and can scaleout across a cluster of servers in a horizontal manner. Further, computation can be pushed into the tiers of servers running the datagrid as opposed to pulling data into the computation tier.
    To meet the needs for scalability, fast access and user collaboration, data grids support replication of datasets to points within the distributed data architecture. The use of replicas allows multiple users faster access to datasets and the preservation of bandwidth since replicas can often be placed strategically close to or within sites where users need them. IMDGs supports WAN replication, clustering, out of the box replication as well as support for multiple language clients.
  • The second data access pattern that needs to be supported is storage for data ranging from next day to months to years. This is typically large scale historical data. The primary data access principle here is “Write Once, Read Many”. This layer contains the immutable, constantly growing master dataset stored on a distributed file system like HDFS. The HDFS implementation in HDP 2.x offers all the benefits of a distributed filesystem while eliminating the SPOF (single point of failure) issue with the NameNode in a HDFS Cluster. With batch processing (MapReduce) arbitrary views – so called batch views are computed from this raw dataset. So Hadoop (MapReduce on YARN) is a perfect fit for the concept of the batch layer. Besides being a storage mechanism, the data stored in HDFS is formatted in a manner suitable for consumption from any tool within the Apache Hadoop ecosystem like Hive or Pig or Mahout.

                          image007

                                      Figure 3 – System Architecture  

The overall system workflow is as below –

  1. Data is injected into the architecture in either an event based manner or in a batch based manner. HDP supports multiple ways of achieving this. One could either use a high performance ingest like Kafka or an ESB like Mule for the batch updates or directly insert data into the IMDG via a Storm layer. For financial data stored in RDBMS’s, one can write a simple Cacheloader to prime the grid.Each of these approaches offers advantages to the business. For instance, using CEP one can derive realtime insights via predefined business rules and optionally spin up new workflows based on those rules. Once the data is inserted into the grid, one can have the Grid automatically distribute the data via Consistent Hashing. Once the data is all there, fast incremental algorithms are run in memory and resulting data can be stored in a RDBMS for querying by Analytics/ Visualisation applications.

Such intermediate or data suitable for modeling or simulation can also be streamed into the long term storage layer.

Data is loaded into different partitions into the HDFS layer in two different ways – a) from the datasources themselves directly; b) from the JDG layer via a connector .

Pattern 2 – Utilize the complete featureset present in a Big Data Platform like Hortonworks HDP 2.3
** this integration was demonstrated at Red Hat Summit by Hortonworks, Mammoth Data and Red Hat  and is well captured at *http://mammothdata.com/big-data-open-source-risk-managment/*
The headline is self explanatory but let’s briefly examine how you might perform a simple Monte Carlo calculation using Apache Spark. Spark is the ideal choice here due to the iterative nature of these calculations as well as the natural increase in performance in doing this from an in memory perspective. Spark enables major performance gains – applications in Hadoop clusters running Spark tend to run up to 100 times faster in memory and 10 times faster – this even on disk.

Apache Spark provides a comprehensive, unified framework to manage big data processing requirements with a variety of data sets that are diverse in nature (text data, graph data etc) as well as the source of data (batch v. real-time streaming data).

A major advantage of using Spark is that it allows programmers to develop complex, multi-step data pipelines using the directed acyclic graph (DAG) pattern while supporting in-memory data sharing across DAGs, so that data can be shared across jobs.

One important metric used in financial modeling is LVaR – Liquidity Adjusted Value at Risk. As we discussed in earlier posts, an important form of risk is Liquidity Risk, and LVaR is one important metric to represent Liquidity Risk. “Value at Risk” or VaR is no more than the probability that a given portfolio will exceed a given threshold loss over a given period of time.

For mathematical details of the calculation, please see  Extreme value methods with applications to finance by S.Y. Novak.

Now, Liquidity risk is divided into two types: funding liquidity risk (i.e can we make the payments on this position or liability? ) and market liquidity risk (where we ask – can we exit  this position if the market suddenly turns illiquid).

The incorporation of external liquidity risk into a  VaR results in LVaR. This essentially means adjusting the time period used in the VaR calculation, based on the expected length of time required to unwind the position.

Given that we have a need to calculate LVaR for a portfolio, we can accomplish this in a distributed fashion using Spark by doing the following:

  1. Implementing the low-level LVaR calculation in Java, Scala, or Python. With Spark it is straightforward to work with code written in any of these three languages. Spark also provides mature support for multiple programming languages – Java, Scala, Python etc & ships with a built-in set of over 80 high-level operators.
  2. Data Ingestion – all kinds of financial data – position data, market data, existing risk data, General Ledger etc –  is batched in i.e read from flat files stored in HDFS, or the initial values can be read from a relational database or other persistent store via Sqoop.
  3. Spark code written in Scala, Java or Python can leverage the database support provided by those languages. Once the data is read in, it resides in what Spark calls a RDD – a Resilient Distributed Dataset. A convenient representation of the input data, which leverages Spark’s fundamental processing model, would include in each input record the portfolio item details, along with the input range, and probability distribution information needed for the Monte Carlo simulation.
  4. If you have streaming data requirements, you can optionally leverage Kafka integration with Apache Storm to read one value at a time and perform some kind of storage like persist the data into a HBase cluster.In a modern data architecture built on Apache Hadoop, Kafka ( a fast, scalable and durable message broker)works in combination with Storm, HBase and Spark for real-time analysis and rendering of streaming data. Kafka has been used to message geospatial data from a fleet of long-haul trucks to financial data to sensor data from HVAC systems in office buildings.
  5. The next step is to perform a transformation on each input record (representing one portfolio item) which runs the Monte Carlo simulation for that item. The distributed nature of Spark will result in each simulation running in a unique worker process somewhere on one node in the overall cluster.
  6. After each individual simulation has run, running another transform over the RDD to perform any aggregate calculations, such as summing the portfolio threshold risk across all instruments in the portfolio at each given probability threshold.
  7. Output data elements can be written out to HDFS, or stored to a database like Oracle, HBase, or Postgres. From here, reports and visualizations can easily be constructed.
  8. Optionally layering in workflow engines can be used to present the right data to the right business user at the right time.  

Whether you choose one solution pattern over the other or mix both of them depends on your complex business requirements and other characteristics including-

  • The existing data architecture and the formats of the data (structured, semi structured or unstructured) stored in those systems
  • The governance process around the data
  • The speed at which the data flows into the application and the velocity at which insights need to be gleaned
  • The data consumers who need to access the final risk data whether they use a BI tool or a web portal etc
  • The frequency of processing of this data to produce risk reports i.e hourly or near real time (dare I say?)  ad-hoc or intraday