A POV on Bank Stress Testing – CCAR & DFAST..

The recession of 2007 to 2009 was still the most painful since the Depression. At its depths, $15 trillion in household wealth had disappeared, ravaging the pensions and college funds of Americans who had thought their money was in good hands. Nearly 9 million workers lost jobs; 9 million people slipped below the poverty line; 5 million homeowners lost homes.”
― Timothy F. Geithner, Former Secretary of the US Treasury – “Reflections on Financial crises – 2014”

A Quick Introduction to Macroeconomic Stress Testing..

The concept of stress testing in banking is not entirely new. It has been practiced for years in global banks across specific business functions that deal with risk. The goal of these internal tests has been to assess firm wide capital adequacy in periods of economic stress. However,the 2008 financial crisis clearly exposed how unprepared the Bank Holding Companies (BHCs) were to systemic risk brought on as a result of severe macroeconomic distress. Thus the current raft of regulator driven stress tests are motivated from the taxpayer funded bailouts in 2008. Back then banks were neither adequately capitalized to cope with stressed economic conditions nor were their market,credit risk losses across portfolios sustainable.

In 2007, SCAP (Supervisory Capital Access Program) was enacted as a stress testing framework in the US that only 19 leading financial institutions (Banks, Insurers etc) had to adhere to. The exercise was not only focused on the quantity of capital available but also the quality- Tier 1 common capital – with the institution. The emphasis on Tier 1 Common Capital is important as it provided an institution with a higher absorption capacity with minimizing losses to higher capital tiers.  Tier 1 Common Capital can also be managed better during economic stress by adjusting dividends, share buybacks and related activities.

Though it was a one off, the SCAP was a stringent and rigorous test. The Fed performed SCAP audits on the results of all the 19 BHC’s – some of whom failed the test.

Following this in 2010, the Dodd Frank Act was enacted by the Obama Administration.The Dodd Frank Act also introduced it’s own stress test – DFAST (Dodd-Frank Act Stress Testing). DFAST requires BHCs with assets of $10 billion & above to run annual stress tests and to make the results public. The goal of these stress tests is multifold but they are conducted primarily to assure the public, the regulators that BHCs have adequately capitalized their portfolios. BHC’s are required to present detailed capital plans to the Fed.

The SCAP’s successor, CCAR (Comprehensive Capital Adequacy Review) was also enacted around that time. Depending on the overall risk profile of the institution, the CCAR mandates several qualitative & quantitative metrics that BHCs need to report on and make public for several stressed macroeconomic scenarios.


Comprehensive Capital Analysis and Review (CCAR) is a regulatory framework introduced by the Federal Reserve in order to assess, regulate, and supervise large banks and financial institutions – collectively referred to in the framework as Bank Holding Companies (BHCs).
– (WikiPedia)

  • Every year, an increasing number of Tier 2 banks come under the CCAR mandate. CCAR basically requires specific BHCs to develop a set of internal macroeconomic scenarios or use those developed by the regulators. Regulators would then get the individual results of these scenario runs from firms across a nine quarter time horizon. Regulators also develop their own systemic stress tests to verify if a given BHC can withstand negative economic scenarios and continue to operate their lending operations. CCAR coverage primarily includes retail banking operations, auto & home lending, trading, counter party credit risk, AFS (Available For Sale)/HTM (Hold To Maturity) securities etc. The CCAR covers all major kinds of risk – market, credit, liquidity and OpsRisk.
CCAR kicked off global moves by regulators to enforce the same of banks in their respective jurisdictions. The EU requires EBA stress testing. The UK is an example of a country that requires its own stress testing – the Prudential Regulatory Authority. The same evolution of the firm wide stress testing has been followed by other regulators over the world, for example, in Europe with the EBA stress testing. Emerging markets such as India and China are also following this trend. Every year, more and more BHCs are increasingly subject to CCAR reporting mandates.

Similarities & Differences between CCAR and DFAST..

To restate – the CCAR is an annual exercise by the Federal Reserve to assess whether the largest bank holding companies operating in the United States have sufficient capital to continue operations throughout times of economic and financial stress and that they have robust, forward-looking capital-planning processes that account for their unique risks.  As part of this exercise, the Federal Reserve evaluates institutions’ capital adequacy, internal capital adequacy assessment processes, and their individual plans to make capital distributions, such as dividend payments or stock repurchases. Dodd-Frank Act stress testing (DFAST)-an exercise similar to CCAR- is a forward-looking stress test conducted by the Federal Reserve for smaller financial institutions. It is supervised by the Federal Reserve to help assess whether institutions have sufficient capital to absorb losses and support operations during adverse economic conditions.

As part of CCAR reporting guidelines, the BHC’s have to explicitly call out

  1. their sources of capital given their risk profile & breadth of operations,
  2. the internal policies & controls for measuring capital adequacy &
  3. any upcoming business decisions (share buybacks, dividends etc) that may impact their capital adequacy plans.

While both CCAR and DFAST look very similar from a high level  – they both mandate that banks  conduct stress tests – they do differ in the details. DFAST is applicable to banks that have assets between 10-50 billion $. During the planning horizon phase, CCAR allows the BHCs to use their own capital action assessments while DFAST enforces a standardized set of capital actions.The DFAST scenarios represent baseline, adverse and severely adverse scenarios. The DFAST is supervised by the Fed, the OCC (Office of the Comptroller of Currency) and the FDIC.

                                                Summary of DFAST and CCAR (Source: E&Y) 

As can be seen from the above table, while DFAST is complementary to CCAR, both efforts are distinct testing exercises that rely on similar processes, data, supervisory exercises, and requirements. The Federal Reserve coordinates these processes to reduce duplicative requirements and to minimize regulatory burden. CCAR results are reported twice on an annual basis and BHCs are required to also incorporate Basel III capital ratios in their reports with Tier 1 capital ratios calculated using existing rules. DFAST is reported up annually and it does include Basel III reporting.

In a Nutshell…

In CCAR (and DFAST), the Fed is essentially asking the BHC’s the following questions –

(1) For your defined risk profile, please define a process of understanding and mapping the key stakeholders to carry out this process.

(2) Please ensure that you use clean internal data to compute your exposures in the event of economic stress. The entire process of data sourcing, cleaning, computation, analytics & reporting needs to be auditable.

(3) What macroeconomic stress scenarios did you develop in working with key lines of business ? What are the key historical assumptions in developing these? What are the key what-if scenarios that you have developed based on the stressed scenarios? The scenarios need to be auditable as well.

(4) We are then going to run our own macroeconomic numbers & run our own scenarios using our own exposure generators on your raw data.

(5) We want to see how close both sets of numbers are.

Both CCAR and DFAST scenarios are expressed in stressed macroeconomic factors and financial indicators. The regulators typically provide these figures on a quarterly basis a few reporting periods in advance.

What are some examples of these scenarios?
  • Measures of Index Turbulence – E.g. In a certain quarter, regulators might establish that the S&P 500 would go down 30%; Decrease in key indices like home, commercial property & other asset prices.
  • Measures of  Economic Activity – E.g. US unemployment rate spikes, higher interest rates, increased inflation. What if unemployment ran to 14%? What does that do to my mortgage portfolio – the default rates increase and this is what it looks like.
  • Measures of Interest Rate Turbulence –  E.g. US treasury yields, Interest rates on US mortgages etc.

Based on this information, banks would then assess the impact of these economic scenarios as reflected in market and credit losses to their portfolios. This would help them estimate how their capital base would behave in this situation. These internal CCAR metrics are then sent over to the regulators. Every Bank has their own models based on their understanding which the Fed needs to review as well for completeness and quality.

The Fed uses the CCAR and DFAST results to evaluate capital adequacy, the quality of the capital adequacy assessment process and then evaluates the BHC’s plans to make capital distributions using dividends & share repurchases etc in the context of the results. The BHC’s boards of directors are required to approve and sign off on these plans.

What do CCAR & DFAST entail of Banks?

Well, six important things as the above illustration captures –

    1. CCAR is fundamentally very different from other umbrella risk types in that it has a strong external component in terms of reporting on internal bank data to the regulatory authorities. CCAR reporting is done by sending over internal bank Book of Record Transaction (BORT) data from their lending systems (with hundreds of manual adjustments) to the regulators for them to run their models to assess capital adequacy.  Currently , most banks do some model reporting internally which are based on canned CCAR algos in tools like SAS/Spark computed for a few macroeconomic stress scenarios.
    2. Both CCAR and DFAST stress the same business processes, data resources and governance mechanisms. They are both a significant ask on the BHCs from the standpoint of planning, execution and governance. BHCs have found them daunting with the new D-SIB’s that enter the mandate are faced with implementing these programs that need significant organizational and IT spend.
    3. Both CCAR and DFAST challenge the banks on data collection, quality, lineage and reporting. The Fed requires that data needs to be accurate, comprehensive and clean. Data Quality is the single biggest challenge to stress test compliance. Banks need to work on a range of BORT (Book of Record Transaction Systems) like Core Banking, Lending Portfolios, Position data and any other data needed to accurate reflect the business. There is also a reconciliation process that is typically used to reconcile risk data with the GL (General Ledger). For instance if a BHC’s lending portfolio is $4 billion based on the raw summary data. Once reconciliation is performed – it seems to be around $3 billion after adjustments. The regulator runs the aforesaid macroeconomic scenarios at $4 billion and the exposures are naturally off.
    4. Contrary to popular perception -the heavy lifting from is typically not in creating and running the exposure calculations for stress testing. The creation of these is relatively straightforward. Banks historically have had their own analytics groups produce these macroeconomic models. They also already have 10s of libraries in place that can be modified to create supervisory scenarios for CCAR/DFAST- the baseline, adverse & severely adverse. The critical difference with stress testing is that silo-ed models and scenarios need be unified along with the data.
    5. Model development in Banks usually follows a well defined lifecycle.Most of Liquidity Assessment and Liquidity Groups within Banks currently have a good base of quants with a clean separation of job duties. For instance, while one group produces scenarios, others work on exposures that feed into liquidity engines to calculate liquidity. The teams running these liquidity assessments are good candidates to run the CCAR/DFAST models as well. The calculators themselves will need to be rewritten for Big Data using something like SAS/ Spark.
    6. Transparency must be demonstrated down to the source data level. And banks need to be able to document all capital classification and computation rules to a sufficient degree to meet regulatory requirements during the auditing and review process.

The Technology Implications of  CCAR/DFAST..

It can clearly be seen that regulatory stress testing derives inputs from virtually every banking function. Then it should come as no surprise that  it follows that from a technology point of view there are several implications :

    • CCAR and DFAST impact a range of systems, processes and controls. The challenges that most Banks have in integrating front office trading desk data (Position data, pricing data and reporting) with back-office systems –  risk & finance are making the job of accurately reporting on stress numbers all the more difficult. These are causing most BHC’s to resort to manual data operations, analytics and complicated reconciliation process across the front, mid and back offices.
    • Not just from a computation & reporting library standardization, banks need to be able to perform common data storage for data from a range of BORT systems.
    • Banks also need to standardize on data taxonomies across all of these systems.
    • To that end, Banks need to stop creating more silos data across Risk and Finance functions; as I have often advocated in this blog – a move to a Data Lake enabled architecture is appropriate as a way of eliminating silos and the problem of unclean data which is sure to invite regulatory sanction.
    • Banks need to focus on Data Cleanliness by setting appropriate governance and audit-ability policies
    • Move to a paradigm of bringing compute to large datasets instead of the other way around
    • Move towards in memory analytics to transform, aggregate and analyze data in real time across many dimensions to obtain an understanding of the banks risk profile at any given point in time

A Reference Architecture for CCAR and DFAST..

 I recommend readers review the below post on FRTB Architecture as it contains core architectural and IT themes that are broadly applicable to CCAR and DFAST as well.

A Reference Architecture for the FRTB (Fundamental Review of the Trading Book)

Conclusion..

As can be seen from the above, both CCAR & DFAST require a holistic approach across the value chain (model development, data sourcing, reporting) across Risk, Finance and Treasury functions.  Further Regulators are increasingly demanding an automated process across risk & capital calculations under various scenarios using accurate and consistent data. The need of the hour for BHCs is to move to a common model for data storage, stress modeling and testing. Doing this can only ensure that the metrics and outputs of capital adequacy can be produced accurately and in a timely manner, thus satisfying the regulatory mandate.

References –

[1] Federal Reserve CCAR Summary Instructions 2016

https://www.federalreserve.gov/newsevents/press/bcreg/bcreg20160128a1.pdf

Why the Insurance Industry Needs to Learn from Banking’s Risk Management Nightmares..

risk_management_montage

(Image Credit – ENC Consulting)

Why Systemic Financial Crises Are a Broad Failure of Risk Management…

Various posts in this blog have catalogued the practice of  risk management in the financial services industry. To recap briefly, the Great Financial Crisis (GFC) of 2008 was a systemic failure that brought about large scale banking losses across the globe. Considered by many economists to be the worst economic crisis since the Great Depression [1], it not only precipitated the collapse of large financial institutions across the globe but also triggered the onset of sovereign debt crises across Greece, Iceland et al.

Years of deregulation & securitization (a form of risk transfer) combined with expansionary monetary policy during the Greenspan years, in the United States, led to the unprecedented availability of easy consumer credit in lines such as mortgages, credit cards and auto. The loosening of lending standards led to the rise of Subprime Mortgages which were often underwritten using fraudulent practices. Investment Banks were only too happy to create mortgage backed securities (MBS) which were repackaged and sold across the globe to willing institutional investors. Misplaced financial incentives in banking were also a key cause of this mindless financial innovation.

The health of entire global high finance thus rested on the ability of the US consumer to make regular payments on their debt obligations – especially  on their mortgages. However, artificially inflated housing prices began to decline in 2004 and the rate of refinancing dropped, the rate of foreclosures assumed mammoth proportions. Global investors begin to thus suffer significant losses. The crisis assumed the form of a severe liquidity crunch leading to a crisis of confidence among counter parties in the financial system.

Global & National Regulatory Authorities had to step in to conduct massive bailouts of banks. Yet stock markets suffered severe losses as housing markets collapsed causing a large crisis of confidence. Central Banks & Federal Governments responded with massive monetary & fiscal policy stimulus thus yet again crossing the line of Moral Hazard.Risk Management practices in 2008 were clearly inadequate at multiple levels ranging from department to firm to regulatory levels. The point is well made that the while the risks that individual banks ran were seemingly rational on an individual level however taken as a whole, the collective position was irrational & unsustainable. This failure to account for the complex global financial system was reflected across the chain of risk data aggregation, modeling & measurement.

 The Experience Shows That Risk Management Is A Complex Business & Technology Undertaking…

What makes Risk Management a complex job is the nature of Global Banking circa 2016?

Banks today are complex entities engaged in many kinds of activities. The major ones include –

  • Retail Banking – Providing cookie cutter financial services ranging from collecting customer deposits, providing consumer loans, issuing credit cards etc. A POV on Retail Banking at – http://www.vamsitalkstech.com/?p=2323 
  • Commercial Banking –  Banks provide companies with a range of products ranging from business loans, depository services to other financial investments.
  • Capital Markets  – Capital Markets groups provide underwriting services & trading services that engineer custom derivative trades for institutional clients (typically Hedge Funds, Mutual Funds, Corporations, Governments and high net worth individuals and Trusts) as well as for their own treasury group.  They may also do proprietary trading on the banks behalf for a profit – although it is this type of trading that Volcker Rule is seeking to eliminate. A POV on Retail Banking at- http://www.vamsitalkstech.com/?p=2175
  • Wealth Management – Wealth Management provide personal investment management, financial advisory, and planning disciplines directly for the benefit of high-net-worth (HNWI) clients. A POV on Wealth Management at – http://www.vamsitalkstech.com/?p=1447

Firstly, Banks have huge loan portfolios across all of the above areas (each with varying default rates) such as home mortgages, credit loans, commercial loans etc . In the Capital Markets space, a Bank’s book of financial assets gets more complex due to the web of counter-parties across the globe across a range of complex assets such as derivatives. Complex assets mean complex mathematical models that calculate risk exposures across many kinds of risk. These complex models for the most part did not take tail risk and wider systemic risk into account while managing risk.

Secondly, the fact that markets turn in unison during periods of (downward) volatility – which ends up endangering the entire system. Finally, complex and poorly understood financial instruments in the derivatives market had made it easy for Banks to take on highly leveraged positions which placed their own firms & counter parties at downside risk. These models were entirely dependent on predictable historical data which never modeled “black swan” events. That means while the math may have been complex, it never took on sophisticated scenario analysis into account.

Regulatory Guidelines ranging from Basel III to Dodd Frank to MiFiD II to the FRTB (the new kid on the regulatory block) have been put in place by international and national regulators post 2008. The overarching goal being to prevent a repeat of the GFC where taxpayers funded bailouts for managers of a firm – who profit immensely on the upside.

These Regulatory mandates & pressures have begun driving up Risk and Compliance expenditures to unprecedented levels. The Basel Committee guidelines on risk data reporting & aggregation (RDA), Dodd Frank, Volcker Rule as well as regulatory capital adequacy legislation such as CCAR are causing a retooling of existing risk regimens. The Volcker Rule prohibits banks from trading on their own account (proprietary trading) & greatly curtails their investments in hedge funds. The regulatory intent is to avoid banker speculation with retail funds which are insured by the FDIC. Banks have to thus certify across their large portfolios of positions as to which trades have been entered for speculative purposes versus hedging purposes.

The impact of the Volcker Rule has been to shrink margins in the Capital Markets space as business moves to a a flow based trading model that relies less on proprietary trading and more on managing trading for clients. At the same time risk management gets more realtime in key areas such as  market, credit and liquidity risks.

A POV on FRTB is at the below link.

A POV on the FRTB (Fundamental Review of the Trading Book)…

Interestingly enough one of the key players in the GFC was AIG –  an insurance company with a division – FP (Financial Products)- that really operated like a Hedge Fund by looking to insure downside risk it never thought it needed to payout on. 

Which Leads Us to the Insurance Industry…

For the most part of their long existence, insurance companies were relatively boring – they essentially provided protection against adverse events such as loss of property, life & health risks. The consumer of insurance products is a policyholder who makes regular payments called premiums to cover themselves. The major lines of insurance business can be classified into life insurance, non-life insurance and health insurance. Non life insurance is also termed P&C (Property and Casualty) Insurance. While insurers collect premiums, they invest these funds in relatively safer areas such as corporate bonds etc.

Risks In the Insurance Industry & Solvency II…

While the business model in insurance is essentially inverted & more predictable as compared to banking, insurers have to grapple with the risk of ensuring that enough reserves have been set aside for payouts to policyholder claims.  It is very important for them to have a diversified investment portfolio as well as ensure that profitability does not suffer due to defaults on these investments. Thus firms need to ensure that their investments are diverse – both from a sector as well as from a geographical exposure standpoint.

Firms thus need to constantly calculate and monitor their liquidity positions & risks. Further, insurers are constantly entering into agreements with banks and reinsurance companies – which also exposes them to counterparty credit risk.

From a global standpoint, it is interesting that US based insurance firms are largely regulated at the state level while non-US firms are regulated at the federal level. The point is well made that insurance firms have had a culture of running a range of departmentalized analytics as compared to the larger scale analytics that the Banks described above need to run.

In the European Union, all 27 member countries (including the United Kingdom) are expected to adhere to Solvency II [2] from 2016. Solvency II replaced the long standing Solvency I – which only calculates capital for underwriting risk.

Whereas Solvency I calculates capital only for underwriting risks, Solvency II is quite similar to Basel II – discussed below – and imposes guidelines for insurers to calculate investment as well as operational risks.

Towards better Risk Management..Basel III

There are three pillars to Solvency II [2].

  • Pillar 1 sets out quantitative rules and is concerned with the calculation of capital requirements and the types of capital that are eligible.
  • Pillar 2 is concerned with the requirements for the overall insurer supervisory review process &  governance.
  • Pillar 3 focuses on disclosure and transparency requirements.

The three pillars are therefore analogous to the three pillars of Basel II.

Why Bad Data Practices will mean Poor Risk Management & higher Capital Requirements under Solvency II..

While a detailed discussion of Solvency II will follow in a later post, it imposes new  data aggregation, governance and measurement criteria on insurers –

  1. The need to identify, measure and offset risks across the enterprise and often in realtime
  2. Better governance of risks across not just historical data but also fresh data
  3. Running simulations that take in a wider scope of measures as opposed to a narrow spectrum of risks
  4. Timely and accurate Data Reporting

The same issues that hobble banks in the Data Landscape are sadly to be found in insurance as well.

The key challenges with current architectures –

  1.  A high degree of Data is duplicated from system to system leading to multiple inconsistencies at the summary as well as transaction levels. Because different groups perform different risk reporting functions (e.g Credit and Market Risk) – the feeds, the ingestion, the calculators end up being duplicated as well.
  2. Traditional Risk algorithms cannot scale with this explosion of data as well as the heterogeneity inherent in reporting across multiple kinds of risks as needed for Solvency II. E.g Certain kinds of Credit Risk need access to around years of historical data where one is looking at the probability of the counter-party defaulting & to obtain a statistical measure of the same. All of these analytics are highly computationally intensive.
  3. Risk Model and Analytic development needs to be standardized to reflect realities post Solvency II. Solvency II also implies that from an analytics standpoint, a large number of scenarios on a large volume of data. Most Insurers will need to standardize their analytic libraries across their various LOBs. If Banks do not look to move to an optimized data architecture, they will incur tens of millions of dollars in additional hardware spend.

Summary

We have briefly covered the origins of regulatory risk management in both banking and insurance. Though the respective business models vary across both verticals, there is a good degree of harmonization in the  regulatory progression.  The question is if insurers can learn from the bumpy experiences of their banking counterparts in the areas of risk data aggregation and measurement.

References..

[1] https://en.wikipedia.org/wiki/Financial_crisis_of_2007%E2%80%9308

[2] https://en.wikipedia.org/wiki/Solvency_II_Directive_2009

A POV on the FRTB (Fundamental Review of the Trading Book)…

Regulatory Risk Management evolves…

The Basel Committee of supranational supervision was put in place to ensure the stability of the financial system. The Basel Accords are the frameworks that essentially govern the risk taking actions of a bank. To that end, minimum regulatory capital standards are introduced that banks must adhere to. The Bank of International Settlements (BIS) established  1930, is the world’s oldest international financial consortium. with 60+ member central banks, representing countries from around the world that together make up about 95% of world GDP. BIS stewards and maintains the Basel standards in conjunction with member banks.

The goal of Basel Committee and the Financial Stability Board (FSB) guidelines are to strengthen the regulation, supervision and risk management of the banking sector by improving risk management and governance. These have taken on an increased focus to ensure that a repeat of financial crisis 2008 comes to pass again. Basel III (building upon Basel I and Basel II) also sets new criteria for financial transparency and disclosure by banking institutions.

Basel III – the last prominent version of the Basel standards published in 2012 (named for the town of Basel in Switzerland where the committee meets) prescribes enhanced measures for capital & liquidity adequacy and were developed by the Basel Committee on Banking Supervision with voluntary worldwide applicability.  Basel III covers credit, market, and operational risks as well as liquidity risks. As this is known, BCBS 239 –  guidelines do not just apply to the G-SIBs (the Globally Systemically Important Banks) but also to the D-SIBs (Domestic Systemically Important Banks).Any important financial institution deemed ‘too big to fail” needs to work with the regulators to develop a “set of supervisory expectations” that would guide risk data aggregation and reporting.

Basel III & other Risk Management topics were covered in these previous posts – http://www.vamsitalkstech.com/?p=191 && http://www.vamsitalkstech.com/?p=667

Enter the FTRB (Fundamental Review of the Trading Book)…

In May 2012, the Basel Committee on Banking Supervision (BCBS) again issued a consultative document with an intention of revising the way capital was calculated for the trading book. These guidelines which can be found here in their final form [1] were repeatedly refined based on comments from various stakeholders & quantitative studies. In Jan 2016, a final version of this paper was released. These guidelines are now termed  the Fundamental Review of the Trading Book (FRTB) or unofficially as some industry watchers have termed – Basel IV. 

What is new with the FTRB …

The main changes the BCBS has made with the FRTB are – 

  1. Changed Measure of Market Risk – The FRTB proposes a fundamental change to the measure of market risk. Market Risk will now be calculated and reported via Expected Shortfall (ES) as the new standard measure as opposed to the venerated (& long standing) Value At Risk (VaR).  As opposed to the older method of VaR with a 99% confidence level, expected shortfall (ES) with a 97.5% confidence level is proposed. It is to be noted that for normal distributions, the two metrics should be the same however the ES is much superior at measuring the long tail. This is a recognition that in times of extreme economic stress, there is a tendency for multiple asset classes to move in unison. Consequently, under the ES method capital requirements are anticipated to be much higher.
  2. Model Creation & Approval – The FRTB also changes how models are approved & governed.  Banks that want to use the IMA (Internal Model Approach) need to pass  a set of rigorous tests so that they are not forced to used the Standard Rules approach (SA) for capital calculations. The fear is that the SA will increase capital requirements. The old IMA approach has now been revised and made more rigorous in a way that it enables supervisors to remove internal modeling permission for individual trading desks. This approach now enforces more consistent identification of material risk factors across banks, and constraints on hedging and diversification. All of this is now going to be done at a desk level instead of the entity level. FRTB moves the responsibility of showing compliant models, significant backtesting & PnL attribution to the desk level.
  3. Boundaries between the Regulatory Books – The FRTB also assigns explicit boundaries between the trading book (the instruments the bank intends to trade) and the bank book (the instruments held to maturity). These rules have been redefined in such a way that banks now have to contend with stringent rules for internal transfers between both. The regulatory motivation is to eliminate a given bank’s ability to designate individual positions as belonging to either book. Given the different accounting treatment for both, there is a feeling that bank’s were resorting to capital arbitrage with the goal of minimizing regulatory capital reserves. The FRTB also introduces more stringent reporting and data governance requirements for both which in conjunction with the well defined boundary between books. All of these changes should lead to a much better regulatory framework & also a revaluation of the structure of trading desks. 
  4. Increased Data Sufficiency and Quality – The FRTB regulation also introduces Non-Modellable risk factors (NMRF). Risk factors are non modellabe if certain aspects that pertain to the availability and sufficiency of the data are an issue . Thus with the NMRF, Banks now need increased data sufficiency and quality requirements that go into the model itself. This is a key point, the ramifications of which we will discuss in the next section.
  5. The FRTB also upgrades its standardized approach to data structuring – with a new standardized approach (SBA) which is more sensitive to various risk factors across different asset classes as compared to the Basel II SA. Regulators now determine the sensitivities in the data. Approvals will also be granted at the desk level rather than at the entity level.  The revised SA should provide a consistent way to measure risk across geographies and regions, giving regulatory a better way to compare and aggregate systemic risk. The sensitivities based approach should also allow banks to share a common infrastructure between the IMA approach and the SA approach. Thera are a set of buckets and risk factors that are prescribed by the regulator which instruments can then be mapped to.
  6. Models must be seeded with real and live transaction data – Fresh & current transactions will now need to be entered into the calculation of capital requirements as of the date on which they were conducted. Not only that, though reporting will take place at regular intervals, banks are now expected to manage market risks on a continuous basis -almost daily.
  7. Time Horizons for Calculation – There are also enhanced requirements for data granularity depending on the kind of asset. The FRTB does away with the generic 10 day time horizon for market variables in Basel II to time periods based on liquidity of these assets. It propose five different time horizons – 10 day, 20 day, 60 day, 120 day and 250 days.

FRTB_Horizons

                                 Illustration: FRTB designated horizons for market variables (src – [1])

To Sum Up the FRTB… 

The FRTB rules are now clear and they will have a profound effect on how market risk exposures are calculated. The FRTB clearly calls out the specific instruments in the trading book vs the banking book. With the new switch over to Expected Shortfall (ES) @ 97.5% over VaR @ 99% confidence levels – it should cause increased reserve requirements. Furthermore, the ES calculations will be done keeping liquidity considerations of the underlying instruments with a historical simulation approach ranging from 10 days to 250 days of stressed market conditions. Banks that use a pure IMA approach will now have to move to IMA plus the SA method.

The FRTB compels Banks to create unified teams from various departments – especially Risk, Finance, the Front Office (where trading desks sit) and Technology to address all of the above significant challenges of the regulation.

From a technology capabilities standpoint, the FRTB now presents banks with both a data volume, velocity and analysis challenge. Let us now examine the technology ramifications.

Technology Ramifications around the FRTB… 

The FRTB rules herald a clear shift in how IT architectures work across the Risk area and the Back office in general.

  1. The FRTB calls for a single source of data that pulls data across silos of the front office, trade data repositories, a range of BORT (Book of Record Transaction Systems) etc. With the FRTB, source data needs to be centralized and available in one location where every feeding application can trust it’s quality.
  2. With both the IMA and the SBA in the FRTB, many more detailed & granular data inputs (across desks & departments) need to be fed into the ES (Expected Shortfall) calculations from varying asset classes (Equity, Fixed Income, Forex, Commodities etc) across multiple scenarios. The calculator frameworks developed or enhanced for FRTB will need ready & easy access to realtime data feeds in addition to historical data. At the firm level, the data requirements and the calculation complexity will be even more higher as it needs to include the entire position book.

  3. The various time horizons called out also increase the need to run a full spectrum of analytics across many buckets. The analytics themselves will be more complex than before with multiple teams working on all of these areas. This calls out for standardization of the calculations themselves across the firm.

  4. Banks will have to also provide complete audit trails both for the data and the processes that worked on the data to provide these risk exposures. Data lineage, audit and tagging will be critical.

  5. The number of runs required for regulatory risk exposure calculations will dramatically go up under the new regime. The FRTB requires that each risk class be calculated separately from the whole set. Couple this with increased windows of calculations as discussed  in #3 above- means that more compute processing power and vectorization.

  6. FRTB also implies that from an analytics standpoint, a large number of scenarios on a large volume of data. Most Banks will need to standardize their libraries across the house. If Banks do not look to move to a Big Data Architecture, they will incur tens of millions of dollars in hardware spend.

The FRTB is the most pressing in a long list of Data Challenges facing Banks… 

The FRTB is yet another regulatory mandate that lays bare the data challenges facing every Bank. Current Regulatory Risk Architectures are based on traditional relational databases (RDBMS) architectures with 10’s of feeds from Core Banking Systems, Loan Data, Book Of Record Transaction Systems (BORTS) like Trade & Position Data (e.g. Equities, Fixed Income, Forex, Commodities, Options etc),  Wire Data, Payment Data, Transaction Data etc. 

These data feeds are then tactically placed in memory caches or in enterprise data warehouses (EDW). Once the data has been extracted, it is transformed using a series of batch jobs which then prepare the data for Calculator Frameworks to which run the risk models on them. 

All of the above applications need access to medium to large amounts of data at the individual transaction Level. The Corporate Finance function within the Bank then makes end of day adjustments to reconcile all of this data up and these adjustments need to be cascaded back to the source systems down to the individual transaction or classes of transaction levels. 

These applications are then typically deployed on clusters of bare metal servers that are not particularly suited to portability, automated provisioning, patching & management. In short, nothing that can automatically be moved over at a moment’s notice. These applications also work on legacy proprietary technology platforms that do not lend themselves to flexible & a DevOps style of development.

Finally, there is always need for statistical frameworks to make adjustments to customer transactions that somehow need to get reflected back in the source systems. All of these frameworks need to have access to and an ability to work with terabtyes (TBs) of data.

Each of above mentioned risk work streams has corresponding data sets, schemas & event flows that they need to work with, with different temporal needs for reporting as some need to be run a few times in a day (e.g. Traded Credit Risk), some daily (e.g. Market Risk) and some end of the week (e.g Enterprise Credit Risk). 

One of the chief areas of concern is that the FRTB may require a complete rewrite of analytics libraries. Under the FRTB, front office libraries will need to do Enterprise Risk –  a large number of analytics on a vast amount of data. Front office models cannot make all the assumptions that enterprise risk can to price a portfolio accurately. Front office systems run a limited number of scenarios thus trading off timeliness for accuracy – as opposed to enterprise risk.

Most banks have stringent vetting processes in place and all the rewritten analytic assets will need to be passed through that. Every aspect of the math of the analytics needs to be passed through this vigorous process. All of this will add to compliance costs as vetting process costs typically cost multiples of the rewrite process. The FRTB has put in place stringent model validation standards along with hypothetical portfolios to benchmark these.

The FRTB also requires data lineage and audit capabilities for the data. Banks will need to establish visual representation of the overall process as data flows from the BORT systems to the reporting application. All data assets have to be catalogued and a thorough metadata management process instituted.

What Must Bank IT Do… 

Given all of the above data complexity and the need to adopt agile analytical methods  – what is the first step that enterprises must adopt?

There is a need for Banks to build a unified data architecture – one which can serve as a cross organizational repository of all desk level, department level and firm level data.

The Data Lake is an overarching data architecture pattern. Lets define the term first. A data lake is two things – a small or massive data storage repository and a data processing engine. A data lake provides “massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs“. Data Lake are created to ingest, transform, process, analyze & finally archive large amounts of any kind of data – structured, semistructured and unstructured data.

The Data Lake is not just a data storage layer but one that can allow different users (traders, risk managers, compliance etc) plug in calculators that work on data that spans intra day activity as well as data across years. Calculators can then be designed that can work on data with multiple runs to calculate Risk Weighted Assets (RWAs) across multiple calibration windows.

The below illustration is a depiction of goal is to create a cross company data-lake containing all asset data and compute applied to the data.

RDA_Vamsi

                              Illustration – Data Lake Architecture for FRTB Calculations

1) Data Ingestion: This encompasses creation of the L1 loaders to take in Trade, Position, Market, Loan, Securities Master, Netting  and Wire Transfer data etc across trading desks. Developing the ingestion portion will be the first step to realizing the overall architecture as timely data ingestion is a large part of the problem at most institutions. Part of this process includes understanding examples of a) data ingestion from the highest priority of systems b) apply the correct governance rules to the data. The goal is to create these loaders for versions of different systems (e.g Calypso 9.x) and to maintain it as part of the platform moving forward. The first step is to understand the range of Book of Record transaction systems (lending, payments and transactions) and the feeds they send out. The goal would be to create the mapping to a release of an enterprise grade Open Source Big Data Platform e.g HDP (Hortonworks Data Platform) to the loaders so these can be maintained going forward.

2) Data Governance: These are the L2 loaders that apply the rules to the critical fields for Risk and Compliance. The goal here is to look for gaps in the data and any obvious quality problems involving range or table driven data. The purpose is to facilitate data governance reporting.

3) Entity Identification: This step is the establishment and adoption of a lightweight entity ID service. The service will consist of entity assignment and batch reconciliation.

4) Developing L3 loaders: This phase will involve defining the transformation rules that are required in each risk, finance and compliance area to prep the data for their specific processing.

5) Analytic Definition: Running the analytics that are to be used for FRTB.

6) Report Definition: Defining the reports that are to be issued for each risk and compliance area.

References..

[1] https://www.bis.org/bcbs/publ/d352.pdf

The Five Deadly Sins of Financial Services IT..

THE STATE OF GLOBAL FINANCIAL SERVICES IT ARCHITECTURE…

This blog has time & again discussed how Global, Domestic and Regional banks need to be innovative with their IT platform to constantly evolve their product offerings & services. This is imperative due to various business realities –  the increased competition by way of the FinTechs, web scale players delivering exciting services & sharply increasing regulatory compliance pressures. However, systems and software architecture has been a huge issue at nearly every large bank across the globe.

Regulation is also afoot in parts of the globe which will give non traditional banks access to hitherto locked customer data. E.g PSD-2 in the European Union. Further, banking licenses have been more easily granted to non-banks which are primarily technology pioneers. e.g. Paypal

It’s 2016 and Banks are waking up to the fact that IT Architecture is a critical strategic differentiator. Players that have agile & efficient architecture platforms, practices can not only add new service offerings but also able to experiment across a range of analytic led offerings that create & support multi-channel offerings. These digital services can now be found abundantly areas ranging from Retail Banking, Capital Markets, Payments & Wealth Management esp at the FinTechs.

So, How did we get here…

The Financial Services IT landscape – no matter the segment – one picks across the spectrum – Capital Markets, Retail & Consumer Banking, Payment Networks & Cards, Asset Management etc are all largely predicated on a few legacy anti-patterns. These anti-patterns have evolved over the years from a systems architecture, data architecture & middleware standpoint.

These anti-patterns have resulted in a mishmash of organically developed & shrink wrapped systems that do everything from running critical Core Banking Applications to Trade Lifecycle to Securities Settlement to Financial Reporting etc.  Each of these systems operates in an application, workflow, data silo with it’s own view of the enterprise. These are all kept in sync largely via data replication & stove piped process integration.

If this sounds too abstract, let us take an example &  a rather topical one at that. One of the most critical back office functions every financial services organization needs to perform is Risk Data Aggregation & Regulatory Reporting (RDARR). This spans areas from Credit Risk, Market Risk, Operational Risk , Basel III, Solvency II etc..the list goes on.

The basic idea in any risk calculation is to gather a whole range of quality data in one place and to run computations to generate risk measures for reporting.

So, how are various risk measures calculated currently? 

Current Risk Architectures are based on traditional relational databases (RDBMS) architectures with 10’s of feeds from Core Banking Systems, Loan Data, Book Of Record Transaction Systems (BORTS) like Trade & Position Data (e.g. Equities, Fixed Income, Forex, Commodities, Options etc),  Wire Data, Payment Data, Transaction Data etc. 

These data feeds are then tactically placed in memory caches or in enterprise data warehouses (EDW). Once the data has been extracted, it is transformed using a series of batch jobs which then prepare the data for Calculator Frameworks to which run the risk models on them. 

All of the above need access to large amounts of data at the individual transaction Level. The Corporate Finance function within the Bank then makes end of day adjustments to reconcile all of this data up and these adjustments need to be cascaded back to the source systems down to the individual transaction or classes of transaction levels. 

These applications are then typically deployed on clusters of bare metal servers that are not particularly suited to portability, automated provisioning, patching & management. In short, nothing that can automatically be moved over at a moment’s notice. These applications also work on legacy proprietary technology platforms that do not lend themselves to flexible & a DevOps style of development.

Finally, there is always need for statistical frameworks to make adjustments to customer transactions that somehow need to get reflected back in the source systems. All of these frameworks need to have access to and an ability to work with terabtyes (TBs) of data.

Each of above mentioned risk work streams has corresponding data sets, schemas & event flows that they need to work with, with different temporal needs for reporting as some need to be run a few times in a day (e.g. Traded Credit Risk), some daily (e.g. Market Risk) and some end of the week (e.g Enterprise Credit Risk)

Five_Deadly_Sins_Banking_Arch

                          Illustration – The Five Deadly Sins of Financial IT Architectures

Let us examine why this is in the context of these anti-patterns as proposed below –

THE FIVE DEADLY SINS…

The key challenges with current architectures –

  1. Utter, total and complete lack of centralized Data leading to repeated data duplication  – In the typical Risk Data Aggregation application – a massive degree of Data is duplicated from system to system leading to multiple inconsistencies at the summary as well as transaction levels. Because different groups perform different risk reporting functions (e.g Credit and Market Risk) – the feeds, the ingestion, the calculators end up being duplicated as well. A huge mess, any way one looks at it. 
  2. Analytic applications which are not designed for throughput – Traditional Risk algorithms cannot scale with this explosion of data as well as the heterogeneity inherent in reporting across multiple kinds of risks. E.g Certain kinds of Credit Risk need access to around 200 days of historical data where one is looking at the probability of the counter-party defaulting & to obtain a statistical measure of the same. The latter are highly computationally intensive and can run for days. 
  3. Lack of Application Blueprint, Analytic Model & Data Standardization – There is nothing that is either SOA or microservices-like and that precludes best practice development & deployment. This only leads to maintenance headaches. Cloud Computing enforces standards across the stack. Areas like Risk Model and Analytic development needs to be standardized to reflect realities post BCBS 239. The Volcker Rule aims to ban prop trading activity on part of the Banks. Banks must now report on seven key metrics across 10s of different data feeds across PB’s of data. Most cannot do that without undertaking a large development and change management headache.
  4. Lack of Scalability –  It must be possible to operate it as a central system that can scale to carry the full load of the organization and operate with hundreds of applications built by disparate teams all plugged into the same central nervous system.One other factor to consider is the role of cloud computing in customer retention efforts. The analytical computational power required to understand insights from gigantic data sets is costly to maintain on an individual basis. The traditional owned data center will probably not disappear, but banks need to be able to leverage the power of the cloud to perform big data analysis in a cost-effective manner.
    EndFragment
  5. A Lack of Deployment Flexibility –  The application & data requirements dictate the deployment platforms. This massive anti pattern leads to silos and legacy OS’s that can not easily be moved to Containers like Docker & instantiated by a modular Cloud OS like OpenStack.

THE BUSINESS VALUE DRIVERS OF EFFICIENT ARCHITECTURES …

Doing IT Architecture right and in a responsive manner to the business results in critical value drivers that that are met & exceeded this transformation are – 

  1. Effective Compliance with increased Regulatory Risk mandates ranging from Basel – III, FTRB, Liquidity Risk – which demand flexibility of all the different traditional IT tiers.
  2. An ability to detect and deter fraud – Anti Money Laundering (AML) and Retail/Payment Card Fraud etc
  3. Fendoff competition from the FinTechs
  4. Exist & evolve in a multichannel world dominated by the millennial generation
  5. Reduced costs to satisfy pressure on the Cost to Income Ratio (CIR)
  6. The ability to open up data & services that operate on the customer data to other institutions

 A uniform architecture that works across of all these various types would seem a commonsense requirement. However, this is a major problem for most banks. Forward looking approaches that draw heavily from microservices based application development, Big Data enabled data & processing layers, the adoption of Message Oriented Middleware (MOM) & a cloud native approach to developing applications (PaaS) & deployment (IaaS) are the solution to the vexing problem of inflexible IT.

The question is if banks can change before they see a perceptible drop in revenues over the years?  

Big Data architectural approaches to Financial Risk Mgmt..

Risk management is not just a defensive business imperative but the best managed banks deploy their capital to obtain the best possible business outcomes. The last few posts have more than set the stage from a business and regulatory perspective. This one will take a bit of a deep dive into the technology.

Existing data architectures are siloed with bank IT creating or replicating data marts or warehouses to feed internal lines of business. These data marts are then accessed by custom reporting applications thus replicating/copying data many times over which leads to massive data management & governance challenges.

Furthermore, the explosion of new types of data in recent years has put tremendous pressure on the financial services datacenter, both technically and financially, and an architectural shift is underway in which multiple LOBs can consolidate their data into a unified data lake.

Banking data architectures and how Hadoop changes the game

Most large banking infrastructures , on a typical day, process millions of derivative trades. The main implication is that there are a large number of data inserts and updates to handle. Once the data is loaded into the infrastructure there needs to be complex mathematical calculations that need to be done in near real time to calculate intraday positions. Most banks use techniques like Monte Carlo modeling and other computational simulations to build & calculate these exposures. Hitherto, these techniques were extremely expensive from both the cost of hardware and software needed to run them. Neither were tools & projects available that supported a wide variety of data processing paradigms – batch, interactive, realtime and streaming.

The Data Lake supports multiple access methods (batch, real-time, streaming, in-memory, etc.) to a common data set which is the unified repository of all financial data, it also enables users to transform and view data in multiple ways (across various schemas) and deploy closed-loop analytics applications that bring time-to-insight closer to real time than ever before.

DL

                                                 Figure 1 – From Data Silos to a Data Lake

Also, with the advent and widespread availability of Open Source software like Hadoop (I mean a full Hadoop platform ecosystem with Hortonworks Data Platform HDP, and it’s support of multiple computing frameworks like Storm, Spark, Kafka, MapReduce and HBase) which can turn a cluster of commodity x86 based servers into a virtual mainframe, cost is no longer a limiting factor. The Application ecosystem of a financial institution can now be a deciding factor in how data is created, ingested, transformed and exposed to consuming applications.

Thus clusters of inexpensive x86 servers running Linux and Hortonworks Data Platform (HDP) provide an extremely cost-effective environment for deploying and running simulations and stress tests.

arch2

                                                 Figure 2 – Hadoop now supports multiple processing engines

Finally, an HDP cluster with tools like Hadoop, Storm, and Spark is not limited to one purpose, like older dedicated-computing platforms. The same cluster you use for running stress tests can also be used for text mining, predictive analytics, compliance, fraud detection, customer sentiment analysis, and many many other purposes. This is a key point, once you can bring in siloed data into a data lake, it is available to running multiple business scenarios – limited only by the overall business scope.

Now typical Risk Management calculations require that for each time point, and for each product line, separate simulations are run to derive higher order results. Once this is done, the resulting intermediate data then needs to be aligned to collateral valuations, derivate settlement agreements and any other relevant regulatory data to arrive at a final portfolio position. Further there needs to be a mechanism to pull in data that needs be available from a reference perspective for a given set of clients and/or portfolios.

The following are the broad architectural goals for any such implementation –

* Provide a centralized location for aggregating at a housewide level and subsequent analysis of market data, counterparties, liabilities and exposures

* Support the execution of liquidity analysis on a intraday or multi-day basis while providing long term data retention capabilities

* Provide strong but optional capailities for layering in business workflow and rule based decisioning as an outcome of analysis

* Support the execution of liquidity analysis on a intraday or multi-day basis while providing long term data retention capabilities

* Provide strong but optional capailities for layering in business workflow and rule based decisioning as an outcome of analysis

At the same time, long term positions need to be calculated for stress tests, for instance, typically using at least 12 months of data pertaining to a given product set. Finally the two streams of data may be compared to produce a CVA (Credit Valuation Adjustment) value.

The average Investment Bank deals with potentially 50 to 80 future dates and upto 3,000 different market paths, thus computation resource demands are huge. Reports are produced daily, and under special conditions multiple times per day. What-if scenarios with strawman portfolios can also be run to assess regulatory impacts and to evaluate business options.

image005


                                                Figure 3 – Overall Risk Mgmt Workflow

 

As it can be seen from the above, computing arbitrary functions on a large and growing master dataset in real time is a daunting problem (to quote Nathan Marz). There is no single product or technology approach that satisfies all business requirements. Instead, one has to use a variety of tools and techniques to build a complete Big Data system. I present two approaches both of whom have been tried and tested in enterprise architecture.

 

Solution Patterns 

Pattern 1 – Integrate a Big Data Platform with an In memory datagrid

There are broad needs for two distinct data tiers that can be identified based on the business requirements above –

  • It is very clear from the above that data needs to be pulled in near realtime, accessed in a low latency pattern as well as calculations performed on this data. The design principle here needs to be “Write Many and Read Many” with an ability to scale out tiers of servers.In memory datagrids (IMDGs) are very suitable for this use case as they support a very high write rate. IMDGs like GemFire & JBOSS Data Grid (JDG) are highly scalable and proven implementations of distributed datagrids that gives users the ability to store, access, modify and transfer extremely large amounts of distributed data. Further, these products offers a universal namespace for applications to pull in data from different sources for all the above functionality. A key advantage here is that datagrids can pool memory and can scaleout across a cluster of servers in a horizontal manner. Further, computation can be pushed into the tiers of servers running the datagrid as opposed to pulling data into the computation tier.
    To meet the needs for scalability, fast access and user collaboration, data grids support replication of datasets to points within the distributed data architecture. The use of replicas allows multiple users faster access to datasets and the preservation of bandwidth since replicas can often be placed strategically close to or within sites where users need them. IMDGs supports WAN replication, clustering, out of the box replication as well as support for multiple language clients.
  • The second data access pattern that needs to be supported is storage for data ranging from next day to months to years. This is typically large scale historical data. The primary data access principle here is “Write Once, Read Many”. This layer contains the immutable, constantly growing master dataset stored on a distributed file system like HDFS. The HDFS implementation in HDP 2.x offers all the benefits of a distributed filesystem while eliminating the SPOF (single point of failure) issue with the NameNode in a HDFS Cluster. With batch processing (MapReduce) arbitrary views – so called batch views are computed from this raw dataset. So Hadoop (MapReduce on YARN) is a perfect fit for the concept of the batch layer. Besides being a storage mechanism, the data stored in HDFS is formatted in a manner suitable for consumption from any tool within the Apache Hadoop ecosystem like Hive or Pig or Mahout.

                          image007

                                      Figure 3 – System Architecture  

The overall system workflow is as below –

  1. Data is injected into the architecture in either an event based manner or in a batch based manner. HDP supports multiple ways of achieving this. One could either use a high performance ingest like Kafka or an ESB like Mule for the batch updates or directly insert data into the IMDG via a Storm layer. For financial data stored in RDBMS’s, one can write a simple Cacheloader to prime the grid.Each of these approaches offers advantages to the business. For instance, using CEP one can derive realtime insights via predefined business rules and optionally spin up new workflows based on those rules. Once the data is inserted into the grid, one can have the Grid automatically distribute the data via Consistent Hashing. Once the data is all there, fast incremental algorithms are run in memory and resulting data can be stored in a RDBMS for querying by Analytics/ Visualisation applications.

Such intermediate or data suitable for modeling or simulation can also be streamed into the long term storage layer.

Data is loaded into different partitions into the HDFS layer in two different ways – a) from the datasources themselves directly; b) from the JDG layer via a connector .

Pattern 2 – Utilize the complete featureset present in a Big Data Platform like Hortonworks HDP 2.3
** this integration was demonstrated at Red Hat Summit by Hortonworks, Mammoth Data and Red Hat  and is well captured at *http://mammothdata.com/big-data-open-source-risk-managment/*
The headline is self explanatory but let’s briefly examine how you might perform a simple Monte Carlo calculation using Apache Spark. Spark is the ideal choice here due to the iterative nature of these calculations as well as the natural increase in performance in doing this from an in memory perspective. Spark enables major performance gains – applications in Hadoop clusters running Spark tend to run up to 100 times faster in memory and 10 times faster – this even on disk.

Apache Spark provides a comprehensive, unified framework to manage big data processing requirements with a variety of data sets that are diverse in nature (text data, graph data etc) as well as the source of data (batch v. real-time streaming data).

A major advantage of using Spark is that it allows programmers to develop complex, multi-step data pipelines using the directed acyclic graph (DAG) pattern while supporting in-memory data sharing across DAGs, so that data can be shared across jobs.

One important metric used in financial modeling is LVaR – Liquidity Adjusted Value at Risk. As we discussed in earlier posts, an important form of risk is Liquidity Risk, and LVaR is one important metric to represent Liquidity Risk. “Value at Risk” or VaR is no more than the probability that a given portfolio will exceed a given threshold loss over a given period of time.

For mathematical details of the calculation, please see  Extreme value methods with applications to finance by S.Y. Novak.

Now, Liquidity risk is divided into two types: funding liquidity risk (i.e can we make the payments on this position or liability? ) and market liquidity risk (where we ask – can we exit  this position if the market suddenly turns illiquid).

The incorporation of external liquidity risk into a  VaR results in LVaR. This essentially means adjusting the time period used in the VaR calculation, based on the expected length of time required to unwind the position.

Given that we have a need to calculate LVaR for a portfolio, we can accomplish this in a distributed fashion using Spark by doing the following:

  1. Implementing the low-level LVaR calculation in Java, Scala, or Python. With Spark it is straightforward to work with code written in any of these three languages. Spark also provides mature support for multiple programming languages – Java, Scala, Python etc & ships with a built-in set of over 80 high-level operators.
  2. Data Ingestion – all kinds of financial data – position data, market data, existing risk data, General Ledger etc –  is batched in i.e read from flat files stored in HDFS, or the initial values can be read from a relational database or other persistent store via Sqoop.
  3. Spark code written in Scala, Java or Python can leverage the database support provided by those languages. Once the data is read in, it resides in what Spark calls a RDD – a Resilient Distributed Dataset. A convenient representation of the input data, which leverages Spark’s fundamental processing model, would include in each input record the portfolio item details, along with the input range, and probability distribution information needed for the Monte Carlo simulation.
  4. If you have streaming data requirements, you can optionally leverage Kafka integration with Apache Storm to read one value at a time and perform some kind of storage like persist the data into a HBase cluster.In a modern data architecture built on Apache Hadoop, Kafka ( a fast, scalable and durable message broker)works in combination with Storm, HBase and Spark for real-time analysis and rendering of streaming data. Kafka has been used to message geospatial data from a fleet of long-haul trucks to financial data to sensor data from HVAC systems in office buildings.
  5. The next step is to perform a transformation on each input record (representing one portfolio item) which runs the Monte Carlo simulation for that item. The distributed nature of Spark will result in each simulation running in a unique worker process somewhere on one node in the overall cluster.
  6. After each individual simulation has run, running another transform over the RDD to perform any aggregate calculations, such as summing the portfolio threshold risk across all instruments in the portfolio at each given probability threshold.
  7. Output data elements can be written out to HDFS, or stored to a database like Oracle, HBase, or Postgres. From here, reports and visualizations can easily be constructed.
  8. Optionally layering in workflow engines can be used to present the right data to the right business user at the right time.  

Whether you choose one solution pattern over the other or mix both of them depends on your complex business requirements and other characteristics including-

  • The existing data architecture and the formats of the data (structured, semi structured or unstructured) stored in those systems
  • The governance process around the data
  • The speed at which the data flows into the application and the velocity at which insights need to be gleaned
  • The data consumers who need to access the final risk data whether they use a BI tool or a web portal etc
  • The frequency of processing of this data to produce risk reports i.e hourly or near real time (dare I say?)  ad-hoc or intraday