Vamsi Chemitiganti's weekly musings on applying Big Data, Cloud, & Middleware technology to solving industry challenges. Published every Friday or Sunday (if I'm very busy). All opinions are entirely my own. I write this blog so my readers don't have to spend money on expensive consultants.
Over the last three years, we have examined a succession of business issues in the various sectors of financial services on this blog. These have ranged from the mundane (Trading, Risk management, Market Surveillance, Fraud detection, AML et al) to the transformative (Robo advisors, Customer Journeys, Blockchain, Bitcoin etc). We have also examined the changing paradigms in enterprise architecture – moving from siloed monolithic applications to cloud-native software. This blog summarizes the most 12 important technical posts on innovative application architectures.
Having spent the majority of my career working in Banking and Financial Services has made for a fascinating time. It is amazing to witness business transformation begin to occur across the landscape. However, this transformation is occurring on repeatedly discussed themes. A key challenge that CXOs and Enterprise Architecture teams face is how to deploy much-discussed technologies such as Cloud platforms, Big Data, Enterprise Middleware and AI in real-world architectures. This blog post sums up eleven real-world application architectures that industry leaders can use as a good reference point for their own implementations.
The common theme to all of the below architectures –
A focus on Cloud native concepts including microservices, lightweight backends, containers
Design Patterns that encourage new age Data Management techniques including Hadoop and Spark
Cloud-agnostic – whether that is public cloud or private cloud
Integrating business process management and business rules engines as first-class citizens
100% Open Source
#1 Design and Architecture of a Real World Trading Platform…
With each passing quarter, financial services is a business that looks increasingly in danger of disintermediation. CXOs have no alternative but to digitize their businesses. IT will be forced to support cloud-native technologies in both key areas – applications and infrastructure in pursuit of business goals. Developers will also be at the forefront of this change. Eventually, quality of Enterprise Architecture decides business destiny.
After a decade of focusing on compliance with regulatory mandates, Banks are back at fixating on technology innovation. The reason is obvious – over the last five years, Silicon Valley majors and FinTechs have begun to rapidly encroach on the highest profit areas of the banking business. The race is on to create next-generation financial services ecosystems in a variety of areas ranging from Retail Banking, Capital Markets, and Wealth Management. The common thread to all these is massive volumes of Data & Advanced analytics on the data. Given that almost every large and small bank has a Big Data & AI strategy in place, it makes sense for us to highlight six key areas where they should all first direct and then benchmark their efforts from an innovation standpoint.
Global Banking in 2016-17…
As 2017 draws to a close, the days of growth and sky-high stock market valuations seem to be largely back. McKinsey Research posits that while the global banking industry appears quite healthy outwardly, profits are at best flat or even falling across geographies. For the seventh year in a row, the industry’s ROE (Return on Equity) was between 8-10%. For 2016, the industry’s ROE was down a full percentage point from 2015, raising concerns about profitability across the board. There are however innovators that are doing well due to their strong focus on execution.
Banks have overall been very slow to respond to the onslaught of the digital business led by Amazon, Google/Alphabet, PayPal and the horde of FinTechs. What all of these disruptors do better than Banks is to harness customer data to drive offerings that appeal to neglected banking consumers who are already used to using these services every waking hour in their lives.
As technology continues to advance and data becomes more available, the twin forces of competition & regulation, are driving overall innovation in across banking. Capital Markets players are using AI in a range of areas from optimising trading execution, contract pricing, strategy backtesting to risk & compliance.
In the staider Retail Banking & Asset Management areas, profitable areas such as customer lending, consumer payments & wealth management are slowly being disrupted at the cost of established banks. What also lies behind this disruption is the FinTech’s ability to pick and choose the (profitable) areas they want to compete in, their minimal overhead as opposed to & an advanced ability to work with data generated constantly by customer interactions by deploying algorithms that mine historical data & combine it in ways that reveal new insights.
I posit that there are six strategic questions that Banking institutions of all stripes need to glean from their Big Data (& AI) projects. This with a view to attaining sustainable growth for the foreseeable future –
How do we know more about our customers?
How do we manage regulation and turn it into a source of lasting competitive advantage?
How can we increase our digital quotient in a way that enables us to enter new businesses?
How can this deluge of information drive business insight?
How can we drive Business transformation both within the Bank and disarm competition?
How can this information drive agility in customer responsiveness?
Question #1 How much do we know about our customers..really?
Financial institutions, including retail banks, capital markets, payment networks etc process large amounts of customer information daily. However, they have been sorely lacking in their capability to understand their customer profiles as one whole and to interpret this in a form that is meaningful to their business. The ability to do this can result in an understanding of not just the risks posed by this relationship (from a credit risk/AML standpoint) but also an ability to increase the returns per client relationship. This is an area Big Data and AI can influence in a huge way.
Question #2 How do we manage the Regulatory Onslaught and Turn it into Competitive Advantage?
There exist two primary reasons for Enterprises such as Banks, Insurers, Payment Providers and FinTechs to pursue best in class Risk Management Processes and Platforms. The first need in Capital Markets is compliance driven by various regulatory reporting mandates such as the Basel Reporting Requirements, the FRTB, the Dodd‐Frank Act, Solvency II, CCAR and CAT/MiFID II in the United States & the EU. The second reason (common to all kinds of Banking) is the need to drive top-line sales growth for both individual and institutional clients.
We have long advocated for the implementation of Big Data across both the areas. The common thread across virtually every business processes across the front, mid and back office is risk management. Processes ranging from risk data aggregation/reporting, customer onboarding, loan approvals, financial crimes compliance (AML, KYC, CRS & FATCA), enterprise financial reporting& Cyber Security etc can benefit from universal access to accurate, cleansed and well-governed cross-organization data from a range of Book Of Record systems. Further, applying techniques for data processing such as in-memory processing, the process of scenario analysis, computing, & reporting on this data (reg reports/risk scorecards/dashboards etc) can be vastly enhanced. They can be made more real time in response to data about using market movements to understand granular risk concentrations. Finally, model management techniques can be clearly defined and standardized across a large organization. RegTechs or startups focused on the risk and compliance space are already leveraging these techniques across a host of areas identified above.
For decades, Banks have had a monopoly on the financial business. The last few years have seen both FinTechs and other players such as Amazon, Alibaba, Facebook etc enter lucrative areas in banking. These areas include Consumer lending, financial advisory etc. The keyword in all of this is ‘Digital Disintermediation’ and regulators have also begun to take note. In the EU and the UK, regulators are at the forefront of pushing mandates such as SEPA (Single European Payments Area), Open Banking Standard, and PSD-2. All of these regulations will ensure that Banks are forced to unlock their customer data in a way that encourages consumer choice. The hope is that agile players can then use this data to exploit inefficiencies in the banks business model using technology. Services such as account aggregation, consumer loans, credit scoring services, personal financial management tools, and other financial advisory become easy to provide via Open APIs.
If incumbent Banks don’t respond, they will lose their monopoly on being their customers primary front end. As new players take over areas such as mortgage loans (an area where they’re much faster than banks in granting loans), Banks that cannot change their distribution and product models will be commodified. The challenges start with reworking inflexible core banking systems. These maintain customer demographics, balances, product information and other BORT (Book Of Record Transaction) data that store a range of loan, payment and risk information. These architectures will slowly need to transition from their current (largely) monolithic architectures to compose-able units. There are various strategies that Banks can follow to ‘modernize the core’ but adopting Big Data native mindset is. Banks will also seek to work with FinTechs to create islands of cooperation where they can learn from the new players.
Question #4 Drive Business Insight…
There are two primary areas where business insights need to be driven out of. The first is internal operations and the second is customer service. This category encompasses a wide range of strategic choices that drive an operating model – product ideation, creation, distribution strategies across channels/geographies etc. Whatever be the right product and strategy focus, the ability to play in select areas of the value chain depends upon feedback received from day to day operations. Much like in a manufacturing company, this data needs to be harnessed, analyzed with a view to ultimately monetizing it.
Question #5 Business Transformation…
There is no question that FinTechs are able to take ideas from nothing to delivery in a matter of months. This is the key reason banks need to transform their business. This is critical in key areas such as sales, wealth management, and origination. There is surely a lot of confusion around how to drive such initiatives but no one questions the need for centralizing data assets.
In my mind, the first and most important reason to move to a unified strategy is to evolve standalone Digital capabilities into a coherent Platform. Different lines of business can use these capabilities to develop an ecosystem of applications that can be offered as a SaaS (Software as a Service). The end state of Digital Platforms is to operate business systems at massive scale in terms of customers, partners, and employees.
Question #6 Enhance Customer Service…
Customer Service is clearly an area of differentiation for nimbler players as compared to Banks. Banks are still largely dealing with ensuring that consistent views of customer accounts & balances can be maintained across channels. On the other hand, FinTechs have moved onto Chatbots and Robo-advisors all built around Big Data & AI. A Chatbot is a virtual assistant that helps clients perform simple transactions using mediums such as text or voice. They’re based on Natural Language Processing and Machine Learning and are being deployed in simple scenarios such as balance checks and other simpler customer service processes. However, as time goes by they will inevitably get more sophisticated and will eventually supplant human service for the vast majority of the service lifecycle.
Surely, areas such as automated customer service and investment management are still in early stages of maturity. However, they are unmistakably the next big trend in the financial industry and one that players should begin developing capabilities around.
Increasingly, a Bank’s technology platform(s) centered around Big Data represents a significant competitive differentiator that can generate substantial revenues from existing customers and help acquire new ones. Given the strategic importance and revenue potential of this resource, the C-suite must integrate Big Data & AI into their strategic planning in 2018.
We have discussed the topic of market surveillance reporting to some depth in previous blogs. e.g.http://www.vamsitalkstech.com/?p=2984. Over the last decade, Global Financial Markets have embraced the high speed of electronic trading. This trend has only accelerated with the concomitant explosion in trading volumes. The diverse range of instruments & the proliferation of trading venues pose massive regulatory challenges in the area of market conduct supervision and abuse prevention. Banks, Broker dealers, Exchanges and other market participants across the globe are now shelling out millions of dollars in fines for failure to accurately report on market abuse violations. In response to this complex world of high volume & low touch electronic trading, global capital markets regulators have been hard at work across different jurisdictions & global hubs e.g. the FINRA in the US, the IROC in Canada and the ESMA in the European Union. Regulators have created extensive reporting regimes for surveillance with a view to detecting suspicious patterns of trade behavior (e.g, dumping, quote stuffing & non bonafide fake orders etc). The intent to increase market transparency on both the buy and the sell side. Based on the scrutiny Capital Markets players are under, a Big Data Analytics based architecture has become a “must-have” to ensure timely & accurate compliance with these mandates. This blog attempts to discuss such a reference architecture.
Business Technology Requirements for Market Surveillance..
The business requirements for the Surveillance architecture are covered at the below link in more detail but are reproduced below in a concise fashion.
Some of the key business requirements that can be distilled from regulatory mandates include the below:
Store heterogeneous data – Both MiFiD II and MAR mandate the need to perform trade monitoring & analysis on not just real time data but also historical data spanning a few years. Among others this will include data feeds from a range of business systems – trade data, eComms, aComms, valuation & position data, order management systems, position management systems, reference data, rates, market data, client data, front, middle & back office, data, voice, chat & other internal communications etc. To sum up, the ability to store a range of cross asset (almost all kinds of instruments), cross format (structured & unstructured including voice), cross venue (exchange, OTC etc) trading data with a higher degree of granularity – is key.
Data Auditing – Such stored data needs to be fully auditable for 5 years. This implies not just being able to store it but also putting in place capabilities in place to ensure strict governance & audit trail capabilities.
Manage a huge volume increase in data storage requirements (5+ years) due to extensive Record keeping requirements
Perform Realtime Surveillance & Monitoring of data – Once data is collected, normalized & segmented, it will need to support realtime monitoring of data (around 5 seconds) to ensure that every trade can be tracked through it’s lifecycle. Detecting patterns that could perform surveillance for market abuse and monitor for best execution are key.
Business Rules – Core logic that deals with identifying some of the above trade patterns are created using business rules. Business Rules have been covered in various areas in the blog but they primarily work based on an IF..THEN..ELSE construct.
Machine Learning & Predictive Analytics – A variety of supervised ad unsupervised learning approaches can be used to perform extensive Behavioral modeling & Segmentation to discover transactions behavior with a view to identifying behavioral patterns of traders & any outlier behaviors that connote potential regulatory violations.
A Single View of an Institutional Client- From the firm’s standpoint, it would be very useful to have a single view capability for clients that shows all of their positions across multiple desks, risk position, KYC score etc.
A Reference Architecture for Market Surveillance ..
This reference architecture aims to provide generic guidance to banking Business IT Architects building solutions in the realm of Market & Trade Surveillance. This supports a host of hugely important global reg reporting mandates – CAT, MiFiD II, MAR etc that Capital Markets need to comply with. While the concepts discussed in this solution architecture discussed are definitely Big Data oriented, they are largely agnostic to any cloud implementation – private, public or hybrid.
A Market Surveillance system needs to include both real time surveillance of trading activity as well as a retrospective (batch oriented) analysis component. The real time component includes the ability to perform realtime calculations (concerning thresholds, breached limits etc). real time queries with the goal of triggering alerts. Both these kinds of analytics span structured and unstructured data sources. For the batch component, the analytics involve data queries, simple to advanced statistics (min, max, avg, std deviation, sorting, binning, segmentation) to running data science models involving text analysis & search etc.
The system needs to process tens of millions to billions of events in a trading window while providing highest uptime guarantees. Batch analysis is always running in the background.
A Hadoop distribution that includes components such as Kafka, HBase and near real time components such as Storm & Spark Streaming provide a good fit for a responsive architecture. Apache NiFi with its ability to ingest data from a range of sources is preferred for it’s ability to support complex data routing, transformation, and system mediation logic in a complex event processing architecture. The capabilities of Hortonworks Data Flow (the enterprise version of Apache NiFi) is covered in the below blogpost in much detail.
Data volumes in the area of Regulatory reporting can be huge to insanely massive. For instance, at large banks, they can go up to 100s of millions of transactions a day. At market venues such as stock exchanges, they easily enter into the hundreds of billions of messages every trading day. However the data itself is extremely powerful & is really business gold in terms of allowing banks to not just file mundane reg reports but also to perform critical line of business processes such as Single View of Customer, Order Book Analysis, TCA (Transaction Cost Analysis), Algo Backtesting, Price Creation Analysis etc. The architecture thus needs to support multiple ways of storage, analysis and reporting ranging from compliance reporting to data scientists to business intelligence.
Real time processing in this proposed architecture are powered by Apache NiFi. There are five important reasons for this decision –
First of all, complex rules can be defined in NiFi in a very flexible manner. As an example, one can execute SQL queries in processor A against incoming data from any source (data that isnt from a relational databases but JSON, Avro etc.) and then route different results to different downstream processors based on the needs for processing while enriching it. E.g. Processor A could be event driven and if any data is being routed there, a field can be added, or an alert sent to XYZ. Essentially this can be very complex, equivalent to a nested rules engine so to speak.
From a Throughput standpoint, a single NiFi node can typically handle somewhere between 50MB/s to 150MB/s depending on your hardware spec and data structure. Assuming 100-500 kbytes of average messages, for a throughput of 600MB/s, the architecture can be sized to about 5-10 NiFi nodes. It is important to note that performance latency of inbound message processing depends on the network, could be extremely small. Under the hood, you are sending data from source to NIfi node (disk), extract some attributes in memory to process, and deliver to the target system.
Data quality can be handled via the aforementioned “nested rules engine” approach, consisting of multiple NiFi processors. One can even embed an entire rules engine into a single processor. Similarly, you can define simple authentication rules at the event level. For instance, if Field A = English, route the message to an “authenticated” relationship; otherwise send it to an “unauthenticated” relationship.
One of the corner stones in NiFi is called “Data Provenance“, allowing you to have end to end traceability. Not only can the event lifecycle of trade data be traced but you can also track the time at which it happened & the user role who made the change and metadata around why did it happen.
Security – NiFi enables authentication at ingest. One can authenticate data via the rules defined in NiFi, or leverage target system authentication which is implemented at processor level. For example, the PutHDFS processor supports kerberized HDFS, the same applies for Solr and so on.
Overall Processing flow..
The below illustration shows the high-level conceptual architecture. The architecture is composed of core platform services and application-level components to facilitate the processing needs across three major areas of a typical surveillance reporting solution:
Connectivity to a range of trade data sources
Data processing, transformation & analytics
Visualization and business connectivity
The overall processing of data follows the order shown below and depicted in the diagram below –
Data Production – Data related to Trades and their lifecycle is produced from a range of business systems. These data feeds from a range of business systems (including but not limited to) – trade data, valuation & position data, order management systems, position management systems, reference data, rates, market data, client data, front, middle & back office, data, voice, chat & other internal communications etc.
Data Ingestion – Data produced from the the above layer is ingested using Apache NiFi from a range of sources described above. Data can also be filtered and alerts can be setup based on complex event logic. For time series data support HBase can be leveraged along with OpenTSDB. For CEP requirements, such as sliding windows and complex operators, NiFi can be leveraged along with Kafka and Storm pipeline. Using NiFi will make the process easier to load data into the data lake while applying guarantees around the delivery itself. Data can be streamed in real time as it is created in the feeder systems. Data is also loaded at end of the trading day based on the P&L sign off and the end of day close processes. The majority of the data will be fed in from Book of Record Trading systems as well as from market data providers.
As trade and other data is ingested into the data lake, it is important to note that the route in which certain streams are processed will differ from how other streams are processed. Thus the ingest architecture needs to support multiple types of processing ranging from in memory processing, intermediate transformation processing on certain data streams to produce a different representation of the stream. This is where NiFi adds critical support in not just handling a huge transaction throughput but also enabling “on the fly processing” of data in pipelines. As mentioned, NiFi does this via the concept of “processors”.
The core data processing platform is then based on a datalake pattern which has been covered in this blog before. It includes the following pattern of processing.
Data is ingested real time into a HBase database (which uses HDFS as the underlying storage layer). Tables are designed in HBase to store the profile of a trade and it’s lifecycle.
Producers are authenticated at the point of ingest.
Once the data has been ingested into HDFS, it is taken through a pipeline of processing (L0 to L3) as depicted in the below blogpost.
Historical data (defined as T+1) once in the HDFS tier is taken through layers of processing as discussed above. One of the key areas of processing is to run machine learning on the data to discover any hidden patterns in the trades themselves. Patterns that can connote a range of suspicious behavior. Most surveillance applications are based on a search for data that breaches thresholds and seek to match sell & buy orders. The idea is that when these rules are breached, alerts are then generated for compliance officers to conduct further investigation. However this method falls short with complex types of market abuse.A range of supervised learning techniques can then be applied on data such as creating a behavioral profile of different kinds of traders (for instance junior and senior) by classifying & then scoring them based on their likelihood to commit fraud. Thus a range of Surveillance Analytics can be performed on the data. Apache Spark, is highly recommended for near realtime processing not only due to its high performance characteristics but also due to its native support for graph analytics and machine learning – both of which are critical to surveillance reporting.For a deeper look at data science, I recommend the below post.
The other important value driver in deploying Data Science is to perform Advanced Transaction Monitoring Intelligence. The core idea is to get years worth of Trade data in one location (i.e the datalake) & then applying unsupervised learning to glean patterns in those transactions. The goal is then to identify profiles of actors with the intent of feeding it into existing downstream surveillance & TM systems.
This knowledge can then be used to constantly learn transaction behavior for similar traders. This can be a very important capability in detecting fraud in traders, customer accounts and instruments.Some of the usecases are –
Profile trading activity of individuals with similar traits (types of customers, trading desks & instruments, geographical areas of operations etc.) to perform Know Your Trader
Segment traders by similar experience levels and behavior
Understand common fraudulent behavior typologies (e.g. spoofing) and clustering such (malicious) trading activities by trader, instrument and volume etc. The goal being to raise appropriate downstream investigation case management system
Using advanced data processing techniques like Natural Language Processing, constantly analyze electronic communications and join them up with trade data sources to both detect under the radar activity but also to keep the false positive rate low.
Graph Database – Given that most kinds of trading fraud happens in groups of actors – traders acting in collusion with verification & compliance – the ability to view complex relationships of interactions and the strength of those interactions can be a significant monitoring capability
Grid Layer – To improve performance, I propose the usage of a distributed in memory data fabric like JBOSS DataGrid or Pivotal GemFire. This can aid in two ways –
a. Help with fast lookup of data elements by the visualization layer
b. Help perform fast computation process by overlaying a framework like Spark or MapReduce directly onto a stateful data fabric.
The choice of tools here is dependent of the language choices that have been made in building the pricing and risk analytic libraries across the Bank. If multiple language bindings are required (e.g. C# & Java) then the data fabric will typically be a different product than the Grid.
The visualization solution chose shouldI enable the quick creation of interactive dashboards that provide KPIs and other important business metrics from a process monitoring standpoint. Various levels of dashboard need to be created ranging from compliance officer toolboxes, executive dashboard to help identify trends and discover valuable insights.
Additionally, the visualization layer shall provide –
a) A single view of Trader or Trade or Instrument or Entity
b) Investigative workbench with Case Management capability
c) The ability follow the lifecycle of a trade
d) The ability to perform ad hoc queries over multiple attributes
e) Activity correlation across historical and current data sets
f) Alerting on specific metrics and KPIs
To Sum Up…
The solution architecture described in this blogpost is designed with peaceful enterprise co-existence in mind. In the sense, it interacts and is also integrated with a range of BORT systems and other enterprise systems such as ERP, CRM, legacy surveillance systems. This includes all and any other line of business solutions that typically exist as shared enterprise resources (such as CRM or ERP systems or other line-of-business solutions).
Previous posts in this blog have discussed how Capital markets firms must create new business models and offer superior client relationships based on their vast data assets. Firms that can infuse a data driven culture in both existing & new areas of operation will enjoy superior returns and raise the bar for the rest of the industry in 2016 & beyond.
Capital Markets are the face of the financial industry to the general public and generate a large percent of the GDP for the world economy. Despite all the negative press they have garnered since the financial crisis of 2008, capital markets perform an important social function in that they contribute heavily to economic growth and are the primary vehicle for household savings. Firms in this space allow corporations to raise capital using the underwriting process. However, it is not just corporations that benefit from such money raising activity – municipal, local and national governments do the same as well. Just that the overall mechanism differs – while business enterprises issue both equity and bonds, governments typically issue bonds. According to the Boston Consulting Group (BCG), the industry will grow to annual revenues of $661 billion in 2016 from $593 billion in 2015 – a healthy 12% increase. On the buy side, the asset base (AuM – Assets under Management) is expected to reach around $100 trillion by 2020 up from $74 trillion in 2014.
Within large banks, the Capital Markets group and the Investment Banking Group perform very different functions. Capital Markets (CM) is the face of the bank to the street from a trading perspective. The CM group engineers custom derivative trades that hedge exposure for their clients (typically Hedge Funds, Mutual Funds, Corporations, Governments and high net worth individuals and Trusts) as well as for their own treasury group. They may also do proprietary trading on the banks behalf for a profit – although it is this type of trading that Volcker Rule is seeking to eliminate.
If a Bank uses dark liquidity pools (DLP) they funnel their Brokerage trades through the CM group to avoid the fees associated with executing an exchange trade on the street. Such activities can also be used to hide exchange based trading activity from the Street. In the past, Banks used to make their substantial revenues by profiting from their proprietary trading or by collecting fees for executing trades on behalf of their treasury group or other clients.
Banking and within it, capital markets continues to generate insane amounts of data. These producers range from news providers to electronic trading participants to stock exchanges which are increasingly looking to monetize data. And it is not just the banks, regulatory authorities like the FINRA in the US are processing peak volumes of 40-75 billion market events a day http://www.vamsitalkstech.com/?p=1157 . In addition to data volumes, Capital Markets has always possessed a variety challenge as well. They have tons of structured data around traditional banking data, market data, reference data & other economic data. You can then factor in semi-structured data around corporate filings,news,retailer data & other gauges of economic activity. An additional challenge now is the creation of data from social media, multimedia etc – firms are presented with significant technology challenges and business opportunities.
Within larger financial supermarkets, the capital markets group typically leads the way in being forward looking in terms of adopting cutting edge technology and high tech spends. Most of the compute intensive problems are generated out of either this group or the enterprise risk group. These groups own the exchange facing order management systems, the trade booking systems, the pricing libraries for the products the bank trades as well as the tactical systems that are used to manage their market and credit risks, customer profitability, compliance and collateral systems. They typically hold about one quarter of a Banks total IT budget. Capital Markets thus has the largest number of use cases for risk and compliance.
Players across value chain on the buy side, the sell side, the intermediaries (stock exchanges & the custodians) & technology firms such as market data providers are all increasingly looking at leveraging these new data sets that can help unlock the value of data for business purposes beyond operational efficiency.
So what are the different categories of applications that are clearly leveraging Big Data in production deployments.
Illustration – How are Capital Markets leveraging Big Data In 2016
I have catalogued the major ones below based on my work with the majors in the spectrum over the last year.
Client Profitability Analysis or Customer 360 view: With the passing of the Volcker Rule, the large firms are now moving over to a model based on flow based trading rather than relying on prop trading. Thus it is critical for capital market firms to better understand their clients (be they institutional or otherwise) from a 360-degree perspective so they can be marketed to as a single entity across different channels—a key to optimizing profits with cross selling in an increasingly competitive landscape. The 360 view encompasses defensive areas like Risk & Compliance but also the ability to get a single view of profitability by customer across all of their trading desks, the Investment Bank and Commercial Lending.
Regulatory Reporting – Dodd Frank/Volcker Rule Reporting: Banks have begun to leverage data lakes to capture every trade intraday and end of day across it’s lifecycle. They are then validating that no proprietary trading is occurring on on the banks behalf.
CCAR & DFast Reporting: Big Data can substantially improve the quality of raw data collected across multiple silos. This improves the understanding of a Bank’s stress test numbers.
Timely and accurate risk management: Running Historical, stat VaR (Value at Risk) or both to run the business and to compare with the enterprise risk VaR numbers.
Timely and accurate liquidity management: Look at the tiered collateral and their liquidity profiles on an intraday basis to manage the unit’s liquidity. They also need to look at credit and market stress scenarios and be able to look at the liquidity impact of those scenarios.
Timely and accurate intraday Credit Risk Management: Understanding when & if deal breaches a tenor bucketed limit before they book it. For FX trading this means that you have about 9 milliseconds to determine if you can do the trade. This is a great place to use in memory technology like Spark/Storm and a Hadoop based platform. These usecases are key in increasing the capital that can be invested in the business. To do this they need to convince upper management that they are managing their risks very tightly.
Timely and accurate intraday Market Risk Management: Leveraging Big Data to market risk computations ensures that Banks have a real time idea of any market limit breaches for any of the tenor bucketed market limits.
Reducing Market Data costs: Market Data providers like Bloomberg, Thomson Reuters and other smaller agencies typically charge a fee each time data is accessed. With a large firm, both the front office and Risk access this data on an ad-hoc fairly uncontrolled basis. A popular way to save on cost is to negotiate the rights to access the data once and read it many times. The key is that you need a place to put it & that is the Data Lake.
Trade Strategy Development & Backtesting: Big Data is being leveraged to constantly backtest trading strategies and algorithms on large volumes of historical and real time data. The ability to scale up computations as well as to incorporate real time streams is key to
Sentiment Based Trading: Today, large scale trading groups and desks within them have begun monitoring economic, political news and social media data to identify arbitrage opportunities. For instance, looking for correlations between news in the middle east and using that to gauge the price of crude oil in the futures space. Another example is using weather patterns to gauge demand for electricity in specific regional & local markets with a view to commodities trading. The realtime nature of these sources is information gold. Big Data provides the ability to bring all these sources into one central location and use the gleaned intelligence to drive various downstream activities in trading & private banking.
Market & Trade Surveillance:Surveillance is an umbrella term that usually refers to a wide array of trading practices that serve to distort securities prices thus enabling market manipulators to illicitly profit at the expense of other participants, by creating information asymmetry. Market surveillance is generally out by Exchanges and Self Regulating Organizations (SRO) in the US – all of which have dedicated surveillance departments set up for this purpose. However, capital markets players on the buy and sell side also need to conduct extensive trade surveillance to report up internally. Pursuant to this goal, the exchanges & the SRO’s monitor transaction data including orders and executed trades & perform deep analysis to look for any kind of abuse and fraud.
Firstly, capital markets players should look to create centralized trade repositories for Operations, Traders and Risk Management. This would allow consolidation of systems and a reduction in costs by providing a single platform to replace operations systems, compliance systems and desk centric risk systems. This would eliminate numerous redundant data & application silos, simplify operations, reduce redundant quant work, improve and understanding of risk.
Secondly, it is important to put in place a model to create sources of funding for discretionary projects that can leverage Big Data.
Third, Capital Markets groups typically have to fund their portion of AML, Dodd Frank, Volcker Rule, Trade Compliance, Enterprise Market Risk and Traded Credit Risk projects. These are all mandatory spends. After this they typically get to tackle discretionary business projects. Eg- fund their liquidity risk, trade booking and tactical risk initiatives. These defensive efforts always get the short end of the stick and are not to be neglected while planning out new initiatives.
Finally, an area in which a lot of current players are lacking is the ability to associate clients using a Lightweight Entity Identifier (LEI). Using a Big Data platform to assign logical and physical entity ID’s to every human and business the bank interacts can have salubrious benefits. Big Data can ensure that firms can do this without having to redo all of their customer on-boarding systems. This is key to achieving customer 360 views, AML and FATCA compliance as well as accurate credit risk reporting.
It is no longer enough for CIOs in this space to think of tactical Big Data projects, they must be thinking around creating platforms and ecosystems around those platforms to be able to do a variety of pathbreaking activities that generate a much higher rate of return.
 “The State of Capital Markets in 2016” – BCG Perspectives