“Silicon Valley is coming. There are hundreds of start-ups with a lot of brains and money working on various alternatives to traditional banking….the ones you read about most are in the lending business, whereby the firms can lend to individuals and small businesses very quickly and — these entities believe — effectively by using Big Data to enhance credit underwriting. They are very good at reducing the ‘pain points’ in that they can make loans in minutes, which might take banks weeks. ” Jamie Dimon – CEO JP Morgan Chase in Annual Letter to Shareholders Feb 2016.
If Jamie Dimon’s opinion is anything to go by, the Financial Services industry is undergoing a major transformation and it is very evident that Banking as we know it will change dramatically over the next few years. This blog has spent some time over the last year defining the Big Data landscape in Banking. However the rules of the game are changing from mere data harnessing to leveraging data to drive profits. With that background, let us begin examining the popular applications of Data Science in the financial industry. This blog covers the motivation for and need of data mining in Banking. The next blog will introduce key usecases and we will round off the discussion in the third & final post by covering key algorithms, and other computational approaches.
The Banking industry produces the most data of any vertical out there with well defined & long standing business processes that have stood the test of time. Banks possess rich troves of data that pertain to customer transactions & demographic information. However, it is not enough for Bank IT to just possess the data. They must be able to drive change through legacy thinking and infrastructures as things change around the entire industry not just from a risk & compliance standpoint. For instance a major new segment are the millennial customers – who increasingly use mobile devices and demand more contextual services as well as a seamless unified banking experience – akin to what they commonly experience via the internet – at web properties like Facebook, Amazon,Uber, Google or Yahoo etc.
How do Banks stay relevant in this race? A large part of the answer is to make Big Data a strategic & boardroom level discussion and to take an industrial approach to predictive analytics. The current approach as in vogue – to treat these as one-off, tactical project investments does not simply work or scale anymore. There are various organizational models that one could employ, ranging from a shared service to a line of business led approach. An approach that I have seen work very well is to build a Center of Excellence (COE) to create contextual capabilities, best practices and rollout strategies across the larger organization.
Banks need to lead with Business Strategy
A strategic approach to industrializing analytics in a Banking organization can add massive value and competitive differentiation in five distinct categories –
- Exponentially improve existing business processes. e.. Risk data aggregation and measurement, financial compliance, fraud detection
- Help create new business models and go to market strategies – by monetizing multiple data sources – both internal and external
- Vastly improve customer satisfaction by generating better insights across the customer journey
- Increase security while expanding access to relevant data throughout the enterprise to knowledge workers
- Help drive end to end digitization
Financial Services gradually evolves from Big Data 1.0 to 2.0
Predictive analytics & data mining have only been growing in popularity in recent years. However, when coupled with Big Data, they are on their way to attaining a higher degree of business capability & visibility.
Lets take a quick walk down memory lane..
In Big Data 1.0 – (2009-2015), a large technical area of focus was to ingest huge volumes of data to process them in a batch oriented fashion to perform a limited number of business usecases. In the era of 2.0, the focus is on enabling applications to perform high, medium or low latency based complex processing.
In the age of 1.0, Banking organizations across the spectrum, ranging from the mega banks to smaller regional banks to asset managers, have used the capability to acquire, store and process large volumes of data using commodity hardware at a much lower price point. This has resulted in huge reduction in CapEx & OpEx spend on data management projects (Big Data augments while helping augment legacy investments in MPP systems, Data Warehouses, RDBMS’s etc).
The age of Big Data 1.0 in financial services is almost over and the dawn of Big Data 2.0 is now upon the industry. One may ask, “what is the difference?”, I would contend that while Big Data 1.0 largely dealt with the identification, on-boarding and broad governance of the data; 2.0 will begin the redefinition of business based on the ability do deploy advanced processing techniques across a plethora of new & existing sources of data. 2.0 will thus be about extracting richer insights from the onboarded data to serve customers better, stay compliant with regulation & to create new businesses. The new role of ‘Data scientist’ who is an interdisciplinary expert (part business strategist, part programmer, part statistician, data miner & part business analyst) – has come to represent one of the highly coveted job skills today.
Much before the time “Data Science” entered the technology lexicon, the Capital Markets employed advanced quantitative techniques. The emergence of Big Data has only created up new avenues in machine learning, data mining and artificial intelligence.
Illustration: Data drives Banking
Why is that ?
Hadoop, which is now really a platform ecosystem of 30+ projects – as opposed to a standalone technology, has been reimagined twice and now forms the backbone of any financial services data initiative. Thus, Hadoop is has now evolved into a dual persona – first an Application platform in addition to being a platform for data storage & processing.
Why are Big Data and Hadoop the ideal platform for Predictive Analytics?
Big Data is dramatically changing that approach with advanced analytic solutions that are powerful and fast enough to detect fraud in real time but also build models based on historical data (and deep learning) to proactively identify risks.
The reasons why Hadoop is emerging as the best choice for predictive analytics are
- Access to the advances in advanced infrastructures & computing capabilities at a very low cost
- Monumental advances in the algorithmic techniques themselves now..e.g. mathematical abilities, feature sets, performance etc
- Low cost & efficient access to tremendous amounts for data & the ability to store it at scale
Technologies in the Hadoop ecosystem such as ingestion frameworks (Flume,Kafka,Sqoop etc) and processing frameworks (MapReduce,Storm, Spark et al) have enabled the collection, organization and analysis of Big Data at scale. Hadoop supports multiple ways of running models and algorithms that are used to find patterns of customer behavior, business risks, cyber security violations, fraud and compliance anomalies in the mountains of data. Examples of these models include Bayesian filters, Clustering, Regression Analysis, Neural Networks etc. Data Scientists & Business Analysts have a choice of MapReduce, Spark (via Java,Python,R), Storm etc and SAS to name a few – to create these models. Fraud model development, testing and deployment on fresh & historical data become very straightforward to implement on Hadoop
However the story around Big Data adoption in your average Bank is typically not all that revolutionary – it typically follows a more evolutionary cycle where a rigorous engineering approach is applied to gain small business wins before scaling up to more transformative projects.Leveraging an open enterprise Hadoop approach, Big Data centric business initiatives in financial services have begun realizing value in a range of areas as diverse as – the defensive (Risk, Fraud and Compliance – RFC ) to achieving Competitive Parity (e.g Single View of Customer) to the Offensive (Digital Transformation across their Retail Banking business, unified Trade Data repositories in Capital Markets).
With the stage thus set, the next post will describe real world compelling usecases for Predictive Analytics across the spectrum of 21st century banking.