The Emerging Role for Big Data and Machine Learning on the Buy Side in Financial Services..

The Buy Side is perhaps the biggest segment of Wall St & the financial markets – there are roughly 7,000+ mutual funds, thousands of hedge funds which invest across 40,000 plus instruments – stocks, bonds and other securities. Thus, one of the important business functions on Buy-Side institutional businesses such as Mutual Funds, Hedge Funds, Trusts, Asset Managers, Pension Funds & Private Equity is to constantly analyze a range of information about companies underlying the above instruments to determine their investment worthiness. 

The Changing Nature of the Buy Side circa 2018…

When compared with the rest of the financial services industry,  the investment and asset management sector has lagged behind in terms of the many business and technology shifts over the recent decades, as we have cataloged in the below series of blogs.

The State of Global Wealth Management..(1/3)

Given the competitive nature of the market, commodified investment strategies will need to rapidly change to incorporate more and more advanced technology into the decision making process.Combined with substandard performance across a crucial sector of the Buy Side – Hedge Funds over the last couple of years – there is all of a sudden a need to incorporate innovative approaches to enhancing Alpha.

This is even more important in this age of real-time information. Market trends, sentiment, and operational risk issues, negative news seem to crop up virtually every day.

All of the above information sources have an ability to dramatically change the quality of an underlying financial instrument. At some point, the ability of a human portfolio manager to keep up with the information onslaught is moot, this calls for techniques around advanced intelligence and automation.

In this blog post, I will discuss key recommendations across the spectrum of  Big Data and Artificial Intelligence techniques to help store, process and analyze hundreds of data points across the universe of millions of potential investments.

Recommendation #1 Focus on Non-Traditional Datasets…

Traditional investment management has tended to focus on a financial analysis. This is rigorous fundamental analysis of the investment worthiness of the underlying company. At larger Buy Side firms especially the big mutual funds, tens of Portfolio Managers & Analysts constantly analyze a range of data – both quantitative – e.g. financial statements such as balance sheets, cash flow statements & qualitative – e.g industry trends, supply chain information etc. The trend analysis is typically broken up into three broad areas – Momentum, Value (relative to other players in the same segment) and Future Profitability. It is also not uncommon for large mutual funds to add and remove companies constantly from their investment portfolio – almost on a weekly basis. I propose that firms expand the underlying data into not just the traditional sources identified above but also some of the newer kinds as depicted in the below illustration. The information asymmetry advantage conferred by using a wider source of data has the potential to produce outsize investment performance.

Recommendation #2 Leverage advances in Big Data storage and processing…

We start moving into the technology now. First, a range of non-traditional data has to be identified and then ingested into a set of commodity servers either in an on-premise data center or using a cloud provider such as Amazon AWS or Microsoft Azure. It then needs to be curated, by applying business level processing. This could include identifying businesses using fundamental analysis or applying algorithms that spot patterns in data that pertain to attractiveness based on certain trending themes etc.

As the below table captures, the advent of Big Data collection, storage, and processing techniques now enable a range of information led capabilities that were simply not possible with older technology.

All of these non-traditional data streams shown above and depicted below can be stored on commodity hardware clusters. This can be done at a fraction of the cost of traditional SAN storage. The combined data can then be analyzed effectively in near real-time thus providing support for advanced business capabilities.

Driver Business Value Example
Data Volumes   Larger data sets allow analysts to query and conduct experiments with fewer iterations to understand which business fit certain investment strength criteria Omnichannel data, Customer engagement data, ticker data, pricing data, sales volumes across longer time horizons. Social media and third-party datasets etc
Data Variety  New data types spanning text, images, time series data and video Business process data, audio data, images, Sensor & device data. Publicly available statements and OTC contracts
Analytics and visualization More powerful analytics and visualization tools to explain and explore investment themes and patterns Complex Event Processing (CEP), predictive analytics.   Portfolio and risk management dashboards
Data Velocity Open source software tools.   Lower server and enterprise storage costs Hadoop, NoSQL. Commodity hardware. Elastic compute capacity.

Recommendation #3 For certain key areas of the process esp Portfolio Backtesting and Risk Management, adopt Parallel Processing Techniques…

We have covered how the rapidly flowing information across markets creates opportunities for buy-side firms that can exploit this data. In this context, a key capability is to perform backtesting of key algorithmic strategies based on years worth of historical data. These strategies can range from deciding when to trade away exposure to capital optimization. The scale of the analysis problem is immense with virtually 10s of thousands of investment prospects (read companies) operating across the globe across 30+ countries in 6 continents. Every time an algorithm is tweaked, extensive backtesting must be performed on a few quarters or years of historical data to assess its performance.

Big Data has a huge inherent architectural advantage here in that it minimizes data movement & can bring the processing to the data, can cut down the time taken to run these kinds of backtesting and risk analyses across TB of data to hours as opposed to a day or two taken by older technology.

Recommendation #4 Adopt & Accelerate Adoption of Machine Learning Techniques…

Given that the process of investment research is rapidly becoming a data and analytics challenge, what are the new techniques in the analytics space that can help?

Big Data & Advanced Analytics drive profits in Financial Services..(1/3)

  • Classification & Class Probability Estimation– For a given set of data, predict for each individual in a population, a discrete set of classes that this individual belongs to. An example classification is – “For all wealth management clients in a given population, who are most likely to respond to an offer to move to a higher segment”. Common techniques used in classification include decision trees, Bayesian models, k-nearest neighbors, induction rules etc. Class Probability Estimation (CPE) is a closely related concept in which a scoring model is created to predict the likelihood that an individual would belong to that class.  Employing such classical machine learning techniques such as clustering, segmentation, and classification to create models that can automatically segment investment prospects into key categories. These could be based on certain key investment criteria or factors.
  • Testing investment hypotheses by understanding hitherto hidden relationships among the underlying data
  • Constantly learning from the underlying data and then ranking companies based on investment metrics and criteria
  • Adopting Natural language processing (NLP) techniques to read from and to analyze thousands of text documents such as regulatory filings, research reports etc. A key use case is to understand what kinds of geopolitical events can use movements in location sensitive instruments such as heavy metals, commodities such as oil. This is very important as markets move in concert. This can be analyzed on the fly to not just rebalance exposures but also client portfolios. The usecases for NLP are myriad.

Recommendation #5 Leverage Partnerships…

We are aware of the fact that the above investments in technology may be a huge ask of small and mid-level buy-side firms which have viewed technology as a supporting function. However, there now exist service providers that provide the infrastructure, curated data feeds, and custom analytics as a SaaS (Software-as-a-service) to interested clients. Let not size and potential upfront CapEx investment deter these firms from driving their investment methodology to a data-driven process.

Recommendation #6 Increase Automation via Analytics but Human still stays in the loop…

None of the above technology recommendations are intended to displace a portfolio manager who has years of rich industry experience and expertise. The above technology stack can enable these expensive resources to focus their valuable time on activities that add meaningful business value. e.g interviewing key investment prospects, real-time analysis/ portfolio rebalancing, trade execution and management/strategic reporting. Technology is just an aid in that sense and serves as an assistant to the portfolio manager.


Leading actively managed funds are all about selection, allocation and risk/return assessments. The business goal is to ultimately generate insights that can drive higher investment returns or to shield from investment risk. As Buy Side firms across the board evolve in 2018, one of the key themes from a business and technological standpoint is leveraging AI & Big Data technologies to transform their internal research process from a resource-intensive process to a data-driven investment process.

Leave a Reply

Your email address will not be published. Required fields are marked *