My take on Gartner’s Top 10 Strategic Technology Trends for 2016

Gartner_top_2016

Dream no small dreams for they have no power to move the hearts of men.” — Goethe

It is that time of the year again when the mavens at Gartner make their annual predictions regarding the top Strategic trends for the upcoming year. The definition of ‘strategic’ as in an emerging technology trend that will impact Iong term business thus influencing plans & budgets. As before, I will be offering up my own take on these while solidifying the discussion in terms of the Social, Mobile, Big Data Analytics & Cloud (SMAC) stack that is driving ongoing industry revolution.
  1. The Digital Mesh
    The rise of the machines has been well documented but enterprises are waking up to the possibilities only recently.  Massive data volumes are now being reliably generated from diverse sources of telemetry as well as endpoints at corporate offices (as a consequence of BYOD). The former devices include sensors used in manufacturing, personal fitness devices like FitBit, Home and Office energy management sensors, Smart cars, Geo-location devices etc. Couple these with the ever growing social media feeds, web clicks, server logs and more – one sees a clear trend forming which Gartner terms the Digital Mesh.  The Digital Mesh leads to an interconnected information deluge which encompasses classical IoT endpoints along with audio, video & social data streams. This leads to huge security challenges and opportunity from a business perspective  for forward looking enterprises (including Governments). Applications will need to combine these into one holistic picture of an entity – whether individual or institution. 
  2. Information of Everything
    The IoT era brings an explosion of data that flows across organizational, system and application boundaries. Look for advances in technology especially in Big Data and Visualization to help consumers harness this information in the right form enriched with the right contextual information.In the Information of Everything era, massive amounts of efforts will thus be expended on data ingestion, quality and governance challenges.
  3. Ambient User Experiences
    Mobile applications first begun forcing the need for enterprise to begin supporting multiple channels of interaction with their consumers. For example Banking now requires an ability to engage consumers in a seamless experience across an average of four to five channels – Mobile, eBanking, Call Center, Kiosk etc. The average enterprise user is familiar with BYOD in the age of self service. The Digital Mesh only exacerbates this gap in user experiences as information consumers navigate applications as they consume services across a mesh that is both multi-channel as well as provides Customer 360 across all these engagement points.Applications developed in 2016 and beyond must take an approach to ensuring a smooth experience across the spectrum of endpoints and the platforms that span them from a Data Visualization standpoint.
  4. Autonomous Agents and Things

    Smart machines like robots,personal assistants like Apple Siri,automated home equipment will rapidly evolve & become even more smarter as their algorithms get more capable and understanding of their own environments. In addition, Big Data & Cloud computing will continue to mature and offer day to day capabilities around systems that employ machine learning to make predictions & decisions. We will see increased application of Smart Agents in diverse fields like financial services,healthcare, telecom and media.

  5. Advanced Machine Learning
    Most business problems are data challenges and an approach centered around data analysis helps extract meaningful insights from data thus helping the business It is a common capability now for many enterprises to possess the capability to acquire, store and process large volumes of data using a low cost approach leveraging Big Data and Cloud Computing.  At the same time the rapid maturation of scalable processing techniques allows us to extract richer insights from data. What we commonly refer to as Machine Learning – a combination of  of econometrics, machine learning, statistics, visualization, and computer science – extract valuable business insights hiding in data and builds operational systems to deliver that value. Data Science has evolved to a new branch called “Deep Neural Nets” (DNN). DNN Are what makes possible the ability of smart machines and agents to learn from data flows and to make products that use them even more automated & powerful. Deep Machine Learning involves the art of discovering data insights in a human-like pattern. The web scale world (led by Google and Facebook) have been vocal about their use of Advanced Data Science techniques and the move of Data Science into Advanced Machine Learning.
  6. 3D Printing Materials

    3D printing continues to evolve and advance across a wide variety of industries.2015 saw a wider range of materials including carbon fiber, glass, nickel alloys, electronics & other materials used in the 3D printing process . More and more industries continue to incorporate the print and assembly of composite parts constructed using such materials – prominent examples including Tesla and SpaceX. We are at the beginning of a 20 year revolution which will lead to sea changes in industrial automation.

  7. Adaptive Security
    A cursory study of the top data breaches in 2015 reads like a “Who’s Who”of actors in society across Governments, Banks, Retail establishments etc. The enterprise world now understands that an comprehensive & strategic approach to Cybersecurity has  now far progressed from being an IT challenge a few years ago to a business imperative. As Digital and IoT ecosystems evolve to loose federations of API accessible and cloud native applications, more and more assets are at danger of being targeted by extremely well funded and sophisticated adversaries. For instance – it is an obvious truth that data from millions of IoT endpoints requires data ingest & processing at scale. The challenge from a security perspective is multilayered and arises not just from malicious actors but also from a lack of a holistic approach that combines security with data governance, audit trails and quality attributes. Traditional solutions cannot handle this challenge which is exacerbated by the expectation that in an IoT & DM world, data flows will be multidirectional across a grid of application endpoints. Expect to find applications in 2016 and beyond incorporating Deep Learning and Real Time Analytics into their core security design with a view to analyzing large scale data at a very low latency.
  8. Advanced System Architecture
    The advent of the digital mesh and ecosystem technologies like autonomous agents (powered by Deep Neural Nets) will make increasing demands on computing architectures from a power consumption, system intelligence as well as a form factor perspective. The key is to provide increased performance while mimicking neuro biological architectures. The name given this style of building electronic circuits is neuromorphic computing. Systems designers will have increased choice in terms of using field programmable gate arrays (FPGAs) or graphics processing units (GPUs). While both FGPAs and GPUs have their pros and cons, devices & computing architectures using these as a foundation are both suited to deep learning and other pattern matching algorithms leveraged by advanced machine learning. Look for more reductions in form factors at less power consumption while allowing advanced intelligence in the IoT endpoint ecosystem.
  9. Mesh App and Service Architecture
    The micro services architecture approach which combines the notion of autonomous, cooperative yet loosely coupled applications built as a conglomeration of business focused services is a natural fit for the Digital Mesh.  The most important additive and consideration to micro services based architectures in the age of the Digital Mesh is what I’d like to term –  Analytics Everywhere. Applications in 2016 and beyond will need to recognize that Analytics are pervasive, relentless, realtime and thus embedded into our daily lives. Every interaction a user has with a micro services based application will need a predictive capability built into the application architecture itself. Thus, 2016 will be the year when Big Data techniques are no longer be the preserve of classical Information Management teams but move to the umbrella Application development area which encompasses the DevOps and Continuous Integration & Delivery (CI-CD) spheres.

  10. IoT Architecture and Platforms
    There is no doubt in anyone’s mind that IoT (Internet Of Things) is a technology megatrend that will reshape enterprises, government and citizens for years to come. IoT platforms will complement Mesh Apps and Service Architectures with a common set of platform capabilities built around open communication, security, scalability & performance requirements. These will form the basic components of IoT infrastructure including but not limited to machine to machine interfaces,location based technology, micro controllers , sensors, actuators and the communication protocols (based on an all IP standard).


The Final Word
– 

One feels strongly that  Open Source will drive the various layers that make up the Digital Mesh stack (Big Data, Operating Systems, Middleware, Advanced Machine Learning & BPM). IoT will be a key part of Digital Transformation initiatives.

However, the challenge for developing Vertical capabilities on these IoT platforms is three fold.  Specifically in areas of augmenting micro services based Digital Mesh applications- which are largely lacking at the time of writing:

  • Data Ingest in batch or near realtime (NRT) or realtime from dynamically changing, disparate and physically distributed sensors, machines, geo location devices, clickstreams, files, and social feeds via highly secure lightweight agents
  • Provide secure data transfer using point-to-point and bidirectional data flows in real time
  • Curate these flows with Simple Event Processing (SEP) capabilities via tracing, parsing, filtering, joining, transforming, forking or cloning of data flows while adding business context to these flows. As mobile clients, IoT applications, social media feeds etc are being brought onboard into existing applications from an analytics perspective, traditional IT operations face pressures from both business and development teams to provide new and innovative services.

The creation of these smart services will further depend on the vertical industries that these products serve as well as requirements for the platforms that host them. E.g industrial automation, remote healthcare, public transportation, connected cars, home automation etc.

Finally, 2016 also throws up some interesting questions around Cyber Security, namely –

a. Can an efficient Cybersecurity be a lasting source of competitive advantage;
b. Given that most breaches are long running in nature where systems are slowly compromised over months. How does one leverage Big Data and Predictive Modeling to rewire and re-architect creaky defenses?
c. Most importantly, how can applications implement security in a manner that they constantly adapt and learn;

If there were just a couple of sentences to sum up Gartner’s forecast for 2016 in a succinct manner, it would be “The emergence of the Digital Mesh & the rapid maturation of IoT will serve to accelerate business transformation across industry verticals. The winning enterprises will begin to make smart technology investments in Big Data, DevOps & Cloud practices  to harness these changes “.

Design & Architecture of a Next Gen Market Surveillance System..(2/2)

This article is the final installment in a two part series that covers one of the most critical issues facing the financial industry – Investor & Market Integrity Protection via Global Market Surveillance. While the first (and previous) post discussed the global scope of the problem across multiple global jurisdictions –  this post will discuss a candidate Big Data & Cloud Computing Architecture that can help market participants (especially the front line regulators – the Stock Exchanges themselves) & SROs (Self Regulatory Authorities) implement these capabilities in their applications & platforms.

Business Background –

The first article in this two part series laid out the five business trends that are causing a need to rethink existing Global & Cross Asset Surveillance based systems.

To recap them below –

  1. The rise of trade lifecycle automation across the Capital Markets value chain and the increasing use of technology across the lifecycle contributes to an environment where speeds and feeds are contributing to a huge number of securities changing hands (in huge quantities) in milliseconds across 25+ global venues of trading; automation leads to increase in trading volumes which adds substantially to the increased risk of fraud
  2. The presence of multiple avenues of trading (ATF – alternative trading facilities and MTF – multilateral trading facilities) creates opportunities for information and price arbitrage that were never a huge problem before in terms of multiple markets and multiple products across multiple geographies with different regulatory requirements.This has been covered in a previous post in this blog at –
    http://www.vamsitalkstech.com/?p=412
  3. As a natural consequence of all of the above – (the globalization of trading where market participants are spread across multiple geographies) it makes it all the more difficult to provide a consolidated audit trail (CAT) to view all activity under a single source of truth ;as well as traceability of orders across those venues; this is extremely key as fraud is becoming increasingly sophisticated e.g the rise of insider trading rings
  4. Existing application (e.g ticker plants, surveillance systems, DevOps) architectures are becoming brittle and underperforming as data and transaction volumes continue to go up & data storage requirements keep rising every year. This leads to massive gaps in compliance data. Another significant gap is found while performing a range of post trade analytics – many of which are beyond the simple business rules being leveraged right now and now increasingly need to move into the machine learning & predictive domain. Surveillance now needs to include non traditional sources of data e.g trader email/chat/link analysis etc that can point to under the radar rogue trading activity before that causes the financial system huge losses. E.g. the London Whale, the LIBOR fixing scandal etc 
  5. Again as a consequence of increased automation, backtesting of data has become a challenge – as well as being able to replay data across historical intervals. This is key in mining for patterns of suspicious activity like bursty spikes in trading as well as certain patterns that could indicate illegal insider selling

The key issue becomes – how do antiquated surveillance systems move into the era of Cloud & Big Data enabled innovation as a way of overcoming these business challenges?

Technology Requirements –

An intelligent surveillance system needs to store trade data, reference data, order data, and market data, as well as all of the relevant communications from all the disparate systems, both internally and externally, and then match these things appropriately. The system needs to account for multiple levels of detection capabilities starting with a) configuring business rules (that describe a fraud pattern) as well as b) dynamic capabilities based on machine learning models (typically thought of as being more predictive). Such a system also needs to parallelize execution at scale to be able to meet demanding latency requirements for a market surveillance platform.

The most important technical essentials for such a system are –

  1. Support end to end monitoring across a variety of financial instruments across multiple venues of trading. Support a wide variety of analytics that enable the discovery of interrelationships between customers, traders & trades as the next major advance in surveillance technology.
  2. Provide a platform that can ingest from tens of millions to billions of market events (spanning a range of financial instruments – Equities, Bonds, Forex, Commodities and Derivatives etc) on a daily basis from thousands of institutional market participants
  3. The ability to add new business rules (via either a business rules engine and/or a model based system that supports machine learning) is a key requirement. As we can see from the first post, market manipulation is an activity that seems to constantly push the boundaries in new and unforseen ways
  4. Provide advanced visualization techniques thus helping Compliance and Surveillance officers manage the information overload.
  5. The ability to perform deep cross-market analysis i.e. to be able to look at financial instruments & securities trading on multiple geographies and exchanges e.g.
  6. The ability to create views and correlate data that are both wide and deep. A wide view will look at related securities across multiple venues; a deep view will look for a range of illegal behaviors that threaten market integrity such as market manipulation, insider trading, watch/restricted list trading and unusual pricing.
  7. The ability to provide in-memory caches of data  for rapid pre-trade compliance checks.
  8. Ability to create prebuilt analytical models and algorithms that pertain to trading strategy (pre- trade models –. e.g. best execution and analysis). The most popular way to link R and Hadoop is to use HDFS as the long-term store for all data, and use MapReduce jobs (potentially submitted from Hive or Pig) to encode, enrich, and sample data sets from HDFS into R.
  9. Provide Data Scientists and Quants with development interfaces using tools like SAS and R.
  10. The results of the processing and queries need to be exported in various data formats, a simple CSV/txt format or more optimized binary formats, JSON formats, or even into custom formats.  The results will be in the form of standard relational DB data types (e.g. String, Date, Numeric, Boolean).
  11. Based on back testing and simulation, analysts should be able to tweak the model and also allow subscribers (typically compliance personnel) of the platform to customize their execution models.
  12. A wide range of Analytical tools need to be integrated that allow the best dashboards and visualizations.

Application & Data Architecture –

The dramatic technology advances in Big Data & Cloud Computing enable the realization of the above requirements.  Big Data is dramatically changing that approach with advanced analytic solutions that are powerful and fast enough to detect fraud in real time but also build models based on historical data (and deep learning) to proactively identify risks.

To enumerate the various advantages of using Big Data  –

a) Real time insights –  Generate insights at a latency of a few milliseconds
b) A Single View of Customer/Trade/Transaction 
c) Loosely coupled yet Cloud Ready Architecture
d) Highly Scalable yet Cost effective

The technology reasons why Hadoop is emerging as the best choice for fraud detection: From a component perspective Hadoop supports multiple ways of running models and algorithms that are used to find patterns of fraud and anomalies in the data to predict customer behavior. Examples include Bayesian filters, Clustering, Regression Analysis, Neural Networks etc. Data Scientists & Business Analysts have a choice of MapReduce, Spark (via Java,Python,R), Storm etc and SAS to name a few – to create these models. Fraud model development, testing and deployment on fresh & historical data become very straightforward to implement on Hadoop. The last few releases of enterprise Hadoop distributions (e.g. Hortonworks Data Platform) have seen huge advances from a Governance, Security and Monitoring perspective.

A shared data repository called a Data Lake is created, that can capture every order creation, modification, cancelation and ultimate execution across all exchanges. This lake provides more visibility into all data related to intra-day trading activities. The trading risk group accesses this shared data lake to processes more position, execution and balance data. This analysis can be performed on fresh data from the current workday or on historical data, and it is available for at least five years—much longer than before. Moreover, Hadoop enables ingest of data from recent acquisitions despite disparate data definitions and infrastructures. All the data that pertains to trade decisions and trade lifecycle needs to be made resident in a general enterprise storage pool that is run on the HDFS (Hadoop Distributed Filesystem) or similar Cloud based filesystem. This repository is augmented by incremental feeds with intra-day trading activity data that will be streamed in using technologies like Sqoop, Kafka and Storm.

The above business requirements can be accomplished leveraging the many different technology paradigms in the Hadoop Data Platform. These include technologies such as enterprise grade message broker – Kafka, in-memory data processing via Spark & Storm etc.

Market_Surveillance

                  Illustration :  Candidate Architecture  for a Market Surveillance Platform 

The overall logical flow in the system –

  • Information sources are depicted at the left. These encompass a variety of institutional, system and human actors potentially sending thousands of real time messages per second or sending over batch feeds.
  • A highly scalable messaging system to help bring these feeds into the architecture as well as normalize them and send them in for further processing. Apache Kafka is chosen for this tier.Realtime data is published by Payment Processing systems over Kafka queues. Each of the transactions has 100s of attributes that can be analyzed in real time to  detect patterns of usage.  We leverage Kafka integration with Apache Storm to read one value at a time and perform some kind of storage like persist the data into a HBase cluster.In a modern data architecture built on Apache Hadoop, Kafka ( a fast, scalable and durable message broker) works in combination with Storm, HBase (and Spark) for real-time analysis and rendering of streaming data. 
  • Trade data is thus streamed into the platform (on a T+1 basis), which thus ingests, collects, transforms and analyzes core information in real time. The analysis can be both simple and complex event processing & based on pre-existing rules that can be defined in a rules engine, which is invoked with Storm. A Complex Event Processing (CEP) tier can process these feeds at scale to understand relationships among them; where the relationships among these events are defined by business owners in a non technical or by developers in a technical language. Apache Storm integrates with Kafka to process incoming data. Storm architecture is covered briefly in the below section.
  • HBase provides near real-time, random read and write access to tables (or ‘maps’) storing billions of rows and millions of columns. In this case once we store this rapidly and continuously growing dataset from the information producers, we are able  to do perform super fast lookup for analytics irrespective of the data size.
  • Data that has analytic relevance and needs to be kept for offline or batch processing can be handled using the storage platform based on Hadoop Distributed Filesystem (HDFS) or Amazon S3. The idea to deploy Hadoop oriented workloads (MapReduce, or, Machine Learning) to understand trading patterns as they occur over a period of time.Historical data can be fed into Machine Learning models created above and commingled with streaming data as discussed in step 1.
  • Horizontal scale-out (read Cloud based IaaS) is preferred as a deployment approach as this helps the architecture scale linearly as the loads placed on the system increase over time. This approach enables the Market Surveillance engine to distribute the load dynamically across a cluster of cloud based servers based on trade data volumes.
  • To take an incremental approach to building the system, once all data resides in a general enterprise storage pool and makes the data accessible to many analytical workloads including Trade Surveillance, Risk, Compliance, etc. A shared data repository across multiple lines of business provides more visibility into all intra-day trading activities. Data can be also fed into downstream systems in a seamless manner using technologies like SQOOP, Kafka and Storm. The results of the processing and queries can be exported in various data formats, a simple CSV/txt format or more optimized binary formats, json formats, or you can plug in custom SERDE for custom formats. Additionally, with HIVE or HBASE, data within HDFS can be queried via standard SQL using JDBC or ODBC. The results will be in the form of standard relational DB data types (e.g. String, Date, Numeric, Boolean). Finally, REST APIs in HDP natively support both JSON and XML output by default.
  • Operational data across a bunch of asset classes, risk types and geographies is thus available to risk analysts during the entire trading window when markets are still open, enabling them to reduce risk of that day’s trading activities. The specific advantages to this approach are two-fold: Existing architectures typically are only able to hold a limited set of asset classes within a given system. This means that the data is only assembled for risk processing at the end of the day. In addition, historical data is often not available in sufficient detail. HDP accelerates a firm’s speed-to-analytics and also extends its data retention timeline
  • Apache Atlas is used to provide governance capabilities in the platform that use both prescriptive and forensic models, which are enriched by a given businesses data taxonomy and metadata.  This allows for tagging of trade data  between the different businesses data views, which is a key requirement for good data governance and reporting. Atlas also provides audit trail management as data is processed in a pipeline in the lake
  • Another important capability that Hadoop can provide is the establishment and adoption of a lightweight entity ID service – which aids dramatically in the holistic viewing & audit tracking of trades. The service will consist of entity assignment for both institutional and individual traders. The goal here is to get each target institution to propagate the Entity ID back into their trade booking and execution systems, then transaction data will flow into the lake with this ID attached providing a way to do Customer & Trade 360.
  • Output data elements can be written out to HDFS, and managed by HBase. From here, reports and visualizations can easily be constructed.One can optionally layer in search and/or workflow engines to present the right data to the right business user at the right time.  

The Final Word [1] –

We have discussed FINRA as an example of a forward looking organization that has been quite vocal about their usage of Big Data. So how successful has this approach been for them?

The benefits Finra has seen from big data and cloud technologies prompted the independent regulator to use those technologies as the basis for its proposal to build the Consolidated Audit Trail, the massive database project intended to enable the SEC to monitor markets in a high-frequency world. Over the summer, the number of bids to build the CAT was narrowed down to six in a second round of cuts. (The first round of cuts brought the number to 10 from more than 30.) The proposal that Finra has submitted together with the Depository Trust and Clearing Corporation (DTCC) is still in contention. Most of the bids to build and run the CAT for five years are in the range of $250 million, and Finra’s use of AWS and Hadoop makes its proposal the most cost-effective, Randich says.

References –

[1] http://www.fiercefinanceit.com/story/finra-leverages-cloud-and-hadoop-its-consolidated-audit-trail-proposal/2014-10-16

Ensuring Market Integrity, Investor Protection via Market Surveillance..(1/2)

Wall-Street

From coast to coast, the FBI and Securities and Exchange Commission have ensnared people not only at hedge funds, but at technology and pharmaceutical companies, consulting and law firms, government agencies, and even a major stock exchange.” – Preet Bharara, U.S. Attorney for the Southern District of New York, 2013; while annoucing charges in a massive insider trading scandal 

Banking & High Finance are enormously complicated industries yet have significant impact on the daily lives of every denizen on the planet. Furthermore, firms in the most riskiest part of Banking – Capital Markets – deal in complex financial products in a dynamic industry. Capital Markets have been undergoing a rapid transformation  – at a  higher rate perhaps than Retail Banking or Corporate Banking. This is being fueled by technology advances that produce ever lower latencies of trading, an array of financial products, differing (and newer) market participants, heavy quant based trading strategies and multiple venues (exchanges, dark pools etc) that compete for flow based on new products & services.

The Capital Markets value chain encompasses firms on the buy side (e.g wealth managers), the sell side (e.g broker dealers) & firms that provide custodial services as well as technology providers who provide platforms for post trade analytics support. The crucial link to all of these is the execution venues themselves as well as the clearing houses.With increased globalization driving the capital markets and an increasing number of issuers, one finds an ever increasing amount of complexity across a range of financial instruments assets (stocks, bonds, derivatives, commodities etc).

The primary exchanges for equity (stock) trading are majors like NYSE,NASDAQ and the LSE (the London Stock Exchange). Futures and Options are dominated by CME and EUREX. However, deregulation has also resulted in increased fragmentation i.e the above traditional leaders now have competition from non traditional exchange operators like Electronic Communication Networks (ECNs), Crossing Networks (e.g. investment banks developing their own internal crossing systems to match buyers & sellers etc) &  Dark Liquidity Pools etc.

Given the incredible amount of complexity & continuous fragmentation in venues, the industry has, (in what some would term a natural consequence of unchecked greed of a few), been beset by undesirable behavior (ranging from the suboptimal to criminal) by market participants, Such actions have periodically threatened the entire financial system while also shaking investor confidence in the financial system as a whole.

Despite the best efforts of regulatory authorities and legislation designed to tackle the problem of market manipulation and rigging – manipulators  always seem to find newer ways of profiting at the expense of regular investors and the vast majority of (honest) participants.

To recap some of the more prominent scandals over the last few years –

  • the Libor scandal of 2015 (where a rogue trader was found guilty of manipulating the global benchmark interest rate used as a basis for a range of financial deals)
  • the Flash Crash of 2010 (a trillion dollar market crash that started at 2:32 PM EST on May 6 and lasted for approximately 36 minutes causing the various indices to drop precipitously
  • Various Insider Trading scandals from 2008 till current day (the most prominent among these being the collapse of hedge fund Galleon Investments in 2009)
  • the Collapse of Knight Capital in 2012 (due to a fat finger error caused by deploying test code into production)  caused a major disruption in the trading of 140 odd securities while sending the firm into a tailspin. This single incident caused by a loss of $400 million in a single trading day. The result was that the firm’s enterprise value eroded during the matter of a few minutes forcing a sale almost overnight

The undesirable (read deleterious) consequence of High Frequency Trading (HFT)  –  

High-frequency trading (HFT) is essentially a high speed form of algorithmic trading that uses sophisticated networking technology and computer algorithms to rapidly trade securities. HFT uses proprietary trading strategies carried out by computers to move in and out of positions in seconds or fractions of a second.

A new trend in algorithmic trading is around the use of Big Data in a range of use-cases for both high and low latency trading. This covers and often blends, hardware solutions coupled with software algorithms. One example of this uses Big Data technologies, coupled with flash memory to facilitate development of new trading strategies that can use large volumes of tick data, weather data, social data, or geo-location data to make decisions in real time. In effect, predicting where the herd will move in a given market before they actually turn.

In fact, pioneering hedge funds are pioneering new types of algorithmic trading that rely on advanced self–learning analytics coupled with large linearly scalable data stores and low latency enabling hardware technology. Added to all these advances in automation have led to increased trading volumes and sophistication in the HFT space.

On the sell side, Banks have a strong need to provide the Head of Capital Markets & Risk Managers with a 360-degree view of the customers across their entire range of trading desks, not just to optimize enterprise profits, but to manage risk & fraudulent behavior.

Market Manipulation & the need for Surveillance –  

Market manipulation is an umbrella term that usually refers to a wide array of trading practices that serve to distort securities prices thus enabling market manipulators to illicitly profit at the expense of other participants, by creating information asymmetry.

Market manipulation covers practices like insider trading (where securities are sold or bought based on nonpublic information) or misleading auto trading practices like “spoofing” or “pumping and dumping” etc.

Why is surveillance emerging as a major challenge over the last five years or so ? I contend that it is a nuanced issue with five major business trends (with downstream technology ramifications) driving them –

  1. The rise of automation across the Capital Markets value chain and the increasing use of technology across the lifecycle contributes to an environment where speeds and feeds are contributing to a huge number of securities changing hands (in huge quantities) in milliseconds; automation adds substantially to the risk of fraud
  2. The presence of multiple avenues of trading (ATF – alternative trading facilities and MTF – multilateral trading facilities) creates opportunities for information and price arbitrage that were never a huge problem before in terms of multiple markets and multiple products across multiple geographies with different regulatory requirements.This has been covered in a previous post in this blog at –
    http://www.vamsitalkstech.com/?p=412
  3. As a natural consequence of all of the above – the globalization of trading where market participants are spread across multiple geographies makes it all the more difficult to provide a consolidated audit trail (CAT) to view all activity under a single source of truth ;as well as traceability of orders across those venues; this is extremely key as fraud is becoming increasingly sophisticated e.g the rise of insider trading rings
  4. Existing application (e.g ticker plant, backtesting, DevOps) architectures are becoming brittle and underperforming as data and transaction volumes continue to go up & data storage requirements keep rising every year. This leads to massive gaps in compliance data. Another significant gap is found while performing a range of post trade analytics – many of which are beyond the simple business rules being leveraged right now and now increasingly need to move into the machine learning & predictive domain
  5. As automation increases, backtesting of data has become a challenge – as well as being able to replay data across historical intervals. This is key in mining for patterns of suspicious activity like bursty spikes in trading as well as certain patterns that could indicate illegal insider selling

Market Surveillance – 

Market surveillance is can thus be defined as the activity of ingesting, curating and analyzing trade and position information on a real-time or T+1 basis with the intention of detecting irregular activity that could denote market abuse (as defined by the local regulatory authorities).

Market surveillance is generally out by Exchanges and Self Regulating Organizations (SRO) like the FINRA in the US – all of which have dedicated surveillance departments set up for this purpose. However, capital markets players on the Buy and Sell side also need to conduct extensive trade surveillance to report up internally. Pursuant to this goal, the exchanges & the SRO’s monitor transaction data including orders and executed trades & perform deep analysis to look for any kind of abuse and fraud.

To get a sense of the scale and the challenges involved in market surveillance, let us examine the leading US market regulator FINRA (Finance Industry Regulatory Authority).FINRA is a non-governmental organization that regulates member brokerages and exchange markets. Along with the Securities and Exchange Commission (SEC) , it acts as a virtual market regulator.

FINRA have also been a pioneer in their usage of technology to monitor billions of market transactions (from an average of 40 billion to a peak volume of 75 billion a day). They apply a variety of data-gathering & processing techniques, we work to uncover insider trading and any strategies firms or individuals use to gain an unfair advantage. FINRA extensively leverages technology to a). effectively oversee brokerage firms; b). accurately monitor the U.S. equities markets; c). quickly detect potential fraud.[2]

FINRA handles more data in a single day than what MasterCard processes in a year and Visa in 6 months. We use innovative technology, such as parallel computing hardware to process our ever-increasing data volume and cloud computing to automate the process and store the massive amounts of data. [2]FINRA
Illustration : Transaction Volumes Supported by FINRA  vs Payment Card Networks

FINRA have been vocal about their investments in innovative technology—namely Big Data & Cloud Computing—in order to build sophisticated surveillance systems that process extraordinary amounts of data while integrating with cutting-edge applications.

Role of Technology –

As can be seen from FINRA’s approach above, Market Surveillance essentially is a business problem that needs to be solved through the combination of large scale computing, cutting edge data management, business rules and predictive analytics.

The vast majority of current surveillance techniques, including the collection & processing of data may be insufficient to capture in a timely manner all of the information necessary to monitor efficiently and effectively trading activity that occurs in such dispersed markets.

The next post will expand on the above themes from a technology standpoint. We will throw up some interesting ideas in the area of Surveillance for players in light of the substantial data & computing assets the they possess. We also will examine how all of this can be leveraged to harness massive volumes of data across a large global marketplace.

Indeed technologies like Big Data are being looked at to provide a cross asset trade, market and static data repository. A golden source of data that can be used by the front office, middle office, operations, compliance, risk, and finance to provide a common view of the Capital Markets business. Such systems do not generally exist now making it hard to spot undesirable activity, among other things.

References –

[1] Market Surveillance – Wikipedia

[2] FINRA Technology –
http://technology.finra.org/

A Holistic New Age Technology Approach To Countering Payment Card Fraud (3/3)…

This final post in this three part series on Payment Card Fraud covers the evolving business landscape in countering Payment Card Fraud and will make specific recommendations from a best practices, futuristic & thought leadership standpoint.

Business Background – 

As  discussed previously – Consumers, Banks, Law Enforcement, Payment Processors, Merchants and Private label Card Issuers are faced with payment card fraud on a massive scale. As the accessibility and modes of usage of credit, debit and other payment cards burgeons and transaction volumes increase, Banks are losing tens of billions of dollars on an annual basis to fraudsters. The annual estimate is about $189 billion as estimated by Meridian Research.

Fraud is a pernicious problem which can also lead to Identity theft. The U.S. Department of Justice (DOJ) terms Identity theft as “one of the most insidious forms of white collar crime”. Identity theft typically results in multiple instances of fraud, which exact a heavy toll on both the consumer, the merchant and the overall economy.

Improved Consumer access to credit, complex financial products, faster electronic communications, a changing regulatory landscape and a high degree competition among financial institutions make it easier than ever for perpetrators to steal identities and falsify information. Identity theft is estimated to claim 100,000 victims a year, according to the U.S. Federal Trade Commission (FTC) and the Consumer Data Industry Association. The economic and emotional loss to consumers is staggering. It may take as long as several years to restore a victim’s credit reputation, and in the meantime financial and job opportunities may be lost. According to FTC estimates, the average identity theft victim doesn’t discover the problem for 13 months and then invests an average of $1,173 and 175 hours attempting to repair his or her credit record.[1]

Added to all this, from a regulatory perspective  – the Consumer Financial Protection Bureau created as part of the Dodd Frank Act prohibits unfair & abusive acts or practices in the collection of consumer debts.

Thus, fraud has become such a complex problem that it can only be tackled via a strategic, meaningful & multi faceted operational IT model. This model should allow financial organizations a migration path from more traditional approaches in favor of innovation – all while balancing risk. Fraud is a dynamic problem where new modes & threats are constantly evolving – these need counter approaches that extend beyond the current capabilities of older fraud prevention software.

The regulatory authorities in the US (where the bulk of the problem has been centered around) are now mandating the usage of Europay, MasterCard and Visa (EMV chip) technology – which should help reduce the risk of fraudulent point of sale (POS) transactions. To be fair, Visa and MasterCard have developed EMV technology that facilitates secure Card Not Present (CNP) transactions over the internet and phone.

However, we have also discussed the fact that fraudster rings now increasingly resemble sophisticated IT Operations. So this may not be able to put a large dent in the overall problem.

Business Requirements – 

  1. Provide capabilities to tackle the most complex types of fraud and to learn from fraud data & patterns to be able to stay ahead of criminal networks
  2. Provide comprehensive Omnichannel abilities to detect fraud across the multiple (and growing) avenues that payment cards are used by consumers
  3. Ability to view a customer as a single entity (or Customer 360) across all those channels & to be able to profile those
  4. Ability to segment those customers into populations based on their behavior patterns. This will vastly help improve anomaly detection capabilities while also helping reduce the false positive problem
  5. Incorporate many sources of data (both deep and wide) into the decisioning process in analyzing fraud. This includes not just the existing – customer databases, data on historical spending patterns etc but also credit reports, social media data and other datasets (e.g Government watchlists of criminal activity)
  6. Provide multiple levels of detection capabilities starting with a) configuring business rules (that describe a fraud pattern) as well as b) dynamic capabilities based on machine learning models (typically thought of as being more predictive)
  7. Provide multiple levels of dashboards ranging from the Descriptive (Business Intelligence) to the Prescriptive (business simulation as well as optimization)
  8. Out of the box integration with Government (and peer Bank’s) provided watch list information and an ability to incorporate these into the decisioning engine in realtime

Technology Requirements – 

  1. Ingest (& cleanse) real time Card usage data to get complete view of every transaction with a view to detecting potential fraud
  2. Support multiple ways of ingest across a pub-sub messaging paradigm,clickstreams, logfile aggregation and batch data – at a minimum
  3. Allow business users to specify 1000’s of rules that signal fraud e.g. when the same credit card is used from multiple IP addresses within a very short span of time
  4. Support batch oriented analytics that provide predictive and historical models of performance
  5. As much as possible, eliminate false positives as these cause inconvenience to customers and also inhibit transaction volumes
  6. Support a very high degree of scalability – 10’s of millions of transactions a day and 100’s of TB of historical information
  7. Predict cardholder behavior (using a 360 degree view) to provide better customer service
  8. Help target customer transactions for personalized communications on transactions that raise security flags
  9. Deliver alerts the ways customers want — web, text, email and mail etc
  10. Track these events end to end from a strategic perspective across dashboards and predictive models
  11. Help provide a complete picture of high value customers to help drive loyalty programs

Next Gen Fraud Detection Platform –

Fraud_NextGen

Illustration :  Integrated Capabilities in a Next Gen Fraud Detection Platform

The Fraud detection platform of the future will have four key integrated capabilities – an ability to support multiple channels of ingress (Omichannel), ability to store & process massive volumes of data at scale while leveraging an agile deployment (Big Data & Cloud) based architecture, advanced predictive modeling & Data Mining capabilities and finally – a native approach to Business Process Automation (BPM).

1. Native Omnichannel Capabilities

Banking is an increasingly high tech endeavor with younger customers increasingly seeking personalization capabilities across the multiple avenues available to them. Thus, Omnichannel is the future of consumer banking. According to research from Mintel/Compremedia, almost 27 percent of all US consumers still receive at least one credit card offer per month via mail. The company’s research also shows that 43 percent of all customers (both acquisition and retention) receive an offer via digital communications, 25 percent by direct mail, 15 percent in-person and 17 percent by phone or other channels. In terms of response and application, 52 percent apply online, 18 percent by mail, 17 percent by mobile, 8 percent by phone and 6 percent other channels. [2]

Within the domain of Payment Cards themselves – offering customers multiple avenues to buy goods  while making the process as seamless and secure as possible. Firms would do well to consider conducting an ideal “Customer Journey Path Analysis” as the crucial first step. The goal is to have a seamless experience for customers across internet, phone, kiosk, POS and other channels while reducing fraud that can happen due to a siloed data architecture.

Once the customer’s journey is well understood in terms of segments as well as individual attributes & behaviors, Banks can use this data aim for a variety of positive outcomes such as selling the customer a higher segment credit card, additional banking services, or simply get their online newspaper subscription paid with their credit card.  Direct marketing in today’s world is a key component that makes up Omnichannel marketing. Using it effectively in online, mobile and offline environments will drive positive results.[2]

Big Data is dramatically changing that approach with advanced analytic solutions that are powerful and fast enough to detect fraud in real time but also build models based on historical data (and deep learning) to proactively identify risks. This is key to improving customer experience while improving loyalty.

2. Data Processing at Scale (NoSQL and Hadoop) –

Big Data is dramatically changing that approach with advanced analytic solutions that are powerful and fast enough to detect fraud in real time but also build models based on historical data (and deep learning) to proactively identify risks.

The business reasons why Hadoop is emerging as the best choice for fraud detection are –

a) Real time insights –  Hadoop can be used to generate insights at a latency of a few milliseconds  that can assist Banks in detecting fraud as soon as it happens
b) A Single View of Customer/Transaction & Fraud enabled by Hadoop
c) Loosely coupled yet Cloud Ready Architecture
d) Highly Scalable yet Cost effective

The technology reasons why Hadoop is emerging as the best choice for fraud detection: From a component perspective Hadoop supports multiple ways of running models and algorithms that are used to find patterns of fraud and anomalies in the data to predict customer behavior. Examples include Bayesian filters, Clustering, Regression Analysis, Neural Networks etc. Data Scientists & Business Analysts have a choice of MapReduce, Spark (via Java,Python,R), Storm etc and SAS to name a few – to create these models. Fraud model development, testing and deployment on fresh & historical data become very straightforward to implement on Hadoop. The combination of Big Data as well as Omnichannel allows the implementation of strong customer segmentation and profiling, which helps usher in fraud detection models based on risk scoring.

3. Predictive Analytics (Machine Learning and Deep Learning) –

Customer profiling and Behavior Patterns are key techniques used to extract meaningful information from the data produced in Big Data projects. Most fraud detection solutions operate on individual transaction data which may not contain the entire gamut of information needed to predict probability of fraud. This needs to be combined with past historical information based on the consumer’s spending patterns and history to detect anomalous (and fraudulent) behavior.

Many solutions operate using transactional data, such as credit card purchase transactions, or other types of data that change over time. In its raw form, this data is very difficult to use in predictive models for several reasons. First, an isolated transaction contains very little information about the behavior of the individual who generated the transaction. In addition, transaction patterns change rapidly over time. Finally, this type of data can often be highly complex.

4. Business Process Management & Business Rules Engines –

Business process management (BPM) as an industry discipline has been around for over a decade, but only recently have verticals like financial services realized how important it is to business automation. The core ideas around BPM are not that complex or arcane. For the newbies among us — every enterprise is composed of repeatable business activities done by human actors. These steps are the core of the functioning of the enterprise, indeed they are the very lifeblood. There is significant business value in being able to document, simulate, manage, automate, and monitor business processes.

Financial services are fertile ground for business process automation, since most banks across their various lines of business are simply a collection of core and differentiated processes. Examples are consumer banking (with processes including onboarding customers, collecting deposits, conducting business via multiple channels, and compliance with regulatory mandates such as KYC and AML); investment banking (including straight-through-processing, trading platforms, prime brokerage, and compliance with regulation); payment services; and wealth management (including modeling model portfolio positions and providing complete transparency across the end-to-end life cycle). The key takeaway is that driving automation can result not just in better business visibility and accountability on behalf of various actors. It can also drive revenue and contribute significantly to the bottom line.It enables enterprise business and IT users to document, simulate, manage, automate and monitor business processes and policies. It is designed to empower business and IT users to collaborate more effectively, so business applications can be changed more easily and quickly.

Within the Payment Card Fraud area, BPM tools can be leveraged to create fraud detection workflows, alerts & dashboards which automate away the process thus making it highly repeatable, auditable & amenable to improvement. Every business actor in the fraud  domain can be brought into the business process and actions delegated to them.

Open Source alternatives are now very viable in the BPM space as well. For instance, Red Hat’s JBoss BPM Suite is a flexible and powerful BPM platform which offers integrated business process modeling, execution, and monitoring capabilities for numerous use cases. A market leading platform, it can be easily deployed in a variety environments (on prem/virtual/cloud etc), and, as a result, can be integrated into multiple runtime architectures (appservers, webservice APIs etc) and configured to support a variety of business scenarios.

The final word – 

Most existing fraud solutions that are in use today have been created using a (now) legacy approach and not built for today’s requirements for Omnichannel enabled eCommerce, and are not a good fit for the millennial customer. Industry players that lack dynamic and flexible fraud prevention technology platforms are at a severe competitive disadvantage, ultimately running the risk of slowing revenue growth, customer defections & reputational risk.

References – 

[1] “PayPal fights Fraud With Machine Learning and Human Detectives” – From WSJ.com

http://blogs.wsj.com/cio/2015/08/25/paypal-fights-fraud-with-machine-learning-and-human-detectives/

[2] “In Search of the Omnichannel Card Prospect” –

In Search of The Omnichannel Card Prospect