A Framework for Model Risk Management

“There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don’t know. But there are also unknown unknowns. There are things we don’t know we don’t know.” – Donald Rumsfeld, 2002,  Fmr US Secy of Defense

This is the fourth in a series of blogs on Data Science that I am jointly authoring with Maleeha Qazi, (https://www.linkedin.com/in/maleehaqazi/). We have previously covered Data quality  issues @ http://www.vamsitalkstech.com/?p=5396 and the inefficiencies that result from a siloed data science process @ http://www.vamsitalkstech.com/?p=5046 . We have also discussed the ideal way Data Scientists would like their models deployed for the maximal benefit and use – as a Service @ http://www.vamsitalkstech.com/?p=5321. This fourth blogpost discusses an organizational framework for managing business risk which comes with a vast portfolios of model.  

Introduction

With machine learning increasing in popularity and adoption across industries, models are increasing in number and scope. McKinsey estimates that large enterprises have seen an increase of about 10 – 25% in their complex models which are being employed across areas as diverse as customer acquisition, risk management, insurance policy management, insurance claims processing, fraud detection and other advanced analytics. However, this increase is accompanied by a rise in model risk where incorrect model results, or design, contributes to erroneous business decisions. In this blog post, we discuss the need for model risk management (MRM) and a generic framework to achieve the same from an industry standpoint.

Model Risk Management in the Industry

The Insurance industry has extensively used predictive modeling across a range of business functions including policy pricing, risk management, customer acquisition, sales, and internal financial functions. However as predictive analytics has become increasingly important there is always a danger, or a business risk, incurred due to the judgment of the models themselves.  While the definition of a model can vary from one company to another, we would like to define a model as a representation of some real-world phenomenon based on the real-world inputs (both quantitative and qualitative) shown to it, which is generated by operating on the inputs using an algorithm to produce a business insight or decision. The model can also provide some level of explanation for the reasons it arrived at the corresponding business insight. There are many ways to create and deliver models to applications. These vary from spreadsheets to specialized packages and platforms. We have covered some of these themes from a model development perspective in a previous blog @ – http://www.vamsitalkstech.com/?p=5321.

Models confer a multitude of benefits, namely:

  1. The ability to reason across complex business scenarios spanning customer engagement, back-office operations, and risk management
  2. The ability to automate decision-making based on historical patterns across large volumes of data
  3. The audit-ability of the model which can explain to the business user how the model arrived at a certain business insight

The performance and the composition of a model depend on the intention of the designer. The reliability of the model depends primarily on access to adequate and representative data and secondly on the ability of the designer to model complex real-world scenarios and not always assume best-case scenarios.

As the financial crisis of 2008 illustrated, the failure of models brought down the insurance company AIG which caused severe disruption to the global financial system, set off the wider crisis in the global economy. Over the last few years, the growing adoption of Machine Learning models has resulted in their increased adoption into key business processes. This illustrates that if models go wrong, it can cause severe operational losses.  This should illustrate the importance of putting in place a strategic framework for managing model risk.

A Framework for Model Risk Management

The goal of Model Risk Management (MRM) is to ensure that the entire portfolio of models is governed like any other business asset. To that effect, a Model Risk Management program needs to include the following elements:

  1. Model Planning – The first step in the MRM process is to form a structure by which models created across the business are done so in a strategic and planned manner. This phase covers the ability to ensure that model objectives are well defined across the business, duplication is avoided, best practices around model development are ensured, & making sure modelers are provided the right volumes of data with high quality to create the most effective models possible. We have covered some of these themes around data quality in a previous blogpost @ http://www.vamsitalkstech.com/?p=5396    
  2. Model Validation & Calibration – As models are created for specific business functions, they must be validated for precision [1], and calibrated to reflect the correct sensitivity [4] & specificity [4] that the business would like to allow for. Every objective could have it’s own “sweet spot” (i.e. threshold) that they want to attain by using the model. For example: a company who wants to go green but realizes that not all of it’s customers have access to (or desire to use) electronic modes of communication might want to send out the minimum number of flyers that can get the message out but still keep their carbon footprint to a minimum without losing revenue by not reaching the correct set of customers. All business validation is driven by the business objectives that must be reached and how much wiggle room there is for negotiation.
  3. Model Management – Models that have made it to this stage must now be managed. Management here reflects answering questions suck: who should use what model for what purpose, how long should the models be used without re-evaluation, what is the criteria for re-evaluation, who will monitor the usage to prevent wrong usage, etc. Management also deals with logistics like where do the models reside, how are they accessed & executed, who gets to modify them versus just use them, how will they be swapped out when needed without disrupting business processes dependent on them, how should they be versioned, can multiple versions of a model be deployed simultaneously, how to detect data fluctuations that will disrupt model behavior prior to it happening, etc.
  4. Model Governance – Model Governance covers some of the most strategic aspects of Model Risk Management. The key goal of this process is to ensure that the models are being managed in conformance with industry governance and are being managed with a multistage process across their lifecycle – from Initiation to Business Value to Retirement.

Regulatory Guidance on Model Risk Management

The most authoritative guide on MRM comes from the Federal Reserve System – FRB SR 11-7/OCC Bulletin 2011-12. [3] And though it is not directly applicable to the insurance industry (it’s meant mainly for the banking industry), its framework is considered by many to contain thought leadership on this topic. The SR 11-7 framework includes documentation as part of model governance. An article in the Society of Actuaries April 2016 Issue 3 [2], details a thorough method to use for documenting a model, the process surrounding it, and why such information is necessary. In a highly regulated industry like insurance, every decision made (e.g. assumptions made, judgment calls given circumstances at the time, etc.) in the process of creating a model could be brought under scrutiny & effects the risk of the model itself. With adequate documentation you can attempt to mitigate any risks you can foresee, and have a good starting point for those that might blindside you down the road.

And Now a Warning…

Realize that even after putting MRM into place, models are still limited – they cannot cope with what Donald Rumsfeld dubbed the “unknown unknowns”. As stated in an Economist article [5]: “Almost a century ago Frank Knight highlighted the distinction between risk, which can be calibrated in probability distributions, and uncertainty, which is more elusive and cannot be so neatly captured…The models may have failed but it was their users who vested too much faith in them”. Models, by their definition, are built using probability distributions based on previous experience to predict future outcomes. If the underlying probability distribution changes radically, they can no longer attempt to predict the future – because the assumption upon which they were built no longer holds. Hence the human element must remain vigilant and not put all their eggs into the one basket of automated predictions. A human should always question if the results of a model make sense and intervene when they don’t.

Conclusion

As the saying goes – “Models do not kill markets, people do.” A model is only as good as the assumptions and algorithm choices made by the designer, as well as the quality & scope of the data fed to it. However, enterprises need to put in place an internal model risk management program that ensures that their portfolio of models are constantly updated, enriched with data, and managed as any other strategic corporate asset. And never forget, that a knowledgeable human must remain in the loop.

References

[1] Wikipedia – “Precision and Recall”
https://en.wikipedia.org/wiki/Precision_and_recall

[2] The Society of Actuaries – “The Modeling Platform” https://www.soa.org/Library/Newsletters/The-Modeling-Platform/2016/april/mp-2016-iss3-crompton.aspx

[3] The Federal Reserve – SR 11-7: Guidance on Model Risk Management
https://www.federalreserve.gov/supervisionreg/srletters/sr1107.htm

[4] Wikipedia – “Sensitivity and Specificity”

https://en.wikipedia.org/wiki/Sensitivity_and_specificity

[5] The Economist: “Economic models and the financial crisis Why they crashed too”, Jun 19th 2014 by P.W., London.

https://www.economist.com/blogs/freeexchange/2014/06/economic-models-and-financial-crisis

The Tao of Data Monetization in Banking and Insurance & Strategies to Achieve the Same…

“We live in a world awash with data. Data is proliferating at an astonishing rate—we have more and more data all the time, and much of it was collected in order to improve decisions about some aspect of a business, government, or society. If we can’t turn that data into better decision making through quantitative analysis, we are both wasting data and probably creating suboptimal performance.”
Tom Davenport, 2013  – Professor Babson College, Best Selling Author and Leader at Deloitte Analytics

Data Monetization is the organizational ability to turn data into cost savings & revenues in existing lines of business and to create new revenue streams.

Digitization is driving Banks and Insurance companies to reinvent themselves…

Enterprises operating in the financial services and the insurance industry have typically taken a very traditional view of their businesses. As waves of digitization have begun slowly upending their established business models, firms have begun to recognize the importance of harnessing their substantial data assets which have been built over decades. These assets include fine-grained data about internal operations, customer information and external sources (as depicted in the below illustration). So what does the financial opportunity look like? PwC’s Strategy& estimates that the incremental revenue from monetizing data could potentially be as high as US$ 300 billion [1] every year beginning 2019. This is across all the important segments of financial services-  capital markets, commercial banking, consumer finance & banking, and insurance. FinTechs are also looking to muscle into this massive data opportunity,

The compelling advantages of Data Monetization have been well articulated across new business lines, customer experience, cost reduction et al. One of the key aspects of Digital transformation is data and the ability to create new revenue streams or to save costs using data assets.

..Which leads to a huge Market Opportunity for Data Monetization…

In 2013, PwC estimated that the market opportunity in data monetization was a nascent – US $175 million. This number has begun to grow immensely over the next five years with consumer banking and capital markets leading the way.

Digital first has been a reality in the Payments industry with Silicon Valley players like Google and Apple launching their own payments solutions (in the form of Google Pay and Apple Pay).

Visionary Banks & FinTechs are taking the lead in Data Monetization…

Leader firms such as Goldman Sachs & AIG have heavily invested in capabilities around data monetization. In 2012, Goldman purchased the smallest of the three main credit-reporting firms – TransUnion. In three years, Goldman has converted TransUnion into a data-mining machine. In addition to credit-reporting, TransUnion now gathers billions of data points about Americans consumers. This data is constantly analyzed and then sold to lenders, insurers, and others. Using data monetization, Goldman Sachs has made nearly $600 million in profit. It is expected to make about five times its initial $550 million investment. [2]

From the WSJ article…

By the time of its IPO in 2015, TransUnion had 30 million gigabytes of data, growing at 25% a year and ranging from voter registration in India to drivers’ accident records in the U.S. The company’s IPO documents boasted that it had anticipated the arrival of online lenders and “created solutions that catered to these emerging providers.”

As are forward looking Insurers …

The insurance industry is reckoning with a change in consumer behavior. Younger consumers are very comfortable with using online portals to shop for plans, compare them, purchase them and do other activities that increase the amount of data being collected by the companies. While data and models that operate on them have been critical in the insurance industry, it has been stronger around the actuarial areas. The industry has now begun heavily leveraging data monetization strategies across areas such as new customer acquisition, customer Underwriting, Dynamic Pricing et al. A new trend is for them to form partnerships with Automakers to tap into a range of telematics information such as driver behavior, vehicle performance, and location data. In fact, Automakers are already ingesting and constantly analyzing this data with the intention of leveraging it for a range of use-cases which include selling this data to insurance companies.

Leading carriers such as AXA are leveraging their data assets to strengthen broker and other channel relationships. AXA’s EB360 platform helps brokers with a range of analytic infused functions – e.g. help brokers track the status of applications, manage compensation, and commissions and monitor progress on business goals. AXA has also optimized user interfaces to ensure that data entry is minimized while supporting rapid quoting thus helping brokers easily manage their business thus strengthening the broker-carrier relationship.[3]

Introducing Five Data Monetization Strategies across Financial Services & Insurance…

Let us now identify and discuss five strategies that enable financial services firms to progressively monetize their data assets. It must be mentioned that doing so requires an appropriate business transformation strategy to be put into place. Such a strategy includes clear business goals such as improving core businesses to entering lateral business areas.

Monetization Strategy #1 – Leverage Data Collected during Business Operations to Ensure Higher Efficiency in Business Operations…

The simplest and easiest way to monetize on data is to begin collecting disparate data generated during the course of regular operations. An example in Retail Banking is to collect data on customer branch visits, online banking usage logs, clickstreams etc. Once collected, the newer data needs to be fused with existing Book of Record Transaction (BORT) data to then obtain added intelligence on branch utilization, branch design & optimization, customer service improvements etc. It is very important to ensure that the right metrics are agreed upon and tracked across the monetization journey.

Monetization Strategy #2 – Leverage Data to Improve Customer Service and Satisfaction…

The next progressive step in leveraging both internal and external data is to use it to drive new revenue streams in existing lines of business. This requires fusing both internal and external data to create new analytics and visualization. This is used to drive use cases relating to cross sell and up-sell of products to existing customers.

Demystifying Digital – Reference Architecture for Single View of Customer / Customer 360..(3/3)

Monetization Strategy #3 – Use Data to Enter New Markets…

A range of third-party data needs to be integrated and combined with internal data to arrive at a true picture of a customer. Once the Single View of a Customer has been created, the Bank/Insurer has the ability to introduce marketing and customer retention and other loyalty programs in a dynamic manner. These include the ability to combine historical data with real time data about customer interactions and other responses like clickstreams – to provide product recommendations and real time offers.

Demystifying Digital – the importance of Customer Journey Mapping…(2/3)

An interesting angle on this is to provide new adjacent products much like the above TransUnion example illustrates.

Monetization Strategy #4 – Establish a Data Exchange…

The Data Exchange is a mechanism where firms can fill in holes in their existing data about customers, their behaviors, and preferences. Data exchanges can be created using a consortium based approach that includes companies that span various verticals. Companies in the consortium can elect to share specific datasets in exchange for data while respecting data privacy and regulatory constraints.

Monetization Strategy #5 – Offer Free Products to Gather Customer Data…

Online transactions in both Banking and Insurance are increasing in number year on year. If Data is true customer gold then it must be imperative on companies to collect as much of it as they can. The goal is to create products that can drive longer & continuous online interactions with global customers. Tools like Personal Financial Planning products, complementary banking and insurance services are examples of where firms can offer free products that augment existing offerings.

A recent topical example in Telecom is Verizon Up, a program from the wireless carrier where consumers can earn credits (that they can use for a variety of paid services – phone upgrades, concert tickets, uber credits and movie premieres etc) in exchange for providing access to their browsing history, app usage, and location data. Verizon also intends to use the data to deliver targeted advertising to their customers. [4]

Consumers can win Lady Gaga tickets in Verizon’s new rewards program, which requires that they enroll in its targeted advertising program. PHOTO: ADREES LATIF/REUTERS

How Data Science Is a Core Capability for any Data Monetization Strategy…

Data Science and Machine learning approaches are the true differentiators and the key ingredients in any data monetization strategy. Further, it is a given that technological investments in Big Data Platforms, analytic investments in areas such as machine learning, artificial intelligence are also needed to stay on the data monetization curve.

How does this tie into practical use-cases discussed above? Let us consider the following usecases of common Data Science algorithms –

  • Customer Segmentation– For a given set of data, predict for each individual in a population, a discrete set of classes that this individual belongs to. An example classification is – “For all retail banking clients in a given population, who are most likely to respond to an offer to move to a higher segment”.
  • Pattern recognition and analysis – discover new combinations of business patterns within large datasets. E.g. combine a customer’s structured data with clickstream data analysis. A major bank in NYC is using this data to bring troubled mortgage loans to quick settlements.
  • Customer Sentiment analysis is a technique used to find degrees of customer satisfaction and how to improve them with a view of increasing customer net promoter scores (NPS).
  • Market basket analysis is commonly used to find out associations between products that are purchased together with a view to improving marketing products. E.g Recommendation engines which to understand what banking products to recommend to customers.
  • Regression algorithms aim to characterize the normal or typical behavior of an individual or group within a larger population. It is frequently used in anomaly detection systems such as those that detect AML (Anti Money Laundering) and Credit Card Fraud.
  • Profiling algorithms divide data into groups, or clusters, of items that have similar properties.
  • Causal Modeling algorithms attempt to find out what business events influence others.

Conclusion..

Banks and Insurers who develop data monetization capabilities will be positioned to create new service offerings and revenues. Done right (while maintaining data privacy & consumer considerations), the monetization of data represents a truly transformational opportunity for financial services players in the quest to become highly profitable.

References..

[1] PwC Strategy& – “The Data Gold Rush” – https://www.strategyand.pwc.com/media/file/Strategyand_The-Data-Gold-Rush.pdf

[2] WSJ – “How Goldman Sachs Made More Than $1 Billion With Your Credit Score”

https://www.wsj.com/articles/how-goldman-sachs-made-more-than-1-billion-with-your-credit-score-1491742835

[3] McKinsey Quarterly – “Harnessing the potential of data in insurance..”

http://www.mckinsey.com/industries/financial-services/our-insights/harnessing-the-potential-of-data-in-insurance

[4] Verizon Wants to Build an Advertising Juggernaut. It Needs Your Data First

https://www.wsj.com/articles/verizon-wants-to-build-an-advertising-juggernaut-it-needs-your-data-first-1504603801

Why Data Garbage-In means Analytics Garbage-Out..

This is the third in a series of blogs on Data Science that I am jointly authoring with Maleeha Qazi, (https://www.linkedin.com/in/maleehaqazi/). We have previously covered some of the inefficiencies that result from a siloed data science process @ http://www.vamsitalkstech.com/?p=5046 & the ideal way Data Scientists would like their models deployed for the maximal benefit and use – as a Service @ http://www.vamsitalkstech.com/?p=5321. As the name of this third blog post suggests, the success of a data science initiative depends on data. If the data going into the process is “bad” then the results cannot be relied upon. Our goal is to also suggest some practical steps that enterprises can take from a data quality & governance process standpoint. 

However, under the strong influence of the current AI hype, people try to plug in data that’s dirty & full of gaps, that spans years while changing in format and meaning, that’s not understood yet, that’s structured in ways that don’t make sense, and expect those tools to magically handle it. ” – Monica Rogati (Data Science Advisor and ex-VP  Jawbone – 2017) [1]

Image Credit – The Daily Omnivore

Introduction

Different posts in this blog have discussed Data Science and other Analytical approaches to some degree of depth. What is apparent is that whatever the kind of analytics – descriptive, predictive, or prescriptive – the availability of a wide range of quality data sources is key. However, along with volume and variety of data, the veracity, or the truth, in the data is as important. This blog post discusses the main factors that determine the quality of data from a Data Scientist’s perspective.  

The Top Issues of Data Quality

As highlighted in the above illustration, the top quality issues that data assets typically face are the following:

  1. Incomplete Data: The data provided for analysis should span the entire cross-section of known data about how the organization views its customers and products. This would include data generated from various applications that belong to the business, and external data bought from various vendors to enriched the knowledge base. The completeness criteria measures if all of the information about entities under consideration is available and useable.
  2. Inconsistent & Inaccurate Data: Consistency measures what data values give conflicting information and must be fixed. It also measures if all the data elements conform to specific and uniform formats and are stored in a consistent manner. Inaccurate data either has duplicate, missing or erroneous values. It also does not reflect an accurate picture of the state of the business at the point in time it was pulled.
  3. Lack of Data Lineage & Auditability: The data framework needs to support audit-ability, i.e provide an audit trail of how the data values were derived from source to analysis point; the various transformations performed on it to arrive at the data set being considered for analysis.
  4. Lack of Contextuality: Data needs to be accompanied by meaningful metadata – data that describes the concepts within the dataset.
  5. Temporally Inconsistent: This measures if the data was temporally consistent and meaningful given the time it was recorded.

What Business Challenges does Poor Data Quality Cause…

Image Credit – DataMartist

Data Quality causes the following business challenges in enterprises:

  1. Customer dissatisfaction: Across industries like Banking, Insurance, Telecom & Manufacturing, the ability to get a unified view of the customer & their journey is at the heart of the enterprise’s ability to promote relevant offerings & detect customer dissatisfaction. Currently, most industry players are woeful at putting together this comprehensive Single View of their Customers (SVC). Due to operational silos, each department possesses its own siloed & limited view of the customer across multiple channels. These views are typically inconsistent, lack synchronization with other departments, & miss a high amount of potential cross-sell and upsell opportunities. This is a data quality challenge at its core.
  2. Lost revenue: The Customer Journey problem has been an age-old issue which has gotten exponentially more complicated over the last five years as the staggering rise of mobile technology and the Internet of Things (IoT) have vastly increased the number of enterprise touch points that customers are exposed to in terms of being able to discover and purchase new products/services. In an OmniChannel world, an increasing number of transactions are being conducted online. In verticals like the Retail industry and Banking & Insurance industries, the number of online transactions conducted approaches an average of 40%. Adding to the problem, more and more consumers are posting product reviews and feedback online. Companies thus need to react in real-time to piece together the source of consumer dissatisfaction.
  3. Time and cost in data reconciliation: Every large enterprise nowadays runs expensive data re-engineering projects due to their data quality challenges. These are an inevitable first step in other digital projects which cause huge cost and time overheads.
  4. Increased time to market for key projects: Poor data quality causes poor data agility, which increases the time to market for key projects.
  5. Poor data means suboptimal analytics: Poor data quality causes the analytics done using it to be suboptimal – algorithms will end up giving wrong conclusions because the input provided to them is incorrect at best & inconsistent at worst.

Why is Data Quality a Challenge in Enterprises

Image Credit – DataMartist

The top reasons why data quality has been a huge challenge in the industry are:

  1. Prioritization conflicts: For most enterprises, the focus of their business is the product(s)/service(s) being provided, book-keeping is a mandatory but secondary concern. And since keeping the business running is the most important priority, keeping the books accurate for financial matters is the only aspect that gets most of the technical attention it deserves. Other data aspects are usually ignored.
  2. Organic growth of systems: Most enterprises have gone through a series of book-keeping methods and applications, most of which have no compatibility with one another. Warehousing data from various systems as they are deprecated, merging in data streams from new systems, and fixing data issues as these processes happen is not prioritized till something on the business end fundamentally breaks. Band-aids are usually cheaper and easier to apply than to try and think ahead to what the business will need in the future, build it, and back-fill it with all the previous systems’ data in an organized fashion.
  3. Lack of time/energy/resources: Nobody has infinite time, energy, or resources. Doing the work of making all the systems an enterprise chooses to use at any point in time talk to one another, share information between applications, and keep a single consistent view of the business is a near-impossible task. Many well-trained resources, time & energy is required to make sure this can be setup and successfully orchestrated on a daily basis. But how much is a business willing to pay for this? Most do not see short-term ROI and hence lose sight of the long-term problems that could be caused by ignoring the quality of data collected.
  4. What do you want to optimize?: There are only so many balls an enterprise can have up in the air to focus on without dropping one, and prioritizing those can be a challenge. Do you want to optimize the performance of the applications that need to use, gather and update the data, OR do you want to make sure data accuracy/consistency (one consistent view of the data for all applications in near real-time) is maintained regardless? One will have to suffer for the other.

How to Tackle Data Quality

Image Credit – DataMartist

                                                   

With the advent of Big Data and the need to derive value from ever increasing volumes and a variety of data, data quality becomes an important strategic capability. While every enterprise is different, certain common themes emerge as we consider the quality of data:

  1. The sheer number of transaction systems found in a large enterprise causes multiple challenges across the data quality dimensions. Organizations need to have valid frameworks and governance models to ensure the data’s quality.
  2. Data quality has typically been thought of as just data cleansing and fixing missing fields. However, it is very important to address the originating business processes that cause this data to take multiple dimensions of truth. For example, centralize customer onboarding in one system across channels rather than having every system do its own onboarding.
  3. It is clear from the above that data quality and its management is not a one time or siloed application exercise. As part of a structured governance process, it is very important to adopt data profiling and other capabilities to ensure high-quality data.

Conclusion

Enterprises need to define both quantitative and qualitative metrics to ensure that data quality goals are captured across the organization. Once this is done, an iterative process needs to be followed to ensure that a set of capabilities dealing with data governance, auditing, profiling, and cleansing is applied to continuously ensure that data is brought up to, and kept at, a high standard. Doing so can have salubrious effects on customer satisfaction, product growth, and regulatory compliance.

References

[1] Monica Rogati “The AI hierarchy of Needs” – https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007

Anti Money Laundering (AML) – Industry Insights & Reference Architectures…

This blog has from time to time discussed issues around the defensive portion of financial services industry  (Banking, Payment Processing, and Insurance etc). Anti Money Laundering (AML) is a critical area where institutions need to protect themselves and their customers from malicious activity. This post summarizes eight key blogs on the topic of AML published at VamsiTalksTech.com. It aims to serve as a handy guide for business and technology audiences tasked with implementing complex AML projects.

Image Credit – FIBA Anti-Money Laundering Compliance Conference

Introduction

Money laundering has emerged as an umbrella crime which facilitates public corruption, drug trafficking, tax evasion, terrorism financing etc. Banks and other financial institutions are expected to conduct business in a manner that protects their countries of operations and consumers from security risks such as laundering, terrorist financing, and corruption (the ML/TF risks). Given the global reach of financial products, a variety of regulatory authorities is concerned about money laundering.  Technology has become key to meeting the regulatory expectations as well as reducing costs in these onerous programs. As the below graphic from PwC [1] demonstrates this is one of the most pressing issues facing financial services industry.

The above infographic from PwC provides a handy visual guide to the state of global AML programs.

The Six Critical Gaps in Global AML Programs…

From an industry standpoint, the highest priority issues that are being pointed out by regulators include the following –

  1. Institutions failing to develop AML frameworks that are unique to the risks run by organizations given their product and geographic mix
  2. Failure to develop real-time insights into business transactions and assigning them elevated risks based on their elements
  3. Developing AML models that draw from the widest possible sources of data – both internal and external – to understand a true picture of the business
  4. Demonstrating a consistent approach across geographies
  5. Leveraging the latest developments in analytics including Machine Learning to enable the automation of AML programs
  6. Lack of appropriate business governance & change management in setting, monitoring and managing AML compliance programs, policies and procedures

With this background in mind, the complete list of AML blogs on VamsiTalksTech is included below.

# 1 – Why Banks should Digitize their Operations and how this will help their AML programs –

Digitization implies a mix of business models predicated on agile systems, rapid & iterative development and more importantly – a Data First strategy. These have significant impacts on AML programs as well in addition to helping increase market share.

A Digital Bank is a Data Centric Bank..

# 2 – Why Data Silos are a huge challenge in many cross organization projects such as AML –

Organizational Data Silos inhibit the effectiveness of AML programs as compliance officers cannot gain a single view of a customer or single view of a suspicious transaction or view the social graph in critical areas such as trade finance. This blog discusses the Silo anti-pattern and ways to mitigate silos from proliferating.

Why Data Silos Are Your Biggest Source of Technical Debt..

# 3 – The Major Workstreams Around AML Programs

The headline is self-explanatory but we discuss the five major work streams on global AML projects – Customer Due Diligence, Entity Analysis, Downstream Analytics, Ongoing Monitoring and Investigation Lifecycle.

Deter Financial Crime by Creating an Effective Anti Money Laundering (AML) Program…(1/2)

# 4 – Predictive Analytics Across the AML workstreams –

Here we examine how Predictive Analytics can be applied across all of the five work streams.

How Big Data & Predictive Analytics transform AML Compliance in Banking & Payments..(2/2)

# 5 – The Business Need for Big Data in AML programs  –

This post discusses the most important developments in building AML systems using Big Data Technology-

Building AML Regulatory Platforms For The Big Data Era

# 6 – A Detailed Look at how Enterprises can use Big Data and Advanced Analytics to reduce AML costs –

How to leverage Big Data and Advanced Analytics to detect a range of suspicious transactions and actors.

Big Data – Banking’s New Weapon In War Against Financial Crime..(1/2)

# 7 – Reference Architecture for AML  –

We discuss a Big Data enabled Reference Architecture of an enterprise-wide AML program.

Big Data – Banking’s New Weapon In War Against Financial Crime..(2/2)

Conclusion

According to Pricewaterhouse Coopers, the estimates of global money laundering flows were between 2-5% of global GDP [1] in 2016 – however, only 1% of these transactions were caught. Certainly, the global financial industry has a long way to go before they effectively stop these nefarious actors but there should be no mistaking that technology is a huge part of the answer.

References –

  1. Pricewaterhouse Coopers 2016 AML Survey  http://www.pwc.com/gx/en/services/advisory/forensics/economic-crime-survey/anti-money-laundering.html

Data Science in the Cloud A.k.a. Models as a Service (MaaS)..

This is second in a series of blogs on Data Science that I am jointly authoring with Maleeha Qazi, (https://www.linkedin.com/in/maleehaqazi/). We have previously covered some of the inefficiencies that result from a siloed data science process @ http://www.vamsitalkstech.com/?p=5046. All of the actors in the data science space can agree that becoming responsive to business demands is the overarching goal of the process. In this second blog post, we will discuss Model as a Service (MaaS), an approach to ensuring that models and their insights can be leveraged throughout a large organization.

Image Credit – Logistics Industry Blog

Introduction

Hardware as a Service (HaaS), Software as a Service (SaaS), Database as a Service (DBaaS), Infrastructure as a Service (IaaS), Platform as a service (PaaS), Network as a Service (NaaS), Backend as a service (BaaS), Storage as a Service (STaaS). While every IT delivery model is going the way of the cloud, does Data Science lag behind in this movement?  In such an environment, what do Data Scientists dream of to ensure that their models are constantly being trained on high quality and high volume production grade data?… Models as a Service (MaaS).

The Predictive Analytics workflow…

The Predictive Analytics workflow always starts with a business problem in mind. For example: “A marketing project to detect which customers are likely to buy new products or services in the next six months based on their historical & real time product usage patterns” or “Detect real-time fraud in credit card transactions.”

Illustration – The Predictive Analysis Workflow in a financial services setting

In use cases like these, the goal of the data science process is to be able to segment & filter customers by corralling them into categories that enable easy ranking. Once this is done, the business can setup easy and intuitive visualizations to present the results.

A lot of times, business groups have a hard time explaining what they would like to see – both in terms of input data and output format. In such cases, a prototype makes things easier from a requirement gathering standpoint.  Once the problem is defined, the data scientist/modeler identifies the raw data sources (both internal and external) which are pertinent to the business challenge.  They spend a lot of time in the process of collating the data (from a variety of sources like Oracle/SQL Server, DB2, Mainframes, Greenplum, Excel sheets, external datasets, etc.). The cleanup process involves dealing with missing values, corrupted data elements, formatting fields to be homogenous in terms of format, etc.

This data wrangling phase involves writing code to join various data elements so that a complete dataset is gathered in the Data Lake from a raw features standpoint, at the correct granularity for the problem at hand.  If more data is obtained as the development cycle is underway, the Data Science team has to go back & redo the process to incorporate the new data feeds. The modeling phase is where sophisticated algorithms come into play. Feature engineering takes in business concepts & raw data features and creates predictive features from them. The Data Scientist takes the raw & engineered features and creates a model by applying various algorithms & testing to find the best one. Once the model has been refined, & tested for accuracy and performance, it is ideally deployed as a service.

Challenges with the existing approach

The challenges with the above approach are:

  1. Business Scalability – Predictive analytics as highlighted above resembles a typical line of business project or initiative. The benefits of the learning from localized application initiatives are largely lost to the larger organization if you don’t allow multiple applications and business initiatives to access the models built.
  2. Lack of Data Richness – The models created by individual teams are not always enriched by cross organizational data constantly being generated by different business applications. In addition to that, the vast majority of industrial applications do not leverage all possible kinds of unstructured data & 3rd party data in their business applications. Enabling the models to be exposed to a range of data (both internal and external) can only enrich the insights generated.
  3. Cross Application Applicability – This challenge deals with how business intelligence insights from disparate applications (which leverage different models), to enhance business areas they weren’t originally created for. This could allow for customer centered insights in real-time. For example, consider a customer sales application and a call center application. Can cross application insights be used to understand that customers are calling into the call center because it has been hard to use the website to order products?
  4. Data Monetization  – What is critical in the ability to create new commercial business models is agile analytics around existing and new data sources. If it follows that enterprise businesses are being increasingly built around data assets, then it must naturally follow that data as a commodity can be traded or re-imagined to create revenue streams off of it. As an example, pioneering payment providers now offer retailers analytical services to help them understand which products perform best and how to improve the micro-targeting of customers. Thus, data is the critical prong of any digital initiative. This has led to efforts to monetize on data by creating platforms that support ecosystems of capabilities. To vastly oversimplify this discussion, the ability to monetize data needs two prongs – to centralize it in the first place and then to perform strong predictive modeling at large scale where systems need to constantly learn and optimize their interactions, responsiveness & services based on client needs & preferences. Thus, centralizing models offer more benefits than the typical enterprise can imagine.

    Enter Model As A Service…

    The MaaS takes in business variables (often hundreds or thousands of inputs) and provides as output model results upon which business decisions can be predicated upon. And also visualizations that augment and support business decision support systems. As depicted in the above illustration, once different predictive models are built, tested and validated, they are ready to be used in real world production deployments. MaaS is essentially a way of deploying these advanced models as a part of software applications where they are offered as a software subscription.

    MaaS also enables cleaner separation of the Application development process and the Data Science workflow.

    Business Benefits from a MaaS approach

    1. Exposing models to different lines of business thus increasing their usefulness and opening them up to feedback to help increase their accuracy.
    2. MaaS opens the models to any application that wants to take advantage of them. This forces Data scientists to work with business teams that are much broader than they otherwise would have access to work with normally.
    3. The provision of dashboards and business intelligence across the organization becomes much easier than with a siloed approach.
    4. MaaS as an approach fundamentally encourages an agile approach to managing data assets and also to rationalizing them. For any MaaS initiative to succeed, timely access needs to be provided to potentially hundreds of data sources in an organization. MaaS encourages a move to viewing data as a reusable asset across the organization.

    Technical advantages of the MaaS approach

    • Separation of concerns : software & data feeds maintained by IT, models maintained by Data Scientists.
    • Versioning of models can be separated from versioning of system(s) using models.
    • Same models can be utilized by multiple software packages for consistency.
    • Consistent handling of data sources: e.g. which “master” source provides what types of data for all the models so that a customer looks the same regardless of the model acting on the data for insights.
    • Single point for putting a “watch” on the performance of a model.
    • Controlled usage of model.
    • MaaS ensures that the analytic process can be automated from a deployment standpoint.  

    Conclusion

    MaaS can enable organizations to move their analytic practices and capabilities to the next level. It enables the best of both worlds – the ability to centralize the data science capabilities across an organization while keeping customer data securely inside the organization. Done right, it can enable the democratization of data science insights across a large enterprise.

The Why and How of an Enterprise API Strategy..

We discussed the emergence of Application Programming Interfaces (APIs) as a key business capability in Digital Platforms @ http://www.vamsitalkstech.com/?p=3834. We also saw how APIs can serve as a business interaction driven integration layer. APIs provide a layer that serves to connect backend business services across Digital applications across multiple channels. In this second post we will discuss the foundational business, technology, integration & governance capabilities that any Enterprise API Platform must support. The next and final post will discuss an API centric deployment architecture for a medium to large enterprise.

What is your API vision?

The first post in this series (http://www.vamsitalkstech.com/?p=3834) covered the need for industry players to treat APIs as a way of reinventing the many aspects of their business and their consumers.  From a high level standpoint, this can be done in one of three ways –

  1. Inculcating Digital Innovation both inside & inside out. Extending the boundaries of a large global or national enterprise or outside in, by enabling partners to build innovative applications.
  2. Exposing Data Assets and combining them with advanced analytics to enable customers to consume enterprise business services across the globe.
  3. Taking a Platform first approach to building new applications and enabling API nativity in such greenfield development.

Not every Borders Bookstores like company can turn into an Amazon but the ability to create new lines of revenue implies closer integration with business partners. The creation of APIs enables this integration as we saw in the previous post but it is really the treatment of APIs as an enterprise enabler that ensure the scalability of innovation. Hence the need for an enterprise API strategy which senior executives need to be able to devise based on both from a tactical standpoint as well as keeping the strategic vision in mind.

As with all things in digital technology, API Management is founded on strong business use cases. So let us begin by examining a smattering of these.

Industrial Use Cases for API Management Platforms..

Let us first discuss the major business use cases for APIs in a business enterprise.

  1. The simplest use case for any API implementation is to provide Information Retrieval. This includes such use cases for a Free API (which typically accesses non private information) to a Paid API (which securely accesses business sensitive data stored in Book of Record Transaction (BORT) systems). E.g. Patient Medical Records, Supply Chain data, Bank Customer Account Information, Insurance Policies etc.
  2. Other complementary use cases include supporting exposing functionality in Internal applications (that typically perform Document & File Management) across a range of business scenarios – typically via a Private API
  3. Across Partner & Supplier Applications, support the invocation of business logic that typically performs a business process, using an internal or trusted partner API.
  4. Support for Mobile applications and web front ends for applications ranging from field employee enablement to online payments etc using consumer facing public APIs
  5. The most complex use case is support for Data Monetization using advanced analytics. The last post discussed how APIs need to help monetize business assets, this implies an ability to provide complex analytic support for select APIs that extend brands by connecting to a range of backend sources.

The technology and platform requirements for an API strategy will cascade from these use cases – all of which should fairly resonate across several industry verticals.

Business Requirements for API Strategy..

The goal of an enterprise API strategy should be to support the creation of a centralized API platform which appeals to various audiences – Customers, Internal & External Developers, Lines of business and Operations teams.

There are ten distinct business challenges that an enterprise API strategy needs to account for.

  1. First and foremost, an API strategy needs to support the ability of existing business systems to expose their business assets for consumption in Digital scenarios. This implies not just supporting a cloud native/micro-service model of application development but also a range of legacy systems such as RDBMS’s, ERP, CRM systems etc. The ability to front these systems with RESTful APIs, at a minimum, will ensure that these can participate in a digital business process without a lot of upfront rewriting.  Adapters that can provide deep integration with these sources that allow for efficient API performance using techniques such as query optimization, pagination, support for business policies etc. The API platform also needs to support easy ways of composing APIs and orchestrating them across backend applications which are not always cloud native. The capability of API Composition where backend APIs are orchestrated to perform a higher business function is highly desirable.
  2. The API Management Platform needs to support a High Performance Architecture capable of supporting high volumes of client applications – at a high end potentially millions of API calls per minute.
  3. The Platform needs to provide five nine’s of Infrastructure and Application reliability. Lost API messages mean missed revenue – it is as simple as that. Thus, API’s need to be highly available and support a high degree of redundancy.
  4. APIs increase the attack surface of an enterprise. Accordingly, the strategy needs to account for the provision of bulletproof Security against a range of threat vectors – malicious API client applications, Malware, Denial of Service (DOS) attacks etc. Also ensuring strong Identity Management capabilities for client applications across complex backend services
  5. The ability to Monitor the APIs for performance, throttling etc to guarantee SLA (Support Level Agreements). It is also important to provide the ability to generate granular business & IT reporting on API usage across a range of metrics etc.
  6. As discussed in the last blog @ http://www.vamsitalkstech.com/?p=3834, an API ecosystem provides support for multiple players – customers, partners, employees etc. Accordingly needs to support multiple versions of underlying APIs that expose different views of business assets. This is key so that consumers can obtain value around the capabilities that are aligned with their interests.
  7. An ability to support Data Monetization via Rich Analytics than has been possible before that provide a great degree of context. This ability to reason around context is what provides an ability to design new business models which cannot be currently imagined due to lack of agility in the data and analytics space.This integration helps these systems leverage the digital intelligence and insights across (potentially) millions of devices across complex areas of operation.
  8. Application developers access to APIs with a view to including them in their business applications. Accordingly, an API Management strategy should provide strong capabilities for Developers via a Portal. The Portal helps them right from on-boarding, help around exploring organization backend capabilities, API documentation, Quickstart Guides, Online videos, API Testing capabilities, API version history, search & discovery tools for API discovery etc.It should be noted that multiple developer portal views must be supported – both for internal and external communities of developers. Internal developers will want to do a range of tasks that create support lines of business, business automation tasks, supporting workforce related IT access applications etc. They will create, package and upload APIs to the portal. External API developers range from Partners to Customer communities. They typically access these APIs, subscribe to them and run a range of dev-test tasks using the Portal.
  9. Supporting Governance across potentially hundreds of API definitions. The topic of Governance is the most critical area and tools need to help right from the definition of business case to assigning actors (who may already be defined in business directories) to managing deployment schedules to change management etc. Business policies need to be supported to enable business and IT stakeholders to retire APIs.
  10. Finally, an API strategy cannot be divorced from the Industry Vertical that the enterprise operates in. This implies that starter set APIs, templates, SDKs etc be provided as modules for verticals like Financial Services, Insurance, Telecom, Healthcare, Manufacturing and Connected Cars etc.

Conclusion..

APIs are a product line and should be treated as such which implies an ability to manage them across their lifecycle.  Developers create API client applications, the corporation makes these API definitions available for communities of developers consume in their applications. Sys admins secure, deploy & manage these APIs.

The end goal of an API strategy is to ensure that the process of creating, securing, orchestrating & monitoring these API interfaces is intuitive, consistent and scalable across a large organization. We will round off this three part series on APIs by define a technical deployment architecture in the next & final post.

Risk Management – Industry Insights & Reference Architectures…

Financial Risk Management as it pertains to different industries – Banking, Capital Markets and Insurance – has been one of the most discussed topics in this blog. The business issues and technology architecture of systems dedicated to aggregating, measuring & visualizing Risk are probably one of the more complex tasks in the worlds of finance & insurance. This post summarizes ten key blogs on the topic of Financial Risk published at VamsiTalksTech.com. It aims to serve as a handy guide for business and technology audiences tasked with implementing Risk projects.

Image Credit – ShutterStock

The twin effects of the global financial crisis & the FinTech boom has caused Financial Services, Insurance and allied companies to become laser focused on on risk management.  What was once a concern primarily of senior executives in the financial services sector has now become a top-management priority in nearly every industry.

Whatever be the kind of Risk, certain themes are common from a regulatory intention standpoint-

  1. Limiting risks that may cause wider harm to the economy by restricting certain activities such as preventing banks with retail operations from engaging in proprietary trading activities
  2. Requiring that banks increase the amount of and quality of capital held on reserve to back their assets and by requiring higher liquidity positions
  3. Ensuring that banks put in place appropriate governance standards ensuring that boards and management interact not just internally but also with regulators and their clients
  4. Upgrading governance standards, enabling a fundamental change in bank governance and the way boards interact with both management and regulators. These ambitions were expressed in various new post‐crisis rules and approaches.
  5.  Tackle the “too big to fail” challenge for highly complex businesses spanning multiple geographies, product lines and multifaceted customer segments. Accurate risk reporting ensures adequate capital conservation buffers.

With this background in mind,  complete list of Risk use case blogs on VamsiTalksTech is included below.

# 1 – Why Banks and Other Financial Institutions Should Digitize Risk Management –

Banks need to operate their IT across two distinct prongs – defense and offense. Defensive in areas like Risk, Fraud and Compliance (RFC) ; Offensive as in revenue producing areas of the business like Customer 360 (whether Institutional or Retail), Digital Marketing, Mobile Payments, Omni channel Wealth Management etc. If one really thinks about it – the biggest activity that banks do is manipulate and deal in information whether customer or transaction or general ledger etc.

Why Banks, Payment Providers and Insurers Should Digitize Their Risk Management..

# 2 – Case Study of a Big Data Enabled IT Architecture for Risk Data Measurement – Volcker Rule/Dodd Frank –

While industry analysts can discuss the implications of a certain Risk mandate, it is certainly most help for Business & IT audiences to find CIOs discussing overall strategy & specific technology tools. This blogpost discusses how two co-CIOs charged with an enterprise technology mandate are focused on growing and improving a Global Banking leaders internal systems, platforms and applications especially from a Risk standpoint.

How a Pioneering Bank leverages Hadoop for Enterprise Risk Data Aggregation & Reporting..



# 3 – A POV on Bank Stress Testing – CCAR and DFast

An indepth discussion of Bank Stress Testing from both a business and technology standpoint.

A POV on Bank Stress Testing – CCAR & DFAST..

# 4 – Capital Markets – Architectural Approaches to the practice of Risk Management

In Capital Markets, large infrastructures ,on a typical day, process millions of derivative trades. The main implication is that there are a large number of data inserts and updates to handle. Once the data is loaded into the infrastructure there needs to be complex mathematical calculations that need to be done in near real time to calculate intraday positions. Most banks use techniques like Monte Carlo modeling and other computational simulations to build & calculate these exposures. Hitherto, these techniques were extremely expensive from both the cost of hardware and software needed to run them. Neither were tools & projects available that supported a wide variety of data processing paradigms – batch, interactive, realtime and streaming. This post examines a detailed reference architecture applicable to areas such as Market, Credit & Liquidity Risk Measurement.

Big Data architectural approaches to Financial Risk Mgmt..

# 5 – Risk Management in the Insurance Industry  – Solvency II 

A discussion of Solvency II – the Insurance industry’s equivalent of Basel III – from both a business and technology standpoint.

Why the Insurance Industry Needs to Learn from Banking’s Risk Management Nightmares..

# 6 – FRTB (Fundamental Review of the Trading Book)

An in-depth business and technology discussion of the highlights and key implications of the FRTB (Fundamental Review of the Trading Book).

A POV on the FRTB (Fundamental Review of the Trading Book)…

# 7 – Architecture and Data Management Antipatterns

How not to architect Financial Service IT platforms using Risk Applications as an example.

The Five Deadly Sins of Financial Services IT..

# 8 – The Intelligent Banker Needs Better Risk Management –

The Intelligent Banker needs better Risk Management

# 9 – Implications of Basel III

This blogpost discusses the key implications of Basel III.

Towards better Risk Management..Basel III

#10 The Implications of BCBS 239

This blogpost discusses the data management and governance implications of BCBS 239. The BCBS 239 provides guidelines to overhaul an organization’s risk data aggregation capabilities and internal risk reporting practices.

BCBS 239 and the need for smart data management

Conclusion..

Industry clearly requires a fresh way of thinking about Risk Management. Leader firms will approach Risk as a way to create customer value and a board level conversation around such themes rather than as a purely defensive and regulatory challenge. Surely, this will mean that budgets for innovation related spending in areas such as Digital Transformation will also slowly percolate over to Risk. As firms either digitize or deal with gradually eroding market share, business systems that work with and leverage risk will emerge as a strong enterprise capability over the upcoming 3-5 year horizon.

What Your Data Science Team Needs From IT..

Data matures like wine, applications like fish.” – James Governor, Principal Analyst & Founder of RedMonk, circa 2007

I would like to begin a series of posts on Data Science jointly authored with my friend, ex-colleague, & collaborator, Maleeha Qazi – Data Scientist (https://www.linkedin.com/in/maleehaqazi/). In these posts, we are intending to bring to light several technology themes around industrial use of Data Science and Deep Learning around Industrial Applications, Big Data , Cyber Security, Cognitive Applications, Business Process Management, and Cloud Computing. Our goal for this first post is to discuss typical issues that bedevil every Data Science initiative at the beginning. Namely, the top technical and cultural concerns to communicate to the IT Department every time a new project is begun.

Introduction

With Data Science emerging as a key enabler in Digital Customer focused Applications, renewed focus is  being placed on how the lifecycle of these new fangled applications happens alongside traditional IT development. This blogpost aims to highlight some of the key concerns involved when Data Science groups work with IT departments. Currently there is no “one size fits all model” in terms of how advanced models are developed and deployed so that they can be accessed and used at scale by customers. It is our wager that almost every large enterprise working on these projects encounters these issues. We wanted to share our experience with the enterprise community over a series of blog posts.

It is clear that Data Science teams, product teams and IT need to collaborate to create business applications that learn from customer needs.

So what are the top asks that Data Science has for their IT groups? There are at least nine important focus areas:

#1 Understanding of the business challenge and agreeing on a common vocabulary 

It is a generally accepted fact that most IT/Data Science interactions are focused on the technology portion which include some of the following elements : the data sources within the organization, acquisition and access to external data sources, the availability of tools & infrastructure to begin supporting the data science development process, cloud or on-prem, data ingestion engines (e.g. Kafka, Flume, Sqoop) to ingest and process the data, etc. While this is certainly part of the process, there has begun to be a distinct anti pattern in how this interaction is working when solely driven by technology alone. The Data Science team is involved in creating models that typically reflect customer needs that drive business value for an organization’s customers, partners, regulators & employees. In that rather important context, technology at it’s core is just an engine and does not exist in a vacuum. The most vibrant enterprises understand this ground reality and always ensure that business needs drive both Data Scientists & IT and not the other way around. It is thus highly important for both the Data Science team and IT team to agree on the business challenge at hand to ensure that their interactions (long and short term) are being driven with business & competitive outcomes in mind. Examples of such goals are a common organization wide business language (so that definitions agree semantically) across products, customers, logistics, supply chains & business domains. The shared emphasis on both teams should be on overall goals such as increased customer profitability, enhanced customer segmentation, customer service productivity, etc. Setting this tone upfront will not only ensure that outcomes for both teams are aligned but will also ensure that critical gaps in knowledge and capabilities are filled. One of the approaches that is working well is increased cross pollination across both teams, collapsing artificial organizational barriers by adopting DevOps & ensuring that Data Science teams have a “slim IT” presence (e.g. an embedded data engineer and datacenter person) to rapidly be able to fill in gaps in IT’s business knowledge or capability.

#2 IT needs to help Data Scientists acquire a deep understanding of the overall Data Architecture

Once business requirements have been identified, Data Scientists get right to work in understanding the different data sources that will comprise inputs to their models. In large enterprises, it is not inconceivable to find out that there are many varied data sources from which data needs to be sourced. For instance, in Banking there are a range of Book of Record Transaction (BORT) systems from which data needs to be extracted. It is also key to supplement this data with external data sets. Models are only as good as the data they are given to work with. Garbage In, Garbage Out (GIGO) is the moniker given to bad data that ensures that models perform poorly. A lot of times, business groups have a hard time explaining what they would like to see – both in terms of data and visualization. In such cases, a prototype makes things easier from a requirements gathering standpoint.  Once the problem is defined, the data scientist/modeler identifies the raw data sources (both internal and external) which are needed for the execution of the business challenge.  They spend a lot of time in the process of collating the data (from Oracle/SQL Server, DB2, Mainframes, Greenplum, Excel sheets, external datasets, etc.). The cleanup/data-wrangling process includes fixing and standardizing missing value representations, identifying potentially corrupted data elements, formatting fields that indicate time and date in a consistent manner, etc.

#3 Infrastructure & IT Self Service Across Environments, Platforms and Tools 

This one is huge. The traditional IT model of hardware acquisition and vetting is typically drawn out as a process. Even with public cloud, onerous security controls are sometime added to infrastructure which delay the Data Science team’s ability to develop their models in an agile manner. The dreaded term Shadow IT (where business & data science teams go around the IT team to procure compute and storage on the public cloud) is not just an issue with infrastructure software but is slowly creeping up to business intelligence and advanced analytics apps. The delays associated with provisioning legacy data silos combined with using tools that are neither intuitive nor able to scale to deal with the increasing data deluge are making timely business analysis almost impossible to perform.  Insights delivered too late are not very valuable. Data Scientists dearly desire that the environments that they need for development and testing are made available as soon as possible and ideally via a self service user interface. This calls for IT investments in Cloud computing platforms that enable agility and speedy provisioning of dev/test environments across compute, network and storage.

#4 – Collaboration with IT around the DS development lifecycle

Organizations typically have well established development methodologies and processes. Currently most data science development and traditional application development  happen in two distinct tracks. Software development typically follows a Agile/DevOps process (a combination of Scrum/XP). The development lifecycle is divided into several stages with each producing a working deliverable at the end. The deliverables are incrementally updated to arrive at an acceptable product at the end which is then deployed for customer use. In this model, team members typically follow a defined role.

The Data Science development cycle is different. Data scientists/modelers are given a certain business problem to solve. They proceed to find the appropriate data they need, pull it into Hadoop or a Data Warehouse, wrangle it, try various algorithms to create the best possible models, test the models, and ensure that they perform well for the problem at hand. If they  get more data during the process, they will go back and retest the whole process. The issue is that IT needs to partner with and collaborate with the Data Science team to first strategize and then help provision different environments (dev, test, prod) to enable data scientists to do iterative model development. They then need to help the Data Science team deploy these models in the appropriate deployment architecture.

#5 – Help Improve the Data Science User Experience

Using traditional app dev methodologies, it can take months to design, test and deploy software – which is simply unsustainable. One of chief goals of the DevOps model is to close the long-standing gap between the engineers who develop and test IT capability and business requirements for such capabilities.  Accordingly the data science teams need best practice recommendations on using IDEs that support iterative model development & debugging. It is important that these development tools support programming languages such as R and Python – the most common go-to languages for data science – to rapidly develop code. It is critical that the IT group partner with the Data Scientists to enable these capabilities both from a development and a deployment standpoint.

#6 – Model Deployment

The data wrangling phase involves writing code to be able to join various data sets so that a single complete dataset can be created from a raw features standpoint.  If more data is obtained as the development cycle is underway, the Data Science team has no option but to go back and redo the whole process. Once the raw features are gathered, feature engineering can begin to create predictive features from the raw data, taking into account business concepts. The modeling phase is where the choice of algorithms comes into play. A Data Scientist takes the raw & engineered features and creates models using the most appropriate algorithms for the task. After the models have been repeatedly tested for accuracy and performance, the best one is typically deployed for use. Once the models have been developed it is critical to ensure that these can be deployed rapidly, run automatically, and changed as per business requirements and performance. How and where these models will get deployed depends on the business case, ideally they should be deployed as a service. Models as a Service (MaaS) is the Data Science counterpart to Software as a Service. The MaaS takes in business variables (often hundreds of inputs) and provides as output business decisions/intelligence, measurements, and visualizations that augment decision support systems. IT help is needed to ensure that the models can scale as customer usage of these Digital Platforms increases.

#7 Model Governance and Management

There needs to be appropriate checks put into place to allow for the monitoring and maintenance of the models once in production. Model versioning must be handled so that customers aren’t affected during a maintenance cycle – old models must still function while the new ones are being put into place. And by keeping a check on the performance of models in production, the IT team can tell when a model stops performing optimally & to call on the Data Science team to check on why.

#8 Security and Compliance  

How are security constraints around different environments managed? Though IT maintains control over the vast domain of tools and environments in any organization, the Data Science team must maintain control of the models. Any random person updating the models could lead to performance degradations. This separation of concerns is akin to DB security over schemas/tables/columns – only certain individuals should be granted access to perform certain operations for the most optimal results.

#9 – Delivering Results to Business Users –

Once the model has been deployed the results need to be made available to business users. Depending on the application, model results might need to be served up in near-real-time, every day/week/month/year, ad-hoc on demand, or any other time frame in-between. Organizations need to deal with providing appropriate tools (e.g. apps, sandboxes, etc.) to enable end users to explore the results of the analysis, and to perform intelligent visualization of the data.  Visualizations include trend analysis over time, KPIs, list of interesting customers/accounts, etc.

Conclusion

Digital applications will continue to incorporate Data Science at an increasing scale. However, traditional IT Departments need to collaborate in the above specific areas to ensure that the algorithms developed for specific business issues are effective, forward looking and scalable.

Why Banks, Payment Providers and Insurers Should Digitize Their Risk Management..

When models turn on, brains turn off.” – Dr. Til Schuermann, Formerly Research Officer in the Banking Studies function at the Federal Reserve Bank of New York.Currently Partner at Oliver Wyman & Company.

There exist two primary reasons for Enterprises such as Banks, Insurers, Payment Providers and FinTechs to pursue best in class Risk Management Processes and Platforms. The first need is compliance driven by various regulatory reporting mandates such as the Basel Reporting Requirements, the FRTB, the Dodd‐Frank Act, Solvency II, CCAR and CAT/MiFiD II in the United States & the EU. The second reason is the need to drive top-line sales growth leveraging using Digital technology. This post advocates the implementation of Digital Technology on Risk Management across both the areas.

Image Credit – Digital Enterprise

Recapping the Goals of Regulatory Reform..

There are many kinds of Risk, ranging from the three keystone kinds – Credit, Market and Operational to the Basel-II.5/III accords, FRTB, Dodd Frank etc. The best enterprises not only manage Risk well but they also turn it into a source of competitive advantage. Leading banks have recognized this and according to McKinsey forecasts, while risk-operational processes such as credit administration today account for the majority of the some (50 percent) of the Risk function’s staff, and analytics just 15 percent, by 2025 those figures will be around 25 percent and 40 percent respectively. [1]

Whatever be the kind of Risk, certain themes are common from a regulatory intention standpoint-

  1. Limiting risks that may cause wider harm to the economy by restricting certain activities such as preventing banks with retail operations from engaging in proprietary trading activities
  2. Requiring that banks increase the amount of and quality of capital held on reserve to back their assets and by requiring higher liquidity positions
  3. Ensuring that banks put in place appropriate governance standards ensuring that boards and management interact not just internally but also with regulators and their clients
  4. Upgrading governance standards, enabling a fundamental change in bank governance and the way boards interact with both management and regulators. These ambitions were expressed in various new post‐crisis rules and approaches.
  5.  Tackle the “too big to fail” challenge for highly complex businesses spanning multiple geographies, product lines and multifaceted customer segments. Accurate risk reporting ensures adequate capital conservation buffers.

Beyond the standard models used for Risk regulatory reporting, Banks & FinTechs are pushing the uses of risk modeling to new areas such as retail lending, SME lending. Since the crisis of 2008, new entrants have begun offering alternatives to traditional financial services in areas such as payments, mortgage loans, cryptocurrency, crowdfunding, alternative lending, and Investment management. The innovative use of Risk analytics lies at the core of the FinTechs success.

Across these areas, risk models are being leveraged in diverse areas such as marketing analytics to gain customers, defend against competition etc. For instance, realtime analytic tools are also being used to improve the credit granting processes. The intention is to gain increased acceptance by pre-approving qualified customers quickly without the manual intervention that can cause weeks of delays. Again, according to McKinsey, the goals of leading Banks are to approve up to 90 percent of consumer loans in seconds, generate efficiencies of 50 percent leading to revenue increases of 5 to 10 percent. Thus, leading institutions are using Risk Analytics to rethink their business models and to expand their product portfolios. [2]

Over the last two years, this blog has extensively covered areas such as cyber security, fraud detection, anti money laundering (AML) etc from a data analytics standpoint. The industry has treated Risk as yet another defensive function but over the next 10 years, it is expected that the Risk function will be an integral part of all of these above areas thus driving business revenue growth & detecting financial fraud, crimes. There is no doubt that Risk is a true cross cutting concern across a range of business functions & not just the traditional Credit, Market, Liquidity and Operational silos. Risk strategy needs to be a priority at the highest levels of an organization.

The Challenges with Current Industry Risk Architectures..

Almost an year ago, we discussed these technology issues in the below blogpost. To recap – most industry players have a mishmash of organically developed & shrink wrapped IT systems. These platforms run critical Core Banking Applications to Trade Lifecycle to Securities Settlement to Financial Reporting etc.  Each of these systems operates in an application, workflow, data silo with it’s own view of the enterprise. These are all kept in sync largely via data replication & stove piped process integration. Further siloed risk functions ensure that different risk reporting applications are developed using duplicative technology paradigms causing massive IT spend. Further, the preponderance of complex vendor supplied systems ensures lengthy release cycles and complex data center deployment requirements.

The Five Deadly Sins of Financial Services IT..

Industry Risk Architectures Suffer From Five Limitations

 A Roadmap for Digitization of Risk Architectures..

The end state or how a Digital Risk function will look like will vary for every institution embarking on this journey. There are six foundational elements we can still point out a few guideposts based on the .

#1 Automate Back & Mid Office Processes Across Risk and Compliance  –

As discussed, Many business processes across the front, mid and back office involve risk management. These processes range from risk data aggregation, customer on boarding, loan approvals, regulatory compliance (AML,KYC, CRS & FATCA), enterprise financial reporting  & Cyber Security.It is critical to move all and any manual steps from these business functions to a highly automated model. Doing so will not only reduce operational costs in a huge way but also demonstrate substantial auditability capabilities to regulatory authorities.

#2 Design Risk Architectures to handle Real time Data Feeds –

A critical component of Digital Risk is the need to incorporate real time data feeds across Risk applications. While Risk algorithms have traditionally dealt with historical data, new regulations such as FRTB explicitly call for various time horizons. These imply that Banks  to run a full spectrum of analytics across many buckets on data seeded from real time interactions. While the focus has been on the overall quality and auditability of data, the real time requirement is critical as one moves from front office applications such as customer on boarding, loan qualifications & pre-approvals to  key areas such as  market, credit and liquidity risks. Why is this critical? We have discussed the need for real time decision making insights for business leaders. Understanding risk exposures and performing root cause analysis in real time is a huge business capability for any Digital Enterprise.

#3 Experiment with Advanced Analytics and Machine Learning 

In response to real time risk reporting, the analytics themselves will be begin to get considerably more complex. This technology complexity will only be made more difficult with multiple teams working on all of these areas. This calls out for standardization of the calculations themselves across the firm. This also implies that from an analytics standpoint, a large number of scenarios on a large volume of data.  For Risk to become truly a digital practice, the innovative uses of Data Science across areas such as customer segmentation, fraud detection, social graph analysis must all make their way into risk management. Insurance companies and Banks are already deploying self learning algorithms in applications that deal with credit underwriting, employee surveillance and fraud detection. Wealth Managers are deploying these in automated investment advisory.  Thus, machine learning will support critical risk influenced areas such as Loan Underwriting, Credit Analytics, Single view of risk etc. All of these areas will need to leverage predictive modeling leading to better business decisions across the board.

#4 Technology Led Cross Organization Collaboration –

McKinsey predicts [1] that in the coming five to ten years, different regulatory ratios such as capital, funding, leverage, total loss-absorbing capacity etc will drive  the composition of the balance sheet to support profitability. Thus the risk function will work with finance and strategy functions to help optimize the enterprise balance sheet across various economic scenarios and then provide executives with strategic choices (e.g. increase or shrink a loan portfolio, for example), and likely regulatory impacts across these scenarios. Leveraging analytical optimization tools, an improvement on return on equity (ROE) by anywhere between 50 and 400 basis points has been forecasted.

The Value Drivers in Digitization of Risk Architectures..

McKinsey contends that the automation of credit processes and the digitization of the key steps in the credit value chain can yield cost savings of up to 50 percent. The benefits of digitizing credit risk go well beyond even these improvements. Digitization can also protect bank revenue, potentially reducing leakage by 5 to 10 percent. [2]

To give an example, by putting in place real-time credit decision making in the front line, banks reduce the risk of losing creditworthy clients to competitors as a result of slow approval processes. Additionally, banks can generate credit leads by integrating into their suite of products new digital offerings from third parties and Fintech’s, such as unsecured lending platforms for business. Finally, credit risk costs can be further reduced through the integration of new data sources and the application of advanced-analytics techniques. These improvements generate richer insights for better risk decisions and ensure more effective and forward-looking credit risk monitoring. The use of machine-learning techniques, for example, can help banks improve the predictability of credit early-warning systems by up to 25 percent [2].

The Questions to Ask at the Start of Risk Transformation..

There are three questions at this phase every Enterprise needs to ask at the outset –

  • What customer focused business capabilities can be enabled across the organization by incorporating an understanding of the various kinds of Risk ?
  • What aspects of this Risk transformation can be enabled by digital technology? Where are the current organizational and technology gaps that inhibit innovation?
  • How do we measure ROI and Business success across these projects before and after the introduction of ? How do we benchmark ourselves from a granular process standpoint against the leaders?

Conclusion..

As the above makes it clear, traditional legacy based approaches to risk data management reporting do not lend themselves well to managing your business effectively. When things are going well it has become very difficult for executives and regulators to get a good handle on how the business is functioning. In the worst of times, the risk function can fail to function well as models do not perform effectively.  It is not enough to take an incremental approach to improving current analytics approaches. The need of the hour is to incorporate the state of the art data management and analytic approaches based on Big Data, Machine Learning and Artificial Intelligence.

References

What Banks, Retailers & Payment Providers Should Do About Exploding Online Fraud in 2017..

Despite the introduction of new security measures such as EMV chip technology, 2016 saw the highest number of victims of identity fraud , according to a new report from Javelin Strategy & Research and identity-theft-protection firm LifeLock Inc[1]. 

Image Credit: Wall Street Journal

Background

The Global Credit Card industry has industry players facing new business pressures in strategic areas. Chief among these business shifts are burgeoning online transaction volumes, increased regulatory pressures (e.g. PSD2 in the European Union) and disruptive competition from FinTechs.

As discussed in various posts in this blog in 2016 – Consumers, Banks, Law Enforcement, Payment Processors, Merchants and Private Label Card Issuers are faced with yet another critical & mounting business challenge – payment card fraud. Payment card fraud continued to expand at a massive clip in 2016 – despite the introduction of security measures such as EMV Chip cards, multi-factor authentication, secure point of sale terminals etc. As the accessibility and modes of usage of credit, debit and other payment cards burgeons and transaction volumes increase across the globe, Banks are losing tens of millions of dollars on an annual basis to fraudsters.

Regular readers of this blog will recollect that we spent a lot of time last year discussing Credit Card and Fraud in some depth. I have reproduced some of these posts below for background reading.

Big Data Counters Payment Card Fraud (1/3)…

Hadoop counters Credit Card Fraud..(2/3)

It’s time for a 2017 update on this issue.

Increasing Online Payments means rising Fraud

The growing popularity of alternative payment modes like Mobile Wallets (e.g Apple Pay, Chase and Android Pay) are driving increased payment volumes across both open loop and closed loop payments. Couple this with in-app payments (e.g Uber) as well as Banking providers Digital Wallets  only driving increased mobile payments. Retailers like Walmart, Nordstrom and Tesco have been offering more convenient in-store payments.

This relentless & secular trend towards online payments is being clearly seen in all forms of consumer and merchant payments across the globe. This trend will only continue to accelerate in 2017 as smartphone manufacturers continue to produce devices that have more onscreen real estate. This will drive more mobile commerce. With IoT technology taking center stage, the day is not long off when connected devices (e.g. wearables) make their own payments.

However, with convenience of online payments confers anonymity which increases the risk of fraud. Most existing fraud platforms were designed for a previous era – of point of sales payments – with their focus on magnetic stripes, chips and EMV technology. Online payments thus present various challenges that Banks and Merchants did not have to deal with on such a large scale.

According to the WSJ [1] more consumers (15.4 million in the US) became victims of identity fraud in 2016 than at any point in more than a decade. Despite new security protections implemented by the industry in the form of EMV – about $16 billion was lost to fraudulent purchases with online accounting for a 15% rise in cases.

Fraud is a pernicious problem which in a lot of cases leads to a much worse crime- identity theft. The U.S. Department of Justice (DOJ) terms Identity theft as “one of the most insidious forms of white collar crime”. Identity theft typically results in multiple instances of fraud, which exact a heavy toll on consumers, merchants, banks and the overall economy. Let us look at some specific recommendations for Payment providers to consider.


Sadly, the much hyped “Chip on your cards” are useless in countering online fraud..

Javelin Research noted in their study that the vast majority of identity theft fraud was linked to credit cards.[2]

Most credit card holders in the USA will remember 2016 as the year when electronic chip technology became ubiquitous and required at the majority of retail establishments. The media buzz around chips was that this would curtail fraudster activity. However, this has been accompanied by a large in online theft. Card-not-present (CNP) fraud, which is when a thief buys something online or by phone, rose 40%.[2]

So did Account takeover fraud, where thieves access ongoing customer accounts and change the contact details/security information. These increased 61% compared to 2015, and totaled around 1.4 million incidents.[2]

It is very clear that the bulk of fraud happens over online transactions. It is here that the Banks must focus now. And online is a technology game.

How should Banks, Retailers & Payment Providers Respond..

Online card fraud revolves around the unauthorized stealing of an individual’s financial data. Fraudsters are engaging in a range of complex behaviors such as counterfeiting cards, committing mail fraud to open unauthorized accounts, online Card Not Present (CNP) transactions etc. Fraud patterns are quickly copied and reproduced across diverse geographies.

Let us consider five key areas where industry players need to make investments.


#1 Augment traditional Fraud Detection Systems & Architectures  with Big Data capabilities

Traditional Fraud detection systems have been built leveraging expert systems or rules engines. These expert systems are highly mature as they take into account the domain experience, intuition of fraud analysts. Fraud patterns called business rules are created in the form of IF..THEN.. format and made available in these systems. These rules describe a range of well understood patterns as shown below.

If Consumer Credit = yes And Transaction amount ≤ 1000 And Card present = yes Then Fraud = no

Typically hundreds of such rules are applied in realtime to incoming transactions.

Expert systems have been built for the era of physical card usage and can thus only reason on a limited number of data attributes. In the online world they are focused on looking for factors such as known bad IP addresses or unusual login times based on Business Rules and Events.However, the scammers have also learnt to stay ahead of the scammed and are leveraging computing advances to come up with ever new ways of cheating the banks. Big Data can help transform the detection process by enriching the data available to the fraud process including traditional customer data, transaction data, third party fraud data, social data and location based data.

Big Data also provides capabilities to tackle the most complex types of fraud and to learn from fraud data & patterns to be able to stay ahead of criminal networks. It is recommended that fraud systems be built using a layering paradigm. E.g. Provide multiple levels of detection capabilities starting with a) configuring business rules (that describe a fraud pattern) as well as b) dynamic capabilities based on machine learning models (typically thought of as being more predictive). Fraud systems also need to adapt Big Data frameworks like Spark, Storm etc to move to a real time mode. Frameworks like Spark make it extremely intuitive to implement advanced risk scoring based on user account behavior, suspicious behavior etc.

Advanced fraud detection systems augment the Big Data approach with building models of customer behavior at the macro level. Then they would use these models to detect anomalous transactions and flag them as potentially being fraudulent.


#2 Create Dynamic Single View of Cardholders

The Single View provide comprehensive business advantages as captured here – http://www.vamsitalkstech.com/?p=2517.  The SVC can help with the ability to view a customer as a single entity (or Customer 360) across all those channels & to be able to profile those.Ability to segment those customers into populations based on their behavior patterns. This will vastly help improve anomaly detection capabilities while also helping reduce the false positive problem.

#3 Adopt Graph Data processing capabilities

Fraudsters are engaging in a range of complex behaviors such as counterfeiting cards, committing mail fraud to open unauthorized accounts, online Card Not Present (CNP) transactions etc. Fraud patterns are quickly copied and reproduced across diverse geographies as fraudsters operate in concert. Thus, fraud displays a strong social element which leads to a higher risk of repetitive fraud across geographies.

The ability to demonstrate Social Network identity links with customer profiles to establish synthetic (or fraudulent) customer profiles and to reduce false identities is a key capability to possess. As fraud detection algorithms constantly analyze thousands of data points, it is important to perform Network based analysis understand if an account or IP Address or fraud pattern is occurring across different and seemingly unrelated actors.  The ability to search for the same Telephone numbers, Email accounts, social network profiles etc – in addition to machine data such as similar IP Addresses, device signatures and addresses can be used to establish these connections. Thus, graph and network analysis lends a different dimension to detection.


#4 Personalize Fraud Detection by Adopting Machine Learning

Incorporating as many sources of data (both deep and wide) into the decisioning process helps majorly in analyzing fraud. This data includes not just the existing – customer databases, data on historical spending patterns etc but also credit reports, social media data and other datasets (e.g Government watch-lists of criminal activity).

Some of these non-traditional sources are depicted below –

  • Geolocation Data
  • Purchase Channel Data
  • Website clickstream data
  • POS Sensor, Camera, ATM data
  • Social Media Data
  • Customer Complaint Data

Payment Providers assess the risk score of transactions in realtime depending upon these 100s of such attributes. Big Data enables these reasoning on more detailed and granular attributes. Advanced statistical techniques are used to incorporate behavioral (e.g. transaction is out of normal behavior for a consumers buying patterns), temporal and spatial techniques. The models often weigh attributes differently from one another thus separating the vast majority of good transactions from the small percentage of fraudulent ones.

We discussed the fact that fraud happens at every stage of the process – account opening, customer on-boarding, account validation & cross verification, card usage & chargebacks etc. It is imperative that fraud models be created and leveraged across the entire business workflow.


#5 Automate the Fraud Monitoring, Detection Lifecycle

Business Process Management (BPM) is a more prosaic and mature field compared to Big Data and Predictive Analytics. Pockets of BPM implementations exist at every large Bank in customer facing areas such as issuance, on-boarding, reporting, compliance etc. However, the ability to design, deploy automated processes is critical across the Cards fraud lifecycle. In areas like dispute management, false positive case resolution etc depend upon robust Case Management capability – which a good BPM platform or tool can provide.

Improvements can be noticed in agent productivity, number of cases handled per Agent and improved customer satisfaction. Errors and lags due to issues in human driven manual processes come down. On the front end, providing customers with handy mobile apps to instantaneously report suspicious transactions as well as tying those with automated handling can drastically improve fraud detection thus saving tens of millions of dollars. Major improvements can also seen in compliance, dispute resolution and cross border customer service.

Conclusion  

Online fraud keeps going up year after year, thus enterprises will remain vigilant especially banks and retailers. Online retail sales are expected to total nearly $28 trillion in 2020 [2] and it is a given that fraudsters will invent new techniques to steal customer data. Effective Fraud prevention has become an essential part of the customer experience.

References

[1] WSJ – Credit Card Fraud Keeps Rising Despite New Security Chips – “https://www.wsj.com/articles/credit-card-fraud-keeps-rising-despite-new-security-chipsstudy-1485954000

[2] Forbes – That Chip on Your Credit Card Isn’t Stopping Fraud After All – “http://fortune.com/2017/02/01/credit-card-chips-fraud/ “