What Banks, Retailers & Payment Providers Should Do About Exploding Online Fraud in 2017..

Despite the introduction of new security measures such as EMV chip technology, 2016 saw the highest number of victims of identity fraud , according to a new report from Javelin Strategy & Research and identity-theft-protection firm LifeLock Inc[1]. 

Image Credit: Wall Street Journal

Background

The Global Credit Card industry has industry players facing new business pressures in strategic areas. Chief among these business shifts are burgeoning online transaction volumes, increased regulatory pressures (e.g. PSD2 in the European Union) and disruptive competition from FinTechs.

As discussed in various posts in this blog in 2016 – Consumers, Banks, Law Enforcement, Payment Processors, Merchants and Private Label Card Issuers are faced with yet another critical & mounting business challenge – payment card fraud. Payment card fraud continued to expand at a massive clip in 2016 – despite the introduction of security measures such as EMV Chip cards, multi-factor authentication, secure point of sale terminals etc. As the accessibility and modes of usage of credit, debit and other payment cards burgeons and transaction volumes increase across the globe, Banks are losing tens of millions of dollars on an annual basis to fraudsters.

Regular readers of this blog will recollect that we spent a lot of time last year discussing Credit Card and Fraud in some depth. I have reproduced some of these posts below for background reading.

Big Data Counters Payment Card Fraud (1/3)…

Hadoop counters Credit Card Fraud..(2/3)

It’s time for a 2017 update on this issue.

Increasing Online Payments means rising Fraud

The growing popularity of alternative payment modes like Mobile Wallets (e.g Apple Pay, Chase and Android Pay) are driving increased payment volumes across both open loop and closed loop payments. Couple this with in-app payments (e.g Uber) as well as Banking providers Digital Wallets  only driving increased mobile payments. Retailers like Walmart, Nordstrom and Tesco have been offering more convenient in-store payments.

This relentless & secular trend towards online payments is being clearly seen in all forms of consumer and merchant payments across the globe. This trend will only continue to accelerate in 2017 as smartphone manufacturers continue to produce devices that have more onscreen real estate. This will drive more mobile commerce. With IoT technology taking center stage, the day is not long off when connected devices (e.g. wearables) make their own payments.

However, with convenience of online payments confers anonymity which increases the risk of fraud. Most existing fraud platforms were designed for a previous era – of point of sales payments – with their focus on magnetic stripes, chips and EMV technology. Online payments thus present various challenges that Banks and Merchants did not have to deal with on such a large scale.

According to the WSJ [1] more consumers (15.4 million in the US) became victims of identity fraud in 2016 than at any point in more than a decade. Despite new security protections implemented by the industry in the form of EMV – about $16 billion was lost to fraudulent purchases with online accounting for a 15% rise in cases.

Fraud is a pernicious problem which in a lot of cases leads to a much worse crime- identity theft. The U.S. Department of Justice (DOJ) terms Identity theft as “one of the most insidious forms of white collar crime”. Identity theft typically results in multiple instances of fraud, which exact a heavy toll on consumers, merchants, banks and the overall economy. Let us look at some specific recommendations for Payment providers to consider.


Sadly, the much hyped “Chip on your cards” are useless in countering online fraud..

Javelin Research noted in their study that the vast majority of identity theft fraud was linked to credit cards.[2]

Most credit card holders in the USA will remember 2016 as the year when electronic chip technology became ubiquitous and required at the majority of retail establishments. The media buzz around chips was that this would curtail fraudster activity. However, this has been accompanied by a large in online theft. Card-not-present (CNP) fraud, which is when a thief buys something online or by phone, rose 40%.[2]

So did Account takeover fraud, where thieves access ongoing customer accounts and change the contact details/security information. These increased 61% compared to 2015, and totaled around 1.4 million incidents.[2]

It is very clear that the bulk of fraud happens over online transactions. It is here that the Banks must focus now. And online is a technology game.

How should Banks, Retailers & Payment Providers Respond..

Online card fraud revolves around the unauthorized stealing of an individual’s financial data. Fraudsters are engaging in a range of complex behaviors such as counterfeiting cards, committing mail fraud to open unauthorized accounts, online Card Not Present (CNP) transactions etc. Fraud patterns are quickly copied and reproduced across diverse geographies.

Let us consider five key areas where industry players need to make investments.


#1 Augment traditional Fraud Detection Systems & Architectures  with Big Data capabilities

Traditional Fraud detection systems have been built leveraging expert systems or rules engines. These expert systems are highly mature as they take into account the domain experience, intuition of fraud analysts. Fraud patterns called business rules are created in the form of IF..THEN.. format and made available in these systems. These rules describe a range of well understood patterns as shown below.

If Consumer Credit = yes And Transaction amount ≤ 1000 And Card present = yes Then Fraud = no

Typically hundreds of such rules are applied in realtime to incoming transactions.

Expert systems have been built for the era of physical card usage and can thus only reason on a limited number of data attributes. In the online world they are focused on looking for factors such as known bad IP addresses or unusual login times based on Business Rules and Events.However, the scammers have also learnt to stay ahead of the scammed and are leveraging computing advances to come up with ever new ways of cheating the banks. Big Data can help transform the detection process by enriching the data available to the fraud process including traditional customer data, transaction data, third party fraud data, social data and location based data.

Big Data also provides capabilities to tackle the most complex types of fraud and to learn from fraud data & patterns to be able to stay ahead of criminal networks. It is recommended that fraud systems be built using a layering paradigm. E.g. Provide multiple levels of detection capabilities starting with a) configuring business rules (that describe a fraud pattern) as well as b) dynamic capabilities based on machine learning models (typically thought of as being more predictive). Fraud systems also need to adapt Big Data frameworks like Spark, Storm etc to move to a real time mode. Frameworks like Spark make it extremely intuitive to implement advanced risk scoring based on user account behavior, suspicious behavior etc.

Advanced fraud detection systems augment the Big Data approach with building models of customer behavior at the macro level. Then they would use these models to detect anomalous transactions and flag them as potentially being fraudulent.


#2 Create Dynamic Single View of Cardholders

The Single View provide comprehensive business advantages as captured here – http://www.vamsitalkstech.com/?p=2517.  The SVC can help with the ability to view a customer as a single entity (or Customer 360) across all those channels & to be able to profile those.Ability to segment those customers into populations based on their behavior patterns. This will vastly help improve anomaly detection capabilities while also helping reduce the false positive problem.

#3 Adopt Graph Data processing capabilities

Fraudsters are engaging in a range of complex behaviors such as counterfeiting cards, committing mail fraud to open unauthorized accounts, online Card Not Present (CNP) transactions etc. Fraud patterns are quickly copied and reproduced across diverse geographies as fraudsters operate in concert. Thus, fraud displays a strong social element which leads to a higher risk of repetitive fraud across geographies.

The ability to demonstrate Social Network identity links with customer profiles to establish synthetic (or fraudulent) customer profiles and to reduce false identities is a key capability to possess. As fraud detection algorithms constantly analyze thousands of data points, it is important to perform Network based analysis understand if an account or IP Address or fraud pattern is occurring across different and seemingly unrelated actors.  The ability to search for the same Telephone numbers, Email accounts, social network profiles etc – in addition to machine data such as similar IP Addresses, device signatures and addresses can be used to establish these connections. Thus, graph and network analysis lends a different dimension to detection.


#4 Personalize Fraud Detection by Adopting Machine Learning

Incorporating as many sources of data (both deep and wide) into the decisioning process helps majorly in analyzing fraud. This data includes not just the existing – customer databases, data on historical spending patterns etc but also credit reports, social media data and other datasets (e.g Government watch-lists of criminal activity).

Some of these non-traditional sources are depicted below –

  • Geolocation Data
  • Purchase Channel Data
  • Website clickstream data
  • POS Sensor, Camera, ATM data
  • Social Media Data
  • Customer Complaint Data

Payment Providers assess the risk score of transactions in realtime depending upon these 100s of such attributes. Big Data enables these reasoning on more detailed and granular attributes. Advanced statistical techniques are used to incorporate behavioral (e.g. transaction is out of normal behavior for a consumers buying patterns), temporal and spatial techniques. The models often weigh attributes differently from one another thus separating the vast majority of good transactions from the small percentage of fraudulent ones.

We discussed the fact that fraud happens at every stage of the process – account opening, customer on-boarding, account validation & cross verification, card usage & chargebacks etc. It is imperative that fraud models be created and leveraged across the entire business workflow.


#5 Automate the Fraud Monitoring, Detection Lifecycle

Business Process Management (BPM) is a more prosaic and mature field compared to Big Data and Predictive Analytics. Pockets of BPM implementations exist at every large Bank in customer facing areas such as issuance, on-boarding, reporting, compliance etc. However, the ability to design, deploy automated processes is critical across the Cards fraud lifecycle. In areas like dispute management, false positive case resolution etc depend upon robust Case Management capability – which a good BPM platform or tool can provide.

Improvements can be noticed in agent productivity, number of cases handled per Agent and improved customer satisfaction. Errors and lags due to issues in human driven manual processes come down. On the front end, providing customers with handy mobile apps to instantaneously report suspicious transactions as well as tying those with automated handling can drastically improve fraud detection thus saving tens of millions of dollars. Major improvements can also seen in compliance, dispute resolution and cross border customer service.

Conclusion  

Online fraud keeps going up year after year, thus enterprises will remain vigilant especially banks and retailers. Online retail sales are expected to total nearly $28 trillion in 2020 [2] and it is a given that fraudsters will invent new techniques to steal customer data. Effective Fraud prevention has become an essential part of the customer experience.

References

[1] WSJ – Credit Card Fraud Keeps Rising Despite New Security Chips – “https://www.wsj.com/articles/credit-card-fraud-keeps-rising-despite-new-security-chipsstudy-1485954000

[2] Forbes – That Chip on Your Credit Card Isn’t Stopping Fraud After All – “http://fortune.com/2017/02/01/credit-card-chips-fraud/ “

A Holistic New Age Technology Approach To Countering Payment Card Fraud (3/3)…

This final post in this three part series on Payment Card Fraud covers the evolving business landscape in countering Payment Card Fraud and will make specific recommendations from a best practices, futuristic & thought leadership standpoint.

Business Background – 

As  discussed previously – Consumers, Banks, Law Enforcement, Payment Processors, Merchants and Private label Card Issuers are faced with payment card fraud on a massive scale. As the accessibility and modes of usage of credit, debit and other payment cards burgeons and transaction volumes increase, Banks are losing tens of billions of dollars on an annual basis to fraudsters. The annual estimate is about $189 billion as estimated by Meridian Research.

Fraud is a pernicious problem which can also lead to Identity theft. The U.S. Department of Justice (DOJ) terms Identity theft as “one of the most insidious forms of white collar crime”. Identity theft typically results in multiple instances of fraud, which exact a heavy toll on both the consumer, the merchant and the overall economy.

Improved Consumer access to credit, complex financial products, faster electronic communications, a changing regulatory landscape and a high degree competition among financial institutions make it easier than ever for perpetrators to steal identities and falsify information. Identity theft is estimated to claim 100,000 victims a year, according to the U.S. Federal Trade Commission (FTC) and the Consumer Data Industry Association. The economic and emotional loss to consumers is staggering. It may take as long as several years to restore a victim’s credit reputation, and in the meantime financial and job opportunities may be lost. According to FTC estimates, the average identity theft victim doesn’t discover the problem for 13 months and then invests an average of $1,173 and 175 hours attempting to repair his or her credit record.[1]

Added to all this, from a regulatory perspective  – the Consumer Financial Protection Bureau created as part of the Dodd Frank Act prohibits unfair & abusive acts or practices in the collection of consumer debts.

Thus, fraud has become such a complex problem that it can only be tackled via a strategic, meaningful & multi faceted operational IT model. This model should allow financial organizations a migration path from more traditional approaches in favor of innovation – all while balancing risk. Fraud is a dynamic problem where new modes & threats are constantly evolving – these need counter approaches that extend beyond the current capabilities of older fraud prevention software.

The regulatory authorities in the US (where the bulk of the problem has been centered around) are now mandating the usage of Europay, MasterCard and Visa (EMV chip) technology – which should help reduce the risk of fraudulent point of sale (POS) transactions. To be fair, Visa and MasterCard have developed EMV technology that facilitates secure Card Not Present (CNP) transactions over the internet and phone.

However, we have also discussed the fact that fraudster rings now increasingly resemble sophisticated IT Operations. So this may not be able to put a large dent in the overall problem.

Business Requirements – 

  1. Provide capabilities to tackle the most complex types of fraud and to learn from fraud data & patterns to be able to stay ahead of criminal networks
  2. Provide comprehensive Omnichannel abilities to detect fraud across the multiple (and growing) avenues that payment cards are used by consumers
  3. Ability to view a customer as a single entity (or Customer 360) across all those channels & to be able to profile those
  4. Ability to segment those customers into populations based on their behavior patterns. This will vastly help improve anomaly detection capabilities while also helping reduce the false positive problem
  5. Incorporate many sources of data (both deep and wide) into the decisioning process in analyzing fraud. This includes not just the existing – customer databases, data on historical spending patterns etc but also credit reports, social media data and other datasets (e.g Government watchlists of criminal activity)
  6. Provide multiple levels of detection capabilities starting with a) configuring business rules (that describe a fraud pattern) as well as b) dynamic capabilities based on machine learning models (typically thought of as being more predictive)
  7. Provide multiple levels of dashboards ranging from the Descriptive (Business Intelligence) to the Prescriptive (business simulation as well as optimization)
  8. Out of the box integration with Government (and peer Bank’s) provided watch list information and an ability to incorporate these into the decisioning engine in realtime

Technology Requirements – 

  1. Ingest (& cleanse) real time Card usage data to get complete view of every transaction with a view to detecting potential fraud
  2. Support multiple ways of ingest across a pub-sub messaging paradigm,clickstreams, logfile aggregation and batch data – at a minimum
  3. Allow business users to specify 1000’s of rules that signal fraud e.g. when the same credit card is used from multiple IP addresses within a very short span of time
  4. Support batch oriented analytics that provide predictive and historical models of performance
  5. As much as possible, eliminate false positives as these cause inconvenience to customers and also inhibit transaction volumes
  6. Support a very high degree of scalability – 10’s of millions of transactions a day and 100’s of TB of historical information
  7. Predict cardholder behavior (using a 360 degree view) to provide better customer service
  8. Help target customer transactions for personalized communications on transactions that raise security flags
  9. Deliver alerts the ways customers want — web, text, email and mail etc
  10. Track these events end to end from a strategic perspective across dashboards and predictive models
  11. Help provide a complete picture of high value customers to help drive loyalty programs

Next Gen Fraud Detection Platform –

Fraud_NextGen

Illustration :  Integrated Capabilities in a Next Gen Fraud Detection Platform

The Fraud detection platform of the future will have four key integrated capabilities – an ability to support multiple channels of ingress (Omichannel), ability to store & process massive volumes of data at scale while leveraging an agile deployment (Big Data & Cloud) based architecture, advanced predictive modeling & Data Mining capabilities and finally – a native approach to Business Process Automation (BPM).

1. Native Omnichannel Capabilities

Banking is an increasingly high tech endeavor with younger customers increasingly seeking personalization capabilities across the multiple avenues available to them. Thus, Omnichannel is the future of consumer banking. According to research from Mintel/Compremedia, almost 27 percent of all US consumers still receive at least one credit card offer per month via mail. The company’s research also shows that 43 percent of all customers (both acquisition and retention) receive an offer via digital communications, 25 percent by direct mail, 15 percent in-person and 17 percent by phone or other channels. In terms of response and application, 52 percent apply online, 18 percent by mail, 17 percent by mobile, 8 percent by phone and 6 percent other channels. [2]

Within the domain of Payment Cards themselves – offering customers multiple avenues to buy goods  while making the process as seamless and secure as possible. Firms would do well to consider conducting an ideal “Customer Journey Path Analysis” as the crucial first step. The goal is to have a seamless experience for customers across internet, phone, kiosk, POS and other channels while reducing fraud that can happen due to a siloed data architecture.

Once the customer’s journey is well understood in terms of segments as well as individual attributes & behaviors, Banks can use this data aim for a variety of positive outcomes such as selling the customer a higher segment credit card, additional banking services, or simply get their online newspaper subscription paid with their credit card.  Direct marketing in today’s world is a key component that makes up Omnichannel marketing. Using it effectively in online, mobile and offline environments will drive positive results.[2]

Big Data is dramatically changing that approach with advanced analytic solutions that are powerful and fast enough to detect fraud in real time but also build models based on historical data (and deep learning) to proactively identify risks. This is key to improving customer experience while improving loyalty.

2. Data Processing at Scale (NoSQL and Hadoop) –

Big Data is dramatically changing that approach with advanced analytic solutions that are powerful and fast enough to detect fraud in real time but also build models based on historical data (and deep learning) to proactively identify risks.

The business reasons why Hadoop is emerging as the best choice for fraud detection are –

a) Real time insights –  Hadoop can be used to generate insights at a latency of a few milliseconds  that can assist Banks in detecting fraud as soon as it happens
b) A Single View of Customer/Transaction & Fraud enabled by Hadoop
c) Loosely coupled yet Cloud Ready Architecture
d) Highly Scalable yet Cost effective

The technology reasons why Hadoop is emerging as the best choice for fraud detection: From a component perspective Hadoop supports multiple ways of running models and algorithms that are used to find patterns of fraud and anomalies in the data to predict customer behavior. Examples include Bayesian filters, Clustering, Regression Analysis, Neural Networks etc. Data Scientists & Business Analysts have a choice of MapReduce, Spark (via Java,Python,R), Storm etc and SAS to name a few – to create these models. Fraud model development, testing and deployment on fresh & historical data become very straightforward to implement on Hadoop. The combination of Big Data as well as Omnichannel allows the implementation of strong customer segmentation and profiling, which helps usher in fraud detection models based on risk scoring.

3. Predictive Analytics (Machine Learning and Deep Learning) –

Customer profiling and Behavior Patterns are key techniques used to extract meaningful information from the data produced in Big Data projects. Most fraud detection solutions operate on individual transaction data which may not contain the entire gamut of information needed to predict probability of fraud. This needs to be combined with past historical information based on the consumer’s spending patterns and history to detect anomalous (and fraudulent) behavior.

Many solutions operate using transactional data, such as credit card purchase transactions, or other types of data that change over time. In its raw form, this data is very difficult to use in predictive models for several reasons. First, an isolated transaction contains very little information about the behavior of the individual who generated the transaction. In addition, transaction patterns change rapidly over time. Finally, this type of data can often be highly complex.

4. Business Process Management & Business Rules Engines –

Business process management (BPM) as an industry discipline has been around for over a decade, but only recently have verticals like financial services realized how important it is to business automation. The core ideas around BPM are not that complex or arcane. For the newbies among us — every enterprise is composed of repeatable business activities done by human actors. These steps are the core of the functioning of the enterprise, indeed they are the very lifeblood. There is significant business value in being able to document, simulate, manage, automate, and monitor business processes.

Financial services are fertile ground for business process automation, since most banks across their various lines of business are simply a collection of core and differentiated processes. Examples are consumer banking (with processes including onboarding customers, collecting deposits, conducting business via multiple channels, and compliance with regulatory mandates such as KYC and AML); investment banking (including straight-through-processing, trading platforms, prime brokerage, and compliance with regulation); payment services; and wealth management (including modeling model portfolio positions and providing complete transparency across the end-to-end life cycle). The key takeaway is that driving automation can result not just in better business visibility and accountability on behalf of various actors. It can also drive revenue and contribute significantly to the bottom line.It enables enterprise business and IT users to document, simulate, manage, automate and monitor business processes and policies. It is designed to empower business and IT users to collaborate more effectively, so business applications can be changed more easily and quickly.

Within the Payment Card Fraud area, BPM tools can be leveraged to create fraud detection workflows, alerts & dashboards which automate away the process thus making it highly repeatable, auditable & amenable to improvement. Every business actor in the fraud  domain can be brought into the business process and actions delegated to them.

Open Source alternatives are now very viable in the BPM space as well. For instance, Red Hat’s JBoss BPM Suite is a flexible and powerful BPM platform which offers integrated business process modeling, execution, and monitoring capabilities for numerous use cases. A market leading platform, it can be easily deployed in a variety environments (on prem/virtual/cloud etc), and, as a result, can be integrated into multiple runtime architectures (appservers, webservice APIs etc) and configured to support a variety of business scenarios.

The final word – 

Most existing fraud solutions that are in use today have been created using a (now) legacy approach and not built for today’s requirements for Omnichannel enabled eCommerce, and are not a good fit for the millennial customer. Industry players that lack dynamic and flexible fraud prevention technology platforms are at a severe competitive disadvantage, ultimately running the risk of slowing revenue growth, customer defections & reputational risk.

References – 

[1] “PayPal fights Fraud With Machine Learning and Human Detectives” – From WSJ.com

http://blogs.wsj.com/cio/2015/08/25/paypal-fights-fraud-with-machine-learning-and-human-detectives/

[2] “In Search of the Omnichannel Card Prospect” –

In Search of The Omnichannel Card Prospect

Hadoop counters Credit Card Fraud..(2/3)

This article is the second installment in a three part series that covers one of the most critical issues facing the financial industry – Payment Card Fraud. While the first (and previous) post discussed the global scope of the problem & the business ramifications –  this post will discuss a candidate Big Data Architecture that can help financial institutions turn the tables on Fraudster Networks. The final post will cover the evolving business landscape in this sector – in the context of disruptive technology innovation (predictive & streaming analytics) and will make specific recommendations from a thought leadership standpoint.

Traditional Approach to Fraud Monitoring & Detection – 

Traditional Fraud detection systems have been focused on looking for factors such as known bad IP addresses or unusual login times based on Business Rules and Events. Advanced fraud detection systems augment the above approach with building models of customer behavior at the macro level. Then they would use these models to detect anomalous transactions and flag them as potentially being fraudulent. However, the scammers have also learnt to stay ahead of the scammed and are leveraging computing advances to come up with ever new ways of cheating the banks.

Case in point [1] –

In 2008 and 2009, PayPal tested several fraud detection packages, finding that none could provide correct analysis fast enough, Dr. Wang (head of Fraud Risk Sciences – PayPal) said. She declined to name the packages but said that the sheer amount of data PayPal must analyze slowed those systems down.

Why Big Data and Hadoop for Fraud Detection?

Big Data is dramatically changing that approach with advanced analytic solutions that are powerful and fast enough to detect fraud in real time but also build models based on historical data (and deep learning) to proactively identify risks.

The business reasons why Hadoop is emerging as the best choice for fraud detection are –

  1. Real time insights –  Hadoop can be used to generate insights at a latency of a few milliseconds  that can assist Banks in detecting fraud as soon as it happens
  2. A Single View of Customer/Transaction & Fraud enabled by Hadoop
  3. Loosely coupled yet Cloud Ready Architecture
  4. Highly Scalable yet Cost effective

The technology reasons why Hadoop is emerging as the best choice for fraud detection are:

  1. Hadoop (Gen 2) is not just a data processing platform. It has multiple personas – a real time, streaming data, interactive platform for any kind of data processing (batch, analytical, in memory & graph based) along with search, messaging & governance capabilities built in – all of which support fraud detection architecture patterns
  2. Hadoop provides not just massive data storage capabilities but also provides multiple frameworks to process the data resulting in response times of milliseconds with the outmost reliability whether that be realtime data or historical processing of backend data
  3. Hadoop can ingest billions of events at scale thus supporting the most mission critical analytics irrespective of data size
  4. From a component perspective Hadoop supports multiple ways of running models and algorithms that are used to find patterns of fraud and anomalies in the data to predict customer behavior. Examples include Bayesian filters, Clustering, Regression Analysis, Neural Networks etc. Data Scientists & Business Analysts have a choice of MapReduce, Spark (via Java,Python,R), Storm etc and SAS to name a few – to create these models. Fraud model development, testing and deployment on fresh & historical data become very straightforward to implement on Hadoop
  5. Hadoop is not all about highly scalable filesystems and processing engines. It also provides native integration with highly scalable NoSQL options including a database called HBase. HBase has been proven to support near real-time ingest of billions of data streams. HBase provides near real-time, random read and write access to tables containing billions of rows and millions of columns

Again, from [1] –

PayPal processes more than 1.1 petabytes of data for 169 million active customer accounts, according to James Barrese, PayPal’s chief technology officer. During customer transactions, subsets of the data are analyzed in real-time.

Since 2009, PayPal has been building and modifying its fraud analytics systems, incorporating new open-source technologies as they have evolved. For example, the company uses Hadoop to store data, and related analytics tools, such as the Kraken. A data warehouse from Teradata Corp. stores structured data. The fraud analysis systems run on both grid and cloud computing infrastructures.

Several kinds of algorithms analyze thousands of data points in real-time, such as IP address, buying history, recent activity at the merchant’s site or at PayPal’s site and information stored in cookies. Results are compared with external data from identity authentication providers. Each transaction is scored for likely fraud, with suspicious activity flagged for further automated and human scrutiny, Mr. Barrese said.

After implementing multiple large real time data processing applications using Big Data related technologies in financial services, we present a proven architectural pattern & technology stack that has been proven in very large production deployments. The key goal is to process millions of events per second, tens of billions of events per day and tens of terabytes of financial data per day – as is to be expected in a large Payment Processor or Bank.

Business Requirements

  1. Ingest (& cleanse) real time Card usage data to get complete view of every transaction with a view to detecting potential fraud
  2. Support multiple ways of ingest across a pub-sub messaging paradigm,clickstreams, logfile aggregation and batch data – at a minimum
  3. Allow business users to specify 1000’s of rules that signal fraud e.g. when the same credit card is used from multiple IP addresses within a very short span of time
  4. Support batch oriented analytics that provide predictive and historical models of performance
  5. As much as possible, eliminate false positives as these cause inconvenience to customers and also inhibit transaction volumes
  6. Support a very high degree of scalability – 10’s of millions of transactions a day and 100’s of TB of historical information
  7. Predict cardholder behavior (using a 360 degree view) to provide better customer service
  8. Help target customer transactions for personalized communications on transactions that raise security flags
  9. Deliver alerts the ways customers want — web, text, email and mail etc
  10. Track these events end to end from a strategic perspective across dashboards and predictive models
  11. Help provide a complete picture of high value customers to help drive loyalty programs

Design and Architecture

The architecture thus needs to consider two broad data paradigms — data in motion and data at rest.

Data in motion is defined as streaming data that is being sent into an information architecture in real time. Examples of data in motion include credit card swipes, e-commerce tickets, web-based interactions and social media feeds that are a result of purchases or feedback about services. The challenge in this area is to assimilate a huge volume of data and filter it, gather reason from it and to send it to downstream systems such as a business process management (BPM) or a Partner System(s). Managing the event data to make sure changing business rules/regulations are consistently integrated with the data is another key facet in this area.

Data at rest is defined as data that has been collected and ingested in a form that conforms to enterprise data architecture and governance specifications. This data needs to be assimilated or federated with pre-existing sources so that the business can query it in a read/write manner from a strategic and long-term perspective.

A Quick Note on Data Science and it’s applicability to Fraud Monitoring & Detection – 

Various posts in this blog have discussed the augmented capability of financial organizations to acquire, store and process large volumes of data using commodity (x86) hardware.  At the same time, technologies such as Hadoop and Spark have enabled the collection, organization and analysis of Big Data at scale. The convergence of cost effective storage and scalable processing allows us to extract richer insights from data. These insights can then be operationalized to provide commercial and social value.   Data science is a term that refers to the process of extracting meaningful insights from large volumes of structured and unstructured data. Data science is about scientific exploration of data to extract meaning or insight, and the construction of software systems to utilize such insights in a business context.   This involves the art of discovering data insights combined with the science of operationalizing them.  A data scientist uses a combination of machine learning, statistics, visualization, and computer science to extract valuable business insights hiding in data and builds operational systems to deliver that value. Data Science based approaches are core to the design and architecture of a Fraud Detection System. Data Mining techniques range from clustering and classification to find patterns and associations among a large group of data. The machine learning components are classified into two categories: ‘supervised’ and ‘unsupervised’ learning. These methods seek for accounts, customers, suppliers, etc. that behave ‘unusually’ in order to output suspicion scores, rules or visual anomalies, depending on the method. (Ref – Wikipedia).

It needs to be kept in mind that Data science is a cross-functional discipline. A data scientist is part statistician, part developer and part business strategist. The Data Science team collaborates with an extended umbrella team which includes visualization specialists, developers, business analysts, data engineers, applied scientists, architects, LOB owners and DevOps (ref – Hortonworks). The success of data science projects often relies on the communication, collaboration, and interaction that takes place with the extended team, both internally and possibly externally to their organizations.

Reference Architecture

FP1 

Illustration 1:  Candidate Architecture Pattern for a Fraud Detection Application 

 

The key technology components of the above reference architecture stack include:

  1. Information sources are depicted at the left. These encompass a variety of machine and human actors either transmitting potentially thousands of real time messages per second. These are your typical Credit Card Swipes, Online transactions, Fraud databases and other core Banking data.
  2. A highly scalable messaging system to help bring these feeds into the architecture as well as normalize them and send them in for further processing. Apache Kafka is chosen for this tier.Realtime data is published by Payment Processing systems over Kafka queues. Each of the transactions has 100s of attributes that can be analyzed in real time to  detect patterns of usage.  We leverage Kafka integration with Apache Storm to read one value at a time and perform some kind of storage like persist the data into a HBase cluster.In a modern data architecture built on Apache Hadoop, Kafka ( a fast, scalable and durable message broker) works in combination with Storm, HBase (and Spark) for real-time analysis and rendering of streaming data. Kafka has been used to message geospatial data from a fleet of long-haul trucks to financial data to sensor data from HVAC systems in office buildings.
  3. A Complex Event Processing tier that can process these feeds at scale to understand relationships among them; where the relationships among these events are defined by business owners in a non technical or by developers in a technical language. Apache Storm integrates with Kafka to process incoming data. Storm architecture is covered briefly in the below section.
  4. Once the machine learning models are defined, incoming data received from the Storm/Spark tier will be ingested into the models to predict outlier transactions or potential fraud. As a result of specific patterns being met that indicate potential fraud, business process workflows are created that follow a well defined process that is predefined and modeled by the business.
    • Credit card transaction data comes as stream (typically through Kafka)
    • An external system has information about the credit card holder’s recent location (collected from GPS on mobile device and/or from mobile towers)
    • Each credit card transaction is looked up against user’s current location
    • If the geographic distance between the credit card transaction location and user’s recent known location is significant (say 100 miles), the credit card transaction is flagged as potential fraudScreen Shot 2015-10-27 at 9.52.34 PM

Illustration 2 :External Lookup Pattern for a Fraud Detection Application (Sheetal Dolas – Hortonworks)

  1. Data that has business relevance and needs to be kept for offline or batch processing can be handled using the  storage platform based on Hadoop Distributed Filesystem (HDFS). The idea to deploy Hadoop oriented workloads (MapReduce, or, Machine Learning) to understand fraud patterns as they occur over a period of time.Historical data can be fed into Machine Learning models created in Step 1 and commingled with streaming data as discussed in step 2.
  2. Horizontal scale-out is preferred as a deployment approach as this helps the architecture scale linearly as the loads placed on the system increase over time
  3. Output data elements can be written out to HDFS, and managed by HBase. From here, reports and visualizations can easily be constructed.
  4. One can optionally layer in search and/or workflow engines to present the right data to the right business user at the right time.  


Messaging Broker Tier

The messaging broker tier (based on Apache Kafka) is the first point of entry in a system. It fundamentally hosts a set of message queues. The broker tier needs to be highly scalable while supporting a variety of cross language clients and protocols from Java, C, C++, C#, Ruby, Perl, Python and PHP. Using various messaging patterns to support real-time messaging, this tier integrates application, endpoints and devices quickly and efficiently. The architecture of this tier needs to be flexible so as to allow it to be deployed in various configurations to connect to customized solutions at every endpoint, payment outlet, partner, or device.

Pipeline

Illustration 3: Multistage Data Refinery Pipeline for a Fraud Detection Application

Apache Storm is an Open Source distributed, reliable, fault – tolerant system for real time processing of large volume of data. Spout and Bolt are the two main components in Storm, which work together to process streams of data.

  • Spout: Works on the source of data streams. In this use case, Spout will read realtime transaction data from Kafka topics.
  • Bolt: Spout passes streams of data to Bolt which processes and passes it to either a data store or another Bolt.

Storm-Kafka

                                                        Illustration 3:  Kafka-Storm integration

Storage Tier

There are broad needs for two distinct data tiers that can be identified based on business requirements.

  1. Some data needs to be pulled in near realtime, accessed in a low latency pattern as well as have calculations performed on this data. The design principle here needs to be “Write Many and Read Many” with an ability to scale out tiers of servers
  2. In memory technology based on Spark is very suitable for this use case as it not only supports a very high write rate but also gives users the ability to store, access, modify and transfer extremely large amounts of distributed data. A key advantage here is that Hadoop based architectures can pool memory and can scaleout across a cluster of servers in a horizontal manner. Further, computation can be pushed into the tiers of servers running the datagrid as opposed to pulling data into the computation tier.
  3. As the data volumes increase in size, compute can scale linearly to accommodate them. The standard means of doing so is through techniques such as data distribution and replication. Replicas are nothing but copies of the same segment or piece of data that are stored across (aka distributed) a cluster of servers for purposes of fault tolerance and speedy access. Smart clients can retrieve data from a subset of servers by understanding the topology of the grid. This speeds up query performance for tools like business intelligence dashboards and web portals that serve the business community.
  4. The second data access pattern that needs to be supported is storage for data that is older. This is typically large scale historical data. The primary data access principle here is “Write Once, Read Many.” This layer contains the immutable, constantly growing master dataset stored on a distributed file system like HDFS. Besides being a storage mechanism, the data stored in this layer can be formatted in a manner suitable for consumption from any tool within the Apache Hadoop ecosystem like Hive or Pig or Mahout.

The final word [1] – 

Since 2009, PayPal has been building and modifying its fraud analytics systems, incorporating new open-source technologies as they have evolved. For example, the company uses Hadoop to store data, and related analytics tools, such as the Kraken. A data warehouse from Teradata Corp. stores structured data. The fraud analysis systems run on both grid and cloud computing infrastructures.

Several kinds of algorithms analyze thousands of data points in real-time, such as IP address, buying history, recent activity at the merchant’s site or at PayPal’s site and information stored in cookies. Results are compared with external data from identity authentication providers. Each transaction is scored for likely fraud, with suspicious activity flagged for further automated and human scrutiny, Mr. Barrese said.

For example, “a very bad sign” is when one account shows IP addresses from 10 parts of the world, Dr. Wang said, because it suggests the account might have been hacked.

The system tags the account for review by human experts, she said. “They might discover that the IP addresses are at airports and this guy is a pilot,” she said. Once verified, that intelligence is fed back into PayPal’s systems. Humans don’t make the system faster, but they make real-time decisions as a check against, and supplement to, the algorithms, she said.

The combination of open-source technology, online caching, algorithms and “human detectives,” she said, “gives us the best analytical advantage.”

References – 

[1] “PayPal fights Fraud With Machine Learning and Human Detectives” – From WSJ.com

http://blogs.wsj.com/cio/2015/08/25/paypal-fights-fraud-with-machine-learning-and-human-detectives/

Big Data Counters Payment Card Fraud (1/3)…

This article is the first installment in a three part series that covers one of the most critical issues facing the financial industry – Payment Card Fraud. Payment Cards include Credit, ATM & Debit Cards. This first post discusses the origin and scope of the problem. The next post will discuss a candidate Big Data Architecture that can help financial institutions turn the tables on Fraudster Networks. The final post will cover the evolving technology landscape in this sector – in the context of disruptive technology innovation in predictive & streaming analytics in by Big Data.

“We are confronting a criminal population that continues to improve its sophistication and its attack vectors, so we can’t stand still,”  says Ellen Richey, chief enterprise risk officer at Visa Inc.“You see the criminal capability evolving on the technology side,” she said. “They are getting into the systems of [Visa] stakeholders and other companies that process payments,  and they are able to encrypt their own movements on networks, sometimes for months, and exfiltrate the data.” (Source – The Wall Street Journal)

Payment Card fraud has mushroomed into a massive challenge for consumers, financial institutions,regulators and law enforcement. As the accessibility and usage of Credit Cards burgeons and transaction volumes increase, Banks are losing tens of billions of dollars on an annual basis to fraudsters. The annual estimate is about $189 billion as estimated by Meridian Research.

The Nilson Report  depicts the global scale of the problem as of 2015. Nilson counted the Fraud losses incurred by banks and merchants on all credit, debit, and prepaid general purpose and private label payment cards issued worldwide. These reached $16.31 billion last year when global card volume totaled $28.844 trillion. This means that for every $100 in volume, 5.65¢ was fraudulent. Fraud, which grew by 19%, outpaced volume, which grew by 15%.

Pmt_Card_Fraus

                 Figure 1 – Payment Card Fraud Worldwide 2015  (source – The Nilson Report)

In  2015, fraud losses incurred by banks and merchants on all credit, debit, and prepaid general purpose and private label payment cards (worldwide) reached $16 billion while global card volume totaled almost $29 trillion[1]. This means that for every $100 in volume, almost 6¢ was fraudulent. Fraud increases (up by 19%) also handily outpaced growth in transaction volume, which grew by 15%.

The US Federal Reserve defines credit card fraud as “Unauthorized account activity by a person for which the account was not intended. Operationally, this is an event for which action can be taken to stop the abuse in progress and incorporate risk management practices to protect against similar actions in the future.

The US leads the world in Payment Card fraud with 48% of the total fraud occurring in the States. The problem bedevils both Card Issuers and Merchants.High profile hacks at Target, TJX Companies and Sony Pictures etc only serve to illustrate the scale of the challenge.

20140215_FNC300_0

                    Figure 2 – US Share of Payment Card Fraud (source – The Nilson Report)

Types of Credit Card Fraud – 

The various categories of credit card fraud include – application fraud (where an unauthorized person open up a credit card using stolen personal information), lost or stolen payment card information (misplaced or stolen card details used to typically make online purchases), counterfeit cards, and account takeovers. Oftentimes Credit or Payment Card fraud also involves identity theft. According to the FTC, identity theft is escalating at 40 percent a year and is particularly problematic compared with more traditional forms of financial fraud.

fraud-type-infographic

                                              Figure 3 – US Card Fraud by Type

As can be seen from the above pie chart, the highest amount occurs via online fraud. Organized criminal organizations now resemble sophisticated and agile IT operations. Gartner reports that online fraud is 12 times more likely than offline fraud.Why is this occurring at such an alarming clip and why now?

The FTC (Federal Trade Commission) estimates that enhanced consumer access to various forms of payments, sophisticated technology &  high speed communications make it ever easier for fraudsters.

How Big Data and Hadoop change the game in Fraud Detection?

Banks are increasingly turning to predictive analytics to predict and prevent fraud in real-time. That can sometimes be an inconvenience for customers who are traveling or making large purchases, but it’s necessary inconvenience today in order for banks to reduce billions in losses.

recent WSJ Article highlights advances made in the area of fraud detection and management at Visa by using Big Data techniques. The company estimates that their models have helped identify at least $2 billion worth of annual fraud, and have also given it the chance to address those vulnerabilities before that money was lost.

In August 2011, Visa as one of the early pioneers moved to a Big Data based analytic platform that harnesses the power of Big Data. The term may not have been coined yet but it the idea was to tackle the larger and more varied sets of transaction data using intensive algorithms. underlying hardware and software that runs calculations faster and more cheaply than traditional databases or analytic engines.

Big Data is dramatically changing that approach with advanced analytic solutions that are powerful and fast enough to detect fraud in real time but also build models based on historical data (and deep learning) to proactively identify risks.

Traditional (pre-Hadoop) fraud detection systems were designed for an older era and were primarily based on Business Rules and Complex Events. However, they fall short in the following ways.

  1. Static Data Analysis  vs Advanced Predictive Analytics – Traditional systems have been focused on looking for a few static factors such as known bad IP addresses or unusual login times or excessive transaction amounts.  These systems are typically based on hardcoded business rules and a barebones eventing model. Advanced fraud detection systems augment the above approach with building models of customer behavior at the macro level. Then they would use these models ( to detect anomalous transactions and flag them as potentially being fraudulent. However, the scammers have also learnt to stay ahead of the scammed and are leveraging computing advances to come up with ever new ways of cheating the banks. To accommodate larger data sets, Visa has updated its database technology. In 2010, it began using Hadoop, a software framework that is based on open-source technology from Google. It is designed to quickly process huge amounts of information from disparate sets, and to work with clusters of lower-cost machines, instead of expensive servers[1].
  2. Scope and Precision of Data Coverage –  Big Data enables Banks to incorporate way way more information into the decisioning process than was possible before. Per Visa[1], their earlier analytic models studied as little as 2% of transaction data. Adopting Big Data provides them completeness and massive breadth of attributes of every transaction. Now the company said it endeavors to analyze all of its data. In the past, the company based its security assumptions on average fraud rates for merchant categories, like grocery stores. Now it said it can analyze the actual market, right down to individual merchant terminals. That allows it to drill down on hundreds of attributes, such as average authorization volumes, average ticket sizes and frequency of purchases that turn out to be fraudulent, the company said[1].
  3. Fraud Detection in Realtime –  As Visa points out, the ability to  analyze much  larger & richer data sets helps them identify fraud more quickly – virtually in milliseconds from the time that a payment card is used. While one transaction at a merchant might not look suspicious, a data set that includes hundreds or thousands of transactions makes it easier to spot a problem, such as a tampered PIN pad.The new analytic engine can study as many as 500 aspects of a transaction at once. That’s a sharp improvement from 2005, when the company’s previous analytic engine could study only 40 aspects at once[1]. 
  4. Fraud Detection via Machine Learning – Big Data brings along machine learning to the table. Using a variety of techniques (both supervised and unsupervised learning methods), Banks and Payment Networks can build models which can detect anomalous transactions with a very high degree of surety.  These can also be very quickly updated. From [1] –  And instead of using just one analytic model, as it did in 2005, Visa now operates 16 models, covering different segments of its market, such as geographic regions.The models can be updated much more quickly, too. An attribute can be added to a model in as little as an hour. Back in 2005, it would take two or three days to make that happen.
  5. Big Data now supports Cyber Security – As Hadoop undergoes multiple changes and evolves to becoming a true Application Platform – an important use-case emerges – Hadoop as a framework for security analytics via frameworks like OpenSOC. We will cover the detailed architecture in the next post but being able to make big data part of technical security strategy by providing a platform for the application of anomaly detection and incident forensics to the data loss problem has particular relevance to the Payment Card Fraud problem.
    In the future, Big Data will play a bigger role in authenticating users, reducing the need for the system to ask users for multiple proofs of their identify, according to Visa Richey, and 90% or more of transactions will be processed without asking customers those extra questions, because algorithms that analyze their behavior and the context of the transaction will dispel doubts. “Data and authentication will come together,” Richey said. The data-driven improvement in security accomplishes two strategic goals at once, according to Richey. It improves security itself, and it increases trust in the brand, which is critical for the growth and well-being of the business, because consumers won’t put up with a lot of credit-card fraud. “To my mind, that is the importance of the security improvements we are seeing,” she said. “Our investments in data and analysis are baseline to our ability to thrive and grow as a company.”[1]

Thus, from a pure technology stack perspective, Hadoop is emerging as the best choice for fraud detection, namely because –

  1. Hadoop (Gen 2) is not just a data processing platform. It has multiple personas – a real time, streaming data, interactive platform for any kind of data processing (batch, analytical, in memory & graph based) along with search, messaging & governance capabilities built in – all of which support fraud detection architecture patterns.
  2. Hadoop provides not just massive data storage capabilities but also provides multiple frameworks to process the data resulting in response times of milliseconds with the outmost reliability whether that be realtime data or historical processing of backend data.
  3. Hadoop can ingest billions of events at scale thus supporting the most mission critical analytics irrespective of data size.
  4. From a component perspective Hadoop supports multiple ways of running models and algorithms that are used to find patterns of fraud and anomalies in the data to predict customer behavior. Examples include Bayesian filters, Clustering, Regression Analysis, Neural Networks etc. Developers have a choice of MapReduce, Spark (via Java,Python,R), Storm etc and SAS to name a few – to create these models. Fraud model development, testing and deployment on fresh & historical data become very straightforward to implement on Hadoop.
  5. Hadoop provides a highly scalable NoSQL option – HBase. HBase has been proven to support near real-time ingest of billions of data streams. HBase provides near real-time, random read and write access to tables containing billions of rows and millions of columns.

Visa estimates that their approach model has identified $2 billion in potential annual incremental fraud opportunities, and have also given it the chance to address those vulnerabilities before that money was lost[1].

Having set the stage, the next post will present a real world reference architecture from an end to end infrastructure and application re-architecture for any organization that is considering a Big Data initiative in the area of fraud detection and prevention. 

References

[1] http://blogs.wsj.com/cio/2013/03/11/visa-says-big-data-identifies-billions-of-dollars-in-fraud/

[2] http://www.nilsonreport.com/