Big Data Counters Payment Card Fraud (1/3)…

This article is the first installment in a three part series that covers one of the most critical issues facing the financial industry – Payment Card Fraud. Payment Cards include Credit, ATM & Debit Cards. This first post discusses the origin and scope of the problem. The next post will discuss a candidate Big Data Architecture that can help financial institutions turn the tables on Fraudster Networks. The final post will cover the evolving technology landscape in this sector – in the context of disruptive technology innovation in predictive & streaming analytics in by Big Data.

“We are confronting a criminal population that continues to improve its sophistication and its attack vectors, so we can’t stand still,”  says Ellen Richey, chief enterprise risk officer at Visa Inc.“You see the criminal capability evolving on the technology side,” she said. “They are getting into the systems of [Visa] stakeholders and other companies that process payments,  and they are able to encrypt their own movements on networks, sometimes for months, and exfiltrate the data.” (Source – The Wall Street Journal)

Payment Card fraud has mushroomed into a massive challenge for consumers, financial institutions,regulators and law enforcement. As the accessibility and usage of Credit Cards burgeons and transaction volumes increase, Banks are losing tens of billions of dollars on an annual basis to fraudsters. The annual estimate is about $189 billion as estimated by Meridian Research.

The Nilson Report  depicts the global scale of the problem as of 2015. Nilson counted the Fraud losses incurred by banks and merchants on all credit, debit, and prepaid general purpose and private label payment cards issued worldwide. These reached $16.31 billion last year when global card volume totaled $28.844 trillion. This means that for every $100 in volume, 5.65¢ was fraudulent. Fraud, which grew by 19%, outpaced volume, which grew by 15%.


                 Figure 1 – Payment Card Fraud Worldwide 2015  (source – The Nilson Report)

In  2015, fraud losses incurred by banks and merchants on all credit, debit, and prepaid general purpose and private label payment cards (worldwide) reached $16 billion while global card volume totaled almost $29 trillion[1]. This means that for every $100 in volume, almost 6¢ was fraudulent. Fraud increases (up by 19%) also handily outpaced growth in transaction volume, which grew by 15%.

The US Federal Reserve defines credit card fraud as “Unauthorized account activity by a person for which the account was not intended. Operationally, this is an event for which action can be taken to stop the abuse in progress and incorporate risk management practices to protect against similar actions in the future.

The US leads the world in Payment Card fraud with 48% of the total fraud occurring in the States. The problem bedevils both Card Issuers and Merchants.High profile hacks at Target, TJX Companies and Sony Pictures etc only serve to illustrate the scale of the challenge.


                    Figure 2 – US Share of Payment Card Fraud (source – The Nilson Report)

Types of Credit Card Fraud – 

The various categories of credit card fraud include – application fraud (where an unauthorized person open up a credit card using stolen personal information), lost or stolen payment card information (misplaced or stolen card details used to typically make online purchases), counterfeit cards, and account takeovers. Oftentimes Credit or Payment Card fraud also involves identity theft. According to the FTC, identity theft is escalating at 40 percent a year and is particularly problematic compared with more traditional forms of financial fraud.


                                              Figure 3 – US Card Fraud by Type

As can be seen from the above pie chart, the highest amount occurs via online fraud. Organized criminal organizations now resemble sophisticated and agile IT operations. Gartner reports that online fraud is 12 times more likely than offline fraud.Why is this occurring at such an alarming clip and why now?

The FTC (Federal Trade Commission) estimates that enhanced consumer access to various forms of payments, sophisticated technology &  high speed communications make it ever easier for fraudsters.

How Big Data and Hadoop change the game in Fraud Detection?

Banks are increasingly turning to predictive analytics to predict and prevent fraud in real-time. That can sometimes be an inconvenience for customers who are traveling or making large purchases, but it’s necessary inconvenience today in order for banks to reduce billions in losses.

recent WSJ Article highlights advances made in the area of fraud detection and management at Visa by using Big Data techniques. The company estimates that their models have helped identify at least $2 billion worth of annual fraud, and have also given it the chance to address those vulnerabilities before that money was lost.

In August 2011, Visa as one of the early pioneers moved to a Big Data based analytic platform that harnesses the power of Big Data. The term may not have been coined yet but it the idea was to tackle the larger and more varied sets of transaction data using intensive algorithms. underlying hardware and software that runs calculations faster and more cheaply than traditional databases or analytic engines.

Big Data is dramatically changing that approach with advanced analytic solutions that are powerful and fast enough to detect fraud in real time but also build models based on historical data (and deep learning) to proactively identify risks.

Traditional (pre-Hadoop) fraud detection systems were designed for an older era and were primarily based on Business Rules and Complex Events. However, they fall short in the following ways.

  1. Static Data Analysis  vs Advanced Predictive Analytics – Traditional systems have been focused on looking for a few static factors such as known bad IP addresses or unusual login times or excessive transaction amounts.  These systems are typically based on hardcoded business rules and a barebones eventing model. Advanced fraud detection systems augment the above approach with building models of customer behavior at the macro level. Then they would use these models ( to detect anomalous transactions and flag them as potentially being fraudulent. However, the scammers have also learnt to stay ahead of the scammed and are leveraging computing advances to come up with ever new ways of cheating the banks. To accommodate larger data sets, Visa has updated its database technology. In 2010, it began using Hadoop, a software framework that is based on open-source technology from Google. It is designed to quickly process huge amounts of information from disparate sets, and to work with clusters of lower-cost machines, instead of expensive servers[1].
  2. Scope and Precision of Data Coverage –  Big Data enables Banks to incorporate way way more information into the decisioning process than was possible before. Per Visa[1], their earlier analytic models studied as little as 2% of transaction data. Adopting Big Data provides them completeness and massive breadth of attributes of every transaction. Now the company said it endeavors to analyze all of its data. In the past, the company based its security assumptions on average fraud rates for merchant categories, like grocery stores. Now it said it can analyze the actual market, right down to individual merchant terminals. That allows it to drill down on hundreds of attributes, such as average authorization volumes, average ticket sizes and frequency of purchases that turn out to be fraudulent, the company said[1].
  3. Fraud Detection in Realtime –  As Visa points out, the ability to  analyze much  larger & richer data sets helps them identify fraud more quickly – virtually in milliseconds from the time that a payment card is used. While one transaction at a merchant might not look suspicious, a data set that includes hundreds or thousands of transactions makes it easier to spot a problem, such as a tampered PIN pad.The new analytic engine can study as many as 500 aspects of a transaction at once. That’s a sharp improvement from 2005, when the company’s previous analytic engine could study only 40 aspects at once[1]. 
  4. Fraud Detection via Machine Learning – Big Data brings along machine learning to the table. Using a variety of techniques (both supervised and unsupervised learning methods), Banks and Payment Networks can build models which can detect anomalous transactions with a very high degree of surety.  These can also be very quickly updated. From [1] –  And instead of using just one analytic model, as it did in 2005, Visa now operates 16 models, covering different segments of its market, such as geographic regions.The models can be updated much more quickly, too. An attribute can be added to a model in as little as an hour. Back in 2005, it would take two or three days to make that happen.
  5. Big Data now supports Cyber Security – As Hadoop undergoes multiple changes and evolves to becoming a true Application Platform – an important use-case emerges – Hadoop as a framework for security analytics via frameworks like OpenSOC. We will cover the detailed architecture in the next post but being able to make big data part of technical security strategy by providing a platform for the application of anomaly detection and incident forensics to the data loss problem has particular relevance to the Payment Card Fraud problem.
    In the future, Big Data will play a bigger role in authenticating users, reducing the need for the system to ask users for multiple proofs of their identify, according to Visa Richey, and 90% or more of transactions will be processed without asking customers those extra questions, because algorithms that analyze their behavior and the context of the transaction will dispel doubts. “Data and authentication will come together,” Richey said. The data-driven improvement in security accomplishes two strategic goals at once, according to Richey. It improves security itself, and it increases trust in the brand, which is critical for the growth and well-being of the business, because consumers won’t put up with a lot of credit-card fraud. “To my mind, that is the importance of the security improvements we are seeing,” she said. “Our investments in data and analysis are baseline to our ability to thrive and grow as a company.”[1]

Thus, from a pure technology stack perspective, Hadoop is emerging as the best choice for fraud detection, namely because –

  1. Hadoop (Gen 2) is not just a data processing platform. It has multiple personas – a real time, streaming data, interactive platform for any kind of data processing (batch, analytical, in memory & graph based) along with search, messaging & governance capabilities built in – all of which support fraud detection architecture patterns.
  2. Hadoop provides not just massive data storage capabilities but also provides multiple frameworks to process the data resulting in response times of milliseconds with the outmost reliability whether that be realtime data or historical processing of backend data.
  3. Hadoop can ingest billions of events at scale thus supporting the most mission critical analytics irrespective of data size.
  4. From a component perspective Hadoop supports multiple ways of running models and algorithms that are used to find patterns of fraud and anomalies in the data to predict customer behavior. Examples include Bayesian filters, Clustering, Regression Analysis, Neural Networks etc. Developers have a choice of MapReduce, Spark (via Java,Python,R), Storm etc and SAS to name a few – to create these models. Fraud model development, testing and deployment on fresh & historical data become very straightforward to implement on Hadoop.
  5. Hadoop provides a highly scalable NoSQL option – HBase. HBase has been proven to support near real-time ingest of billions of data streams. HBase provides near real-time, random read and write access to tables containing billions of rows and millions of columns.

Visa estimates that their approach model has identified $2 billion in potential annual incremental fraud opportunities, and have also given it the chance to address those vulnerabilities before that money was lost[1].

Having set the stage, the next post will present a real world reference architecture from an end to end infrastructure and application re-architecture for any organization that is considering a Big Data initiative in the area of fraud detection and prevention.