Credit Card Fraud Detection

Contributed by: Saurabh Bagchi

Introduction

Credit cards are now the most preferred way for customers to transact either offline or online. There are a number of reasons, as illustrated below, due to which consumers are slowly shifting from debit card transactions to credit cards, especially in developing countries like India.

Lucrative cashback and reward point options are present for each credit card transaction. These are generally not offered by financial institutions for debit cards.
Tie up credit cards with online and offline merchants, especially during festive seasons like Diwali, Eid, and Christmas, to offer further discounts on transactions. Several online merchants run their own promotional campaigns, which are tied up with credit cards—for example, Amazon Prime day, which happens at least once a year.
Immediate needs can be fulfilled (for example, medical emergencies, lifetime events, etc.) quickly instead of having sufficient account balance for the same. Most credit cards offer 0% EMI options, so it makes it all the more worth pursuing this goal.
Having a good credit history helps to build a good CIBIL score which then, in turn, helps customers to avail themselves better and competitive interest rates on longer-term needs like home loans or car loans.
Credit cards are tailored to suit individual customer needs. For example, customers who want to use a credit card for daily usage are usually offered a card with no annual fees or joining fees (marketed as lifetime free credit cards). On the other hand, we have premium cards with annual fees or joining fees for affluent people who offer golf membership, airport lounge access, seamless transactions at international and domestic merchants with lower transaction transfer fees, 5x to 10x reward points, etc.

However, with all these advantages, we also have the additional advantage of the ease of usage without having to carry currency around, and we can get a record of all our digital transactions through credit card statements far more easily compared with cash transactions or bank statements. One downside that has been witnessed over the past few years of this increasing digital phenomenon is the rise of fraud on the credit card. Fraud can be of several types, as we will try to understand a bit later on in this blog. You can also take up a free online credit card defaulter prediction course and enhance your knowledge about the same.

Before going into details of credit card fraud detection, let us try to understand the size of the overall credit card industry, especially in the large western economies like in some of the European countries, the US, and the UK. Below are some of the numbers around the credit card industry in general at a worldwide level.

There are 1.06 billion credit cards in use in America and 2.8 billion credit cards worldwide.
A US citizen, on average, has four active credit cards.
In the European Union (EU), the number of cards carried per person ranges from 0.8 to 3.9
In the UK, there were 32.3 million people with credit cards or charge cards in 2016, which roughly translates to 6 in every ten adults. The numbers have only grown since then from 2016 to now.
There were 368.92 billion card transactions worldwide in 2018. However, the average value per card payment is decreasing in most of the major economies, as a credit card is used more and more as a preferred financial product compared to other means. The average value per card payment drop indicates that customers are using a credit card more and more for daily use compared to one-off events like big purchases.

Impact of COVID-19

The impact of the Covid-19 pandemic worldwide has now given more impetus to digital transactions compared to cash transactions, especially as customers are more hygiene aware now

Cash volume in the UK for financial transactions has dropped by about 60% as of 2020
In the US, about 28% population has stopped using cash altogether
Online shopping increased from 19% to 28% in the UK from 2019 to 2020. It is expected to go up further in 2021

The numbers above indicate that credit card usage and popularity have gained significant ground due to the pandemic. Contactless payment has also gained a lot of impetus, with few countries making it mandatory. However, on the flip slide, travel and work-related expenses have dropped as a large part of the working population (which also coincidentally is a large part of the customer base for credit cards) is working from home instead of the office. Also, international travel has taken a severe hit due to fears related to pandemic spread.

In other words, the pandemic has really accelerated the digitization of payments. In contrast to previous years, in 2020, it was not the technology around financial transactions that changed, but the way the consumers used it. Most consumers who had never used digital transactions before we’re forced to learn and use them during the pandemic. However, with this, consumer anxiety about fraud has also increased manifold.

Credit Card Fraud

There is an explosion of demand for new payment methods. With new payment methods, we have an extremely complex backend which makes fraud detection all the more hard. We have nearly 1.8 billion Euros on average of fraudulent transactions detected in Europe every year.

Global fraud has increased by almost three times, from $9.84 billion to $32.39 billion in less than a decade (2011 to 2020). Fraud can be broadly categorized into the following types:

Card Not Present (CNP) fraud: These are mostly digital frauds for which the physical card need not be present at the point of transaction (POS). This usually means online payments. It is also known as “remote purchase” fraud.
Card Present (CP) fraud: This is as expected and, as the name suggests, the opposite of the above, for which the physical card needs to be present at the POS site.
Mail and telephone order (MOTO) fraud: Instances of stolen card details being used over the phone or mail (more popular in western countries like the US compared to India)
First-party fraud: In this case, the customer itself is the fraudster. He or she might have run into a financial crisis like loss of job, medical emergency, or sometimes actual malicious intention of not paying back the credit card bank their dues.
Identity fraud: In this case, the customer is the victim, and the fraudster is someone else. For example, says Shyam, the fraudster knows that Ram, the victim, has a very good credit score and other credentials. Shyam applies for the credit card in Ram’s name, and the bank, thinking that it is actually Ram who has applied, approves the credit card (this is more likely to happen in the US and other western countries as customers can apply for a credit card online without providing any documents, only a few personal information is required. In India, it is less likely as stringent screening processes like in-person verification, and salary verification, etc., are carried out). Shyam then intercepts the card through either change of address by setting up an online profile and starts transacting on the card. At the end of the month, Ram receives the statement in his email and refutes the charges, and this is when the bank knows that identity fraud has happened. This can occur either at the start of the credit card lifecycle, in which case it is known as a fraudulent application, or anytime in between the lifecycle, in which case it is known as account takeover.
Plastic fraud: In this scenario, it is generally one-off transactions or a few transactions that are fraudulent on the credit card instead of all the transactions. A popular example can be if Ram loses his Wifi enabled credit card at a shopping mall while having his lunch. Shyam happens to spot the credit card and, instead of volunteering to return to the rightful owner, goes on a shopping spree in the mall doing “tap and pay” or “contactless” transactions instead of providing pin (As per the latest VISA regulation in India, we can transact upto Rs 5,000 using “tap and pay “ or “contactless” without giving card pin). Ram, unfortunately, spots the transaction alerts later on his registered mobile or email and immediately blocks the card, but by then, the fraudster has done the damage. This is an example of a lost or stolen card scenario. Similarly, a counterfeit (CFT) credit card can also be created even though with current EMV chip technology, this has become tougher (it was much easier in older credit cards that did not have EMV chips and had only magnetic stripes). CNP, CP, CFT, and MOTO types of frauds belong under the larger umbrella of plastic frauds.

Credit Card Fraud in the United States

The United States has its own banking and finance system, which is different from the rest of the world. The United States is the world’s #1 in terms of the size of the economy, and it is observed that Americans have a particular affinity towards credit cards, or we can say towards credit in general. As a result of this, that country is a large target for external hackers, and credit card fraud is more likely to happen in the US than in any other part of the world.

The US reports the largest credit card losses in the world, which is close to 38.6% of the whole world. Credit card fraud is the most common form of fraud that occurs in the United States.
Credit card fraud has been on the rise year after year for the last five years. At the same time, total fraud and identity-based frauds have decreased.
CNP fraud is 81% more likely to happen in the US compared to CP fraud.
CNP fraud hit $4.57 billion in the US in 2016, rising about 34% year on year.
Georgia, Nevada, Florida, and California are the states whose resident population is highly susceptible to fraud. Florida and California also have larger avenues to spend related to travel and entertainment like Casinos and sea sports.
About 80% of the credit cards in the US have been compromised at some point in time or other.
About three-quarters of Americans (~73%) are concerned that their financial account, email, or social profiles can be hacked.

Credit Card Fraud stats in the European Union (EU)

The European Union includes all European countries (except the UK after the infamous Brexit), the Nordics, and several other key countries like Switzerland, Monaco, and Liechtenstein. These countries have their own specific rules, especially where the common currency is not Euro. We will use a financial terminology known as bps instead of percentage (For example, 0.01% translates to 1 bps, you might have heard of this when RBI cuts repo rates, etc.)

The fraud value for cards issued within Europe is estimated to be 1.8 billion Euro in 2016. ( as per European central bank)
73% of the above comes from CNP payments, 19% from transactions at POS (point of sale) terminals, and 8% from ATM (automated teller machines). It is worthwhile to note that CNP fraud has increased while card-present fraud has decreased. This suggests a migration from physical fraud to digital or online fraud, which is expected as financial institutions make digital access to credit cards easy for customer convenience. ( as per European central bank)
Portugal is the only country which is an exception which has more point of sale or POS fraud than card not present or CNP fraud ( as per European central bank)
The fraud level as a portion of the transaction value ranges from 0.5 bps (basis points) for cards issued in Poland to 7.3 bps for cards issued in the Netherlands in terms of value, and from 0.2 bps in Poland to 4.3 bps for credit cards issued in France in terms of volume. (as per European central bank)
In general, countries with voluminous card markets (high volume of transactions and greater value per transaction like the UK and France) also experience a high level of card fraud (as per the European central bank)
The Netherlands has 0.6 bps fraud, Denmark 1.3 bps, Norway 1.6 bps has the lowest ratio of fraud versus legitimate purchases, compared with 53 bps in France and 50 bps in the UK.

As per one of the reports, the Netherlands and Nordic countries are excellent examples of fraud control best practices in Europe due to well-managed fraud and risk prevention services thanks to their pan-European processors, which cover fraud prevention expertise across multiple country borders. In absolute contrast, the UK and France continue to experience higher card fraud losses, mainly from CNP fraud on internet purchases, lost or stolen card fraud, or fraud losses on domestic cards used across multiple country borders.

The share of fraud is much higher for credit cards versus debit cards, showing that fraudsters prefer to do credit card fraud than debit cards.
Fraud share can further be broken down into region-wise components, which helps us observe that the majority of fraud is within the European Union itself.
- 43% of fraud within the European Union outside the domestic country.
- 35 % of fraud is within the domestic country of the card issuer.
- 22% of fraud is outside the domestic country and European Union.

Let’s deep dive into a few specific markets to understand how the underlying fraud trends differ from each other.

Credit fraud in France

France has its own system for cards known as CB (cartes bancaires). This means it has some different rules governing payments, on top of the standard rules for European countries.

It is very much possible and feasible that these rules make committing credit card fraud within France (that is, domestic fraud) far more difficult. Compared with the UK, the fraud rates in France are still lower and sizable.

Domestic fraud losses on French cards have stabilized at about 3.2 bps.
On the other hand, the fraud rate on French cards used abroad outside European countries is 16 times higher than on domestic transactions in 2017, while for foreign cards used in France, the rate was 12.1 times higher.

Identity theft of card details accounted for 66.1% of total domestic card fraud losses in France.
The main methods of compromise responsible for fraud losses are as below:
- Lost and stolen fraud (16.3%)
- CNP fraud (72.3%) based on theft of card credentials

Together, the two categories accounted for 88.6% of losses as of 2017.

Credit Fraud in the Netherlands

The Netherlands brought in a new digital ID service (iDIN) in 2016. This collaboration between Dutch banks serves to increase online security, especially for domestic card usage.

Dutch shoppers rely far more on debit cards than credit cards, and the country also has a popular transfer service where customers can pay online from their bank account.

Overall, levels of credit card fraud in the Netherlands are low and have decreased substantially in recent years.

Fraud levels had reduced significantly from €33.3 million in 2013 to €12.6 million in 2018
39% of card fraud losses in 2018 occurred on debit cards. This is down from 57% in 2017
While debit card fraud has fallen substantially, internet banking fraud actually increased in 2018. This shows a similar trend in other western countries as well, where the fraudster prefers digital fraud over physical fraud. This was usually a result of phishing techniques, including “scam emails and text messages, fake apps, fake invoices, identity fraud, and deception of financial employees of companies (known as CEO fraud).”

This may also be somewhat thanks to the popularity of iDEAL, the bank transfer system mentioned above. Fewer customers use credit cards in general, and bank accounts themselves may be a juicier target for fraudsters since the fraudster has direct access to a known amount that can be cashed immediately.

Regardless compared to other European countries, the Netherlands can be considered a success story in tackling credit card fraud to a large extent.

Credit Card Fraud in Denmark

As explained above, we have somewhat conflicting data about Danish rates of fraud (as a percentage of total card payment value). The European Central Bank (ECB) assigns Denmark the highest ratio of fraud to total payments. Meanwhile, other important sources like Nets.eu states that it has one of the lowest ratios.

Part of the cause for this may simply be related to the timing of the reports as the ECB report was published in 2016, while Nets.eu published its report later.

CNP losses rose significantly from 2014-16 and “show no signs of slowing as fraudulent attacks continue to migrate across Europe, away from France and the UK.”
Denmark also has an abnormally high level of lost and stolen fraud (52.7% of total losses), perhaps due to high credit limits.
In Q2 2018, contactless card fraud made up 65% of all fraudulent card payments, despite only 56% of all payments being contactless. In other words, contactless represents a disproportionate amount of card fraud in Denmark.

What makes credit card fraud detection hard?

After understanding the gravity of the fraud situation worldwide, particularly in the United States and some of the major European countries, the next automatic question that comes to mind is how we prevent this fraud and the damage it causes to the overall economy and especially to the customer sentiments and trust in financial institutions. Also, let us discuss some of the challenges that we face while dealing with credit card fraud as below.

Data imbalance: The fraud and non fraud data are generally much skewed. To give an example in the sample open-source dataset that we will be dealing with here, we have 492 frauds out of a total of 2,84,807 transactions. This is roughly only 0.172% of all the transactions. So it is easy to achieve almost 99% accuracy with a naive model which just predicts all the transactions as non fraud.

Customer friction: The most likely outcome if a model predicts a current transaction as fraud is to decline the transaction outright to prevent any financial loss. However, we will soon see that it sometimes proves to be a bone of contention with genuine customers, who might get declined if the model has too many false positives or Type 1 errors. Though we may never achieve 100% accuracy in a real-world scenario, it is desirable for the model to be as accurate as possible to minimize any real customer friction.

Real-Time Detection: For most of the fraud detection models in practice they have to work under very stringent timing conditions. We can take an example of a transaction-level fraud detection model. This model has to run and give the decision as to whether the current transaction is fraud or not within a fraction of a second. If we employ a time-consuming but highly accurate model, we might irritate the customer who is waiting to do the transaction, and if we process too fast, we may improve on customer experience, but it might lose out on accuracy. So it is a very thin line that we have to tread on while developing such fraud detection models.

Let us now discuss the dataset that we will be working with to build models and decide their effectiveness in fraud detection.

Dataset description

The dataset contains transactions made by credit cards in September 2013 by European cardholders.

This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced. The positive class (fraud transactions) accounts for 0.172% of all transactions.

It contains only numeric input variables, which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, the original features and more background information about the data are not provided. Features V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are ‘Time’ and ‘Amount.’ Feature ‘Time’ contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature ‘Amount’ is the transaction Amount. This feature can be used for example-dependent cost-sensitive learning. Feature ‘Class’ is the response variable, and it takes value 1 in case of fraud and 0 otherwise.

Given the class imbalance ratio, it is recommended to measure the model accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification. The source code for all the analysis done here can be found in GitHub. The platform to run the analysis was Google Colab.

Exploratory Data Analysis and other findings

As noted earlier, the dataset is highly skewed, which can be seen from the below bar plot. Only 492 (or 0.172%) of the transactions are fraudulent. That means the data is highly unbalanced with respect to the target variable Class.

Fraudulent transactions have an even distribution compared to non-fraudulent transactions. It means that fraudsters continue to operate at abnormal times as well, compared to the real customers who mostly transact between business hours.

Genuine or real transactions have a larger average value, larger first quartile or Q1, smaller third quartile, and fourth quartile or Q3, Q4 respectively, besides large outliers. Fraud transactions, on the other hand, have a smaller first quartile (Q1) and mean larger fourth quartile (Q4), and smaller outliers.

Let us now plot the number of fraud transactions by time; we see that a lot of these transactions are outliers, which means that fraud is skewed towards larger transactions.

Correlation among independent variables

We can see that there is no correlation between the independent variables, and that is expected as the data providers have done PCA on the input variables except Time and Amount (PCA -> Principal component analysis). There are some small correlations between some of the independent variables and Time (inverse correlation with V3) and Amount (good correlation with V7 and V20, inverse correlation with V1 and V5).

Let us try to observe the correlated and inverse correlated values on the same plot. Also let us first start with direct correlated value tuple (V20, Amount) and the other tuple (V7, Amount).

We can confirm that the two independent variable tuples are correlated (the linear regression lines for Class 0 has a positive slope, while the regression line for Class 1 has a smaller positive slope). Let us now plot the inverse correlated values.

We can observe that the two tuples of independent features are inversely correlated (the regression lines for class = 0 have a negative slope while the regression lines for class = 1 have a very small negative slope). We will now visualize the density plots for each of the independent variables by the fraud type.

For some of the independent variables, we can see a good separation between the distributions of the two classes that is fraud versus nonfraud. For example, V4 and V11 have different distributions for both classes. V12, V14, and V18 are somewhat separated. V1, V2, V3, V10 have quite a distinct distribution, while V25, V26, V28 have similar distribution for both the classes.

As a general statement, with the exception of the two independent variables, Time and Amount, the distribution for non-fraud transactions is centered on zero, sometimes with a long tail for the extreme or outlier observations. At the same time, fraud transactions have a skewed distribution.

After doing extensive exploratory data analysis, it is time we move to actual predictive modeling. Let us first define the independent variables that we would use. In this case, we do not have any categorical value as part of the independent variable list, so we do not have to do additional steps like one-hot encoding, etc.

We will split the data into three sets as follows

Train data -> 60%
Validation data -> 20%
Test data -> 20%

Also Read: 170 Machine Learning Interview Questions and Answer for 2021

Logistic Classifier

For building a logistic classifier, we take all the independent variables and then fit the logistic regression on top of it. The results of the logistic regression are given below. We see that some of the variables namely V12, V23 and V24 have p values greater than 0.05 (which means that there is a greater than 5% chance that the variable has appeared to be significant in the model due to chance).

In the next iteration of the model, we would then remove these variables and fit the model again.

Now, we see that all the variables in the model have a p-value less than 0.05 which is expected.

Let us plot the feature importance of the independent variables. We are taking the absolute value of t statistic to get the feature importance on the assumption that larger the t value more important is the variable.

We can see that the most important values are Time, V3, V2, and Amount.

Let us now plot the confusion matrix, which we can see below. As we can see, an imbalanced data confusion matrix is not a very good tool to judge if the model is good or bad. We instead rely on the AUC metric which in the case of a logistic classifier is giving a value of about 0.81 (The closer the value is to 1 the better model t is, while 0.5 means a completely random model). Since 0.81 is closer to 1, we see or observe that the logistic classifier model does a decent job at an overall level.

Random Forest Classifier

Let us first set the model parameters and run a model using the training data. Subsequently, we will be using the validation data. Our validation criteria would be the GINI index, where GINI is defined as two * (AUC) -1, where AUC is the abbreviated version of the area under the curve. We set the number of estimators to be 100 and the number of parallel jobs as 4.

Let us plot the feature importance of the independent variables.

It seems that the most important independent variables are V17, V12, V14, V16, and V11.

Let us plot a confusion matrix of how the results look like.

As we can see, an imbalanced data confusion matrix is not a very good tool to judge if the model is good or bad. We instead rely on the AUC metric, which in the case of a random forest classifier, is giving a value of about 0.85 (The closer the value is to 1, the better model t is, while 0.5 means a completely random model). Since 0.85 is closer to 1, we see or observe that the random forest classifier model does a decent job at an overall level.

Adaboost classifier

Adaboost classifier stands for the adaptive boosting classifier. We use mostly the default parameters and fit the model. We then plot the feature importance as per the model

Now we can plot the confusion matrix for the model. It is given below for reference.

We calculate the AUC (Area under the curve) and observe that the value is 0.83, which is a decent value for the model, but let us see if we can improve more through some other models. We note that this value is lower than the one obtained earlier for the random forest.

Catboost classifier

Catboost classifier is a gradient boosting classifier for decision tree algorithms with support of handling categorical data. We use mostly the default parameters and fit the model. We then plot the feature importance as per the model.

Let us now plot the confusion matrix for the model and observe it.

The AUC (Area under the curve) metric for this model is 0.86 which is the best so far. Slightly better than random forest(AUC=0.85) and much better than Adaboost (AUC=0.83).

XGBoost

XGBoost is a gradient boosting algorithm. Let us prepare a model similar to previous ones. We use mostly the default parameters and fit the model. We then plot the feature importance as per the model

The AUC (Area under the curve) for this model is 0.97 which is the best obtained so far.

LightGBM

Let us try another gradient boosting algorithm, LightGBM and as done previously let us again build a model. We use mostly the default parameters and fit the model. We then plot the feature importance as per the model

The AUC (Area under the curve) value is 0.95 which is slightly worse than the XGBoost model but much better than the other models.

Ensemble Model

We can make an ensemble model of a few of the models like Logistic Classifier, Random Forest, K nearest neighbor, and Decision Tree see if the ensemble model performs better than the individual model. Generally, in practice also if the majority of the models predict a particular transaction as fraud, it is highly likely that it ultimately turns out to be a fraud. One downside is that the processing time of the model takes a hit even though the accuracy improves.

Training and validation using cross validation

We can now use cross-validation. We will use cross-validation (KFolds) with five folds which is generally the standard practice. Data is thus divided into five-folds and, by rotation, we are training using four folds (n-1) and validating using the 5th (nth) fold.

The test set is then calculated as an average of the predictions.

The AUC score for the prediction from the test data was 0.93. We prepare the test prediction from the averaged predictions for the test over the five folds. Below we prepare a results table with all the model AUC values consolidated so far.

Model Name	Area under the curve (AUC)
Logistic Classifier	0.81
Random Forest Classifier	0.85
Adaboost Classifier	0.83
Catboost Classifier	0.86
XGboost Classifier	0.97
LightGBM Classifier	0.95
Ensemble Model	0.82

Conclusion

We investigated the data, checking for data unbalancing, visualizing the features, and understanding the relationship between different features. We then investigated two predictive models. The data was split into three parts, a train set, a validation set, and a test set. For the first three models, we only used the train and test set.

We started with Logistic Classifier and then with RandomForrestClassifier, for which we obtained an AUC code of 0.81 and 0.85, respectively, when predicting the target for the test set.

We followed with an AdaBoostClassifier model, with a lower AUC score (0.83) for the prediction of the test set target values.

We then followed with a CatBoostClassifier, with the AUC score after training 500 iterations 0.86.

We then experimented with an XGBoost model. In this case, we used the validation set for validation of the training model. The best validation score obtained was 0.984. Then we used the model with the best training step to predict the target value from the test data; the AUC score obtained was 0.974.

We then presented the data to a LightGBM model. We used both train-validation split and cross-validation to evaluate the model effectiveness to predict the ‘Class’ value, i.e., detecting if a transaction was fraudulent. With the first method, we obtained values of AUC for the validation set around 0.974. For the test set, the score obtained was 0.946.

With the cross-validation, we obtained an AUC score for the test prediction of 0.93.

Future Work

One additional work that could have been achieved but could not be completed due to time crunch was using neural nets to see if we could further improve the model results. Also, if we can have training time for each of the models, it will give us another dimension to select the model based on better AUC and lesser training time.

If you found this insightful and wish to learn more, upskill with Great Learning’s PGP- Machine Learning Course today!

Also Read: What is Machine Learning?