Credit Score Prediction using Machine Learning

Credit Score Prediction using Machine Learning

In today's world, credit scores are essential to determine creditworthiness for lending institutions, and they impact everything from getting a mortgage to renting an apartment. With the rise of big data and machine learning, the credit scoring process has been revolutionized, making it more accurate and efficient. Machine learning algorithms have the ability to analyze vast amounts of data and provide more accurate predictions than traditional credit scoring models. This article will explore credit score prediction using machine learning, including its benefits and challenges.

Credit Score and its Importance

Credit Score Prediction using Machine Learning

A credit score is a numerical representation of a person's creditworthiness based on their credit history, income, and other financial factors. It is a critical factor for lenders and credit card companies when deciding whether to approve a loan or extend credit. A credit score ranges from 300 to 850, with higher scores indicating better creditworthiness. A good credit score is typically above 700, while a score below 600 is considered poor.

Benefits of Machine Learning in Credit Scoring

Machine learning algorithms have revolutionized credit scoring by providing more accurate predictions of creditworthiness. Machine learning models are trained on vast amounts of data, enabling them to identify patterns and make more accurate predictions than traditional credit scoring models. Machine learning algorithms can also take into account a broader range of data, including non-traditional data sources such as social media, to make more accurate predictions.

One of the main benefits of machine learning in credit scoring is its ability to reduce bias. Traditional credit scoring models often have inherent biases based on factors such as race or gender. Machine learning algorithms are designed to be unbiased, as they are trained on data and do not incorporate any preconceived biases. This results in fairer credit-scoring decisions.

Machine learning algorithms are also more efficient than traditional credit scoring models. They can analyze vast amounts of data in a matter of seconds, providing near-instantaneous credit-scoring decisions. This makes the lending process faster and more efficient for both borrowers and lenders.

Challenges of Machine Learning in Credit Scoring

While machine learning has many benefits for credit scoring, there are also challenges to consider.

  • One of the main challenges is the complexity of machine learning models. Machine learning algorithms are often black boxes, making it difficult for lenders to understand how the algorithm arrived at its credit scoring decision. This can make it difficult for borrowers to understand why they were denied credit or how they can improve their credit scores.
  • Another challenge is the need for large amounts of high-quality data. Machine learning algorithms rely on large amounts of data to make accurate predictions. However, if the data is of poor quality or limited in scope, the algorithm may not be able to make accurate predictions.
  • Privacy is also a concern when using machine learning algorithms for credit scoring. Machine learning models require access to personal and financial data, which can be a concern for borrowers. Lenders must take steps to ensure that borrower data is protected and secure.

Python Implementation

Now we will try to implement it in the code.

Objective

Based on a client's monthly customer profile, the goal is to estimate the likelihood that they won't pay off their credit card bill in the future. The binary target variable is derived by tracking performance over the 18 months following the most recent credit card statement, and a default event is deemed to have occurred if the consumer does not make the required payment within 120 days of the statement date.

About Data

Each customer's aggregated profile characteristics at each statement date are contained in the dataset. Features fall into the following broad groups after being anonymized and normalized:

  • D_* = Delinquency variables
  • S_* = Spend variables
  • P_* = Payment variables
  • B_* = Balance variables
  • R_* = Risk variables

The following features are categorical:

  • Importing Libraries
  • Loading Data

Output:

Credit Score Prediction using Machine Learning

We have 5531451 rows and 190 columns in the Training dataset.

Output:

Credit Score Prediction using Machine Learning
  • Target Distribution

Output:

Credit Score Prediction using Machine Learning

Here ,0 --> Non Default and 1 --> Default

  • EDA

Output:

Credit Score Prediction using Machine Learning

We can see that there are no user intersections in the train-test data.

Output:

Credit Score Prediction using Machine Learning

Output:

Credit Score Prediction using Machine Learning

Also not intersect the timeline.

Output:

Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning

We can see that test profiles increased from October to April while train profiles remained consistent.

Output:

Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning

We can see that the distributions of the train and test profile lengths are similar.

  • Feature Selection
  • Information Value

Information value is one of the most useful techniques for selecting important variables in a predictive model. It helps to rank variables on the basis of their importance.

Credit Score Prediction using Machine Learning

If the IV statistic is:

  • Less than 0.02, then the predictor is not useful for modeling (separating the Good from the Bads)
  • 02 to 0.1, then the predictor has only a weak relationship to the Goods/Bads odds ratio
  • 1 to 0.3, then the predictor has a medium strength related to the Goods/Bads odds ratio
  • 3 to 0.5, then the predictor has a strong relationship to the Goods/Bads odds ratio.

Now, for selecting features, we are calculating IV values for each feature.

Output:

Credit Score Prediction using Machine Learning

Output:

Credit Score Prediction using Machine Learning

Output:

Credit Score Prediction using Machine Learning

We can observe that the top 75 features have > 0.5 IV value, so those top 75 IV value features are strong predictors.

Weight of Evidence

The weight of evidence indicates how well an independent variable may predict the dependent variable. It is frequently referred to as a measure of the separation of good and poor consumers because it developed from the realm of credit scoring. Customers who miss a loan payment are referred to as "Bad Customers." and "Good Clients" are those who repaid their loans.

Credit Score Prediction using Machine Learning
  • Distribution of Goods - % of Good Customers in a particular group
  • Distribution of Bads - % of Bad Customers in a particular group
  • ln - Natural Log

Steps of Calculating WOE

  1. For a continuous variable, split data into ten parts (or lesser depending on the distribution).
  2. Calculate the number of events and non-events in each group (bin)
  3. Calculate the % of events and % of non-events in each group.
  4. Calculate WOE by taking the natural log of a division of % of non-events and % of events.

For one feature, we'll try to describe woe values and a woe plot.

Output:

Credit Score Prediction using Machine Learning
  • P_2 is a continuous feature, so we split it into 15 bins
  • each bin has non-event and event counts and rates
  • each bin has WOE, and IV values
  • for missing values it's created in the 16th bin

Output:

Credit Score Prediction using Machine Learning
  • from this woe plot, we can observe that while increasing bins, the event rate decrease
  • you can observe that the black dotted line that is positively correlated with the target

Output:

Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning

Selecting features that have IV values > 0.5.

Correlation Heap

Output:

Credit Score Prediction using Machine Learning

Output:

Credit Score Prediction using Machine Learning
  • Modeling

Output:

Credit Score Prediction using Machine Learning

Output:

Credit Score Prediction using Machine Learning

Output:

Credit Score Prediction using Machine Learning

Output:

Credit Score Prediction using Machine Learning
  • We can observe that P_2 features Points (score).
  • while increasing bins, the score also increasing
  • for example, if the user P_2 value is 0.73, then that user belongs to the 7th bin corresponding score is 22.45
  • Metrics

Output:

Credit Score Prediction using Machine Learning

We have got a accuracy of 75 percent.

Output:

Credit Score Prediction using Machine Learning
Credit Score Prediction using Machine Learning

Output:

Credit Score Prediction using Machine Learning

Output:

Credit Score Prediction using Machine Learning

Output:

Credit Score Prediction using Machine Learning

We have an accuracy of 87%

Output:

Credit Score Prediction using Machine Learning
  • Scores

Output:

Credit Score Prediction using Machine Learning
  • we can observe that default vs non-default score distribution
  • some overlap between 550 to 650
  • overall well separated
  • The number of non-default is higher than the default.

Output:

Credit Score Prediction using Machine Learning

We can see that a lot of scorecards are in the range of 650-750.

Output:

Credit Score Prediction using Machine Learning

Conclusion

Machine learning algorithms have revolutionized the credit scoring process by providing more accurate and efficient credit scoring decisions. They are able to analyze vast amounts of data and identify patterns to make more accurate predictions. However, there are challenges to consider, such as the complexity of machine learning models and the need for large amounts of high-quality data. Privacy is also a concern, and lenders must take steps to ensure that borrower data is protected and secure. Despite these challenges, the benefits of machine learning in credit scoring are clear, and it is likely that machine learning will continue to play an increasingly important role in credit scoring in the future.