This is a project presented by Subramanian Gopalakrishnan, Apurva Dhingra, Sahil Linjhara, George Varghese  and  Ankush Kharbanda, PGP DSBA students in the AICTE Sponsored Online International Conference on Data science, Machine learning and its applications (ICDML-2020). A follow-up paper was published in the conference journal.

Over the last few years, the credit industry in India has experienced exponential growth and the retail loan book of Financial Institutions (FI) in India is expected to double to Rs.96 trillion by 2024. To remain competitive in retail lending, one of the major challenges faced by FI is to maximize loan amount with minimum processing time while ensuring the least number of defaults. In this study, our learners aimed to build a machine learning model to predict probability that a new applicant will default on the first EMI and to calculate the optimum Loan to Asset Value (LTV) for each applicant applying for a two- wheeler loan. The loan to Asset value (LTV) ratio is a financial term used by lenders to express the ratio of a loan to the value of an asset purchased.

The study sample belonged to a Non-Banking Financial Company (NBFC) containing details of 2,33,154 applicants who applied for a two-wheeler loan. The dataset had information on KYC details, demographics, security assets, past loan records and credit score of each applicant. In the data it was observed that 22% of the applicants had defaulted on their first EMI.

A machine learning model was built to predict the probability that a new applicant will default on their first EMI. Higher the Probability of Default (PD) of an applicant, higher is the risk of default.  Based upon the probability of default (PD), customers were bucketed into 3 buckets i.e. Category A, Category B and Category C. Category A contains applicants with low risk and Category C contains applicants with high risk. Further, an optimized LTV was defined for each category which was identified based on the historical data and business requirements in line with leading industry practices. A combination of PD and Optimized LTV might help the NBFC to verify the eligibility of the applicant with no time and reduce the number of defaults.  A prototype for an automation tool was built which recommended the optimum LTV range for a new applicant on entering the details. This tool would enable an applicant to check their applicability in terms of LTV which would in turn reduce the processing cost for an NBFC. This analysis can be further expanded to other types of loans.

Wish to work on such interesting capstone projects and learn new concepts? Upskill with Great Learning’s PGP Data Science and Business Analytics Course today and power ahead in your career. You will have access to personalized mentorship and work on industry-relevant projects with the guidance from industry experts. Feel free to reach out to us in the comments below in case of any queries.



Please enter your comment!
Please enter your name here

3 × four =