This is a project by Jai Kushwaha and Richa Agarwal presented at the SAMAROH Conference 2021. It also won the “Best Paper Award” in the same.

The Banking industry generates a huge volume of data on a day-to-day basis. To differentiate themselves from the competition, banks are increasingly adopting big data analytics as part of their core strategy. Adopting strategies formed by fitting machine learning models has become necessary in almost all sectors that banks deal with. One such sector that is being explored in this research paper is the MSME sector.

The Micro, Small, and Medium Enterprises (MSME) sector has emerged as a highly vibrant and dynamic sector of the Indian economy over the last five decades. It contributes significantly to the economic and social development of the country by fostering entrepreneurship and generating the largest employment opportunities at comparatively lower capital costs, next only to agriculture. MSMEs are complementary to large industries as ancillary units, and this sector contributes significantly to the inclusive industrial development of the country. The MSMEs are widening their domain across sectors of the economy, producing a diverse range of products and services to meet the demand of domestic as well as global markets.

In this paper, the patterns of erratic EMI payment leading to delinquency are studied. The factors that make the MSME loans default are also explored. A database of 32191 MSME borrowers from 2017 to 2019 was constituted, and the borrower loan with even one missed EMI payment was identified as delinquent, while borrower loan for which interest or principal repayment has not been received for more than 90 days was considered as Loan Default. By avoiding the conundrum of extreme multicollinearity, a Logistic Regression model was built for delinquency prediction. The model was successfully able to spot 97 of the 100 delinquent accounts. Different Machine Learning techniques like Random Forest, Extreme Gradient Boosting were also applied for the Loan default model development. The best model was able to identify 75 of 100 loan defaults. Finally, a user interface was developed to allow financial institutions to consume the proposed model for identifying delinquency and potential loan default. 

This particular research project tries to answer two very fundamental questions by trying to predict loan delinquency and loan default in the MSME sector. Here, we see the application of a parametric algorithm (Logistic Regression) to understand the probable contributing factors behind loan delinquency so that we may try to change one variable (if at all possible) to see how the outcome varies, and later we see the application for modern machine learning algorithms to successfully predict the most number of loan defaults in order to identify who are going to be the defaulters. Lastly, a user interface was developed for the easy consumption of these models.

Want to work on interesting capstone projects too? Sign up for Great Learning’s PGP Data Science and Business Analytics Course and upskill today! 



Please enter your comment!
Please enter your name here

14 − three =