online shopping

This is a project presented by Sahil Sachdeva, Amitesh Bajpai, Kushal Maheshwari, Amit Sharma and Neeraj Sharma, PGP DSBA students in the AICTE Sponsored Online International Conference on Data science, Machine learning and its applications (ICDML-2020). A follow-up paper was published in the conference journal.

With e-commerce becoming more and more prevalent in today’s economy, businesses within this sector need to understand what factors influence a visitor to transform into a purchaser. In their study, our learners aimed to build a machine learning set-up for the online shopping environment to predict the purchase intentions of prospective buyers through various analytical models. The sample data had the information of 12330 users, each containing metrics of web visits of a user within a one-year timeframe out of which 15% users have made a purchase. The dataset had Administrative, Product, Demographic and Navigation Information for each user such as different types of pages visited by the visitor in a session and total time spent in each of these page categories. Information like bounce rate, exit rate and page value features are also available. The bounce rate feature indicates the percentage of visitors who entered the site from a particular page at the site and left the website without any activity. The exit rate of a page is the percentage of users who have their last session on that page. The page value indicates the average value of a page that visitors have visited before purchasing any product.

The users are classified on the basis of their revenue-generating propensity, and multiple ML models are applied, including logistic regression, Support Vector Machine, Ada boost, Voting Classifier, to predict their intention to purchase. Issues of class imbalance are dealt with by techniques such as Random oversampling and SMOTE. The best model is selected based on the F1 score, cross-validation accuracy, and cross-validation ROC AUC. Overfit models are discarded. The study suggested a strong relationship between Purchase Intention of Online Shoppers with Administrative, Product, Demographic and Navigation variables of users. A high level analysis suggests that users are spending a lot of time on Administrative pages like login, logout, password recovery, profile, email wish list, etc., which is required to be reduced which will help a user to spend more time on Product Related pages. 

The study also recommended having targeted strategies for the returning users as they contributed more to the revenue generation. The seasonal peaks in the revenue trends were also identified along with the regions accounting for maximum revenue potential. 

Upskill with Great Learning’s PGP Data Science and Business Analytics Course today and power ahead in your career. The course is designed for working professionals. You will have access to personalized mentorship and work on industry-relevant projects with the guidance from industry experts. Feel free to reach out to us in the comments below in case of any queries.



Please enter your comment!
Please enter your name here

fourteen + 15 =