Research : A Detailed Analysis of AI Models for Predicting Employee Attrition Risk
Employee attrition is one of the biggest problems of all organizations in today’s world. Typical organizations have 12-15% attrition on average. Replacing an employee is very expensive and it puts stress on the teams by impacting the morale and also resulting in unnecessary overtime for them. The average hiring cost of a software engineer is approximately $40000. In light of these facts, it is clear that organizations need to find ways to control or reduce attrition. A first step to controlling attrition lies in predicting the attrition risk of the employees. In this paper, we analyze the prominent factors impacting employee attrition using the IBM HR Analytics Data set from Kaggle with various machine learning models to predict the attrition risk. We also compare the accuracy of these models with respect to the Area Under Curve (AUC). We select the main factors affecting employee attrition by using Random Forest, and classify which types of people are more likely to quit by utilizing the Xg boost classification. We also discuss the approaches that an organization can use to keep its employees engaged.
I. INTRODUCTION
Happy and engaged employees are a crucial asset for the success of any organization. Engaged employees work with better focus and hence have higher chances of success. Successful employees achieve and exceed their goals such as development deadlines, sales, and the brand building through positive customer interactions. Organization that continually strive to minimise employee attrition can have a higher competitive advantage. Therefore, for the better development of corporation, it is essential for the leader of companies to know the main reasons why their employees choose to leave the company, then take relevant measures to improve their company’s productivity, overall workflow and business performance.
Employee attrition has been an ongoing problem across industries all over the world. Average employee attrition rate is 12 – 15% as per various surveys conducted by different bodies. When an employee leaves a company, it takes a considerable amount of time to deal with that gap and fill it. It costs nearly half the salary to replace entry level employees. On an average, it costs 33% of a worker’s annual salary to replace them if they leave.
Hiring an employee is a time-consuming and expensive process. According to Code summit, hiring a software developer can cost a total of $41049. Acquisition includes the various costs such as recruiter and recruiting costs, interviewing costs, the cost of keeping the bench strength and the productivity loss till the gap is filled. Recruiter and recruiting costs include Sourcing the profiles from various hiring sites, Job Boards, Pre-hire assessments, Recruitment technology/tooling, Cost of shortlisting by going through all the profiles, Cost of scheduling the interviews, Background verifications, Referral rewards etc. According to a Glassdoor article, the average recruitment cost of hiring an employee is $4000. Interviewing cost vary based on the organization and the seniority of the position being interviewed for. On an average, any employee in most organizations needs to go through a minimum of 3 hours of interview. Considering an average salary of $150000 PA, the cost per hours is $75. Consider that the success rate is 1 in 10. The time spent is close to $2250 for interviewing alone. Organizations typically have 2 approaches to handle attrition – keeping the bench strength for critical areas or replacement hiring for non-critcal areas or areas where secondary expertise is available. Each of these options comes with a different cost. Keeping the bench strength incurs continuing costs as the resources available are always higher than the resources required. On the other end of the spectrum, late hiring has hidden business impact, which can be considered as the cost. The delay in finding replacement, waiting for the notice period and ramp up of the new employee can be anywhere between 3 and 6 months. Delays also put pressure on the existing employees, thereby reducing their job satisfaction and making them the next attrition possible. In most of the cases, an outgoing employee’s pay is lower than the new employee being acquired. Attrition also impacts the morale of other employees.
Employee engagement is the key to retention of employees and reduce the attrition. For the past couple of decades, typical organizations depended on engagement pulse surveys for understanding employee engagements. There are a few disadvantages of these. They are manually initiated and followed up. They typically operate on samples of employees but not everyone as the analysis can only be done on finite data sets. As most of these surveys are done by dedicated vendors, they are expensive and can only be done once or twice an year. An early indication of a disgruntled employee is an opportunity to turn-around the employee. The above factors result in the inability to identify such employees in time to turn-around. As in all other fields, Artificial Intelligence can come to our rescue here. With the amounts of compute and the technologies available today, building an AI tool for this purpose is very cost effective and solves the problem of predicting the attrition risk, which is a first step to controlling employee attrition. The world during and post COVID 19 is a lot different from the pre COVID world. It has brought in unprecedented changes in employees outlook for themselves and from the organizations. Changing jobs has become easier due to various tools available and online interactions that don’t require the employee to commute a lot. The “Great Resignation” or the “Great Reshuffle” is a result of this changed outlook. This fast-changing world calls for tools that can adapt to the changes faster. In addition to faster predictability, AI tools can also adapt faster to the changes in influencing factors. Once the AI model is built, it is very easy to add or remove new features in the model once the data is available. In addition, AI can also be used for data collection process itself, which is out of the scope of this paper.
There are some existing tools to the probability of attrition of an employee based on certain factors. The attributes used by these tools are described in the later sections. Akkio.com is one such tool. Akkio’s historical data is the “IBM HR Analytics Employee Attrition & Performance” dataset from Kaggle. The data set contains synthetic data on over 2,000 employees. Some of the attributes are - employee’s wage, department, travel amount, education, overtime hours, and more. The second one is Obviously.ai. Obviously also uses IBM HR analytics dataset from Kaggle to model their tool.
Many papers dealt with building the model for the predictability of an employee leaving the company. Mohbey and Kumar in [1] used Random Forest, Naïve Bayes, Logistic Regression, SVM, and Decision Tree classifiers for this. Based on the precision, recall, and F1-score values, they concluded that Logistic Regression performed well in the attrition prediction task, and some indicators of this model are higher than those of other classifiers. A Logistic Regression classifier is used in [3] for the IBM HR database employee attrition prediction. But this paper did not filter out insignificant features. The database contained many features. Only eight employees from the 70 who are predicted to leave actually left. The other 62 samples were wrongly flagged. Reference [4] trained Logistic Regression, Random Forest, and K-Nearest Neighbour (KNN) models for attrition prediction. This paper used Principal Component Analysis (PCA) to reduce the feature space’s dimensionality.
In this paper, we aim to select the main causes that contribute to an employee’s decision to leave a company, and to be able to predict whether a particular employee will leave the company by utilizing machine learning models. We select the main factors affecting the employee attrition by using Random Forest, and classify which types of people are more likely to quit by utilizing the Xg boost classification.
II. DATA AND METHODOLOGY
Employee attrition history pertains to internal information of the company, which is difficult to obtain, given a certain degree of confidentiality. To this end, our paper used the data set disclosed by kaggle.
The IBM HR analytics dataset is a synthetic dataset given shape by IBM data science team. The dataset comprises 1471 records and 34 feature variables divided into three categories: personal information, work experience and attendance rate. Features like work-life balance and marital status of the person have been considered.
Provided below are the list of factors:
Age, Attrition (Dependent Variable), Business Travel, DailyRate, Department, DistanceFromHome, Education, EducationField, EmployeeNumber, EnvironmentSatisfaction, GenderTrainingTimes, HourlyRate, Salary, JobInvolvement, JobLevel, JobRole, JobSatisfaction, MaritalStatus, MonthlyIncome, MonthlyRate, NumberofPreviousEmployers, YearsAtCompany, YearsWithCurrentManager, YearsInCurrentRole, YearsSinceLastPromotion.
III. EXPLORATORY DATA ANALYSIS
The Overtime had a direct bearing on employee attrition with majority of employees who did not have to do overtime not leaving the company and the attrition percentage disproportionately higher in the employees who were asked to do overtime in the company.
Distance from home positively correlates with higher attrition level and more than half of the attrited employees commuted for a longer duration than those who stayed in the company, irrespective of the gender and marital status.
Employees who contributed more by working overtime are being promoted after spending on an average more number of years in a specific role, which can demoralise such employees prompting them to leave the company.
Random forests is a type of ensemble learning for classification and regression tasks created by constructing a group of decision trees at training time. For classification, the output is the class most selected by the trees. The top 15 factors that lead to attrition of the employees have been determined using the random forest model. The feature of employee working overtime is the most important feature leading to attrition among the employees.
IV. RESULTS AND EVALUATION CRITERIA
All the models are trained using the IBM HR Analytics dataset. The dataset is split into training data and test data in the ratio 4:1. The Area Under Curve (AUC) is considered to evaluate the identified models.
In the table below, different models are compared according to the chosen evaluation parameter i.e. AUC. extreme Gradient Boosting provides the best result amongst the various chosen models with an AUC of 0.845960. The benefit of using gradient boosting is that after the boosted trees are constructed, it is easy to retrieve importance scores for each attribute. The importance score indicates the usefulness of each feature within the model.
TABLE.1. MODEL COMPARISON – AUC
| Models | Area under the curve |
|---|---|
| SVM | 0.72 |
| Random Forest | 0.81 |
| Logistic Regression | 0.83 |
| eXtreme Gradient Boosting | 0.86 |
| Ensemble Average | 0.85 |
Importance is the amount each attribute split point improves the performance measure, weighted by the number of observations at the node. The Gini index can be one of the performance measures.
V. DISCUSSION AND CONCLUSION
The decision to leave a job is intricate, and people leave companies for various explicit and implicit reasons. By understanding the circumstances that surround an employee leaving a company, organisations can attempt to build a profile of someone ready to attrite, thus helping companies identify probable candidates for attrition. The methodology proposed in this paper will assist in alerting the human resources departments by providing necessary warnings about a forthcoming decision of an employee to leave the organization. Inferencing such signals through the method proposed helps organisation to predict whether there is a potential risk of employee attrition. The overtime hours, job level, and monthly income are the critical features that negatively influence employees decision to leave a company.
The length of time between promotions and overall job tenure is quite evident and if an employee has been in the same job for a longer-than-average time, it may lead to dissatisfaction and demotivation. Managers can intervene effectively by designing training programmes to help such employee upskill. Determining the average tenure for each role at the company is crucial. Some employees may be content in the current role irrespective of the tenure, other employees may be inquisitive for advancement in their careers. Other factors such as long commute and frequent overtime can factor into an employee’s decision to leave a company. If commute is being a problem for majority of employees in a company, offering flexible and work from options provide choices to employees either operate from home and on need basis shift work hours to avoid rush hour traffic. Our EDA found a strong link between excessive overtime and attrition. The employees working more than 15 hours of overtime per week were more prone to quit. While a little overtime is OK during pressure times, if the data reveals that some employees are spending countless evenings and weekends working, please intervene to see if they need additional support. By exhibiting that your company is committed to help employees manage their workload efficiently towards achieving a healthy work-life balance will improve morale and employee engagement, reducing turnover as a result.
VI. FUTURE WORK
Further research can focus on removing the imbalance in the data pertaining to each target variable, collect more data and then apply deep learning models to achieve better accuracy through effective hyper parameter tuning. Deep learning models adapt faster to the presence of influencing factors and result in a model that does not generalize too well to the training dataset.
VII. REFERENCES
- Mohbey, K.K. Employee’s Attrition Prediction Using Machine Learning Approaches. In Machine Learning and Deep Learning in Real-Time Applications; IGI Global: Hershey, PA, USA, 2020; pp. 121–128.
- Ponnuru, S.; Merugumala, G.; Padigala, S.; Vanga, R.; Kantapalli, B. Employee Attrition Prediction using Logistic Regression. Int. J. Res. Appl. Sci. Eng. Technol. 2020, 8, 2871–2875
- Frye, A.; Boomhower, C.; Smith, M.; Vitovsky, L.; Fabricant, S. Employee Attrition: What Makes an Employee Quit? Smu Data Sci. Rev. 2018, 1, 9.
- Alduayj Sarah & Rajpoot, Kashif 2018, Predicting Employee Attrition using Machine Learning, 2018 13th Int. Conf. on Innovations in Information Technology (IIT)
- Sri Harsha B, A. JithendraVaraprasad, L.V N Pavan Sai Sujith March 2020, “Early prediction of employee attrition”, INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY
- “Employee retention,” Akkio. [Online]. Available: https://www.akkio.com/applications/employee-retention. [Accessed: 05-Apr-2022].
- T. Phillips, “The cost of hiring a software developer in 2021 (the exact numbers),” CodeSubmit Blog, 02-Sep-2021. [Online]. Available: https://codesubmit.io/blog/cost-of-hiring-a-software-developer/. [Accessed: 05-Apr-2022].
- “Using HR data to predict when your best employees will leave,” Obviously.ai. [Online]. Available: https://www.obviously.ai/post/using-hr-data-to-predict-when-your-best-employees-will-leave. [Accessed: 05-Apr-2022].
- Glassdoor.com. [Online]. Available: https://www.glassdoor.com/employers/blog/talent-analytics-101/. [Accessed: 05-Apr-2022].
- Aggarwal, M. Singh, S. Chauhan, M. Sharma, and D. Jain, “Employee attrition prediction using machine learning comparative study,” in Intelligent Manufacturing and Energy Sustainability, Singapore: Springer Singapore, 2022, pp. 453–466.
- “Employee attrition in human resource using machine learning techniques,” 186.108. [Online]. Available: http://14.139.186.108/jspui/handle/123456789/32135. [Accessed: 05-Apr-2022].
- “ShieldSquare captcha,” Iop.org. [Online]. Available: https://iopscience.iop.org/article/10.1088/1757-899X/1085/1/012029/meta. [Accessed: 05-Apr-2022]