MC1 -- Predicting Employee Attrition Using Machine Learning Models
Start Date
8-4-2022 10:30 AM
End Date
8-4-2022 12:15 PM
Location
URC Greatroom
Document Type
Event
Abstract
The attraction and retention of talent can be a crucial aspect of any business’ success. It is also the part that many businesses struggle with, especially during the Covid-19 pandemic and what many are calling “The Great Resignation,” (Hopkins & Figaro, 2021). This project intends to examine various factors commonly thought to be attributed to employee retention or attrition and applies machine learning technology to take a data-driven approach to decreasing turnover within organizations. By determining the effects that certain factors, or combinations of factors, have on retention, management and HR can decide where to best allocate resources to minimize costs associated with employee churn. Similar research done on this topic includes An Improved Random Forest Algorithm for Predicting Employee Turnover (Xiang, et al, 2019) and Predicting Employee Attrition Using Machine Learning Techniques (Fallucchi, et al., 2020). Both of these have fairly sizable datasets (2000 and 1500 instances respectively) and use common machine learning models like Support Vector Machine Classification, Decision Tree classifiers, and Random Forest classifiers. Our project uses many of these same models as well as some newer, more advanced ones that will all be trained on a data set of 2,000 instances. We are using an employee attrition data set from Kaggle.com which contains categorical and numerical features including the monthly hours, metrics of performance evaluation, job satisfaction, and the label, whether they are still employed with the company. We have preprocessed and cleaned the data by filling in null values, removing instances as well as adding new features using the existing ones. The main machine learning models we will use include Support Vector Machine, Decision Tree, and Ensemble Learning models like Random Forest Classifier and more advanced and efficient XGBoost. Performance of our models will be measured using measures like precision, recall, ROC curve, and AUC, and we will use the information obtained about these to tweak our hyperparameters to develop a high-performance classifier. Our models should be able to predict, based on features given, whether or not an employee will stay with the organization, but we will also use the model to find feature importance that will allow us to find the features that contribute the most to employee retention.
Keywords
Math, Computer Science, Informatics
MC1 -- Predicting Employee Attrition Using Machine Learning Models
URC Greatroom
The attraction and retention of talent can be a crucial aspect of any business’ success. It is also the part that many businesses struggle with, especially during the Covid-19 pandemic and what many are calling “The Great Resignation,” (Hopkins & Figaro, 2021). This project intends to examine various factors commonly thought to be attributed to employee retention or attrition and applies machine learning technology to take a data-driven approach to decreasing turnover within organizations. By determining the effects that certain factors, or combinations of factors, have on retention, management and HR can decide where to best allocate resources to minimize costs associated with employee churn. Similar research done on this topic includes An Improved Random Forest Algorithm for Predicting Employee Turnover (Xiang, et al, 2019) and Predicting Employee Attrition Using Machine Learning Techniques (Fallucchi, et al., 2020). Both of these have fairly sizable datasets (2000 and 1500 instances respectively) and use common machine learning models like Support Vector Machine Classification, Decision Tree classifiers, and Random Forest classifiers. Our project uses many of these same models as well as some newer, more advanced ones that will all be trained on a data set of 2,000 instances. We are using an employee attrition data set from Kaggle.com which contains categorical and numerical features including the monthly hours, metrics of performance evaluation, job satisfaction, and the label, whether they are still employed with the company. We have preprocessed and cleaned the data by filling in null values, removing instances as well as adding new features using the existing ones. The main machine learning models we will use include Support Vector Machine, Decision Tree, and Ensemble Learning models like Random Forest Classifier and more advanced and efficient XGBoost. Performance of our models will be measured using measures like precision, recall, ROC curve, and AUC, and we will use the information obtained about these to tweak our hyperparameters to develop a high-performance classifier. Our models should be able to predict, based on features given, whether or not an employee will stay with the organization, but we will also use the model to find feature importance that will allow us to find the features that contribute the most to employee retention.