Space Identification of Sexual Harassment Reports with Text Mining

Document Type



Sexual harassment is an invisible problem that has been difficult to combat because victims are often reluctant to report. However, within the past years, the sheer volume of women who have spoken up about sexual harassment has brought the issue to the forefront. This change has been largely driven, in part, by Internet and social media technologies. Given the large size of data posted on these online technologies, it is impossible to manually analyze and organize it; therefore, there is a need to utilize data and text mining methods. In order to help the fight against sexual harassment, this study proposes a predictive framework to collect more than 14,000 sexual harassment reports on the everyday sexism project (ESP) website and identify the space (location) in the reports. Our framework achieves 85.33% accuracy for seven space classes including workplace, public space, home, public transport, school, university, and media. This paper also enriches experiments by merging similar classes (e.g., school and university) and applies a feature selection method to reduce the number of features for efficiency and effectiveness purposes. This enrichment process offers promising results for different sets of classes and features, ranging from 86% – 93% accuracy.

Digital Object Identifier (DOI)

APA Citation

Karami, A., Swan, S., & Moraes, M. F. (2020). Space identification of sexual harassment reports with text mining. Proceedings of the Association for Information Science and Technology, 57(1), e265.