Document Type



Introduction Linkage and retention in HIV medical care remains problematic in the USA. Extensive health utilisation data collection through electronic health records (EHR) and claims data represent new opportunities for scientific discovery. Big data science (BDS) is a powerful tool for investigating HIV care utilisation patterns. The South Carolina (SC) office of Revenue and Fiscal Affairs (RFA) data warehouse captures individual-level longitudinal health utilisation data for persons living with HIV (PLWH). The data warehouse includes EHR, claims and data from private institutions, housing, prisons, mental health, Medicare, Medicaid, State Health Plan and the department of health and human services. The purpose of this study is to describe the process for creating a comprehensive database of all SC PLWH, and plans for using BDS to explore, identify, characterise and explain new predictors of missed opportunities for HIV medical care utilisation.

Methods and analysis This project will create person-level profiles guided by the Gelberg-Andersen Behavioral Model and describe new patterns of HIV care utilisation. The population for the comprehensive database comes from statewide HIV surveillance data (2005-2016) for all SC PLWH (N approximate to 18000). Surveillance data are available from the state health department's enhanced HIV/AIDS Reporting System (e-HARS). Additional data pulls for the e-HARS population will include Ryan White HIV/AIDS Program Service Reports, Health Sciences SC data and Area Health Resource Files. These data will be linked to the RFA data and serve as sources for traditional and vulnerable domain Gelberg-Anderson Behavioral Model variables. The project will use BDS techniques such as machine learning to identify new predictors of HIV care utilisation behaviour among PLWH, and 'missed opportunities' for re-engaging them back into care.

Ethics and dissemination The study team applied for data from different sources and submitted individual Institutional Review Board (IRB) applications to the University of South Carolina (USC) IRB and other local authorities/agencies/state departments. This study was approved by the USC IRB (#Pro00068124) in 2017. To protect the identity of the persons living with HIV (PLWH), researchers will only receive linked deidentified data from the RFA. Study findings will be disseminated at local community forums, community advisory group meetings, meetings with our state agencies, local partners and other key stakeholders (including PLWH, policy-makers and healthcare providers), presentations at academic conferences and through publication in peer-reviewed articles. Data security and patient confidentiality are the bedrock of this study. Extensive data agreements ensuring data security and patient confidentiality for the deidentified linked data have been established and are stringently adhered to. The RFA is authorised to collect and merge data from these different sources and to ensure the privacy of all PLWH. The legislatively mandated SC data oversight council reviewed the proposed process stringently before approving it. Researchers will get only the encrypted deidentified dataset to prevent any breach of privacy in the data transfer, management and analysis processes. In addition, established secure data governance rules, data encryption and encrypted predictive techniques will be deployed. In addition to the data anonymisation as a part of privacy-preserving analytics, encryption schemes that protect running prediction algorithms on encrypted data will also be deployed. Best practices and lessons learnt about the complex processes involved in negotiating and navigating multiple data sharing agreements between different entities are being documented for dissemination.

Digital Object Identifier (DOI) bmjopen-2018-027688


© Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

APA Citation

Olatosi, B., Zhang, J., Weissman, S., Hu, J., Haider, M., & Li, X. (2019). Using big data analytics to improve HIV medical care utilisation in South Carolina: A study protocol. BMJ Open, 9(7) bmjopen-2018-027688.