Date of Award
Fall 2025
Document Type
Open Access Dissertation
Department
Statistics
First Advisor
Jiajia Zhang
Abstract
Electronic health records (EHRs) provide valuable information for disease progression monitoring, which can significantly enhance disease management and reduce healthcare burdens. Each encounter records patient’s demographics, treatment progress, medications, vital signs, past medical history, immunizations, and laboratory data. This data leads to complex longitudinal EHRs data features, including irregularity, sparsity, and non-linearity. Additionally, clinical notes recorded by clinicians are unstructured. Consequently, implementing EHRs in public health research presents numerous challenges, particularly in data management and analysis. Dynamic prediction is designed to update predictions as new data becomes available. It is particularly advantageous in healthcare settings where patient data are continuously collected. The objective of this dissertation is to develop methodology for constructing dynamic prediction algorithms across three projects. These projects model longitudinal numeric and unstructured predictors with scalar outcomes or time-to-event outcomes.
The first project is motivated by COVID-19 EHRs from India. We propose a two-step landmark competing risk model that summarizes historical laboratory measurements using a functional principal component analysis (PCA) and then employs the landmark competing risk model for prediction. Different approaches for handling longitudinal observations, including baseline measurement, mean, last value carry forward, and linear regression, are adopted in the two-step estimation and compared with the proposed method via the weighted Harrell’s C-Index, multi-class area under curve, and Brier score.
The second project is motivated by the EHRs of Prisma Hospital in South Carolina. Our objective is to dynamically predict the risk of all-cause mortality among patients using a landmark large language model that deciphers the chronological comorbidity history. Longitudinal features are initially extracted from concatenated comorbidity descriptions history via Bidirectional Encoder Representations from Transformers (BERT) and its variants. Subsequently, a binary classification model is employed to predict all-cause mortality among patients.
The third project is motivated by the Medical Information Mart for Intensive Care III (MIMIC III) datasets. Our focus is on patients who have been discharged from the intensive care unit (ICU). Dynamic predictions are made regarding the time of discharge, and discharge clinical notes are utilized as predictors for the 360-day mortality of these patients. BERT and its variants were employed to decipher the clinical notes, and a Cox proportional hazards model is integrated to model the time-to-360-day mortality.
All three proposed methods are applied to the motivated data to demonstrate their practical application in real-world scenarios.
Rights
© 2025 Cai
Recommended Citation
Cai, R.(2025). Dynamic Prediction Using Complex Survival Model on Longitudinal Electronic Health Records Data. (Doctoral dissertation). Retrieved from https://scholarcommons.sc.edu/etd/8659