Date of Award

Fall 2024

Document Type

Open Access Dissertation

Department

Computer Science and Engineering

First Advisor

Dezhi Wu

Second Advisor

Yan Tong

Abstract

Low birthweight (LBW) is a major public health issue resulting in increased neonatal mortality and long-term health complications. Traditional LBW analysis methods, focusing on incidence rates and risk factors through statistical models, often struggle with complex unseen data, and thus, their effectiveness is limited in early prevention of LBW, requiring more advanced LBW prediction models. Therefore, this dissertation delves into this important research area by proposing and examining novel machine learning (ML) and deep learning (DL) algorithms, aiming to predict LBW more accurately during the early stage of pregnancy. This dissertation consists of three studies, strategically designed to build upon the insights and findings of the previous one. Each study contributes uniquely to the development and refinement of an increasingly sophisticated predictive framework for LBW, enhancing the overall robustness and accuracy of the model.

The first study examines the effectiveness and impact of various data rebalancing techniques for LBW prediction to solve extremely imbalanced data issues. Through this investigation, we established a foundational pipeline for LBW prediction using the patients’ pre-delivery features, paving the way for further development and refinement in subsequent studies. This study also included an extensive feature importance analysis to identify key factors in LBW classification, which are crucial to guiding targeted interventions to improve birth outcomes.

The second study aims to develop a novel longitudinal transformer-based LBW prediction framework, which integrates prenatal mothers’ historical health records and current pre-delivery data, making it possible to provide more comprehensive and relevant input features for LBW prediction. This framework’s ability to effectively process and analyze these diverse data inputs marks a more significant advancement than previous approaches that primarily focus on immediate pre-delivery factors. As a result, this enhanced model demonstrated improved accuracy of LBW predictions. Thus, it offers a more robust tool for more effective early intervention strategies.

The third study involves proposing and examining a new large language model (LLM)-based fusion framework that combines structured medical records with rich text-based data. This LLM-based approach aims to explore and optimize the strengths of both quantitative and qualitative data sources, for enhancing the predictive accuracy and explainability of the LBW prediction models. By integrating both quantitative and qualitative data types, this proposed method offers more in-depth insights into the myriad factors contributing to LBW, potentially unveiling previously unrecognized and more granular risk factors to refine the prediction models further.

In summary, this dissertation comprehensively explores advanced ML and DL algorithms to predict LBW through a series of three studies. From establishing the LBW prediction pipeline with rebalancing strategies (Study 1), developing a transformer-based approach (Study 2) to introducing a tabular-text fusion framework (Study 3), this research will contribute to a substantial advancement in prenatal care. By enabling earlier and more accurate identification of LBW risks, this work has the potential to transform early prenatal intervention strategies, leading to improved health outcomes for both mothers and their infants.

Rights

© 2025, Yang Ren

Share

COinS