Date of Award

Summer 2025

Document Type

Open Access Dissertation

Department

Statistics

First Advisor

Xiaoyan Lin

Abstract

This dissertation is dedicated to the analysis of arbitrarily censored data, which presents a more complex data structure than conventional right censored data or general interval-censored data. Specifically, our analyses employ two statistical models to investigate arbitrarily censored data: the Bayesian semiparametric proportional hazards model and the Bayesian discrete-time survival model.

In Chapter 2, we introduce a novel Bayesian method designed for the analysis of arbitrarily censored data within the framework of the semiparametric proportional hazards (PH) model. This proposed method adopts M-splines for modeling the baseline hazard function and I-splines for the cumulative baseline hazard function. An innovative two-stage data augmentation strategy is proposed, utilizing exponential and multinomial latent variables, which leads to a nice form of the augmented likelihood. An efficient Gibbs sampler, developed based on this augmented likelihood, facilitates straightforward posterior inferences. Simulation studies validate our method's efficacy in accurately estimating the regression parameters and survival functions. Additionally, we present a numerical comparison with existing Bayesian methods for further validation. The applicability of our approach is demonstrated through the analyses of two real datasets about colorectal cancer and childhood mortality.

In Chapter 3, we introduce a discrete-time survival model tailored for the analysis of arbitrarily censored data, a necessity given that data in real-world scenarios are predominantly collected at discrete intervals. In the proposed model, hazard probabilities are represented as the additive effects of covariates via the probit link function. Additionally, we incorporate B-splines to effectively model the time-varying effects of the covariates. To address computational challenges, we devise a novel data augmentation technique for imputing failure times in cases of left-censored and interval-censored observations. Utilizing the transformed person-period binary data and incorporating normal latent variables, an easy-to-implement Gibbs sampler is developed for the posterior computation. Simulation studies demonstrate the proposed model's performance in accurately estimating time-varying coefficients and survival functions. Furthermore, the utility and effectiveness of this method are exemplified through the application to the datasets about colorectal cancer and childhood mortality.

In Chapter 4, we extend the discrete-time survival model developed in Chapter 3 by incorporating Bayesian Additive Regression Trees (BART) to model the covariate effects on the hazard probabilities. BART, constructed as a sum-of-trees where each tree serves as a weak learner, offers enhanced modeling flexibility without relying on strong parametric assumptions. Compared to the B-spline approach in Chapter 3, BART not only captures complex nonlinear effects of covariates but also accommodates interactions among covariates. In addition, BART enables model-free variable selection through its variable importance metric, making it a powerful tool for exploratory analysis. Simulation studies confirm that the BART method is capable of accurately estimating discrete-time hazard probabilities, predictor functions, and survival functions. Although the model is primarily formulated for discrete-time survival data, we demonstrate that it can naturally handle continuous-time survival settings. In particular, the BART provides reliable estimations of survival functions when data are generated from the PH and AFT models. The discrete-time model with the Bart method is also applied to analyze colorectal cancer data and childhood mortality data to illustrate the utility and effectiveness of the method.

Rights

© 2025, Xin Zhi

Available for download on Monday, May 31, 2027

Share

COinS