# Statistical Analysis of Interval-Censored Data Subject to Additional Complications

Summer 2019

## Document Type

Open Access Dissertation

Statistics

Lianming Wang

## Abstract

Survival analysis is an important branch of statistics that studies time to event data (or survival data), in which the response variable is time to a certain event of interest. The most prominent feature of survival data is that the response is not exactly observed due to limits of the study design or nature of the event of interest. Interval-censored data are a common type of survival data and occur frequently in real life studies where subjects are examined at periodical follow ups. The response time is usually not observed, but the status of the event of interest is known at each examination time. In such cases, the response time for each subject is only known to fall within an interval formed by two examination times in which the status of the event has changed. This dissertation proposes new statistical approaches for analyzing real life interval-censored data with additional complications.

Chapter 1 provides an introduction to this dissertation. Firstly, it gives a description of interval-censored data and an explanation of how interval-censored data are obtained with some illustrative examples. Then, a widely used model, the proportional hazards (PH) model, for analyzing interval-censored data is introduced. Thirdly, some literature for fitting the PH model to interval-censored data is reviewed. Fourthly, three additional complications of the analysis of interval-censored data are presented. Lastly, real data sets are given to explain the motivations for studying these complications.

Chapter 2 of this dissertation develops an expectation-maximization (EM) algorithm for analyzing arbitrarily-censored data under the PH model. Arbitrarily-censored data refer to the data sets that include interval-censored observations and exactly observed failure times. The method developed in Chapter 2 can be considered as an extension of the paper, Wang et al. (2016). The proposed method enjoys all the good properties of Wang’s method, such as flexibility, computational efficiency, accuracy, robustness to the choice of initial values, quick convergence and closed-form variance estimation.

Chapter 3 studies current status data, a special case of interval-censored data, with informative censoring. This study was motivated by the tumor studies conducted by the National Toxicology Program (NTP). In such studies, the tumor onset time at a specific important organ of a mice or rat is usually observed but either left- or right-censored at the sacrifice time depending on whether a tumor is found there, resulting in current status data for the tumor onset time. However, the sacrifice time can be correlated to the tumor onset time because some of such animals are killed when they show symptoms of sickness or serious weight loss potential due to the exposure of the substance being tested. This leads to informative censoring problem and ignoring it may cause serious bias and misleading results. In this chapter, a new estimation approach is proposed based on an EM algorithm and has shown excellent performance in the simulation study. The new approach has many good merits such as being robust to initial values, fast to converge, and easy to implement, and providing variance estimates in closed form. The approach is illustrated by applications to two real data sets from NTP studies.

Chapter 4 studies an estimation of system reliability when the status of all components are also known. Both the system and component data are available in such situations, and all these failure times are either left-censored or right-censored at the examination time depending on whether the system and each component has failed. Different strategies are discussed for estimating system reliability: (1) use system data only and (2) use component data. When component data are used, two models are studied under different assumptions on whether component failure times are independent or correlated. A new estimation method under the gamma frailty proportional hazards model is proposed to handle the situation when the component failure times are correlated. A detailed comparison is conducted among these different strategies.