Date of Award


Document Type

Open Access Dissertation



First Advisor

Lianming Wang


This dissertation discusses three important research topics on semiparametric regression analysis of panel count data and interval-censored data. Both types of data arise commonly in real-life studies in many fields such as epidemiology, social science, and medical research. In these studies, subjects are usually examined multiple times at periodical or irregular follow-up examinations. For panel count data, the response variable is the counts of some recurrent events, whose exact occurrence times are usually unknown. For interval-censored data, the response variable is the time to some events of interest, often called survival time or failure time, and the exact response time is never observed but is known to fall within some interval formed by two examination times. The primary goal for both types of data is to study effects of covariates on the response variable and can be completed by regression analysis.

Chapter 1 of this dissertation provides some detailed descriptions about panel count data and interval-censored data with several real-life examples. A literature review is conducted on existing approaches and commonly used semiparametric regression models for analyzing the two types of data. Some preliminary knowledge used in our approaches such as monotone splines and EM algorithm is also presented in this chapter.

In Chapter 2, we propose a gamma frailty non-homogeneous Poisson process model for the regression analysis of panel count data to account for the within-subject correlation. This topic is important because ignoring such within-subject correlation results in biased estimation and may lead to misleading conclusions, and literature is limited on this topic. We propose an efficient estimation approach based on an EM algorithm. Our approach is robust to initial values, converges fast, and provides variance estimate in closed form. Our approach has shown an excellent performance in estimating both regression parameters and the baseline mean function when there is indeed within-subject correlation and can also be used when such correlation does not exist. An R package PCDSpline has been developed and available on CRAN to disseminate our approach.

In Chapter 3, we study regression analysis of case 1 interval-censored data, also referred to as current status data, using the generalized odds-rate hazards (GORH) models. The GORH models are a general class of semiparametric models and have been widely used for analyzing right-censored data. However, their use for current status data is not found in the literature. We propose an efficient estimation approach with fixed p in the GORH models based on a novel EM algorithm. The proposed method is robust to initial values, fast to converge and provides variance estimates in closed form. A working model approach is proposed when true value of p is known but does not require to fit the GORH models with different p values. The proposed approach and working model strategy are evaluated and show good performance in an extensive simulation study. They are illustrated by a large real-life data set.

In Chapter 4, we study the joint modeling of panel count data and intervalcensored failure time data motivated by a real-life data set about sexually transmitted infections (STI). The failure time of interest is the time to get a new STI since the enrollment, which has an interval-censored data structure. The other response variable is the number of unprotected sex over time, which has a panel count data structure. The proposed joint analysis based on an EM algorithm is more efficient than the univariate analysis of panel count data and interval-censored data separately. The proposed joint model and approach are applied to the STI data.


© 2016, Bin Yao