Date of Award


Document Type

Open Access Dissertation




College of Arts and Sciences

First Advisor

Yanyuan Ma


In Chapter 1, we predicted disease risk by transformation models in the presence of missing subgroup identifiers. When a discrete covariate defining subgroup membership is missing for some of the subjects in a study, the distribution of the outcome follows a mixture distribution of the subgroup-specific distributions. Taking into account the uncertain distribution of the group membership and the covariates, we model the relation between the disease onset time and the covariates through transformation models in each sub-population, and develop a nonparametric maximum likelihood based estimation implemented through EM algorithm along with its inference procedure. We further propose methods to identify the covariates that have different effects or common effects in distinct populations, which enables parsimonious modeling and better understanding of the difference across populations. The methods are illustrated through extensive simulation studies and a real data example.

In Chapter 2, we discussed a generalized partially linear single index model with measurement error, instruments and binary response. Instrumental variables are important elements in studying many errors-in-variables problems. We use the relation between the unobservable variables and the instruments to devise consistent estimators for partially linear generalized single index models with binary response. We establish the consistency, asymptotic normality of the estimator and illustrate the numerical performance of the method through simulation studies and a data example. Despite the connection to Xu et al. (2015) in its general layout, the mathematical derivations are much more challenging in the context studied here.

In Chapter 3, we investigated the errors in covariates issues in a generalized pariv tially linear model. Different from the usual literature (Ma & Carroll 2006), we consider the case where the measurement error occurs to the covariate that enters the model nonparametrically, while the covariates precisely observed enter the model parametrically. To avoid the deconvolution type operations, which can suffer from very low convergence rate, we use the B-splines representation to approximate the nonparametric function and convert the problem into a parametric form for operational purpose. We then use a parametric working model to replace the distribution of the unobservable variable, and devise an estimating equation to estimate both the model parameters and the functional dependence of the response on the latent variable. The estimation procedure is devised under the functional model framework without assuming any distribution structure of the latent variable. We further derive theories on the large sample properties of our estimator. Numerical simulation studies are carried out to evaluate the finite sample performance, and the practical performance of the method is illustrated through a data example.