Yizheng Wei

Date of Award

Fall 2020

Document Type

Open Access Dissertation



First Advisor

Yanyuan Ma


Chapter 1 of this dissertation proposes a consistent and locally efficient estimator to estimate the model parameters for a logistic mixed effect model with random slopes. Our approach relaxes two typical assumptions: the random effects being normally distributed, and the covariates and random effects being independent of each other. Adhering to these assumptions is particularly difficult in health studies where in many cases we have limited resources to design experiments and gather data in long-term studies, while new findings from other fields might emerge, suggesting the violation of such assumptions. So it is crucial if we could have an estimator robust to such violations and then we could make better use of current data harvested using various valuable resources. Our method generalizes the framework presented in Garcia & Ma (2016) which also deals with a logistic mixed effect model but only considers a random intercept. A simulation study reveals that our proposed estimator remains consistent even when the independence and normality assumptions are violated. This contrasts from the traditional maximum likelihood estimator which is likely to be inconsistent when there is dependence between the covariates and random effects. Application of this work to a Huntington disease study reveals that disease diagnosis can be further improved using assessments of cognitive performance.

When a model of main research interest shares partial parameters with several other models, it is of benefit to incorporate the information contained in these other models to improve the estimation and prediction for the main model of interest. Various methods are possible to make use of the additional models as well as the additional observations related to these models. In Chapter 2, we propose an optimal strategy of doing so in terms of prediction. We develop a fusion learning method that fuses the model averaging methodology with meta analysis and obtain the optimal weights. We also establish theory to support the method and show its desirable properties both when the main model is correct and when it is incorrect. Numerical experiments including simulation studies and data analysis are conducted to demonstrate the superior performance of our methods. In Chapter 3, we propose a new pseudo-likelihood approach to fitting logistic regression models with two-phase data that has incomplete data structure. The existing methods included inverse probability weighted (IPW) methods, pseudo-likelihood (PL) methods, and maximum likelihood (ML) methods. MLEs either require that the complete phase I covariates be discrete with a small number of levels or of low dimension, or the continuous phase I covariates could be stratified properly. Therefore, they may not be able to make full use of the complete covariate information. In comparison, our method does not require to stratify the continuous phase I covariates, and is more resilient to the misclassified phase I covariates. And it could handle a relatively larger number of phase I covariates when the sample size is relatively small, in this case, MLEs may not have enough samples in certain strata to obtain a valid estimation.