Date of Award

2023

Document Type

Open Access Dissertation

Department

Public Health

First Advisor

Bo Cai

Abstract

Variable selection is often used to identify important variables and improve model fitting. Many existing variable selection methods either do not work well for correlated data due to the violation of the independence assumption or do not provide information on the correlation patterns. Factor analysis can explore the latent structure by grouping the correlated covariates into independent factors but lacks the flexibility to accommodate nonparametric settings. In Chapter 2, a Bayesian nonparametric model is proposed in the framework of factor analysis. The developed model relaxes the normality assumption of the linear regression, allowing the grouping of correlated covariates and the selection of latent factors simultaneously.

For longitudinal and cluster data, random effects account for the within subject correlation and heterogeneity across subjects. Selecting appropriate random effects is important. In Chapter 3, a stochastic search variable selection method is applied to the decomposed component of the random effects covairance matrix under the generalized linear mixed effects model (GLMM). A penalized quasi-likelihood (PQL) approximation based Gibbs sampling algorithm is proposed for GLMMs. It is an iterative approach that provides an easy and unified sampling algorithm for GLMMs.

The performance of the proposed methods is evaluated using comprehensive simulation studies and real datasets. In the factor model, the four commonly used methods (LASSO regression, grouped LASSO regression, factor analysis, and sparse Bayesian infinite factor model) are compared. In the GLMM model, a Bayes factor based model selection method is compared.

Rights

© 2023, Yanan Zhang

Available for download on Sunday, August 31, 2025

Included in

Biostatistics Commons

Share

COinS