Date of Award


Document Type

Open Access Dissertation




Norman J. Arnold School of Public Health

First Advisor

James W. Hardin

Second Advisor

Bo Cai


A commonly encountered data type in real life is count data, especially in selfreported behavioral studies. One issue of the self-reported count data is the inaccuracy. In the first part of the dissertation, we are going to address one specific type of inaccuracy in bivariate count data–heaping. Copula functions are used for the formulation of the bivariate distribution. Using copula functions for solving data inaccuracy problems is still a new area, which we are going to explore in this dissertation.

We also discuss the methods for variable selection when the explanatory variables are highly correlated. In particular, our method is based on the sparse Bayesian infinite factor models (Bhattacharya and Dunson, 2011). The classic Bayesian variable selection priors are integrated into the factor analysis method. The proposed method can accommodate both binary and continuous variables.

In the last part of this dissertation, we extend the Bayesian factor models into the nonparametric setting. As sometimes the normality assumption can be too strict for the data, or there are outliers that might affect the model performance, our proposed method relaxes the normality assumption, while simultaneously groups the correlated explanatory variables. Our proposed method is one of the first explorations of allowing nonparametric assumption for in a Bayesian factor analysis setting.

Included in

Biostatistics Commons