Date of Award


Document Type

Campus Access Thesis


Epidemiology and Biostatistics



First Advisor

Hongmei Zhang


A variable selection approach was investigated through simulations and applied to DNA methylation data generated from the Isle of Wight birth cohort study. The approach featured the use of clustering with a penalty function to select informative variables. We evaluated the method by conducting simulations for a variety of scenarios. For clustering, we evaluated sensitivity and specificity, and for variable selection, we evaluated the percentages of correct selection, overselection, underselection, and partial selection. The method was tested for robustness in the case of moderate correlation between variables. In real data application, the method identified 33 informative DNA methylation CpG sites out of 38 candidate sites. The method was compared to association-based variable selection methods: linear regression, LASSO, and adaptive LASSO. No methylation sites were identified. This variable selection method is suitable for data preprocessing and will benefit next step association studies in terms of testing power.