Date of Award
1-1-2012
Document Type
Campus Access Thesis
Department
Epidemiology and Biostatistics
Sub-Department
Biostatistics
First Advisor
Hongmei Zhang
Abstract
A variable selection approach was investigated through simulations and applied to DNA methylation data generated from the Isle of Wight birth cohort study. The approach featured the use of clustering with a penalty function to select informative variables. We evaluated the method by conducting simulations for a variety of scenarios. For clustering, we evaluated sensitivity and specificity, and for variable selection, we evaluated the percentages of correct selection, overselection, underselection, and partial selection. The method was tested for robustness in the case of moderate correlation between variables. In real data application, the method identified 33 informative DNA methylation CpG sites out of 38 candidate sites. The method was compared to association-based variable selection methods: linear regression, LASSO, and adaptive LASSO. No methylation sites were identified. This variable selection method is suitable for data preprocessing and will benefit next step association studies in terms of testing power.
Rights
© 2012, Genevieve Ray
Recommended Citation
Ray, G.(2012). Variable Selection for DNA Methylation Data Using Model-Based Clustering. (Master's thesis). Retrieved from https://scholarcommons.sc.edu/etd/554