Date of Award
Campus Access Thesis
Epidemiology and Biostatistics
A variable selection approach was investigated through simulations and applied to DNA methylation data generated from the Isle of Wight birth cohort study. The approach featured the use of clustering with a penalty function to select informative variables. We evaluated the method by conducting simulations for a variety of scenarios. For clustering, we evaluated sensitivity and specificity, and for variable selection, we evaluated the percentages of correct selection, overselection, underselection, and partial selection. The method was tested for robustness in the case of moderate correlation between variables. In real data application, the method identified 33 informative DNA methylation CpG sites out of 38 candidate sites. The method was compared to association-based variable selection methods: linear regression, LASSO, and adaptive LASSO. No methylation sites were identified. This variable selection method is suitable for data preprocessing and will benefit next step association studies in terms of testing power.
Ray, G.(2012). Variable Selection for DNA Methylation Data Using Model-Based Clustering. (Master's thesis). Retrieved from https://scholarcommons.sc.edu/etd/554