Date of Award

2018

Document Type

Open Access Dissertation

Department

Statistics

Sub-Department

College of Arts and Sciences

First Advisor

David Hitchcock

Second Advisor

Paramita Chakraborty

Abstract

Supervised and unsupervised classification are common topics in machine learning in both scientific and industrial fields, which usually involve three tasks: prediction, exploration, and explanation. False discovery rate (FDR) theory has a close connection to classical classification theory, which must be employed in a sophisticated way to achieve good performance in various contexts. The study aims to explore novel supervised classifiers and unsupervised classification approaches for functional data and high-dimensional data in genome study by using FDR, respectively. One work develops a novel classifier for functional data by casting the classification problem into a multiple testing task, which involves using statistical depth functions. The other two works essentially deal with p-values or tail-areas by using FDR in the large scale testing problem. One work proposes a novel algorithm to yield reproducible differential expression analysis for microarray and RNA-Seq data. The proposed algorithm combines the cross-validation type subsampling and false discovery rate, where the p-values obtained from the training data are used to fit a mixture of baseline and signal distributions by using the EM algorithm, which is in turn used to screen the significance for the p-values obtained from the testing data. Another work proposes a novel weighted p-value approach to explore the association between microRNAs and COPD emphysema severity by regulating the mRNA expressions, while integrating patient phenotype information. This proposed method can be applied to study the causality between miRNA and any particular disease, by exploring the precise role of miRNA in regulating genes.

Share

COinS