Author

Zichen Ma

Date of Award

Spring 2021

Document Type

Open Access Dissertation

Department

Statistics

First Advisor

Yen-Yi Ho

Second Advisor

Joshua M. Tebbs

Abstract

This dissertation focuses on studying the association between random variables or random vectors from the Bayesian perspective. In particular, it consists of two topics: (1) hypothesis testing for the independence among groups of random variables; and (2) modeling the dynamic association between two random variables given covariates.

In Chapter 2, a nonparametric approach for testing independence among groups of continuous random variables is proposed. Gaussian-centered multivariate finite Polya tree priors are used to model the underlying probability distributions. Integrating out the random probability measure, a tractable empirical Bayes factor is derived and used as the test statistic. The Bayes factor is consistent in the sense that it tends to infinity under the alternative hypothesis and zero under the null. A $p$-value is then obtained through a permutation test based on the observed Bayes factor. Through a series of simulation studies, the performance of the proposed approach is examined and compared to several existing approaches based on the power of the test and the observed Bayes factor. Based on these comparisons, it showed its superiority over other methods in all bivariate cases we considered and several higher-dimensional situations. Finally, the proposed method is applied to a set of real data in ecology, where we test whether the spread of a specific disease among amphibians in an area exhibits spatial-temporal dependency.

Chapter 3 proposes three approaches to analyzing the dynamic association among multivariate count data. A direct approach utilizes a bivariate negative binomial probability mass function developed in Famoye (2010, Journal of Applied Statistics). The second approach models bivariate count data indirectly using a bivariate Poisson-gamma mixture model. The third approach is a bivariate Gaussian copula model. In all three cases, the marginal means and the correlation are simultaneously regressed onto covariates. Based on the simulation results, the indirect and copula approaches perform better overall than the direct approach in terms of model fitting and identifying covariate-dependent association. The proposed approaches are applied to two RNA-sequencing data sets for studying breast cancer and melanoma (BRCA-US and SKCM-US) from The Cancer Genome Atlas.

With the recent advance in technologies to profile multi-omics data at the single-cell level, integrative multi-omics data analysis has been increasingly popular. For example, information such as methylation changes, chromatin accessibility, and gene expression are jointly collected in a single-cell experiment. These different data types often have distinct marginal distributions. Chapter 4 extends the Gaussian copula model in Chapter 3 to this multi-omics setting and proposes a flexible copula-based framework to study the dynamic association across different data types. This approach can incorporate a wide variety of marginal distributions, including the class of zero-inflated distributions. A Gaussian copula is used to jointly model variables with different marginal distributions while accommodating flexible correlation structure. We present a Markov chain Monte Carlo sampling algorithm to estimate the parameters. The usefulness of the proposed framework is demonstrated through a series of simulation studies. Finally, it is applied to a set of real data to investigate the dynamic relationship between single-cell RNA-sequencing, chromatin accessibility, and DNA methylation at different germ layers during mouse gastrulation.

Share

COinS