Author

Zhen Yang

Date of Award

Summer 2022

Document Type

Open Access Dissertation

Department

Statistics

First Advisor

Yen-Yi Ho

Abstract

This dissertation focuses on studying methods in dependence structure analysis. In particular, it consists of two topics: (1) modeling dynamic correlation in zero-inflated bivariate count data; and (2) gene co-expression latent factor analysis for cell-type clustering.

In Chapter 2, a zero-inflated negative binomial model for analyzing the dynamic correlation in zero-inflated bivariate count data is proposed. Interactions between biological molecules in a cell are tightly coordinated and often highly dynamic. As a result of these varying signaling activities, changes in gene co-expression patterns could often be observed. The advancements in next-generation sequencing tech-nologies bring new statistical challenges for studying these dynamic changes of gene co-expression. In recent years, methods have been developed to examine genomic information from individual cells. Single-cell RNA sequencing (scRNA-seq) data are count-based, and often exhibit characteristics such as over-dispersion and zero-inflation. To explore the dynamic dependence structure in scRNA-seq data and other zero-inflated count data, new approaches are needed. We consider over-dispersion and zero-inflation in count outcomes and propose a ZEro-inflated Negative binomial dynamic COrrelation model (ZENCO). The observed count data are modeled as a mixture of two components: success amplifications and dropout events in ZENCO. A latent variable is incorporated into ZENCO in order to model the covariate-dependent correlation structure. We conduct simulation studies to evaluate the performance of our proposed method and to compare it with existing approaches. We also illustrate the implementation of our proposed approach using scRNA-seq data in melanoma.

Chapter 3 proposes a cell-type clustering approach that allows for joint analysis of both expression structures and co-expression structures of the data. Due to the complex regulatory mechanisms, biological molecules in a cell often participate in complicated interaction processes. The traditional cell-type clustering approaches use average expression levels of the data and are therefore insufficient for understanding the intricate regulatory mechanisms that underlie different cellular conditions. The co-expression structures can often bring insights into the complex genetic interactions and can help detect correlation changes between pairs of genes across different modulating conditions. Therefore, the co-expression structures can help identify hidden sub-groups in the data and improve the performance of clustering. Our method learns the joint features shared among expression structures and co-expression structures of the data and identifies the unique variation present in each type of structure to further cluster the cell types. The proposed approach is applied to a breast cancer cells data set.

In Chapter 4, a subject-specific random effects model for zero-inflated count-based data is proposed. Tumor heterogeneity is very common and plays important role in therapy design. The development of scRNA-seq technologies brings new opportunities along with challenges for studying tumor heterogeneity. For scRNA-seq data, one of the main analytical approaches is differential co-expression analysis, which can reveal the intricate underlying gene regulatory mechanisms in tumor cells. In recent years, methods have been developed for modeling the dynamic changes of gene co-expression in scRNA-seq data. However, due to the heterogeneous nature of tumors, new approaches are needed. In this chapter, we propose a subject-specific random effects model for zero-inflated count-based data such as scRNA-seq data. A latent variable is incorporated into the model to quantify the correlation dependency structure. We conduct simulation studies to evaluate the performance of our proposed method and to compare it with existing approaches. We also illustrate the implementation of our proposed approach using scRNA-seq data from a study of immunotherapy resistance in melanoma tumors.

Rights

© 2022, Zhen Yang

Share

COinS