Date of Award
Open Access Dissertation
The recent emergence of single cell sequencing (SCS) technology has provided us with single-cell DNA or RNA sequencing (scDNA/RNA-seq) information to investigate cellular evolutionary relationships. Despite many analysis methods have been developed to infer intra-tumor genetic heterogeneity, cluster cellular subclones, detect genetic mutations, and investigate spatially variable (SV) genes, exploring SCS data remains statistically challenging due to its noisy nature.
To identify subclones with scDNA-seq data, many existing studies use an independent statistical model to detect copy number profile in the first step, followed by classical clustering methods for subclone identification in downstream analyses. However, spurious results might be generated in this two-step clustering strategy due to the falsely identified copy number aberrations (CNAs) in the first copy number profiling step. Furthermore, although advances in spatial transcriptomics enable gene expression profiling with molecular resolution while preserving spatial information of the tissue, it is still challenging to identify spatially variable (SV) genes by modeling transcriptomic data with hundreds of spatial locations.
To address these issues, we developed two methods. First, we developed a subclone clustering method based on a fused lasso model, referred to as FLCNA, which can simultaneously detect CNAs with scDNA-seq data. Extensive simulations and a real data application have demonstrated the desirable performance of FLCNA to cluster subclones and estimate copy number profiles in scDNA-seq data. Second, we developed SPADE, a spatial pattern and differential expression analysis method, to accurately identify SV genes within or between groups in spatial transcriptomic data. To facilitate the application of these two methods, an R package has been developed for each method, respectively.
Our investigation into the analysis of SCS data is expected to help investigators gain deep insights into various single cell studies, ultimately improving the understanding of cellular evolution and designs of treatment approaches for various diseases.
Qin, F.(2023). Statistical Methods for Single Cell Sequencing Data Analysis. (Doctoral dissertation). Retrieved from https://scholarcommons.sc.edu/etd/7433