Date of Award

8-16-2024

Document Type

Open Access Dissertation

Department

Statistics

First Advisor

Xianzheng Huang

Abstract

Directional data arise in many scientific fields such as meteorology, oceanography, geology, zoology, and biomechanics. Geometrically, directional data lie in a spherical space. Although not in a spherical space, compositional data, such as microbiome data, can be mapped from a simplex to a spherical space via the component-wise square-root transformation. Other examples where compositional data emerge as a subject of interest include compositions of minerals in rocks, compositions of chemical mixtures, investment portfolios, and demographic composition of a population. This dissertation aims to develop inference procedures for analyzing directional data in general initially, with later focus shifted to the transformed compositional data (to become directional data).

More specifically, we will first investigate a directional distribution family suitable and convenient for drawing inference for directional data. There are several widely used distributions for directional random variables, including the Von Mises distribution, Kent distribution, projected Gaussian, and wrapped Gaussian (Kantilal Varichand Mardia 2014). We adopt in our research the elliptically symmetric angular Gaussian (ESAG) distribution that is first proposed by Paine et al. (2018). Compared to many popular choices of directional distributions, ESAG has three main virtues. Firstly, it is very flexible in terms of isotropy/anisotropy, allowing it to model directional data with different amount of variations in different directions. Secondly, it has a closed-form probability density function free of hard-to-evaluate normalization constant, which is advantageous for carrying out maximum likelihood estimation. Lastly, it belongs to the bigger family of angular Gaussian distributions, and thus generating data from ESAG is almost as straightforward as generating data from multivariate Gaussian. This last feature makes parametric bootstrap numerically fast and easy.

To prepare for developing inference methods for data modelled by ESAG, we propose a novel reparameterization of ESAG to facilitate modeling directional data of arbitrary dimension, so that ESAG is indexed by parameters that can range over the entire real space. Following this reparameterization of ESAG, we develop a complete package of inferential methods that allow for point estimation, interval estimation, and hypothesis testing of model parameters, as well as prediction and model diagnostics. We apply this package of methodology to analyze compositional data from a real-life application in hydropchemistry. This package of inference methodology is then extended to regression settings, allowing one to analyze directional data from a heterogeneous population by incorporating covariate-dependent model parameters. We apply the regression methodology to explore impacts of one’s age and body mass index on one’s gut microbiota composition.

Despite the flexibility and the convenience the proposed ESAG-based inference procedures offer in a wide range of applications, when it comes to microbiome data as a particularly interesting type of compositional data, certain features of such data present unique challenges that make a direct modeling and drawing inference via ESAG inadequate. These features include the high dimensionality of a compositional vector encountered in many biomedical study and the phenomenon of zero-inflated measures. This motivates our exploration of mixture models using ESAG distributions as building blocks to provide an even more flexible class of regression models for compositional data, and to develop scalable inference methods to address these unique challenges.

Rights

© 2024, Zehao Yu

Share

COinS