Date of Award

Summer 2025

Document Type

Open Access Dissertation

Department

Statistics

First Advisor

David Hitchcock

Abstract

The analysis of functional data is an increasingly relevant part of statistics. The exploratory data analytic method of cluster analysis plays a very important role in different fields. Over the years, researchers have developed many clustering approaches, striving to achieve more accurate and efficient clustering. In Chapter 1, we aim to improve the accuracy of our outcome when clustering functional data. To achieve this goal, we use a predictive likelihood function which serves as an objective function to optimize in order to determine the most appropriate clusters. To optimize the objective function over the space of clustering partitions, we produce a Markov chain of partitions. At each step, we generate via a simulated annealing algorithm new partitions for which the predictive likelihood is evaluated. A schedule for adjusting the temperature parameter enhances the efficiency of the chain. Compared with the K-means algorithm and the algorithm of the funFEM function in the R package funFEM, our method can achieve higher performance as measured by the Rand index for various simulated data. Our approach is also applied to two real data sets, a vertical density profile data set and a yeast gene data set, and it achieves good clustering results. In Chapter 2, we explore the application of the James-Sugar method to effectively cluster sparse functional data, which employs a model-based approach to clustering combined with basis expansions to manage the sparsity and irregularity of data points. We investigate the optimal tuning parameters necessary for maximizing clustering accuracy to provide a deeper understanding of how model settings affect clustering results in functional data analysis. Chapter 3 presents a robust framework for clustering sparse functional data by integrating multiple imputation using smoothing splines with a predictive likelihood-based clustering method. To address missing data, we generate multiple imputed datasets by perturbing a smoothing parameter in a smoothing-spline nonparametric regression. The variation of the smoothing parameter across the imputation process allows us to capture the uncertainty in that imputation process. Each completed dataset is clustered using a likelihood-based approach that models curves through basis expansions. To ensure consistent labeling across imputations, we employ the Hungarian algorithm and finalize cluster assignments via majority voting. Simulation studies demonstrate the method’s strong performance across varying levels of sparsity and noise. We further apply the method to a real-world wages dataset with irregular time points, successfully identifying stable and interpretable clusters. Our approach offers a flexible and statistically grounded solution for functional data clustering in the presence of missingness and sparsity, with practical relevance in economics, biostatistics, and the social sciences.

Rights

© 2025, Tong Shan

Share

COinS