Date of Award

2016

Document Type

Open Access Dissertation

Department

Statistics

Sub-Department

College of Arts and Sciences

First Advisor

David B. Hitchcock

Abstract

As an important exploratory analysis, curves of similar shape are often classified into groups, which we call clustering of functional data. Phase variations or time distortions are often encountered in the biological processes, such as growth patterns or gene profiles. As a result of time distortion, curves of similar shape may not be aligned. Regular clustering methods for functional data usually ignore the presence of phase variations, which may result in low clustering accuracy. However, it is difficult to account for phase variation without knowing the cluster structure.

In this dissertation, we first propose a Bayesian method that simultaneously clusters and registers functional data. We model a warping function with a discrete approximation generated from the family of Dirichlet distributions, which allows great flexibility and computational simplicity. Then, we modify our Bayesian algorithm to obtain a fast registration method, which does not require any template curve. We propose a distance-based clustering method that uses a “derivative sign” to measure the dissimilarity between two curves after potential phase variations are removed. Finally, we derive a modified variational approximation for our Bayesian method for simultaneous registration and clustering, which produces a faster alternative for the full Markov chain Monte Carlo (MCMC) sampling.

We demonstrate our proposed methods on simulated data as well as the famous Berkeley growth data, a set of yeast gene profile data, and a set of response of human fibroblasts to serum data.

Share

COinS