Author

Yawei Liang

Date of Award

Spring 2019

Document Type

Open Access Dissertation

First Advisor

David Hitchcock

Abstract

In the modern world, data have become increasingly more complex and often contain different types of features. Two very common types of features are continuous and discrete variables. Clustering mixed-mode data, which include both continuous and discrete variables, can be done in various ways. Furthermore, a continuous variable can take any value between its minimum and maximum. Types of continuous vari- ables include bounded or unbounded normal variables, uniform variables, circular variables, etc. Discrete variables include types other than continuous variables, such as binary variables, categorical (nominal) variables, Poisson variables, etc. Difficulties in clustering mixed-mode data include handling the association between the different types of variables, determining distance measures, and imposing model assumptions upon variable types. We first propose a latent realization method (LRM) for clus- tering mixed-mode data. Our method works by generating numerical realizations of the latent variables underlying the categorical variables. Compared with the other clustering method, we find that the finite mixture model (FMM) is superior to LRM in terms of accuracy. Thus in the second project, we apply the FMM to multi-culture data. As a motivating example, we test the difference in human responses to the same questions across different cultural backgrounds. In the last project, we first extend the FMM to include circular data, which is one of the continuous types but rarely discussed in the mixed-mode area, within the framework of the EM algorithm. We add a Gaussian copula to the FMM to take into account the dependency of variables within each cluster.

Share

COinS