Date of Award

2018

Document Type

Open Access Dissertation

Department

Statistics

Sub-Department

College of Arts and Sciences

First Advisor

Edsel Pe˜na

Abstract

With the development of computer technology, researchers are able to observe and collect enormous amount of data, where the independent and identical distributed assumption is violated. For example, in sociology, individuals in an organization interact with each other to change the underlying social structure; in biology, understanding the gene-gene interaction helps researchers to detect potential diseases; in politics, voters are mutually influenced before the election via private/public speeches and parades, which might ultimately change the election results. It is crucial to study how individuals interact with each other from the data, which would lead to tremendous contributions to the society.

Centuries ago, mathematicians started to describe the interaction of objects with mathematical language in the field of graph theory. The concepts of vertices/nodes and edges are the cornerstone of graph theory. Vertex can be used to describe individual, and edge is a way to portray interaction between a pair of vertices. Taking advantage of the accumulated discoveries in graph theory, statisticians are able to develop stochastic models to make inference of the data, which can be represented by network structures.

My main research goal is to develop statistical models to discover the underlying community structure in various types of network data, including a snap shot of a network and time-varying network. The word "community" is an intermediate concept between a single node and the whole network, and can refer to a partition, a block structure, etc. Additionally, I desire to make my models be feasible to large size data, so that gigantic networks, e.g. social network, can be analyzed using my contributed methodologies. Spectral clustering type of methods, which usually require less computational resources, are proposed to achieve the research goal.

I first explore the methodologies of discovering community structure under an unobserved latent space by shrinking the latent positions of nodes belonging to the same community. Unlike traditional community detection algorithms, the information of edge covariates are taken into consideration for better estimation. I apply the proposed algorithm on an attorney friendship network to check the correlation between friendship status and office location.

I am also interested in analyzing dynamic network data, where a series of networks are observed. For example, the friendship between the same group of undergraduate students are different in the forth year comparing to the first year. One way to detect communities with dynamic network is to treat network on each time point independently. It is convenient, however, historical information (e.g. the network or community structure in the previous time points), which has potential to improve the estimation accuracy, is ignored. I build an algorithm to borrow the historical information and improve the clustering quality with the help of degree of nodes.

Share

COinS