Date of Award


Document Type

Open Access Dissertation



First Advisor

Diansheng Guo


Exploratory and statistical spatial data analyses are commonly used in a wide range of research fields, such as epidemiology, disease surveillance and crime analysis. Spatial epidemiology, for example, needs to detect significant spatial clusters of disease incidents to help epidemiologists identify environmental factors and spreading patterns associated with certain diseases. Existing spatial analysis approaches mostly focus on the analysis of spatial lattice data, i.e., observations organized by locations such as county or census tract. With the wide spread of location-aware technologies such as GPS and smart phones, spatial interaction data have become increasingly available, e.g., human daily mobility, traveling and migration. The goal of this dissertation work is to develop new methodologies for the analysis of both spatial lattice data and spatial interactions data, with a focus on statistical and modeling approaches. The contribution of this dissertation includes three new methodologies for spatial scan statistics (Chapter 2), flow scan statistics (Chapter 3), and spatial interaction modeling (Chapter 4). The first developed methodology is a new spatial scan statistic incorporating smoothing and regionalization techniques. The contribution is three-fold: 1) the new method can detect irregular shaped spatial clusters, which is more efficient and effective than existing methods; 2) the method can alleviate the multiple-testing problem by dramatically reducing the cluster search space with hierarchical regionalization; and 3) the integration of a smoothing strategy addresses the small-area problem and significantly improves the accuracy of cluster detection. The new method is evaluated with a series of benchmark data that are widely used in related literature. The second approach, a new flow scan statistic, is specifically designed for spatial interaction data to detect significant flow clusters. To my best knowledge, it is the first scan statistics approach for spatial interaction data that can extract significant flow clusters from very large origin-destination (OD) data sets such as migration and taxi trips. The developed flow scan statistic method scans a given OD data set with a flow tube, which is defined by a neighborhood at the origin and a neighborhood at the destination, to detect significantly higher-than-expected flow clusters among locations. The test statistic is based on the Generalized Likelihood Ratio (GLR), which is specifically designed to work with both area-based and point-based spatial interaction data. The new method is demonstrated and evaluated with case studies of the county-to-county migration data in U.S. and a synthetic point-based OD flow data. The third method presented in this dissertation is a spatial interaction modeling and analysis framework that consists of (1) a piece-wise spatial interaction model to understand global flow patterns; (2) an extended spatial autocorrelation statistics based on Moran’s I to examine the spatial distribution of model residuals; and (3) a new mapping approach to visualize local flow patterns (spatial clusters of model residuals) that cannot be explained by the configured model and global patterns. The developed model takes into account the distance, origin/destination sizes and an accessibility measure for each flow. The model outcomes (i.e., coefficients) reveal interesting global patterns, followed with the statistical analysis and mapping of model residuals, with which one can further investigate local deviations from global trends and be able to gain a comprehensive understanding of the complex patterns hidden in spatial interaction data. A case study is carried out to analyze the migration among Metropolitan Statistical Areas (MSAs) in the United States. The major contribution of proposed framework includes a framework to configure piece-wise spatial interaction models, an extended Local Moran’s I statistic for analyzing flow residuals, and a novel mapping method for visualizing the flow residual patterns. The first and second approaches focus on scan statistics, with the first one improving existing spatial scan statistics by detecting irregular-shaped clusters based on regionalization and smoothing while the second approach is a new scan statistics method for analyzing spatial interaction data (i.e., location-to-location flows). The second and third methods are both for the analysis of spatial interaction data, with the former focusing on detecting significant flow clusters by developing a new statistics and the latter focusing on an exploratory framework and new approaches for spatial interaction modeling and residual analysis. The series of new methodologies and framework introduced in this dissertation can be extended in the future to analyze spatiotemporal patterns in spatial interaction data. In this dissertation I focus on migration data analysis, while the methodologies can also be used in other many other spatial data applications, such as economic activities, trade analysis, animal migration, and disease spread.

Included in

Geography Commons