Author

Xiangyang Cao

Date of Award

Summer 2020

Document Type

Open Access Dissertation

Department

Statistics

First Advisor

Karl Gregory

Second Advisor

Dewei Wang

Abstract

The increasingly rapid emergence of high dimensional data, where the number of variables p may be larger than the sample size n, has necessitated the development of new statistical methodologies. LASSO and variants of LASSO are proposed and have been the most popular estimators for the high dimensional regression models. However, not much work has focused on analyzing and summarizing the information contained in the entire solution path of the LASSO. This dissertation consists of three research projects that propose and extend the Leave-One-Covariate-Out(LOCO) solution path statistic to regression and graphical models.

In the first chapter, we propose a new measure of variable importance in highdimensional regression based on the change in the LASSO solution path when one covariate is left out. For low-dimensional linear models, our method can achieve higher power than the T-test. In the high-dimensional setting, our proposed solution path based test achieves greater power than some other recently developed highdimensional inference methods.

In the second and third chapter, we extends the LOCO path statistic developed for linear regression with a continuous response to generalized linear models and graphical models. Our procedure allows for the construction of P-values for testing hypothesis about single regression coefficients as well as hypotheses involving multiple regression coefficients and variable screening for graphical models. In the high-dimensional setting, our proposed solution path based test achieves greater power than some other recently developed high-dimensional inference and screening methods.

Rights

© 2020, Xiangyang Cao

Share

COinS