Date of Award
5-8-2015
Document Type
Open Access Dissertation
Department
Epidemiology and Biostatistics
Sub-Department
Biostatistics
First Advisor
James W. Hardin
Abstract
The use of generalized linear models and generalized estimating equations in the public health and medical fields are important tools for research, specifically for modeling clinical trials, evaluating preventive measures, and secondary data analysis. It is important for these researchers to have the necessary tools to analyze and model their data correctly. This dissertation focuses on a penalized maximum likelihood estimation method for generalized linear models, measures of association such as the coefficient of determination and R2 for generalized estimating equations, and a modified quasi-likelihood information criterion for generalized estimation equations.
Common problems that arise during estimation of generalized linear models are bias of the estimates, small sample size, or complete or quasi-complete separation of data points. To address these problems, the first part of this dissertation introduces a penalized maximum likelihood approach that includes a penalty term directly in the score function prior to maximization of the likelihood, and then implements this method into statistical software.
Generalized estimating equations are also an innovative way to model the within group correlation for longitudinal, clustered, or panel data. Currently, not many diagnostic statistics are available for these models. In the second part of this dissertation, we propose an R2 and several pseudo-R2 measures that help researchers with variable selection and provide a goodness of fit measure for the selected model. These calculations are also made accessible to researchers in statistical software.
Generalized estimating equations are an extension to the generalized linear model specifically designed to address the within group correlation. To model the within group correlation in generalized estimating equations, the researcher must select the working correlation structure. However, the current quasi-likelihood information criterion for selecting the working correlation structure is not efficient in that it tends to favor the independent structure which assumes there is no within group correlation. In the last part of this dissertation, we propose a modified quasi-likelihood information criterion that outperforms the current quasi-likelihood information criterion in that this criterion favors the correct structure a large majority of the time. The efficiency of the estimates are improved when using the modified quasi-likelihood information criterion.
Rights
© 2015, Chelsea Boquet Deroche
Recommended Citation
Deroche, C. B.(2015). Diagnostics and Model Selection for Generalized Linear Models and Generalized Estimating Equations. (Doctoral dissertation). Retrieved from https://scholarcommons.sc.edu/etd/3059