Date of Award

Summer 2025

Document Type

Open Access Thesis

Department

Statistics

First Advisor

Ian Dryden

Abstract

The assumption of independence rarely holds in real-world data. Correlated observations are ubiquitous, especially in sequential contexts where time series models are essential for capturing temporal dependence. This study analyzed six groups of damaged and undamaged DNA sequences, where an "F" in the middle of a sequence indicates damage. One biological aim is to understand how DNA regenerates with the assistance of proteins that recognize damaged regions. Motivated by empirical support for AR(2) modeling, we fit autoregressive models to the first three principal component scores of each DNA group, capturing the dominant structure in the data. We conducted model diagnostics, estimated parameters via both MCMC and MLE and produced plots to evaluate model fit and assess proximity to nonstationarity. To streamline interpretation, we selected AFA and AGA to illustrate typical trends. Additionally, we investigated the behavior of the Ljung-Box test for residual autocorrelation. Our custom implementation, supported by a simulation study, yielded results consistent with the sarima() function in the astsa package, with both methods relying on the Ljung-Box test to assess model adequacy. Our approach also corrects inaccuracies in R’s default tsdiag() output. Under the null hypothesis, the simulated p-values followed a continuous Uniform distribution, with sample means and variances closely matching theoretical expectations. Power analysis further demonstrated that the Ljung-Box test becomes increasingly sensitive to residual autocorrelation as the omitted parameter ϕ(2) increases, clearly indicating model misspecification and underscoring the importance of including all relevant autoregressive components.

Rights

© 2025, David William Custer

Share

COinS