Date of Award

8-19-2024

Document Type

Open Access Dissertation

Department

Statistics

First Advisor

Xiaoyan Lin

Abstract

This dissertation focuses on ordinal classification ratings, which are commonly used in medical practice to assess the severity of a disease or condition. For example, a group of radiologists rate a set of mammograms and assign BI-RADS (Breast Imaging Reporting Data System) score for each mammogram. A Bayesian probit hierarchical model is first proposed to analyze this type of data. It links the ordinal ratings with both rater diagnostic skills and patient latent disease severity. Each rater diagnostic skills are quantified with two parameters, diagnostic bias and diagnostic magnifier. Patient latent disease severity is assumed to follow a different normal distribution depending on the true unknown binary disease status. An MCMC algorithm is developed for the model fitting. Besides evaluating rater-specific diagnostic skills, this model specification provides close form of overall and individual rater ROC (Receiver Operator Characteristic) curves and the corresponding AUC (Area Under the ROC Curves). By accounting for the complexity and variability in the data via the use of latent variables, our model allows for not only assessing overall and individual rater diagnostic accuracy but also measuring the variability of the diagnostic accuracy among the whole group of raters.

The Bayesian probit hierarchical model is then extended by incorporating rater and/or patient covariate information to investigate important factors that affect the diagnostic accuracy. The three extended models are nested within the original model as follows. The first extension is to add a layer of regression of regressing the latent disease severity on patient covariates such as age, gender, medical history, and other relevant patient characteristics. The second extension is to add a layer of regression of regressing both diagnostic bias and diagnostic magnifier on rater covariates such as the number of years of experience, specialty, and other relevant rater characteristics. The third model adds both layers of regression in the original model, allowing for a more comprehensive assessment of diagnostic accuracy that accounts for the complex interactions between patient and rater characteristics. One important aspect of these extended models is the ability to derive covariaterelated ROC curves, which can provide valuable information on the diagnostic accuracy of the raters or the diagnostic tests in different subgroups of patients based on their covariate values.

Missing data often occur in medical records, surveys, and longitudinal studies. Mishandling missing data can result in inaccurate statistical inferences, underscoring the need for appropriate handling. In this dissertation, we investigate the estimation behavior of the primary model when facing different missing percentages. The proposed Bayesian MCMC algorithm naturally accommodates the missing ratings and imputes them in each MCMC iteration. Additionally, the label switching issue in the mixture distribution of patient latent disease severity is explored, with mitigation strategies including artificial identifiability constraints, informative priors, and relabelling algorithms.

Rights

© 2024, Yun Yang

Share

COinS