Yuqi Wu

Date of Award

Fall 2022

Document Type

Open Access Dissertation


Health Services and Policy Management

First Advisor

Sudha Xirasagar



Colorectal cancer (CRC), the second leading cause of cancer and cancer deaths in the United States, can largely be prevented by screening the age-appropriate adult population with colonoscopy to remove precancerous lesions. However, despite the encouraging annual, population-wide decline in CRC incidence and mortality in recent decades, and widespread adoption of screening colonoscopy since 2000 (~60% population coverage rate), the Black population has consistently suffered higher CRC incidence and mortality than Whites, currently 20% and 40% respectively, little changed from the 2000 rats of 26% and 52%.

The reason for this is thought to be that the Black population may experience polyp formation at younger ages than Whites, and/or they may more rapid progression to cancer at younger ages. More than 90% of all CRCs are thought to start in the form of benign polyps. If such is the case, earlier colonoscopy screening initiation is needed among the Black population. However, to support such a policy, there is no empirical evidence on age-related racial differences in the prevalence of early-stage polyps, and in the progression of precancerous polyps. To explore this issue, it requires a data source representing the population-based polyp prevalence and profile among the Black population and the Whites, which could be inferred if the screened sample had near-100% polyp clearance achieved at the screening, which in turn would be validated if this sample achieved the maximal possible CRC protection after colonoscopy documented in clinical trials.

This study used a patient and polyp database from a single center in the midlands region of South Carolina, which may be the nearest possible proxy to the population polyp prevalence and profile. This sample had achieved 83% CRC incidence prevention and 89% CRC mortality prevention at 4.8-year follow-up, results that are far superior to any other observational, real-world colonoscopy follow-up study. This study’s objective was to study the specific polyp characteristics or combination of characteristics that are significantly associated with Black race vs. White within each gender group, using regression and machine learning methods.


The study sample contained 29,425 patients (with 48,761 adenomatous polyps) provided screening colonoscopy at a free-standing licensed ambulatory surgery center for endoscopy in South Carolina from September 4, 2001, to July 28, 2016. In the first component, descriptive analyses and regression modeling were conducted on five outcome variables – the presence of any adenoma (y/n), presence of any advanced adenoma (y/n), presence of any advanced adenoma or more than 3 non-advanced adenomas (y/n), presence of hyperplastic polyp in the right colon (y/n), and the total polyp burden in mm (continuous variable). Analyses within each gender group were carried out to compare Black males with White males, and Black females with White females.

To extend the analysis, second component was added, machine learning (ML) methods to identify the most important polyp features associated with race. It used seven supervised classification models – Logistic Regression (LR), Naïve Bayes (NB), K-Nearest Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF), XG-Boost, and AdaBoost. Based on information-maximization and results showing highest information gain performance among the above ML algorithms, the best performing models were selected, and risk factor ensembled analysis was done. SAS v9.4 and appropriate packages in Python were used.


Of 28,100 total patients, 46.2% were male, 54.8% Black, and 14.7% aged 40-49 years. The results of multivariable logistic regressions showed that the Black patients were less likely to have an adenoma compared to Whites (adjusted odds ratio, AOR=0.88, 95% CI 0.84-0.93, p

Among the machine learning methods, examining model performance among all three study samples (total, male and female), Logistic Regression and AdaBoost performed best, with the highest scores of AUROC and Accuracy. Using the selected models, the most important polyp features that distinguished patients by race were identified. (1) Total polyp burden showed the highest importance in the full cohort, among both males and females. Comparing with traditional regression, this variable also had shown statistical significance among males and females. (2) Presence of hyperplastic polyp in the right colon was found important in ML models in full cohort, and it was also significant in traditional regression in full cohort, among males and females. (3) In ML models, total polyp burden in the right colon was consistently ranked second most important in the full cohort, among males and females. (4) Hyperplastic polyp burden in the left colon was important in ML models in full cohort, among males and females.


In conclusion, the study finds that in general, Black patients are no different from Whites in polyp profile on almost all measures - likelihood, advanced histology, or hyperplastic polyps in the right colon, which are all considered to represent higher risk for future CRC. Whatever differences we observed were of marginal effect size. Given the observed absence of racial differences in polyp status at screening, the plausible explanation for the CRC racial disparities in the general population leads us to default explanations, that differences in colonoscopy procedure details, such as difference in bowel preparation (for which there is evidence in the literature). Poorer bowel preparation among a subset of Black patients (especially Medicaid patients) which is documented, may be driven by differences in health literacy and internet connectivity, in turn largely driven by socio-economic status, SES. Another key factor may be poor benefit coverage under Medicaid which does not cover continuous intravenous sedation and anesthesia with propofol by a nurse anesthetist, which may enable unfettered endoscopist performance of a thorough colonoscopy. In contrast colonoscopy conducted under unpredictable sedation achieved by single dose intravenous sedation administered before the procedure may result in patient discomfort and suboptimal colonoscopy completion or even premature procedure termination. Black patients disproportionately have Medicaid coverage.

To reiterate, the study showed that the magnitude of racial differences in polyp profile are at best marginal. ML methods largely confirmed the findings of traditional regression and extracted additional information on racial differences in polyp features that are less frequent but having higher cancer development potential. The study findings suggest that equal and high prevention of CRC among the Black population and Whites can be achieved by consistent polyp clearance achieved at all colonoscopies, which can be achieved by addressing the barriers that may be impeding polyp clearance among a subset of the Black population.

