Date of Award

Spring 2020

Degree Type


Director of Thesis

Joshua Tebbs

Second Reader

Khalid Ballouli


The goal of this thesis is to model the probability of a high school football player’s chance of being drafted based on information taken from their recruiting profile. The response variable is binary and defined as drafted (1) or undrafted (0). The independent variables were collected by scraping data from the recruiting websites including height, weight, position, hometown, recruiting grade and other socioeconomic factors based on the player’s high school. 247Sports and ESPN were the two recruiting services used and compared in this study. Because of the binary nature of the dependent variable, logistic regression and decision trees were chosen as the methods to analyze and model the data. All analysis was conducted using the statistics program RStudio. Once the data were cleaned, they were separated into two sets: one including all public-school players and another including public school players from the south region. Logistic models were chosen based on AIC, BIC, ROC, and misclassification error. The decision trees were pruned to reduce overfitting and increase the power of the test. Ultimately, the best model for both sets was achieved by using logistic regression from the 247Sports data.

First Page


Last Page