Date of Award

Summer 2020

Document Type

Open Access Thesis

Department

Computer Science and Engineering

First Advisor

Marco Valtorta

Abstract

Bayesian networks are effective tools for discovering relationships between variables in a data set. Algorithms that learn Bayesian networks from data fall into three categories: constraint-based, score-based, and hybrid. Hybrid algorithms contain a constraint testing sub-procedure as well as a score function to create the network. Malicious changes to the training set can cause invalid networks that do not model the true data. The effects of these changes have been demonstrated using the PC algorithm, a constraint-based algorithm. In this thesis a method was developed to measure the robustness of various algorithms to determine potential malicious changes. The robustness analysis involves determining the weakest link in the network and then finding the changes to entries in the training set that will remove this link. In particular, this work focused on the difference in robustness of algorithms from the three categories. The algorithms that were studied were PC-stable, tabu search, and CB. Because the only current implementation of CB was insufficient for this analysis, the algorithm was reconstructed in R. The framework developed for the comparison of the algorithms can be used for further investigation of other algorithms. The tabu search algorithm was found to have the highest robustness. This thesis provides a method for developing hybrid learning algorithms and testing these and other learning algorithms to determine their resistance to training data set manipulation.

Rights

© 2020, Noah Joseph Geveke

Share

COinS