Date of Award
Open Access Dissertation
Computer Science and Engineering
In this dissertation, we investigate and address two kinds of data integrity threats. We ﬁrst study the limitations of secure cryptographic shufﬂing algorithms regarding preservation of data dependencies. We then study the limitations of machine learning models regarding concept drift detection. We propose solutions to address these threats.
Shufﬂing Algorithms have been used to protect the conﬁdentiality of sensitive data. However, these algorithms may not preserve data dependencies, such as functional de- pendencies and data-driven associations. We present two solutions for addressing these shortcomings: (1) Functional dependencies preserving shufﬂe, and (2) Data-driven asso- ciations preserving shufﬂe. For preserving functional dependencies, we propose a method using Boyce-Codd Normal Form (BCNF) decomposition. Instead of shufﬂing the original relation, we recommend to shufﬂe each BCNF decomposition. The ﬁnal shufﬂed rela- tion is constructed by joining the shufﬂed decompositions. We show that our approach is lossless and preserves functional dependencies if the BCNF decomposition is dependency preserving. For preserving data-driven associations, we generate the transitive closure of the sets of attributes that are associated. Attributes of each set are bundled together during shufﬂing.
Concept drift is a signiﬁcant challenge that greatly inﬂuences the accuracy and relia- bility of machine learning models. There is, therefore, a need to detect concept drift in order to ensure the validity of learned models. We study the issue of concept drift in the context of discrete Bayesian networks. We propose a probabilistic graphical model frame- work to explicitly detect the presence of concept drift using latent variables. We employ latent variables to model real concept drift and uncertainty drift over time. For modeling real concept drift, we propose to monitor the mean of the distribution of the latent variable over time. For modeling uncertainty drift, we suggest to monitor the change in belief of the latent variable over time, i.e., we monitor the maximum value that the probability den- sity function of the distribution takes over time. We also propose a probabilistic graphical model framework that is based on using latent variables to provide an explanation of the detected posterior probability drift across time.
Our results show that neither cryptographic shufﬂing algorithms nor machine learning models are robust against data integrity threats. However, our proposed approaches are capable of detecting and mitigating such threats.
Alsuwat, H.(2019). Cybersecurity Issues in the Context of Cryptographic Shuffling Algorithms and Concept Drift: Challenges and Solutions. (Doctoral dissertation). Retrieved from https://scholarcommons.sc.edu/etd/5590