Date of Award


Document Type

Open Access Dissertation


Computer Science and Engineering


College of Engineering and Computing

First Advisor

Csilla Farkas


We investigate privacy violations occurring when non-confidential patient data is combined with medical domain ontologies to disclose a patient’s protected health information (PHI). We propose a framework that detects privacy violations and eliminates undesired inferences. Our inference channel removal process is based on controlling the release of the data items that lead to undesired inferences. These data items are either blocked from release or generalized to eliminate the disclosure of the PHI. We show that our method is sound and complete. Soundness means the only inference paths generated logically follow from released data and corresponding domain knowledge. Completeness means we detect all inference channels leading to undesired data disclosures. Our approach maximizes data availability by minimizing the number of data items to be generalized or removed.

In Phase 1 of our research, we construct an optimal solution which disrupts all privacy violations. We have developed a cost model based on the number of data items that are removed or generalized. We calculate the cost for each solution and select the solution with the lowest cost as the optimal solution.

Phase 2 of our research introduces heuristic-based improvements into our approach. We have developed a method to construct a solution, called an inference disruption cover. We use the entropy of the concepts within domain ontology to guide selection of the best facts in a disruption cover for generalization.

In Phase 3, we extend our privacy model to incorporate personal privacy preferences and safety. We provide mechanisms to specify a patient’s personal privacy restrictions as well as a clinician’s safety criteria. We introduce privacy and safety labeling of data items. We develop conflict resolution strategies when privacy and safety labels are contradictory. Our conflict resolution strategy favors safety over personal privacy.

Lastly, in Phase 4, we propose a graphical tool to support patients’ understanding of the privacy settings. Our tool provides brief tutorials about health data types, regulations, and typical healthcare data sharing. We also allow patients to view their medical data and data inferred using domain knowledge. This will help the patient understand the impact of releasing their data through a Health Information Exchange or for secondary use. Our graphical interface allows patients to request that specific data items be blocked from being released. An important aspect of our approach is that it sets the foundation for creation of patient-specific privacy policies.

In summary, the primary contribution of this work is a sound and complete framework capable of efficiently detecting and disrupting healthcare-focused inference violations. We extend the privacy model to incorporate patient-specific privacy and safety preferences. Finally, our proof-of-concept prototype implementation supports privacy preserving data release and real-time policy composition.