Date of Award
Open Access Dissertation
Computer Science and Engineering
Despite the great achievements made by neural networks on tasks such as image classification, they are brittle and vulnerable to adversarial example (AE) attacks, which are crafted by adding human-imperceptible perturbations to inputs in order that a neural-network-based classifier incorrectly labels them. Along with the prevalence of deep learning techniques, the threat of AEs attracts increasingly attentions since it may lead to serious consequences in some vital applications such as disease diagnosis.
To defeat attacks based on AEs, both detection and defensive techniques attract the research community’s attention. Given an input image, the detection system outputs whether it is an AE, so that the target neural network can reject those adversarial inputs. A defense technique, given an AE, helps the target neural network make correct prediction by either rectifying the AE or fortifying the classifier itself. While many countermeasures against AEs have been proposed, recent studies show that the existing detection methods usually goes ineffective when facing adaptive AEs. In this work, we exploit AEs by identifying their noticeable characteristics.
First, we noticed that L2 adversarial perturbations are among the most effective but difficult-to-detect attacks. How to detect adaptive L2 AEs is still an open question. At the same time, we find that, by randomly erasing some pixels in an L2 AE and then restoring it with an inpainting technique, the AE, before and after the steps, tends to have different classification results, while a benign sample does not show this symptom. We thus propose a novel AE detection technique, Erase-and-Restore (E&R), that exploits the intriguing sensitivity of L2 attacks. Comprehensive experiments conducted on standard image datasets show that the proposed detector is effective and accurate. More importantly, our approach demonstrates strong resilience to adaptive attacks. We also interpret the detection technique through both visualization and quantification.
Second, previous work considers that it is challenging to properly alleviate the effect of the heavy corruptions caused by L0 attacks. However, we argue that the uncontrollable heavy perturbation is an inherent limitation of L0 AEs, and thwart such attacks. We thus propose a novel AE detector by converting the detection problem into a comparison problem. More concretely, given an image I, it is pre-processed to obtain another image I'. Then, a well-trained Siamese network automatically and precisely captures the discrepancies between I and I' to detect L0 perturbations. In addition, we show that the pre-processing technique used for detection can also work as an effective defense, which has a high probability of removing the adversarial influence of L0 perturbations. Thus, our system demonstrates not only high AE detection accuracies, but also a notable capability to correct the classification results.
Finally, we propose a comprehensive AE detector which systematically combines the two aforementioned detection methods to thwart all categories of widely discussed AEs, i.e., L0, L2, and L∞ attacks. By acquiring the both strengths from its assembly components, the new hybrid AE detector is not only able to distinguish various kinds of AEs, but also has a very low false positive rate on benign images. More significantly, through exploiting the noticeable characteristics of AEs, the proposed detector is highly resilient to adaptive attack, filling a critical gap in AE detection.
Zuo, F.(2021). Towards More Trustworthy Deep Learning: Accurate, Resilient, and Explainable Countermeasures Against Adversarial Examples. (Doctoral dissertation). Retrieved from https://scholarcommons.sc.edu/etd/6879