Xin Zhang

Date of Award

Fall 2023

Document Type

Open Access Dissertation


Electrical Engineering

First Advisor

Xiaofeng Wang


Convolutional neural networks (CNNs) have gained increasing popularity and versatility in recent decades, finding applications in diverse domains including image recognition, natural language processing, recommendation systems, as well as safety-critical areas like autonomous driving, medical diagnostics, and military fields. However, the widespread use of CNNs in safety-critical applications has also given rise to concerns regarding their robustness. The low robustness of CNNs may lead to erroneous predictions and even significant losses of human life and property, especially when dealing with corner cases or boundary cases in real-world problem-solving. Most existing methods for evaluating CNNs still heavily rely on their accuracy on the test set, with insufficient consideration of CNN robustness. This oversight is one of the reasons behind their low robustness. As a result, despite achieving high accuracy on the test set, CNNs may still lack trustworthiness. This is primarily due to the fact that the quality of the test set itself is not adequately evaluated and may be missing corner cases. Furthermore, enormous research indicates that CNNs are vulnerable to adversarial attacks, posing a significant challenge to CNN robustness. These attacks can significantly reduce CNN's accuracy by altering image features or attacking image labels, sometimes even preventing the completion of training. Therefore, it is imperative to prioritize the enhancement of CNN robustness and allocate more attention to this aspect. To analyze and improve the robustness of CNNs, in this dissertation, we first introduce a quantification scheme called D-Score for assessing CNN robustness. Then, we propose several corresponding solutions for adversarial attacks that target image features and labels, respectively. Subsequently, we employ the D-Score method further to validate the robustness and correctness of these solutions. Specifically, we delve into the following issues. The first problem concerns the comprehensive evaluation of CNNs and, based on the evaluation results, how to enhance their robustness to input image transformations, particularly translations and rescalings. The previous evaluations of CNN mostly relied on the model's performance on the test set, but the reliability of such methods heavily depends on the quality of the test set, which is often overlooked in traditional training processes. There are also methods available to score the quality of the test set, but the score itself remains a black box, and the reasons behind the low quality of the test set are still unknown. To tickle these issues, we first propose a white-box diagnostic approach that uses mutation operators and image transformation to calculate the feature and attention distribution of the model and further present a diagnosis score, namely D-Score, to reflect the model's robustness and fitness to a dataset. Then, we present a D-Score based data augmentation method to enhance the CNN's performance to translations and rescalings. Comprehensive experiments on two widely used datasets and three commonly adopted CNNs demonstrate the effectiveness of our approach. Additionally, we present a refinement method for trained models, utilizing the deletion of mutation operators. This approach has the capacity to decrease the involvement of neurons in computations, all while upholding accuracy. Subsequent experiments confirm the efficacy of this refinement technique in mitigating overfitting in CNNs. The second problem is related to multi-label classification, specifically addressing how to ensure effective training of CNNs even when some labels in the training set are missing. It is commonly defined as partial-label classification, where only a limited number of labels are annotated for each image while the rest are missing. Our objective is to enhance the model's robustness in handling partially missing training labels. To address this problem, we propose a pseudo-label based approach to reduce the cost of annotation without bringing additional complexity to the existing classification networks. Then we quantitatively study the impact of missing labels on the performance of the classifier. Furthermore, by designing a novel loss function, we are able to relax the requirement that each instance must contain at least one positive label, which is commonly used in most existing approaches. Through comprehensive experiments on three large-scale multi-label image datasets, i.e. MS-COCO, NUS-WIDE, and Pascal VOC12, we show that our method can outperform existing missing-label learning approaches in most cases in terms of accuracy, and in some cases even approaches with fully labeled datasets. Furthermore, using the D-Score method, we demonstrate that our approach exhibits higher robustness compared to other benchmark methods. The third problem also revolves around the partial-label classification problem but focuses on the issue of label imbalance in a large-scale labeling space. In order to address this issue, we propose an innovative loss function that leverages statistical information from existing datasets to effectively alleviate the label imbalance problem and design a dynamic training scheme to reduce the dimension of the labeling space and further mitigate the imbalance. The extensive experiments on four large-scale public image datasets (COCO, NUS-WIDE, CUB, Open Images) demonstrate that our method outperforms the state-of-the-art methods, both in terms of accuracy (mAP) and robustness (D-Score).


© 2024, Xin Zhang

Available for download on Sunday, May 03, 2026