Date of Award

Fall 2023

Document Type

Open Access Dissertation

Department

Electrical Engineering

First Advisor

Xiaofeng Wang

Second Advisor

Song Wang

Abstract

This proposal introduces various methods for tackling the challenge of multi-label image classification in different scenarios, including partial labels and free-annotation settings. Multi-label image classification is typically formulated as a partial-label learning problem because it can be expensive to annotate all labels in every training image. Existing partial-label learning approaches typically focus on annotating each training image with only a subset of its labels, including a special case of annotating only one positive label per image. To further alleviate the annotation burden and improve classifier performance, this proposal introduces a new partial-label setting where only a subset of training images are labeled, each with only one positive label, while the rest remain unlabeled. In order to address the challenge of learning from partially labeled and unlabeled training images, we propose an end-to-end deep network called PLMCL (Partial- Label Momentum Curriculum Learning). Our approach leverages a momentum-based method to update soft pseudo labels on each training image, taking into account the updating velocity of pseudo labels, to avoid being trapped in low-confidence local minimums, especially at the early stage of training when there are no observed labels or confidence in pseudo labels. Moreover, we introduce a confidence-aware scheduler to perform adaptive easy-to-hard learning for different labels. Our experiments show that PLMCL outperforms many state-of-the-art multi-label classification methods under various partial-label settings on three different datasets. Despite the promising performance of our PLMCL method, there are still some limitations that need to be addressed. One such limitation is the lack of guarantee for the convergence of pseudo labels and the absence of theoretical support for its optimality from the perspective of pseudo labels. These limitations highlight the need for further research to ensure a more reliable guarantee of convergence. We developed a game theory framework to address the lack of robust convergence guarantee in pseudo label-based methods, including our PLMCL approach. This approach is more efficient in handling partially labeled and unlabeled training im- ages. We introduced an end-to-end Generic Game-theoretic Network (G2NetPL) for partial-label learning, which formulates a two-player non-zero-sum non-cooperative game between the network and soft pseudo labels associated with unobserved labels. The objective of the network is to minimize the loss function with given pseudo labels, while the pseudo labels aim for convergence to 1 or 0 with a penalty for deviating from the network’s predictions. Additionally, we incorporated a confidence-aware scheduler to adaptively perform easy-to-hard learning for different labels. Our exper- iments show that G2NetPL outperforms many state-of-the-art multi-label classifica- tion methods under various partial-label settings on different datasets. As a next step we will study the performance of our developed framework with reducing the amount of annotated data. Therefore, we propose a CLIP-based un- supervised learning method for annotation-free multi-label image classification. The method consists of three stages: initialization, training, and inference. In the initial- ization stage, we will utilize the CLIP model and extend it for multi-label predictions by computing global-local image-text similarity aggregation. Our approach involves splitting each image into snippets and using CLIP to generate similarity vectors for both the entire image (global) and each snippet (local). We will then introduce a sim- ilarity aggregator to combine the global and local similarity vectors. At the training stage, we will use the aggregated similarity scores as initial pseudo labels and propose an optimization framework to train the classification network’s parameters and re- fine the pseudo labels for unobserved labels. During inference, only the classification network will be used to predict the labels of the input image.

Rights

© 2024, Rabab Ezzeldin Rabie Abdelfattah

Share

COinS