Yuhang Lu

Date of Award

Summer 2022

Document Type

Open Access Dissertation


Computer Science and Engineering

First Advisor

Song Wang


Semantic segmentation that aims at grouping discrete pixels into connected regions is a fundamental step in many high-level computer vision tasks. In recent years, Convolutional Neural Networks (CNNs) have made breakthrough progresses in public semantic segmentation benchmarks. The ability of learning from large-scale labeled datasets empowers them to generalize to unseen images better than traditional nonlearning-based methods. Nevertheless, the heavy dependency on labeled data also limits their applications in tasks where high-quality ground truth segmentation masks are scarce or difficult to acquire. In this dissertation, we study the problem of alleviating the data dependency for CNN-based segmentation with a focus on leveraging the shape prior knowledge of objects.

Shape prior knowledge could provide rich learning-free information of object boundaries if properly utilized. However, this is not trivial for CNN-based segmentation because of its nature of pixel-wise classification. To address this problem, we propose novel methods to integrate three types of shape priors into CNN training, including implicit, explicit and class-agnostic priors. They cover from specific objects with strong prior to general objects with weak prior. To demonstrate the practical value of our methods, we present each of them within a challenging real-world image segmentation task. 1) We propose a weakly supervised segmentation method to extract curve structures stamped on cultural heritage objects, which implicitly takes advantage of the prior knowledge of their thin and elongated shape to relax the training label from pixel-wise curve mask to single-pixel curve skeleton, and outperforms fully supervised alternatives by at least 7.7% in F1 score. 2) We propose a one-shot seg-mentation method to learn to segment anatomical structure from X-ray images with only one labeled image, which is realized by explicitly modeling the shape and appearance prior knowledge of objects into the objective function of CNNs. It performs competitively compared to state-of-the-art fully supervised methods when using a single label, and could outperform them when a human-in-the-loop mechanism is incorporated. 3) Finally, we attempt to model shape priors in a universal form that is agnostic to object classes, where the knowledge can be distilled from a few labeled samples through a meta-learning strategy. Given a base model pretrained on existing large-scale dataset, our method could adapt it to any unseen domains with the help of a few labeled images and masks. Experimental results show that our method significantly improve the performance of base models in a variety of cross-domain segmentation tasks.