Date of Award


Document Type

Open Access Dissertation


Computer Science and Engineering

First Advisor

Song Wang


In this research, we mainly focus on the problem of estimating the 2D human pose from a monocular image and reconstructing the 3D human pose based on the 2D human pose. Here a 3D pose is the locations of the human joints in the 3D space and a 2D pose is the projection of a 3D pose on an image. Unlike many previous works that explicitly use hand-crafted physiological models, both our 2D pose estimation and 3D pose reconstruction approaches implicitly learn the structure of human body from human pose data.

This 3D pose reconstruction is an ill-posed problem without considering any prior knowledge. In this research, we propose a new approach, namely Pose Locality Constrained Representation (PLCR), to constrain the search space for the underlying 3D human pose and use it to improve 3D human pose reconstruction. In this approach, an over-complete pose dictionary is constructed by hierarchically clustering the 3D pose space into many subspaces. Then PLCR utilizes the structure of the over-complete dictionary to constrain the 3D pose solution to a set of highly-related subspaces. Finally, PLCR is combined into the matching-pursuit based algorithm for 3D human-pose reconstruction.

The 2D human pose used in 3D pose reconstruction can be manually annotated or automatically estimated from a single image. In this research, we develop a new learning-based 2D human pose estimation approach based on a Dual-Source Deep Convolutional Neural Networks (DS-CNN). The proposed DS-CNN model learns the appearance of each local body part and the relations between parts simultaneously, while most of existing approaches consider them as two separate steps. In our experiments, the proposed DS-CNN model produces superior or comparable performance against the state-of-the-art 2D human-pose estimation approaches based on pose priors learned from hand-crafted models or holistic perspectives.

Finally, we use our 2D human pose estimation approach to recognize human attributes by utilizing the strong correspondence between human attributes and human body parts. Then we probe if and when the CNN can find such correspondence by itself on human attribute recognition and bird species recognition. We find that there is direct correlation between the recognition accuracy and the correctness of the correspondence that the CNN finds.