Yang Mi

Date of Award

Summer 2020

Document Type

Open Access Dissertation


Computer Science and Engineering

First Advisor

Song Wang

Second Advisor

Michael N. Huhns


Moving-camera video content analysis aims at interpreting useful information in videos taken by moving cameras, including wearable cameras and handy cameras. It is an essential problem in computer vision, and plays an important role in many real-life applications, including understanding social difficulties and enhancing public security. In this work, we study three sub-problems of moving-camera video content analysis, including two sub-problems for the analysis on wearable-camera videos which are a special type of moving camera videos: recognizing general actions and recognizing microactions in wearable-camera videos. And, the third sub-problem is estimating homographies along moving-camera videos.

Recognizing general actions in wearable-camera videos is a challenging task, because the motion features extracted from videos of the same action may show very large variation and inconsistency, by mixing the complex and non-stop motion of the camera. It is very difficult to collect sufficient videos to cover all such variations and use them to train action classifiers with good generalization ability. To address this, we develop a new approach to train action classifiers on a relatively smaller set of fixed-camera videos with different views, and then apply them to recognize actions in wearable-camera videos. We conduct experiments by training on a set of fixed-camera videos and testing on a set of wearable-camera videos, with very promising results.

Microactions such as small hand or head movements, can be difficult to be recognized in practice, especially from wearble-camera videos, because only subtle body motion is presented. To address this, we proposed a new deep-learning based method to effectively learn midlayer CNN features for enhancing microaction recognition. More specifically, we develop a new dual-branch network for microaction recognition: one branch uses the high-layer CNN features for classification, and the second branch with a novel subtle motion detector further explores the midlayer CNN features for classification. In the experiments, we build a new microaction video dataset, where the micromotions of interest are mixed with other larger general motions such as walking. Comprehensive experimental results verify that the proposed method yields new state-of-the-art performance in two microaction video datasets, while its performance on two general-action video datasets is also very promising.

Homography is the invertible mapping between two images of the same planar surface. For estimating homographies along moving-camera videos, homography estimation between non-adjacent frames can be very challenging when their camera view angles show large difference. To handle this, we propose a new deep-learning based method for homography estimation along videos by exploiting temporal dynamics across frames. More specifically, we develop a recurrent convolutional regression network consisting of convolutional neural network and recurrent neural network with long short-term memory cells, followed by a regression layer for estimating the parameters of homography. In the experiments, we introduce a new approach to synthesize videos with known ground-truth homographies, and evaluate the proposed method on both the synthesized and real-world videos with good results.