Date of Award


Document Type

Open Access Dissertation


Computer Science and Engineering


College of Engineering and Computing

First Advisor

Chin-Tser Huang


In this age of ubiquitous communication in which we can stay constantly connected with the rest of the world, for most of the part, we have to be grateful for one particular invention - the Internet. But as the popularity of Internet connectivity grows, it has become a very dangerous place where objects of malicious content and intent can be hidden in plain sight. In this dissertation, we investigate different ways to detect and capture these malicious contents hidden in the Internet. First, we propose an automated system that mimics high-risk browsing activities such as clicking on suspicious online ads, and as a result collects malicious executable files for further analysis and diagnosis. Using our system we crawled over the Internet and collected a considerable amount of malicious executables with very limited resources. Malvertising has been one of the major recent threats against cyber security. Malvertisers apply a variety of evasion techniques to evade detection, whereas the ad networks apply inspection techniques to reveal the malicious ads. However, both the malvertiser and the ad network are under the constraints of resource and time. In the second part of this dissertation, we propose a game theoretic approach to formulate the problem of inspecting the malware inserted by the malvertisers into the Web-based advertising system. During malware collection, we used the online multi-AV scanning service VirusTotal to scan and analyze the samples, which can only generate an aggregation of antivirus scan reports. We need a multi-scanner solution that can accurately determine the maliciousness of a given sample. In the third part of this dissertation, we introduce three theoretical models, which enable us to predict the accuracy levels of different combination of scanners and determine the optimum configuration of a multi-scanner detection system to achieve maximum accuracy. Malicious communication generated by malware also can reveal the presence of it. In the case of botnets, their command and control (C&C) communication is good candidate for it. Among the widely used C&C protocols, HTTP is becoming the most preferred one. However, detecting HTTP-based C&C packets that constitute a minuscule portion of everyday HTTP traffic is a formidable task. In the final part of this dissertation, we present an anomaly detection based approach to detect HTTP-based C&C traffic using statistical features based on client generated HTTP request packets and DNS server generated response packets.